Patent classifications
G06F16/2438
CHANGE STREAM ANALYTICS FOR DATA REPLICATION SYSTEMS
A computer-implemented method includes, by a streaming analytics component, identifying a source database table and a target database table. The target database table includes one or more target database rows and one or more target database columns. The method further includes identifying a change stream including a plurality of row-level modifications that cause the target database table to replicate the source database table. The method further includes determining that each row-level modification affects one or more common columns of the target database columns, wherein the common columns exhibit one or more common values for those of the target database rows that are affected by the row-level modifications. The method further includes generating, based on the common values and the common columns, a reconstructed modification statement and sending the reconstructed modification statement to an apply component. A corresponding computer program product and computer system are also disclosed.
Building a word embedding model to capture relational data semantics
A computer-implemented method according to one embodiment includes identifying a relational database; determining columns of interest within the relational database; creating an unordered group of string tokens for each row of the relational database, utilizing the determined columns of interest; assigning weights for one or more columns within the relational database to one or more string tokens within each unordered group of string tokens to create a plurality of weighted unordered groups of string tokens; and determining a meaning vector for an identifier of each row of the relational database, utilizing the plurality of weighted unordered groups of string tokens.
Method and apparatus to implement a home computing cloud
A home computing system (cloud) integrates a protocols gateway, WiFi router, cloud server, and mass storage device to support one or more Internet of Things (IoT) devices, possibly with different connectively protocols, in a local environment such as a residential home. The home computing cloud often reduces the amount of data traffic sent to a public computing cloud by locally processing collected device data rather than by sending the device data to the public computer cloud for processing. The home computing cloud may download an appropriate data analytic model from the public computing cloud, locally train (for example, reinforcement learning) the model, and locally execute the trained model to obtain prediction information from collected IoT device data. The home computing cloud also allows direct access of the connected IoT devices by user applications via the internet, through a protocols gateway and an IoT message translator.
Data manipulation record storage method, system, apparatus, and device
Computer-implemented methods, computer-implemented systems, and non-transitory, computer-readable media for data manipulation record storage. One computer-implemented method includes: sending, by a server, one or more manipulation instructions to a database, wherein the one or more manipulation instructions are in a structured query language (SQL) format; receiving, by the server from the database, an execution result of the one or more manipulation instructions; generating, by the server, one or more data records comprising the one or more manipulation instructions and the execution result; and determining, by the server, that a predetermined condition of generating a data block is satisfied; and generating, by the server, the data block that includes at least a portion of the data records.
SYSTEM AND METHOD FOR AT-SOURCE MASKING OF SENSITIVE DATA
This disclosure relates to a system and method for at source data masking and discovery of unique identifier for at-source masking. The method reads a table of production database comprising sensitive column from a source database for at source data masking. A unique identifier column is identified, and a temporary table is created which has three or more columns. Columns of temporary table comprises a sensitive column from the table of production database, a column for masked data of sensitive column and a unique identifier column. Sensitive column of the temporary table is masked using a known masking technique and the original data of the sensitive column and the masked data of the sensitive column is inserted into the temporary table. Finally, the production database is updated with the masked data of the sensitive column.
Architecture for resolution of inconsistent item identifiers in a global catalog
Systems and methods are provided for using language-agnostic embedding data to analyze a plurality of items associated with an item identifier in a catalog and determine a characteristic associated with the plurality of items or the item identifier. A distribution may be generated based on the language-agnostic embedding data and groups of items may be identified based on the distribution. Based on the groups of items, the item identifier may be classified as a consistent item identifier or an inconsistent item identifier across the catalog. A primary group of items and secondary groups of items can be identified for the plurality of items. The primary group of items may include items with a verified association with the item identifier and the secondary groups of items may include items with an unverified association with the item identifier.
VECTOR-BASED SEARCH RESULT GENERATION
A system and method to generate search results in response to a search query based on comparisons of embedding vectors. The system and method receive, from an end user system, a search query including a set of keywords associated with the entity. Using a neural network, an embedding vector is identified based on the set of keywords of the search query. The system and method compares the embedding vector associated with the search query to a set of embedding vectors associated with a set of structured data elements relating to the entity. Based on the comparison, a set of matching structured data elements is identified. The system and method generate a search result in response to the search query, wherein the search result includes at least a portion of the set of matching structured data elements. The search result is displayed via an interface of the end user system.
System and method for SQL query extraction in an application
This disclosure relates generally to a method and system for extraction of SQL queries in an application. Various conventional approaches models SQL query extraction at a specific program point problem as an instance of string expression problem. Many string analysis based solutions are not scalable for large applications and those which are scalable do not account explicitly for the heap based flow. In an embodiment, the disclosed method and system utilizes a multi-criteria slicing based model which takes into account the data flowing through heap and contributing to SQL queries generation.
ACCOUNTING FOR ITEM ATTRIBUTES WHEN SELECTING ITEMS SATISFYING A QUERY BASED ON ITEM EMBEDDINGS AND AN EMBEDDING FOR THE QUERY
An online system maintains various items and maintains values for different attributes of the items, as well as an item embedding for each item. When the online system receives a query for retrieving one or more items, the online system generates an embedding for the query. Based on measures of similarity between the embedding for the query and item embeddings, the online system selects a set of items. The online system identifies a specific attribute of items and generates a whitelist of values for the specific attribute based on measures of similarity between item embeddings for items in the selected set and the embedding for the query. The online system removes items having values for the selected attribute outside of the whitelist of values from the selected set of items to identify items more likely to be relevant to the query.
Artificial intelligence and/or machine learning systems and methods for evaluating audiences in an embedding space based on keywords
In some embodiments, a method includes determining a position for a search query and a position for each audience record from multiple audience records in an embedding space. The method further includes receiving multiple device records, each associated with an audience record. The method further includes determining multiple keywords, each associated with an audience record and determining a position for each keyword in the embedding space. The method further includes calculating a first distance between the position of the search query in the embedding space and the position of each audience record in the embedding space. The method further includes calculating a second distance between the position of the search query in the embedding space and the position of each keyword in the embedding space. The method further includes ranking each audience record based on the first distance and the second distance.