G06F16/2282

RELATIONSHIP ANALYSIS USING VECTOR REPRESENTATIONS OF DATABASE TABLES
20230051059 · 2023-02-16 · ·

A computer-implemented method includes representing a plurality of database tables as respective vectors in a multi-dimensional vector space, receiving an indication that a first database table represented by a first vector and a second database table represented by a second vector are related to each other, moving the respective vectors representing the plurality of database tables in the multi-dimensional vector space in response to the indication, and grouping the plurality of database tables into one or more table clusters based on positions of the respective vectors representing the plurality of database tables in the multi-dimensional vector space.

MACHINE LEARNING TECHNIQUES FOR EFFICIENT DATA PATTERN RECOGNITION ACROSS STRUCTURED DATA OBJECTS

Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive data analysis with respect to structured data objects. Certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive data analysis with respect to structured data objects by utilizing at least one of cross-table data similarity score generation machine learning models and unsupervised anomalous table row detection machine learning models.

System and method for large scale anomaly detection

A system and method for detecting anomalies in very large datasets is disclosed. The method includes calculating statistics for data elements in a data set over a range of time periods. These statistics are arranged into a 2D array and analyzed using a machine learning algorithm to detect anomalous regions. The method also includes steps of analyzing time series of the data based on detected anomalous regions, correcting any errors in the datasets, and storing the corrected values in a separate database to maintain data integrity.

Using an object model to view data associated with data marks in a data visualization

A computer generates and displays a data visualization in a data visualization user interface according to placement of data fields, from a data source. The data visualization includes visual data marks representing data from the data source. The computer detects a user input to select a visual data mark. In response to detecting the user input, the computer obtains a data model encoding the data source as a tree of logical tables. The computer identifies one or more aggregated data values for the visual data mark, each of the aggregated data values corresponding to a respective data field in the data model. For each of the aggregated data values, the computer retrieves a respective disaggregated set of data rows from a respective logical table containing the respective data field. The computer displays a summary grid, with a respective tab corresponding to each of the retrieved disaggregated sets of data rows.

Attribute identification based on seeded learning

A system and method are presented in which known genetic attributes associated with a condition are used to seed the determination of additional attributes which are associated with the condition. Based on the learning, the additional attributes (genetic, behavioral, or both) provide for an increased correlation between the combined attributes and the condition. For behavioral attributes, a measure of the impact of the behavioral attribute on the risk of the condition can be transmitted to another device or system.

Distributed pseudo-random subset generation

Distributed pseudo-random subset generation includes obtaining a data-query indicating a first table having a first column including unique values, a second table having a second column including unique values, a join clause joining the first table and the second table on the first column and the second column, and a limit value, pseudo-random filtering the first table to obtain left intermediate data and left filtering criteria, pseudo-random filtering the second table to obtain right intermediate data and right filtering criteria, obtaining intermediate results data by full outer joining the left intermediate data and the right intermediate data, obtaining results data by filtering the intermediate results data using most-restrictive filtering criteria among the left filtering criteria and the right filtering criteria, and outputting the results data, wherein outputting the results data includes limiting the cardinality of rows of the results data to be at most the limit value.

Alternate states in associative information mining and analysis

Provided are methods, systems, and computer readable media for user interaction with database methods and systems. In an aspect, a user interface can be generated to permit dynamic display generation to view data. The system can comprise a visualization component to dynamically generate one or more visual representations of the data to present in the state space.

High performance dictionary for managed environment

Systems and methods are provided for optimizing data structures to improve the data retrieval through the use of bucketing techniques. A number of objects within an environment is drastically reduced utilizing bucketing techniques. Within the buckets, items are sequentially organized such that location is quicker. Items, or keys, are aligned with the same hash value together in a bucket and a mapping of the hash value to the offset of the first key occurrence in that bucket. This guarantees each lookup operation is only two random read accesses. Systems and methods provided herein control the pressures on a system for garbage collection and minimize memory usage with minimal impacts on performance.

Machine learning system and method to map keywords and records into an embedding space

In some embodiments, a method includes determining a position for a search query and a position for each audience record from multiple audience records in an embedding space. The method further includes receiving multiple device records, each associated with an audience record. The method further includes determining multiple keywords, each associated with an audience record and determining a position for each keyword in the embedding space. The method further includes calculating a first distance between the position of the search query in the embedding space and the position of each audience record in the embedding space. The method further includes calculating a second distance between the position of the search query in the embedding space and the position of each keyword in the embedding space. The method further includes ranking each audience record based on the first distance and the second distance.

Computer implemented predisposition prediction in a genetics platform

A method, software, database and system for attribute partner identification and social network based attribute analysis are presented in which attribute profiles associated with individuals can be compared and potential partners identified. Connections can be formed within social networks based on analysis of genetic and non-genetic data. Degrees of attribute separation (genetic and non-genetic) can be utilized to analyze relationships and to identify individuals who might benefit from being connected.