Patent classifications
G06F16/2456
COST-BASED SEMI-JOIN REWRITE
A method, apparatus, and computer program product for executing a relational database management system (RDBMS) in a computer system, wherein the RDBMS manages a relational database comprised of one or more tables storing data. The RDBMS executes a query with a semi-join operation comprising an inclusion join and/or an exclusion join performed against at least an outer table and an inner table, wherein the inclusion join returns a row from the outer table when there is a match with a row in the inner table, and the exclusion join returns a row from the outer table when there is no match with a row in the inner table. The RDBMS performs a rewrite of the query to avoid spooling and/or sorting of the inner table, when the inner table is larger than the outer table and a cost after the rewrite is lower than before the rewrite.
Multilayered Generation and Processing of Computer Instructions
Systems, devices, computer-implemented methods, and tangible non-transitory computer readable media for performing multilayered generation and processing of computer instructions are provided. For example, a computing device may receive a request with instructions in a first computer language, parse the instructions in the first computer language, analyze the instructions in the first computer language in view of information describing structure of a first application, generate instructions in a second computer language different from the first computer language where the instructions in the second computer language are generated based on the instructions in the first computer language and the information describing structure of the first application, obtain a result from a second application where the result comprises information based on the instructions in the second computing language, and provide the result in response to the request comprising the instructions in the first computer language.
Hardware accelerator performing search using inverted index structure and search system including the hardware accelerator
A hardware accelerator includes a block processing circuit configured to read a block from a list stored in an inverted index structure; and a search core configured to extract a document number out of a read block read by the block processing circuit and to calculate a score corresponding to the document number.
Column data compression schemes for scaling writes and reads on database systems
A request for performing a data storing operation directed to a database table that comprises a plurality of table columns is received. Columnar compression metadata is accessed to identify one or more table columns in the database table, each of the one or more table columns being designated to store compressed columnar values. The columnar compression metadata is used to apply one or more columnar compression methods to generate, from one or more uncompressed columnar values received with the request for the data storing operation, one or more compressed columnar values to be persisted in the one or more table columns in the database table. A database statement is executed to persist the one or more compressed columnar values in the one or more table columns in the database table.
Data set generation for testing of machine learning pipelines
A system may include memory containing: (i) a master data set representable in columns and rows, and (ii) a query expression. The system may include a software application configured to apply a machine learning (ML) pipeline to an input data set. The system may include a computing device configured to: obtain the master data set and the query expression; apply the query expression to the master data set to generate a test data set, where applying the query expression comprises, based on content of the query expression, generating the test data set to have one or more columns or one or more rows fewer than the master data set; apply the ML pipeline to the test data set, where applying the ML pipeline results in either generation of a test ML model from the test data set or indication of an error in the test data set; and delete the test data set from the memory.
Compliant entity conflation and access
The disclosed embodiments provide a system for managing data conflation. During operation, the system generates matches between a first set of entities in a first dataset from a first data provider and a second set of entities in a second dataset from a second data provider based on comparisons of fields in the first and second datasets. Next, the system modifies a join query for joining the first and second datasets to include operators representing compliance rules for the first or second datasets. The system executes the modified join query to produce a joined dataset that adheres to the compliance rules and stores data related to the joined dataset within a platform that logically isolates the data from additional datasets. During processing of queries of the data, the system modifies the queries to include additional operators that enforce access control policies for the data.
System and method for bloom filters in large scale applications
A system and method for implementing bloom filters in large scale applications is disclosed. The system and method include at least one processor configured to create a plurality of sharded bloom filters based on signatures stored in the memory, perform at least one lookup using the plurality of sharded bloom filters; and output a lookup result in real time.
DETERMINING A DEGREE OF SIMILARITY OF A SUBSET OF TABULAR DATA ARRANGEMENTS TO SUBSETS OF GRAPH DATA ARRANGEMENTS AT INGESTION INTO A DATA-DRIVEN COLLABORATIVE DATASET PLATFORM
Various techniques are described, including evaluating ingested data including a dataset to identify one or more links to other datasets stored in a graph, using a similarity determination algorithm to identify a degree of similarity between datasets to determine joinability of ingested datasets with graph-stored datasets, determining a ratio to determine whether to perform an overlap or coverage function, associating a subset of similarity matrices with a subset of graph data joined to the ingested dataset, and forming links in a column of data between the dataset and the another dataset of the ingested data based on the degree of similarity.
SYSTEM AND METHOD FOR IDENTIFYING APPROXIMATE K-NEAREST NEIGHBORS IN WEB SCALE CLUSTERING
The present teaching relates to method, system, medium, and implementations for identifying k nearest neighbors. One or more KNN lists corresponding to one or more source data points are received. Each KNN list includes K neighbors of a source data point and each of the K neighbors is a data point represented by an index. Neighbor pairs and reverse neighbor pairs are generated based on the one or more KNN lists. The neighbor pairs and reverse neighbor pairs having the same source data point are grouped to generate a grouped pairs of neighbors for the source data point. A local join operation is performed based on grouped pairs of neighbors for each source data point to generate a combined neighborhood for the source data point, which is then sent to a KNN server, where combined neighborhoods generated by multiple local join executors are integrated to update a plurality of global KNN lists.
COMMUNICATION MANAGEMENT METHOD AND INFORMATION PROCESSING APPARATUS
A non-transitory computer-readable recording medium stores a program that causes a computer to execute a process, the process includes obtaining first graph information indicating a relationship between elements of a plurality of elements included in communication information on a first group, obtaining second graph information indicating a relationship between elements of a plurality of elements included in communication information on a second group, comparing the first graph information and the second graph information, and outputting information that recommends to merge the first group and the second group when it is determined that the first graph information and the second graph information are similar based on a result of the comparison.