Patent classifications
G06F16/24558
OUTPUT VALIDATION OF DATA PROCESSING SYSTEMS
A method is provided for output validation of data processing systems, performed by one or more processors. The method comprises performing a data comparison between a first data table and a second data table to determine a data differentiating table, wherein the first data table is based on an output of a first data pipeline, and wherein the second data table is based on an output of a second data pipeline; performing a schema comparison between the first data table and the second data table to determine a schema differentiating table; generating a first output validation score based on the data differentiating table; generating a second output validation score based on the schema differentiating table; and generating a summary comprising both the first and second output validation scores.
Creating index in blockchain-type ledger
A method and an apparatus for creating an index in a blockchain-type ledger, and a device are disclosed. According to one implementation, a method may include obtaining, by a centralized database server, a data record, wherein the data record is stored in a blockchain-type ledger, and wherein the data record comprises a service attribute and a sequence number; determining location information of the data record in the blockchain-type ledger, wherein the location information comprises a block height of a data block comprising the data record, and an offset of the data record in the data block; establishing a mapping relationship between the service attribute, the location information, and the sequence number; and based on the sequence number, writing the mapping relationship to an index.
SYSTEMS AND METHODS FOR DETERMINING THE SHAREABILITY OF VALUES OF NODE PROFILES
The present disclosure relates to determining the shareability of values of node profiles. Record objects and electronic activities of a system of record corresponding to a data source provider may be accessed. Each record object may correspond to a record object type and have one or more object field-value pairs. Node profiles may be maintained. Values of fields corresponding to a predetermined type of field including fewer than a predetermined threshold number of data source providers may be identified. A restriction tag used to restrict populating other node profiles may be generated. Provision of the value with a second data source provider may be restricted.
Database key identification
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for database key identification. One of the methods includes receiving an identification of a first field in a first data set, the first data set including records. The method includes identifying a set of values, the set including, for each record, a value associated with the field. The method includes generating a filter mask based on the set of values, where application of the filter mask is capable of determining that a given value is not in the set of values. The method includes receiving a second data set including a second field, the second data set including records. The method includes determining a count of a number of records in the second data set having a value associated with the second field that passes the filter mask. The method also includes storing the count in a profile.
COMPARING DATASETS USING HASH VALUES OVER A SUBSET OF FIELDS
Methods and apparatus are disclosed for comparing relevant content of datasets. A hash value is computed over a selected subset of fields to obtain a signature of a dataset, with other fields being disregarded. A hash value can be computed directly for all records of the dataset, or by combining individual hash values for each record. Comparison of the signature with that of other datasets leads to efficient determination whether two datasets match with respect to relevant content in the selected fields. For larger groups of datasets, lists of matched and mismatched datasets can be reported. Optional features include matches insensitive to permutation of the records, or identification of which records in a dataset fail to match.
IN-MEMORY EFFICIENT MULTISTEP SEARCH
A cascading search system includes an associative memory array, a similarity match processor and an exact match processor. The columns of the array store a plurality of multiportion data vectors and have a first section, for a first portion of a vector, a second section for storing a second portion of a vector and a match row. The similarity match processor performs a parallel similarity search of a similarity query in the first sections and stores a match bit indication in the match row of the column. Each match bit indication indicates if its column has a first portion which matches the similarity query. The exact match processor performs an exact search in parallel in the second section of each similarity matched column whose match bit indication indicates a match of its first section and outputs those similarity matched columns whose second portions match the exact query.
SYSTEM AND METHOD FOR MICRO-CODED SEARCHING PLATFORM
Systems and methods are described that provide a backend micro-code architecture and a front-end user agent. For example, the user agent may accept an instruction that contains one or more components of an opcode. The backend system may receive the instruction and provide it to a bifurcated process. The first part of the process can decode the instruction and execute a series of search queries that correspond with the instruction. The second part of the process can receive the search results, create a data model/script that can be read by the user agent, and return/embed the data model/script to the user agent. The user agent may search the data model locally at the user device to reduce the number of electronic communications between the backend and front-end. The user agent can enable the user to dynamically create a new search by selecting different combination of the five components of an opcode.
Delayed processing for electronic data messages in a distributed computer system
A computer system is provided that includes a matching engine and a freezing process. The matching engine freezes one side of a two-sided data structure when an order is determined to matchable. The freezing process starts a timer based on the matching determination. Orders that are handled by the matching engine while the side is frozen are added to a queue. When the timer ends, the orders in the queue are processed against those orders that are now resting within the data structure.
Ranking data assets for processing natural language questions based on data stored across heterogeneous data sources
An analysis system connects to a set of data sources and perform natural language questions based on the data sources. The analysis system connects with the data sources and retrieves metadata describing data assets stored in each data source. The analysis system generates an execution plan for the natural language question. The analysis system finds data assets that match the received question based on the metadata. The analysis system ranks the data assets and presents the ranked data assets to users for allowing users to modify the execution plan. The analysis system may use execution plans of previously stored questions for executing new questions. The analysis system supports selective preprocessing of data to increase the data quality.
Storing feature sets using semi-structured data storage
The subject technology receives, by a database system, raw input data from a source table provided by a machine learning development environment, the source table comprising multiple rows where each row includes multiple columns, the raw input data comprising values in a first format, the values comprising input features corresponding to datasets included in the raw input data for machine learning models, the machine learning development environment comprising an external system from the database system and is accessed by a plurality of different users that are external to the database system. The subject technology generates cell data for a feature store table based at least in part on the values from the source table. The subject technology performs at least one database operation to generate the feature store table including at least table metadata, column metadata, and the generated cell data.