G06F16/00

Visual search refinement
11714865 · 2023-08-01 · ·

Described is a system and method for enabling visual search for information. With each selection of a search term, additional search terms are dynamically selected and presented to the user in conjunction with results matching the currently selected search terms. Likewise, a selected search term may be tokenized and a graphical token presented to the user to represent the selected search term.

Using multiple trained models to reduce data labeling efforts

A method of labeling a dataset of input samples for a machine learning task includes selecting a plurality of pre-trained machine learning models that are related to a machine learning task. The method further includes processing a plurality of input data samples through each of the pre-trained models to generate a set of embeddings. The method further includes generating a plurality of clusterings from the set of embeddings. The method further includes analyzing, by a processing device, the plurality of clusterings to extract superclusters. The method further includes assigning pseudo-labels to the input samples based on analysis.

Adaptive document understanding

An approach is provided in which a method, system, and program create a plurality of page clusters in feature space from a plurality of feature vectors corresponding to a plurality of unstructured pages. The method, system, and program product assign one of a plurality of machine learning models to each one of the plurality of page clusters based on a relationship in the feature space between the plurality of page clusters and a plurality of training clusters corresponding to the plurality of machine learning models. The method, system, and program product identify one of the plurality of page clusters that corresponds to a selected one of the plurality of unstructured pages, and transform the selected unstructured page into a structured page using a selected one of the plurality of machine learning models assigned to the identified page cluster.

Systems and methods for resource-efficient data collection for multi-stage ranking systems
11568309 · 2023-01-31 · ·

Systems, methods, and non-transitory computer-readable media can receive a set of candidate training items for training an early stage model in a multi-stage recall optimization model, wherein the multi-stage recall optimization model comprises the early stage model and a target model. A random subset of the candidate training items is selected from the set of candidate training items. For each training item in the subset of candidate training items, a score is determined based on the target model. Each training item in the subset of candidate training items is labeled with a label based on a probability of the training item being a top-K of the set of candidate training items had the set of candidate training items been scored based on the target model.

System and method for user interactive contextual model classification based on metadata

A system and a method for contextual categorization of data comprises a server having a processor and a non-transitory computer-readable storage medium in electronic communication with the processor and comprising program instructions executable by the processor to access an initial inventory of data set and metadata associated with the initial inventory of data set. The system is then configured to classify the initial inventory of data set by using the metadata into (a) reduced set of data comprising high level sensitivity classification and (b) a remainder data set. The system and method can be further configured for contextual categorization of data that involves receiving an initial data set to be categorized; establishing a library of contextual classifiers, the library comprising (1) a set of predetermined high level sensitivity classifications and (2) a set of user-generated business-specific sensitivity classifications subordinated below the high level sensitivity classifications; identifying and removing redundant, outdated, trivial or abandoned (ROTA) data from the initial data set to create a reduced data set and a remainder data set of ROTA data; applying the user-generated business-specific sensitivity classifications to the reduced data set to create a first set of classified data and a second set of unclassified data; and iteratively applying additional user-generated business-specific sensitivity classifications to the both the first set of classified data and the second set of unclassified data until all data in the reduced data set has been classified in exactly one use-generated business-specific sensitivity classification.

Customized digital content generation systems and methods

The invention provides in some aspects a method, executed on a digital data processing system, of mass generation of customized digital content that includes continuously identifying current external events taken by or with respect to a plurality of respective prospective targets and, upon identification of such an event, generating a set of actions, each identifying a digital content piece and a digital delivery mechanism therefor. Each action is generated, according to the method, based on the current identified events for a particular prospective target and on a database of information about prior events taken by or with respect to him/her. The sets of actions are queued upon generation and continuously retrieved on a first-in-first-out basis. And, upon retrieval, an action for generation of digital content for the respective prospective target is selected for transmittal from the set based on quotas associated with that target and/or the delivery mechanism identified for it per the selected action.

Methods and apparatus to improve data training of a machine learning model using a field programmable gate array

Methods, apparatus, systems, and articles of manufacture are disclosed to improve data training of a machine learning model using a field-programmable gate array (FPGA). An example system includes one or more computation modules, each of the one or more computation modules associated with a corresponding user, the one or more computation modules training first neural networks using data associated with the corresponding users, and FPGA to obtain a first set of parameters from each of the one or more computation modules, the first set of parameters associated with the first neural networks, configure a second neural network based on the first set of parameters, execute the second neural network to generate a second set of parameters, and transmit the second set of parameters to the first neural networks to update the first neural networks.

Methods and apparatus to improve data training of a machine learning model using a field programmable gate array

Methods, apparatus, systems, and articles of manufacture are disclosed to improve data training of a machine learning model using a field-programmable gate array (FPGA). An example system includes one or more computation modules, each of the one or more computation modules associated with a corresponding user, the one or more computation modules training first neural networks using data associated with the corresponding users, and FPGA to obtain a first set of parameters from each of the one or more computation modules, the first set of parameters associated with the first neural networks, configure a second neural network based on the first set of parameters, execute the second neural network to generate a second set of parameters, and transmit the second set of parameters to the first neural networks to update the first neural networks.

Method, device and computer program product for information processing

According to embodiments of the present disclosure, a method, device and computer program product for information processing are proposed. The method comprises: obtaining identification information of a shard of metadata at a first node of a blockchain-based metadata management system; determining, based on similarities of the identification information of the shard and identification information of candidate nodes of the metadata management system, a second node for positioning the shard from the candidate nodes; and enabling the second node to process the identification information of the shard, to manage storage of the shard in the metadata management system. Therefore, the present solution can improve efficiency, security and robustness of the metadata management system.

HASH-BASED IDENTIFICATION OF DATA CORRUPTION ISSUES IN TIME-SERIES DATA
20230025284 · 2023-01-26 ·

An apparatus includes a memory and a processor. The memory stores a time-series of data sets, and a first version of a data structure generated from the time-series as it existed at a first time. The data structure includes a bottom level of nodes, and subsequent levels of nodes, ending with a top level terminal node. Each bottom level node stores a hash of an assigned time-series data set. Each node of each subsequent level stores data generated from an assigned group of nodes of a previous level. The processor receives a validation request. In response, the processor generates a second version of the data structure based on the time-series as it exists at a second time. The processor determines that the terminal nodes in the first and second versions of the data structure do not match. In response, the processor generates an alert.