G06F18/2185

Annotation pipeline for machine learning algorithm training and optimization

Techniques are provided for enhancing the efficiency and accuracy of annotating data samples for supervised machine learning algorithms using an advanced annotation pipeline. According to an embodiment, a method can comprise collecting, by a system comprising a processor, unannotated data samples for input to a machine learning model and storing the unannotated data samples in an annotation queue. The method further comprises determining, by the system, annotation priority levels for respective unannotated data samples of the unannotated data samples, selecting, by the system from amongst different annotation techniques, one or more of the different annotation techniques for annotating the respective unannotated data samples based the annotation priority levels associated with the respective unannotated data samples.

Off-policy control policy evaluation

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for off-policy evaluation of a control policy. One of the methods includes obtaining policy data specifying a control policy for controlling a source agent interacting with a source environment to perform a particular task; obtaining a validation data set generated from interactions of a target agent in a target environment; determining a performance estimate that represents an estimate of a performance of the control policy in controlling the target agent to perform the particular task in the target environment; and determining, based on the performance estimate, whether to deploy the control policy for controlling the target agent to perform the particular task in the target environment.

Machine learned historically accurate temporal classification of objects

A method, computer program product, and a system where a processor(s) ingests content from a source(s) with an attribute(s) comprising a verified temporal context(s) of the source(s). The processor(s) cognitively analyzes the content, by applying an entity recognition algorithm(s) to identify and extract entities in the source(s). The processor(s) classifies each extracted entity into a given grouping from a plurality of groupings based on at least one attribute comprising the verified temporal context of the source from which the extracted entity was extracted. The processor(s) generates a corpus comprising the groupings; each grouping comprises extracted entities with verified temporal contexts consistent with a defined time period.

Automated resolution of over and under-specification in a knowledge graph

Systems and methods for automated resolution of over-specification and under-specification in a knowledge graph are disclosed. In embodiments, a method includes: determining, by a computing device, that a size of an object cluster of a knowledge graph meets a threshold value indicating under-specification of a knowledge base of the knowledge graph; determining, by the computing device, sub-classes for objects of the knowledge graph; re-initializing, by the computing device, the knowledge graph based on the sub-classes to generate a refined knowledge graph, wherein the size of the object cluster is reduced in the refined knowledge graph; and generating, by the computing device, an output based on information determined from the refined knowledge graph.

Distributed and redundant machine learning quality management

Provided is a process including: writing modelling-object classes using object-oriented modelling of the modelling methods, the modelling-object classes being members of a set of class libraries; writing quality-management classes using object-oriented modelling of quality management, the quality-management classes being members of the set of class libraries; scanning modelling-object classes in the set of class libraries to determine modelling-object class definition information; scanning quality-management classes in the set of class libraries to determine quality-management class definition information; using the modelling-object class definition information and the quality-management class definition information to produce object manipulation functions that allow a quality management system to access methods and attributes of modelling-object classes to manipulate objects of the modelling-object classes; and using the modelling-object class definition information and the quality-management class definition information to produce access to the object manipulation functions.

Database replication error recovery based on supervised learning

System and methods are described for automated recovery from errors occurring during replication of a database. The method includes getting text from one or more log files generated during database replication processing in a cloud computing environment, transforming the text into a structured language form represented by vectors, and identifying patterns in the vectors. The method further includes classifying one or more errors based on the identified patterns using supervised learning as either a recoverable error or an unrecoverable error, analyzing the one or more errors to determine one or more recovery jobs associated with database replication processing in the cloud computing environment for each of the recoverable errors, and invoking the one or more recovery jobs.

Sampling-based preview mode for a data intake and query system
11599549 · 2023-03-07 · ·

Systems and methods are described for providing a user interface through which a user can program operation of a data processing pipeline by specifying a graph of nodes that transform data and interconnections that designate routing of data between individual nodes within the graph. In response to a user request, a preview mode can be activated that causes the data processing pipeline to retrieve data from at least one source specified by the graph, transform the data according to the nodes of the graph, sample the transformed data, and display the sampling of the transformed data to at least one node without writing the transformed data to at least one destination specified by the graph.

System capable of establishing model for cardiac ventricular hypertrophy screening
11476004 · 2022-10-18 · ·

A system for establishing a model for cardiac ventricular hypertrophy (VH) screening includes a storage and a processor. The storage stores multiple pieces of subject data respectively associated with multiple subjects. Each of the pieces of subject data contains a basic physiological parameter group, an electrocardiographic parameter group, and an actual VH condition that corresponds to a left or right ventricle of the subject associated with the piece of subject data. The processor is electrically connected to the storage, splits the pieces of subject data into a training set and a test set, and establishes the model for VH screening based on the pieces of subject data in the training set by using machine learning techniques.

Anomaly and outlier explanation generation for data ingested to a data intake and query system
11475024 · 2022-10-18 · ·

Systems and methods are described for processing ingested data, detecting anomalies in the ingested data, and providing explanations of a possible cause of the detected anomalies as the data is being ingested. For example, a token or field in the ingested data may have an anomalous value. Tokens or fields from another portion of the ingested data can be extracted and analyzed to determine whether there is any correlation between the values of the extracted tokens or fields and the anomalous token or field having an anomalous value. If a correlation is detected, this information can be surfaced to a user.

Solution to end-to-end feature engineering automation

Aspects of the present disclosure involve systems, methods, devices, and the like for an end-to-end solution to auto-identifying features. In one embodiment, a novel architecture is presented that enables the identification of optimal features and feature processes for use by a machine learning model. The novel architecture introduces a feature orchestrator for managing, routing, and retrieving the data and features associated with analytical job request. The novel architecture also introduces a feature store designed to identify, rank, and store the features and data used in the analysis. To aid in identifying the optimal features and feature processes, a training system may also be included in the solution which can perform some of the training and scoring of the features.