Patent classifications
G06N3/006
Action selection by reinforcement learning and numerical optimization
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a method comprises, at each of one or more time steps: generating a respective action score for each action in a set of possible actions, wherein the set of possible actions comprises: (i) a plurality of atomistic actions, and (ii) one or more optimization actions, wherein each optimization action is associated with a respective objective function that measures performance of the agent on a corresponding auxiliary task; selecting an action from the set of possible actions in accordance with the action scores, wherein the selected action is an optimization action; in response to selecting the optimization action, performing a numerical optimization to identify a sequence of one or more atomistic actions that are predicted to optimize the objective function.
Continuous machine learning for extracting description of visual content
Aspects of the present disclosure relate to machine learning techniques for continuous implementation and training of a machine learning system for identifying the natural language meaning of visual content. A computer vision model or other suitable machine learning model can predict whether a given descriptor is associated with the visual content. A set of such models can be used to determine whether particular ones of a set of descriptors are associated with the visual content, with the determined descriptors representing a meaning of the visual content. This meaning can be refined based on a multi-armed bandit tracking and analyzing interactions between the visual content and users associated with certain personas related to the determined descriptors.
SECURITY STATUS BASED ON HIDDEN INFORMATION
Techniques for determining and presenting security status are described herein. The disclosed techniques include collecting information associated with an item; determining a security status associated with the item by classifying the item into one of a plurality of classifications based on the information associated with the item; presenting on a first interface information indicative of the security status, wherein the first interface further comprises at least one selectable interface element in relation to the information indicative of the security status; and performing an operation related to the item in response to receiving input indicative of a selection by a user of the at least one selectable interface element.
COMBINING MATH-PROGRAMMING AND REINFORCEMENT LEARNING FOR PROBLEMS WITH KNOWN TRANSITION DYNAMICS
A computer implemented method of improving parameters of a critic approximator module includes receiving, by a mixed integer program (MIP) actor, (i) a current state and (ii) a predicted performance of an environment from the critic approximator module. The MIP actor solves a mixed integer mathematical problem based on the received current state and the predicted performance of the environment. The MIP actor selects an action a and applies the action to the environment based on the solved mixed integer mathematical problem. A long-term reward is determined and compared to the predicted performance of the environment by the critic approximator module. The parameters of the critic approximator module are iteratively updated based on an error between the determined long-term reward and the predicted performance.
Method and system for distributed learning and adaptation in autonomous driving vehicles
The present teaching relates to system, method, medium for in-situ perception in an autonomous driving vehicle. A plurality of types of sensor data acquired continuously by a plurality of types of sensors deployed on the vehicle are first received, where the plurality of types of sensor data provide information about surrounding of the vehicle. Based on at least one model, one or more items are tracked from a first of the plurality of types of sensor data acquired by one or more of a first type of the plurality of types of sensors, wherein the one or more items appear in the surrounding of the vehicle. At least some of the one or more items are then automatically labeled on-the-fly via either cross modality validation or cross temporal validation of the one or more items and are used to locally adapt, on-the-fly, the at least one model in the vehicle.
Methods and systems of industrial processes with self organizing data collectors and neural networks
Systems and methods for data collection for an industrial heating process are disclosed. The system according to one embodiment can include a plurality of data collectors, including a swarm of self-organized data collector members, wherein the swarm of self-organized data collector members organize to enhance data collection based on at least one of capabilities and conditions of the data collector members of the swarm, and wherein the plurality of data collectors is coupled to a plurality of input channels for acquiring collected data relating to the industrial heating process, and a data acquisition and analysis circuit for receiving the collected data via the plurality of input channels and structured to analyze the received collected data using a neural network to monitor a plurality of conditions relating to the industrial heating process.
FEEDBACK-UPDATED DATA RETRIEVAL CHATBOT
A computer retrieves data from a database. The computer retrieves a Machine Learning (ML) model trained to generate database queries. The computer applies the ML model to generate a primary database query based, at least in part, on a user inquiry available to the computer. The computer retrieves the primary database query, an initial set of data from a database available to the computer. The computer, in response to retrieving the initial set of data, receives feedback assessing the initial set of data. The computer, in response to receiving the feedback, applies a Natural Language Processing (NLP) model to identify query adjustment content within the feedback. The computer revises the primary database query based, at least in part, on the model adjustment content, to generate a secondary database query. The computer retrieves using the secondary database query, a secondary set of data from the database.
Predicting yielding likelihood for an agent
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting how likely it is that a target agent in an environment will yield to another agent when the pair of agents are predicted to have overlapping future paths. In one aspect, a method comprises obtaining a first trajectory prediction specifying a predicted future path for a target agent in an environment; obtaining a second trajectory prediction specifying a predicted future path for another agent in the environment; determining that, at an overlapping region, the predicted future path for the target agent overlaps with the predicted future path for the other agent; and in response: providing as input to a machine learning model respective features for the target agent and the other agent; and obtaining the likelihood score as output from the machine learning model.
Subset conditioning using variational autoencoder with a learnable tensor train induced prior
The proposed model is a Variational Autoencoder having a learnable prior that is parametrized with a Tensor Train (VAE-TTLP). The VAE-TTLP can be used to generate new objects, such as molecules, that have specific properties and that can have specific biological activity (when a molecule). The VAE-TTLP can be trained in a way with the Tensor Train so that the provided data may omit one or more properties of the object, and still result in an object with a desired property.
Information provision device, information provision method, and program
To enable provision of appropriate information for a user query even in a case there are multiple information provision modules which are different in answer generation processing. A query sending unit 212 sends a user query to each one of a plurality of information provision module units 220 that are different in the answer generation processing and that each generate an answer candidate for the user query. An output control unit 214 performs control such that the answer candidate acquired from each one of the plurality of information provision module units 220 is displayed on a display unit 300 on a per-agent basis with information on an agent associated with that information provision module unit 220.