G06N3/092

Action selection for reinforcement learning using a manager neural network that generates goal vectors defining agent objectives

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

Method and system for on-the-fly object labeling via cross modality validation in autonomous driving vehicles

The present teaching relates to method, system, medium, and implementation of in-situ perception in an autonomous driving vehicle. A plurality of types of sensor data are acquired continuously via a plurality of types of sensors deployed on the vehicle, where the plurality of types of sensor data provide information about surrounding of the vehicle. One or more items surrounding the vehicle are tracked, based on some models, from a first of the plurality of types of sensor data from a first type of the plurality of types of sensors. A second of the plurality of types of sensor data are obtained from a second type of the plurality of sensors and are used to generate validation base data. Some of the one or more items are labeled, automatically, via validation base data to generate labeled at least some item, which is to be used to generate model updated information for updating the at least one model.

METHOD FOR MULTI-TIME SCALE VOLTAGE QUALITY CONTROL BASED ON REINFORCEMENT LEARNING IN A POWER DISTRIBUTION NETWORK
20220405633 · 2022-12-22 ·

A method for multi-time scale reactive voltage control based on reinforcement learning in a power distribution network is provided, which relates to the field of power system operation and control. The method includes: constituting an optimization model for multi-time scale reactive voltage control in a power distribution network based on a reactive voltage control object of a slow discrete device and a reactive voltage control object of a fast continuous device in the power distribution network; constructing a hierarchical interaction training framework based on a two-layer Markov decision process based on the model; setting a slow agent for the slow discrete device and setting a fast agent for the fast continuous device; and deciding action values of the controlled devices by each agent based on measurement information inputted, so as to realize the multi-time scale reactive voltage control while the slow agent and the fast agent perform continuous online learning.

INVERSE REINFORCEMENT LEARNING-BASED DELIVERY MEANS DETECTION APPARATUS AND METHOD
20220405682 · 2022-12-22 ·

In an inverse reinforcement learning-based delivery means detection apparatus and method according to a preferred embodiment of the present invention, an artificial neural network model may be trained by using an actual deliveryman's driving record and imitated driving record, and from a specific deliveryman's driving record, a delivery means of the corresponding deliveryman may be detected by using the trained artificial neural network model, so that a deliveryman suspected of being abusive may be identified.

SYSTEM AND METHOD FOR RISK SENSITIVE REINFORCEMENT LEARNING ARCHITECTURE
20220405643 · 2022-12-22 ·

A computer-implemented system and method for training an auomated agent are disclosed. An example system includes: a communication interface; at least one processor; memory in communication with said at least one processor; software code stored in said memory, which when executed causes said system to: instantiate an automated agent that maintains a reinforcement learning neural network and generates, according to outputs of said reinforcement learning neural network, signals for communicating task requests; receive a plurality of states and a plurality of actions for the automated agent; initialize a learning table Q for the automated agent based on the plurality of states and the plurality of actions; compute a plurality of updated learning tables based on the initialized learning table Q using a utility function, the utility function comprising a monotonically increasing concave function; and generate an averaged learning table Q′ based on the plurality of updated learning tables.

Autonomous Behavior Generation for Aircraft Using Augmented and Generalized Machine Learning Inputs
20220404831 · 2022-12-22 ·

An example method for training a machine learning algorithm (MLA) to control a first aircraft in an environment that comprises the first aircraft and a second aircraft can involve: determining a first-aircraft action for the first aircraft to take within the environment; sending the first-aircraft action to a simulated environment; generating and sending to both the simulated environment and the MLA, randomly-sampled values for each of a set of parameters of the second aircraft different from predetermined fixed values for the set of parameters; receiving an observation of the simulated environment and a reward signal at the MLA, the observation including information about the simulated environment after the first aircraft has taken the first-aircraft action and the second aircraft has taken a second-aircraft action based on the randomly-sampled values; and updating the MLA based on the observation of the simulated environment, the reward signal, and the randomly-sampled values.

MACHINE LEARNING FOR TRAINING NLP AGENT

A computer-implemented process for training a natural language processing (NLP) agent having a reinforced learning model includes the following operations. A type of document from a document corpus is identified using metadata particularly associated with the document. The NLP agent tokenizes the document to generate a plurality of tokens. Using a schema identified from the type of the document, one of the plurality of tokens is compared to a system of record (SOR) field from the schema. A similarity score between the one of the plurality of tokens with a correct value and a reward based upon the similarity score are generated. A determination is made that an optimum minimum average similarity rate has not been obtained. Based upon the determination, the reinforced learning model is trained using a loss function that includes the reward.

REINFORCEMENT LEARNING DEVICE AND OPERATION METHOD THEREOF
20220405642 · 2022-12-22 ·

A reinforcement learning device includes a computation circuit configured to perform an operation between a weight matrix and an input activation vector and to apply an activation function on an output of the operation to generate an output activation vector. The computation circuit quantizes the input activation vector when a quantization delay time has elapsed since beginning of a learning operation and does not quantize the input activation vector otherwise.

DYNAMIC ADJUSTMENT OF PARALLEL REALITY DISPLAYS

In an approach for dynamically adjusting parallel reality (PR) displays, a processor configures a viewing event. A processor receives data from data collecting devices located throughout a location of the viewing event. A processor classifies a crowd of the viewing event into at least two partitions using a learning-based neural network that ingests the data. A processor selects content to be displayed to each of the at least two partitions. A processor enables a PR display to simultaneously display the content to each of the at least two partitions.

MULTI-LAYER NEURAL NETWORK SYSTEM AND METHOD

A computer-implemented method comprising: obtaining an input; processing the input using a neural network comprising a plurality of layers, comprising: calculating, using a first one or more layers of the plurality of layers, a first intermediate output from the input; reducing a size of one or more dimensions of the first intermediate output; calculating, using a second one or more layers, a second intermediate output from the first intermediate output, the second one or more layers comprising one or more ultra-low precision layers; reducing a size of one or more dimensions of the second intermediate output: combining a plurality of reduced intermediate outputs to derive a combined intermediate output, wherein the plurality of reduced intermediate outputs comprise the reduced first intermediate output and the reduced second intermediate output; and calculating, using one or more higher-precision layers of the plurality of layers, a neural network output using the combined intermediate output; and outputting an output based on the neural network output.