Patent classifications
G06N3/092
Machine learning in resource-constrained environments
In one embodiment, a method includes receiving a request to determine whether to perform an action, wherein the action is based on one or more feature values, generating a prediction of whether to perform the action, wherein the prediction is generated using a machine-learning model that is trained based on the feature values, a heuristic value based on the feature values, and one or more feedback scores based on corresponding past predictions generated by the machine-learning model, where the heuristic value indicates whether to perform the action based on one or more predetermined conditions that are based on the feature values, performing the action when the prediction indicates that the action is to be performed, receiving a feedback score that indicates a level of effectiveness of the prediction, and updating the machine-learning model based on the feedback score, the feature values, and the heuristic value.
Image sensor having on-chip compute circuit
In one example, an apparatus comprises: a first sensor layer, including an array of pixel cells configured to generate pixel data; and one or more semiconductor layers located beneath the first sensor layer with the one or more semiconductor layers being electrically connected to the first sensor layer via interconnects. The one or more semiconductor layers comprises on-chip compute circuits configured to receive the pixel data via the interconnects and process the pixel data, the on-chip compute circuits comprising: a machine learning (ML) model accelerator configured to implement a convolutional neural network (CNN) model to process the pixel data; a first memory to store coefficients of the CNN model and instruction codes; a second memory to store the pixel data of a frame; and a controller configured to execute the codes to control operations of the ML model accelerator, the first memory, and the second memory.
TRAINING ACTION SELECTION NEURAL NETWORKS USING APPRENTICESHIP
An off-policy reinforcement learning actor-critic neural network system configured to select actions from a continuous action space to be performed by an agent interacting with an environment to perform a task. An observation defines environment state data and reward data. The system has an actor neural network which learns a policy function mapping the state data to action data. A critic neural network learns an action-value (Q) function. A replay buffer stores tuples of the state data, the action data, the reward data and new state data. The replay buffer also includes demonstration transition data comprising a set of the tuples from a demonstration of the task within the environment. The neural network system is configured to train the actor neural network and the critic neural network off-policy using stored tuples from the replay buffer comprising tuples both from operation of the system and from the demonstration transition data.
ADAPTIVE LOOKAHEAD FOR PLANNING AND LEARNING
A method is performed by an agent operating in an environment. The method comprises computing a first value associated with each state of a number of states in the environment, determining a lookahead horizon for each state of the number of states in the environment based on the computed first value for each state of the number of states, applying a first policy to compute a second value associated with each state of at least one state in the number of states in the environment for the at least one state in the number of states based on the determined lookahead horizons for the number of states, and determining a second policy based on the first policy and the second value for each state of the number of states in the environment.
ADAPTIVE LOOKAHEAD FOR PLANNING AND LEARNING
A method is performed by an agent operating in an environment. The method comprises computing a first value associated with each state of a number of states in the environment, determining a lookahead horizon for each state of the number of states in the environment based on the computed first value for each state of the number of states, applying a first policy to compute a second value associated with each state of at least one state in the number of states in the environment for the at least one state in the number of states based on the determined lookahead horizons for the number of states, and determining a second policy based on the first policy and the second value for each state of the number of states in the environment.
POLICY LEARNING METHOD, POLICY LEARNING APPARATUS, AND PROGRAM
A policy learning apparatus of the present invention includes: a first unit configured to select a first action element based on a selection rate for each of choices of the first element whose number of choices does not depend on a state; a second unit configured to apply the selected first action element and further apply each of choices of a second action element whose number of choices depends on the state to obtain another state for each of the choices, and determine the other state based on a reward obtained by shifting to the other state and a value of the other state; and a third unit configured to further learn a model by using learning data generated based on information used when determining the other state.
MEMORY-AUGMENTED GRAPH CONVOLUTIONAL NEURAL NETWORKS
System and method for processing a graph that defines a set of nodes and a set of edges, the nodes each having an associated set of node attributes, the edges each representing a relationship that connects two respective nodes, comprising: generating a first node embedding for each node by: generating, for the node and each of a plurality of neighbour nodes, a respective first edge attribute defining a respective relationship type between the node and the neighbour node based on the node attributes of the node and the node attributes of the neighbour node; generating a first neighborhood vector that aggregates information from the generated first edge attributes and the node attributes of the neighbour nodes; generating the first node embedding based on the node attributes of the node and the generated first neighborhood vector.
Swarm Based Orchard Management
A method and system provide the ability to manage an orchard. Sensor data that represents a first state of the orchard is captured via one or more sensors. The sensor data is captured as the one or more sensors are traveling through the orchard. An almanac is maintained. The almanac provides a state library of sequential states of a representative orchard and a task library for one or more tasks to be performed to transition between the sequential states. A task manager queries the almanac to identify a first task of the one or more tasks and allocates the first task to one or more robots that perform the first task.
METHOD AND APPARATUS FOR GENERATING PROCESS SIMULATION MODELS
A method of generating a simulation model based on simulation data and measurement data of a target includes classifying weight parameters, included in a pre-learning model learned based on the simulation data, as a first weight group and a second weight group based on a degree of significance, retraining the first weight group of the pre-learning model based on the simulation data, and training the second weight group of a transfer learning model based on the measurement data, wherein the transfer learning model includes the first weight group of the pre-learning model retrained based on the simulation data.
METHODS AND DECENTRALIZED SYSTEMS THAT EMPLOY DISTRIBUTED MACHINE LEARNING TO AUTOMATICALLY INSTANTIATE AND MANAGE DISTRIBUTED APPLICATIONS
The current document is directed to methods and systems that automatically instantiate complex distributed applications by deploying distributed-application instances across the computational resources of one or more distributed computer systems and that automatically manage instantiated distributed applications. Automatic deployment of multiple instances of a distributed application across computational resources, such as distribution of microservices of a microservice-based application across one or more distributed computer systems, and scaling of instantiated distributed applications are computationally difficult optimization problems that are not amenable to traditional centralized approaches. The current document discloses decentralized, distributed automated methods and systems that instantiate and manage distributed applications. Reinforcement-learning-based agents are installed within the computational resources of one or more distributed computer systems. Distributed-application instances are initially distributed to one or more agents. The agents then exchange distributed-application instances among themselves in order to locally optimize the set of distributed-application instances that they each manage.