G05B2219/39289

Reduced degree of freedom robotic controller apparatus and methods

Apparatus and methods for training and controlling of, for instance, robotic devices. In one implementation, a robot may be trained by a user using supervised learning. The user may be unable to control all degrees of freedom of the robot simultaneously. The user may interface to the robot via a control apparatus configured to select and operate a subset of the robot's complement of actuators. The robot may comprise an adaptive controller comprising a neuron network. The adaptive controller may be configured to generate actuator control commands based on the user input and output of the learning process. Training of the adaptive controller may comprise partial set training. The user may train the adaptive controller to operate first actuator subset. Subsequent to learning to operate the first subset, the adaptive controller may be trained to operate another subset of degrees of freedom based on user input via the control apparatus.

Apparatus and methods for operating robotic devices using selective state space training

Apparatus and methods for training and controlling of e.g., robotic devices. In one implementation, a robot may be utilized to perform a target task characterized by a target trajectory. The robot may be trained by a user using supervised learning. The user may interface to the robot, such as via a control apparatus configured to provide a teaching signal to the robot. The robot may comprise an adaptive controller comprising a neuron network, which may be configured to generate actuator control commands based on the user input and output of the learning process. During one or more learning trials, the controller may be trained to navigate a portion of the target trajectory. Individual trajectory portions may be trained during separate training trials. Some portions may be associated with robot executing complex actions and may require additional training trials and/or more dense training input compared to simpler trajectory actions.

MACHINE LEARNING METHODS AND APPARATUS RELATED TO PREDICTING MOTION(S) OF OBJECT(S) IN A ROBOT'S ENVIRONMENT BASED ON IMAGE(S) CAPTURING THE OBJECT(S) AND BASED ON PARAMETER(S) FOR FUTURE ROBOT MOVEMENT IN THE ENVIRONMENT
20220063089 · 2022-03-03 ·

Some implementations of this specification are directed generally to deep machine learning methods and apparatus related to predicting motion(s) (if any) that will occur to object(s) in an environment of a robot in response to particular movement of the robot in the environment. Some implementations are directed to training a deep neural network model to predict at least one transformation (if any), of an image of a robot's environment, that will occur as a result of implementing at least a portion of a particular movement of the robot in the environment. The trained deep neural network model may predict the transformation based on input that includes the image and a group of robot movement parameters that define the portion of the particular movement.

DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION

Using large-scale reinforcement learning to train a policy model that can be utilized by a robot in performing a robotic task in which the robot interacts with one or more environmental objects. In various implementations, off-policy deep reinforcement learning is used to train the policy model, and the off-policy deep reinforcement learning is based on self-supervised data collection. The policy model can be a neural network model. Implementations of the reinforcement learning utilized in training the neural network model utilize a continuous-action variant of Q-learning. Through techniques disclosed herein, implementations can learn policies that generalize effectively to previously unseen objects, previously unseen environments, etc.

DATA-EFFICIENT HIERARCHICAL REINFORCEMENT LEARNING
20210187733 · 2021-06-24 ·

Training and/or utilizing a hierarchical reinforcement learning (HRL) model for robotic control. The HRL model can include at least a higher-level policy model and a lower-level policy model. Some implementations relate to technique(s) that enable more efficient off-policy training to be utilized in training of the higher-level policy model and/or the lower-level policy model. Some of those implementations utilize off-policy correction, which re-labels higher-level actions of experience data, generated in the past utilizing a previously trained version of the HRL model, with modified higher-level actions. The modified higher-level actions are then utilized to off-policy train the higher-level policy model. This can enable effective off-policy training despite the lower-level policy model being a different version at training time (relative to the version when the experience data was collected).

MITIGATING REALITY GAP THROUGH SIMULATING COMPLIANT CONTROL AND/OR COMPLIANT CONTACT IN ROBOTIC SIMULATOR
20210107157 · 2021-04-15 ·

Mitigating the reality gap through utilization of technique(s) that enable compliant robotic control and/or compliant robotic contact to be simulated effectively by a robotic simulator. The technique(s) can include, for example: (1) utilizing a compliant end effector model in simulated episodes of the robotic simulator; (2) using, during the simulated episodes, a soft constraint for a contact constraint of a simulated contact model of the robotic simulator; and/or (3) using proportional derivative (PD) control in generating joint control forces, for simulated joints of the simulated robot, during the simulated episodes. Implementations additionally or alternatively relate to determining parameter(s), for use in one or more of the techniques that enable effective simulation of compliant robotic control and/or compliant robotic contact.

CONFIGURING A SYSTEM WHICH INTERACTS WITH AN ENVIRONMENT
20200333752 · 2020-10-22 ·

A system is described for configuring another system, e.g., a robotics system. The other system interacts with an environment according to a deterministic policy by repeatedly obtaining, from a sensor, sensor data indicative of a state of the environment, determining a current action, and providing, to an actuator, actuator data causing the actuator to effect the current action in the environment. To configure the other system, the system optimizes a loss function based on an accumulated reward distribution with respect to a set of parameters of the policy. The accumulated reward distribution includes an action probability of an action of a previous interaction log being performed according to the current set of parameters. The action probability is approximated using a probability distribution defined by an action selected by the deterministic policy according to the current set of parameters.

REDUCED DEGREE OF FREEDOM ROBOTIC CONTROLLER APPARATUS AND METHODS
20200139540 · 2020-05-07 ·

Apparatus and methods for training and controlling of, for instance, robotic devices. In one implementation, a robot may be trained by a user using supervised learning. The user may be unable to control all degrees of freedom of the robot simultaneously. The user may interface to the robot via a control apparatus configured to select and operate a subset of the robot's complement of actuators. The robot may comprise an adaptive controller comprising a neuron network. The adaptive controller may be configured to generate actuator control commands based on the user input and output of the learning process. Training of the adaptive controller may comprise partial set training. The user may train the adaptive controller to operate first actuator subset. Subsequent to learning to operate the first subset, the adaptive controller may be trained to operate another subset of degrees of freedom based on user input via the control apparatus.

Self-supervised robotic object interaction

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an object representation neural network. One of the methods includes obtaining training sets of images, each training set comprising: (i) a before image of a before scene of the environment, (ii) an after image of an after scene of the environment after the robot has removed a particular object, and (iii) an object image of the particular object, and training the object representation neural network on the batch of training data, comprising determining an update to the object representation parameters that encourages the vector embedding of the particular object in each training set to be closer to a difference between (i) the vector embedding of the after scene in the training set and (ii) the vector embedding of the before scene in the training set.

SELF-SUPERVISED ROBOTIC OBJECT INTERACTION

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an object representation neural network. One of the methods includes obtaining training sets of images, each training set comprising: (i) a before image of a before scene of the environment, (ii) an after image of an after scene of the environment after the robot has removed a particular object, and (iii) an object image of the particular object, and training the object representation neural network on the batch of training data, comprising determining an update to the object representation parameters that encourages the vector embedding of the particular object in each training set to be closer to a difference between (i) the vector embedding of the after scene in the training set and (ii) the vector embedding of the before scene in the training set.