IPIQ

G05B2219/39298

Deep reinforcement learning for robotic manipulation

11400587 · 2022-08-02 ·

Google Llc

Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

Systems and methods for learning to extrapolate optimal object routing and handling parameters

11407589 · 2022-08-09 ·

Berkshire Grey Operating Company, Inc.

A system for object processing is disclosed. The system includes a framework of processes that enable reliable deployment of artificial intelligence-based policies in a warehouse setting to improve the speed, reliability, and accuracy of the system. The system harnesses a vast number of picks to provide data points to machine learning techniques. These machine learning techniques use the data to refine or reinforce in-use policies to optimize the speed and successful transfer of objects within the system. For example, objects in the system are identified at a supply location, a predetermined set of information regarding object is retrieved and combined with a set of object information and processing parameters determined by the system. The combined information is then used to determine routing of the object according to an initial policy. This policy is then observed, altered, tested, and re-implemented in an altered form.

TRAINING A POLICY MODEL FOR A ROBOTIC TASK, USING REINFORCEMENT LEARNING AND UTILIZING DATA THAT IS BASED ON EPISODES, OF THE ROBOTIC TASK, GUIDED BY AN ENGINEERED POLICY

20220245503 · 2022-08-04 ·

Implementations disclosed herein relate to utilizing at least one existing manually engineered policy, for a robotic task, in training an RL policy model that can be used to at least selectively replace a portion of the engineered policy. The RL policy model can be trained for replacing a portion of a robotic task and can be trained based on data from episodes of attempting performance of the robotic task, including episodes in which the portion is performed based on the engineered policy and/or other portion(s) are performed based on the engineered policy. Once trained, the RL policy model can be used, at least selectively and in lieu of utilization of the engineered policy, to perform the portion of robotic task, while other portion(s) of the robotic task are performed utilizing the engineered policy and/or other similarly trained (but distinct) RL policy model(s).

Imitation Learning in a Manufacturing Environment

20220269254 · 2022-08-25 ·

Nanotronics Imaging, Inc.

A computing system identifies a trajectory example generated by a human operator. The trajectory example includes trajectory information of the human operator while performing a task to be learned by a control system of the computing system. Based on the trajectory example, the computing system trains the control system to perform the task exemplified in the trajectory example. Training the control system includes generating an output trajectory of a robot performing the task. The computing system identifies an updated trajectory example generated by the human operator based on the trajectory example and the output trajectory of the robot performing the task. Based on the updated trajectory example, the computing system continues to train the control system to perform the task exemplified in the updated trajectory example.

Adaptive predictor apparatus and methods

11331800 · 2022-05-17 ·

Brain Corporation

Apparatus and methods for training and operating of robotic devices. Robotic controller may comprise a predictor apparatus configured to generate motor control output. The predictor may be operable in accordance with a learning process based on a teaching signal comprising the control output. An adaptive controller block may provide control output that may be combined with the predicted control output. The predictor learning process may be configured to learn the combined control signal. Predictor training may comprise a plurality of trials. During initial trial, the control output may be capable of causing a robot to perform a task. During intermediate trials, individual contributions from the controller block and the predictor may be inadequate for the task. Upon learning, the control knowledge may be transferred to the predictor so as to enable task execution in absence of subsequent inputs from the controller. Control output and/or predictor output may comprise multi-channel signals.

Reduced degree of freedom robotic controller apparatus and methods

11279026 · 2022-03-22 ·

Brain Corporation

Apparatus and methods for training and controlling of, for instance, robotic devices. In one implementation, a robot may be trained by a user using supervised learning. The user may be unable to control all degrees of freedom of the robot simultaneously. The user may interface to the robot via a control apparatus configured to select and operate a subset of the robot's complement of actuators. The robot may comprise an adaptive controller comprising a neuron network. The adaptive controller may be configured to generate actuator control commands based on the user input and output of the learning process. Training of the adaptive controller may comprise partial set training. The user may train the adaptive controller to operate first actuator subset. Subsequent to learning to operate the first subset, the adaptive controller may be trained to operate another subset of degrees of freedom based on user input via the control apparatus.

Apparatus and methods for operating robotic devices using selective state space training

11279025 · 2022-03-22 ·

Brain Corporation

Apparatus and methods for training and controlling of e.g., robotic devices. In one implementation, a robot may be utilized to perform a target task characterized by a target trajectory. The robot may be trained by a user using supervised learning. The user may interface to the robot, such as via a control apparatus configured to provide a teaching signal to the robot. The robot may comprise an adaptive controller comprising a neuron network, which may be configured to generate actuator control commands based on the user input and output of the learning process. During one or more learning trials, the controller may be trained to navigate a portion of the target trajectory. Individual trajectory portions may be trained during separate training trials. Some portions may be associated with robot executing complex actions and may require additional training trials and/or more dense training input compared to simpler trajectory actions.

Controller and machine learning device

11235461 · 2022-02-01 ·

Fanuc Corporation

Kouichirou Hayashi

A machine learning device is provided in a versatile controller capable of inferring command data to be issued to each axis of a robot. The device includes an axis angle conversion unit calculating, from the trajectory data, an amount of change of an axis angle of an axis of the robot, a state observation unit observing axis angle data relating to the amount of change of the axis angle of the axis of the robot as a state variable representing a current state of an environment, a label data acquisition unit acquiring axis angle command data relating to command data for the axis of the robot as label data, and a learning unit learning the amount of change of the axis angle of the axis of the robot and the command data for the axis in association with each other by using the state variable and the label data.

ROBOT CONTROL DEVICE, ROBOT CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

20210323145 · 2021-10-21 ·

Omron Corporation

A robot control device is provided to accept information (201) specifying an object 170) manipulated by a robot (20) from among objects of a plurality of kinds and accept information (202) specifying a target relative positional relationship between the specified object (70) and the distal end of a hand of the robot (20). The robot control device extracts the object (70) from image information (501) obtained by photographing the objects of the plurality of kinds and the surrounding environment thereof, generates information (301) indicating the position and orientation of the object (70), generates an action instruction (401) from the result of learning by a learning module (103), the action instruction (401) serving to match the relative positional relationship between the object (70) and the distal end of the hand of the robot (20) with the target relative positional relationship, and outputs the action instruction (401) to the robot (20).

Storage medium having stored learning program, learning method, and learning apparatus

11182633 · 2021-11-23 ·

Fujitsu Limited

A learning method is performed by a computer. The method includes: inputting a first image to a model, which outputs, from an input image, candidates for a specific region and confidences indicating probabilities of the respective candidates being the specific region, to cause the model to output a plurality of candidates for the specific region and confidences for the respective candidates; calculating a first value for each of candidates whose confidences do not satisfy a certain criterion among the candidates output by the model, the first value increasing as the confidence increases; calculating a second value obtained by weighting the first value such that the second value decreases as the confidence increases; and updating the model such that the second value decreases.

Patent classifications

G05B2219/39298