Patent classifications
G05B2219/40499
Control apparatus, robot, learning apparatus, robot system, and method
A control apparatus of a robot may include a state obtaining unit configured to obtain state observation data including flexible related observation data, which is observation data regarding a state of at least one of a flexible portion, a portion of the robot on a side where an object is gripped relative to the flexible portion, and the gripped object; and a controller configured to control the robot so as to output an action to be performed by the robot to perform predetermined work on the object, in response to receiving the state observation data, based on output obtained as a result of inputting the state observation data obtained by the state obtaining unit to a learning model, the learning model being learned in advance through machine learning and included in the controller.
Automatic operation control method and system
An object of the present invention is to reduce an error between an actual machine and a simulation by removing the influence of overlearning of an adjustment by a mathematically-described function, and to optimize automatic operation control of the machine. An automatic operation control system for controlling an automatic operation of a machine sets a first model showing a relation between a control signal string input to the machine on the basis of a mathematically-described function and data output from the machine controlled in accordance with the control signal string. In a learning process including learning the automatic operation control of the machine, the system executes learning using the first model until a first condition is satisfied. After the first condition is satisfied, the learning is executed using a second model that is a model after the first model is changed one or more times until a second condition meaning overlearning is satisfied or the learning is finished without satisfying the second condition.
DEEP REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATION
Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
Integrating machine learning into control systems for industrial facilities
Methods, systems, apparatus and computer program products for implementing machine learning within control systems are disclosed. An industrial facility setting slate can be received from a machine learning system and a determination can be made as to whether to adopt the settings in the industrial facility setting slate. The machine learning model can be a neural network, e.g., a deep neural network, that has been trained, e.g., using reinforcement learning to predict a data setting slate that is predicted to optimize an efficiency of a data center.
PERFORMANCE RECREATION SYSTEM
The present disclosure generally relates to performance recreation, and in particular, the recreation of observed human performance using reinforcement learning. In this regard, a first object is identified from a plurality of objects. The manipulation of the first object is tracked from a first position to a second position. A characterization of the manipulation is generated. A policy that controls a mechanical gripper to recreate the manipulation is generated based on an iteratively increasing cumulative award. The mechanical gripper iteratively recreates the manipulation to increase a cumulative award with each recreation.
METHODS FOR RISK MANAGEMENT FOR AUTONOMOUS DEVICES AND RELATED NODE
A method performed by a risk management node for autonomous devices. The risk management node may determine state parameters from a representation of an environment. The representation of the environment may include an object, an autonomous device, and a set of safety zones. The risk management node may determine a reward value based on evaluating a risk of a hazard with the object based on the determined state parameters and current location and speed of the autonomous device relative to a safety zone from the set of safety zones. The risk management node may determine a control parameter based on the determined reward value, and may initiate sending the control parameter to the autonomous device to control action of the autonomous device. The control parameter may be dynamically adapted to reduce the risk of hazard with the object based on reinforcement learning feedback from the reward value.
CONTROL APPARATUS, CONTROL METHOD, AND COMPUTER-READABLE STORAGE MEDIUM STORING A CONTROL PROGRAM
A control apparatus according to one or more embodiments may calculate a first estimate value of the coordinates of an endpoint of a manipulator based on first sensing data obtained from a first sensor system, calculates a second estimate value of the coordinates of the endpoint of the manipulator based on second sensing data obtained from a second sensor system, and adjust a parameter value for at least one of a first estimation model or a second estimation model to reduce an error between the first estimate value and the second estimate value based on a gradient of the error.
Deep reinforcement learning for robotic manipulation
Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.
REMOTE CONTROLLED DEVICE, REMOTE CONTROL SYSTEM AND REMOTE CONTROL DEVICE
A remote controlled device comprises one or more memories and one or more processors. The one or more processors are configured to, when an event relating to a task being executed by a remote control object occurs: transmit information on a subtask of the task, receive a command relating to the subtask, and execute the task based on the command.
System and design of derivative-free model learning for robotic systems
A manipulator learning-control apparatus for controlling a manipulating system that includes an interface configured to receive manipulator state signals of the manipulating system and object state signals with respect to an object to be manipulated by the manipulating system in a workspace, wherein the object state signals are detected by at least one object detector, an output interface configured to transmit initial and updated policy programs to the manipulating system, a memory to store computer-executable programs including a data preprocess program, object state history data, manipulator state history data, a Derivative-Free Semi-parametric Gaussian Process (DF-SPGP) kernel learning program, a Derivative-Free Semi-parametric Gaussian Process (DF-SPGP) model learning program, an update-policy program and an initial policy program, and a processor, in connection with the memory, configured to transmit the initial policy program to the manipulating system for initiating a learning process that operates the manipulator system manipulating the object while a preset period of time.