IPIQ

G05B2219/40499

DEEP REINFORCEMENT LEARNING APPARATUS AND METHOD FOR PICK-AND-PLACE SYSTEM

20230040623 · 2023-02-09 ·

AGILESODA INC.

Disclosed is a deep reinforcement learning apparatus and method for a pick-and-place system. According to the present disclosure, a simulation learning framework is configured to apply reinforcement learning to make pick-and-place decisions using a robot operating system (ROS) in a real-time environment, thereby generating stable path motion that meets various hardware and real-time constraints.

Artificial intelligence system for efficiently learning robotic control policies

11707838 · 2023-07-25 ·

Amazon Technologies, Inc.

A machine learning system builds and uses control policies for controlling robotic performance of a task. Such control policies may be trained using targeted updates. For example, two trials identified as similar may be compared and evaluated to determine which trial achieved a greater degree of task success; a control policy update may then be generated based on identified differences between the two trials.

Viewpoint invariant visual servoing of robot end effector using recurrent neural network

11701773 · 2023-07-18 ·

Google Llc

Training and/or using a recurrent neural network model for visual servoing of an end effector of a robot. In visual servoing, the model can be utilized to generate, at each of a plurality of time steps, an action prediction that represents a prediction of how the end effector should be moved to cause the end effector to move toward a target object. The model can be viewpoint invariant in that it can be utilized across a variety of robots having vision components at a variety of viewpoints and/or can be utilized for a single robot even when a viewpoint, of a vision component of the robot, is drastically altered. Moreover, the model can be trained based on a large quantity of simulated data that is based on simulator(s) performing simulated episode(s) in view of the model. One or more portions of the model can be further trained based on a relatively smaller quantity of real training data.

System for manufacturing dispatching using deep reinforcement and transfer learning

11693392 · 2023-07-04 ·

Hitachi, Ltd.

Example implementations described herein are directed to a system for manufacturing dispatching using reinforcement learning and transfer learning. The systems and methods described herein can be deployed in factories for manufacturing dispatching for reducing job-due related costs. In particular, example implementations described herein can be used to reduce massive data collection and reduce model training time, which can eventually improve dispatching efficiency and reduce factory cost.

LEARNING ROBOTIC SKILLS WITH IMITATION AND REINFORCEMENT AT SCALE

20220410380 · 2022-12-29 ·

Utilizing an initial set of offline positive-only robotic demonstration data for pre-training an actor network and a critic network for robotic control, followed by further training of the networks based on online robotic episodes that utilize the network(s). Implementations enable the actor network to be effectively pre-trained, while mitigating occurrences of and/or the extent of forgetting when further trained based on episode data. Implementations additionally or alternatively enable the actor network to be trained to a given degree of effectiveness in fewer training steps. In various implementations, one or more adaptation techniques are utilized in performing the robotic episodes and/or in performing the robotic training. The adaptation techniques can each, individually, result in one or more corresponding advantages and, when used in any combination, the corresponding advantages can accumulate. The adaptation techniques include Positive Sample Filtering, Adaptive Exploration, Using Max Q Values, and Using the Actor in CEM.

Systems and methods for learning reusable options to transfer knowledge between tasks

11511413 · 2022-11-29 ·

Huawei Technologies Co. Ltd.

A robot that includes an RL agent that is configured to learn a policy to maximize the cumulative reward of a task, to determine one or more features that are minimally correlated with each other. The features are then used as pseudo-rewards, called feature rewards, where each feature reward corresponds to an option policy, or skill, the RL agent learns to maximize. In an example, the RL agent is configured to select the most relevant features to learn respective option policies from. The RL agent is configured to, for each of the selected features, learn the respective option policy that maximizes the respective feature reward. Using the learned option policies, the RL agent is configured to learn a new (second) policy for a new (second) task that can choose from any of the learned option policies or actions available to the RL agent.

POLICY LAYERS FOR MACHINE CONTROL

20220371184 · 2022-11-24 ·

Apparatuses, systems, and techniques provide a policy that can be executed to cause a machine to move. In at least one embodiment, a first policy layer is provided to cause the machine to execute a first motion that causes the machine to accelerate to reach an unbiased state. A second policy layer is provided to cause the machine to execute a second motion without influencing the unbiased state to be reached by machine. The policy can comprise the first and second policy layers.

Robotic Grasping Via RF-Visual Sensing And Learning

20230056652 · 2023-02-23 ·

Massachusetts Institute Of Technology

Described is the design, implementation, and evaluation of a robotic system configured to search for and retrieve RFID-tagged items in line-of-sight, non-line-of-sight, and fully-occluded settings. The robotic system comprises a robotic arm having a camera and antenna strapped around a portion thereof (e.g. a gripper) and a controller configured to receive information from the camera and (radio frequency) RF information via the antenna and configured to use the information provided thereto to implement a method that geometrically fuses at least RF and visual information. This technique reduces uncertainty about the location of a target object even when the object is fully occluded. Also described is a reinforcement-learning network that uses fused RF-visual information to efficiently localize, maneuver toward, and grasp a target object. The systems and techniques described herein find use in many applications including robotic retrieval tasks in complex environments such as warehouses, manufacturing plants, and smart homes.

Robotic control using value distributions

11571809 · 2023-02-07 ·

X Development Llc

Techniques are described herein for robotic control using value distributions. In various implementations, as part of performing a robotic task, state data associated with the robot in an environment may be generated based at least in part on vision data captured by a vision component of the robot. A plurality of candidate actions may be sampled, e.g., from continuous action space. A trained critic neural network model that represents a learned value function may be used to process a plurality of state-action pairs to generate a corresponding plurality of value distributions. Each state-action pair may include the state data and one of the plurality of sampled candidate actions. The state-action pair corresponding to the value distribution that satisfies one or more criteria may be selected from the plurality of state-action pairs. The robot may then be controlled to implement the sampled candidate action of the selected state-action pair.

ROBOT CONTROL METHOD, APPARATUS AND DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT

20230035150 · 2023-02-02 ·

Tencent Technology (Shenzhen) Company Limited

Embodiments of the disclosure provide a robot control method, apparatus and device, a computer storage medium and a computer program product and relate to the technical field of artificial intelligence. The method includes: acquiring environment interaction data and an actual target value, indicating a target that is actually reached by executing an action corresponding to action data in the environment interaction data; determining a return value after executing the action according to state data, action data and the actual target value at the first time of two adjacent times; updating a return value in the environment interaction data by using the return value after executing the action; training an agent corresponding to a robot control network by using the updated environment interaction data, and controlling the action of a target robot by using the trained agent.

Patent classifications

G05B2219/40499