B25J9/163

Method for monitoring balanced state of biped robot

The present invention provides a method for monitoring a balanced state of a humanoid robot, comprising: acquiring state data of the robot falling in different directions and being stable, forming a support vector machine (SVM) training data set and obtaining, by training, an initial SVM classifier; inputting the state data of the robot to the trained SVM classifier, so that the SVM classifier outputs a classification result; taking statistics on a proportion of cycles judged to have an impending fall in the total number of control cycles within a judgment buffer time after the SVM classifier outputs the classification result, and finally determining a monitoring result of the balanced state of the robot according to the proportion and finally extracting state data of misjudged cycles within the buffer time, adding the state data to the current training data set and updating the SVM classifier, eventually enabling the classifier to achieve the effects of matching motion capabilities of the robot and monitoring the balanced state.

Using a recursive reinforcement model to determine an agent action

According to examples, an apparatus may include a processor and a memory on which is stored machine readable instructions that may cause the processor to access data about an environment of an agent, identify an actor in the environment, and access candidate models, in which each of the candidate models may predict a certain action of the identified actor. The instructions may also cause the processor to apply a selected candidate model of the accessed candidate models on the accessed data to determine a predicted action of the identified actor and may implement a recursive reinforcement learning model using the predicted action of the identified actor to determine an action that the agent is to perform. The instructions may further cause the processor to cause the agent to perform the determined action.

Charging system for robot and control method thereof

A robot charging system and a control method thereof are provided to determine a charged state and charge a robot through self-driving. The robot charging system includes: a server configured to store boarding information of a user; a robot configured to receive the boarding information from the server, move the user to a destination included in the boarding information by self-driving using charged power, determine a discharge of the power, and move to a charging station for charging; and the charging station provided with a power supply coil to wirelessly supply the power source to the robot, and provided with a moving rail on a top of the power supply coil to sequentially charge a plurality of robots.

Systems and methods for learning reusable options to transfer knowledge between tasks

A robot that includes an RL agent that is configured to learn a policy to maximize the cumulative reward of a task, to determine one or more features that are minimally correlated with each other. The features are then used as pseudo-rewards, called feature rewards, where each feature reward corresponds to an option policy, or skill, the RL agent learns to maximize. In an example, the RL agent is configured to select the most relevant features to learn respective option policies from. The RL agent is configured to, for each of the selected features, learn the respective option policy that maximizes the respective feature reward. Using the learned option policies, the RL agent is configured to learn a new (second) policy for a new (second) task that can choose from any of the learned option policies or actions available to the RL agent.

ASSISTANCE FOR ROBOT MANIPULATION

A robot control system includes circuitry configured to: acquire an input command value indicating a manipulation of a robot by a subject user; acquire a current state of the robot and a target state associated with the manipulation of the robot; determine a state difference between the current state and the target state; acquire from a learned model, a degree of distribution associated with a motion of the robot, based on the state difference, wherein the learned model is generated based on a past robot manipulation; set a level of assistance to be given during the manipulation of the robot by the subject user, based on the degree of distribution acquired; and generate an output command value for operating the robot, based on the input command value and the level of assistance.

LEARNING DEVICE, LEARNING METHOD, AND COMPUTER PROGRAM PRODUCT FOR TRAINING
20220374767 · 2022-11-24 · ·

According to an embodiment, a learning device includes one or more hardware processors configured to: acquire a current state of a device; learn a reinforcement learning model, and determine a first action of the device on the basis of the current state and the reinforcement learning model; determine a second action of the device on the basis of the current state and a first rule; and select one of the first action and the second action as a third action to be output to the device according to a progress of learning of the reinforcement learning model.

METHOD FOR CONTROLLING A ROBOTIC DEVICE

A method for controlling a robotic device. The method includes providing demonstrations for carrying out a skill by the robot, each demonstration including a robot pose, an acting force as well as an object pose for each point in time of a sequence of points in time, ascertaining an attractor demonstration for each demonstration, training a task-parameterized robot trajectory model for the skill based on the attractor trajectories and controlling the robotic device according to the task-parameterized robot trajectory model.

METHOD FOR TRAINING A CONTROL ARRANGEMENT FOR A CONTROLLED SYSTEM
20220371185 · 2022-11-24 ·

A method for training a control arrangement for a controlled system. The control arrangement includes a regulation device and an actuator that operates according to a control strategy. The method includes the generation of control actions by the regulation device, each control action being generated by detecting measured variables that indicate a state of the controlled system, ascertaining a correction term for the detected measured variables by the actuator according to the control strategy, adapting the detected measured variables using the correction term for the detected measured variables, and generating the control action by supplying the adapted measured variables to the regulation device as the actual value. The method further includes training the control strategy by reinforcement learning for maximizing the gain that is achieved by the generated control actions.

DISPLAY GUIDED HIGH-ACCURACY ROBOTIC NAVIGATION AND MOTION CONTROL SYSTEM AND METHODS
20220371284 · 2022-11-24 ·

A display guided robotic navigation and control system comprises a display system including a display surface and a display device configured to display an image including a visual pattern onto the display surface, a robotic system including a mobile robotic device and an optical sensor attached to the mobile robotic device, and a computing system communicatively connected to the display system and the robotic system. Related methods are also disclosed.

Dual-robot position/force multivariate-data-driven method using reinforcement learning
20220371186 · 2022-11-24 ·

Disclosed is a dual-robot position/force multivariate-data-driven method using reinforcement learning. A master robot adopts an ideal position meta-control strategy, learns a desired position by a reinforcement learning algorithm, and feeds back an actual position to a desired position, and a goal is to generate an optimal force while the robot interacts with the environment, as to minimize a position error; and a slave robot, based on a force meta-control strategy of position deviation of the master robot, adopts a damping proportional-derivative (PD) control strategy suitable for an unknown environment, and learns a desired acting force by the reinforcement learning algorithm, namely a minimum force for driving the slave robot to approach a desired reference point. The present invention may improve the dexterity of dual-robot collaboration, solve a parameter optimization problem in position/force control.