FOOT CONTACT PATTERN(S) AS INTERFACE FOR LANGUAGE TO CONTROL ROBOT(S)
20250291351 ยท 2025-09-18
Inventors
- Yujin Tang (Kanagawa, JP)
- Wenhao Yu (Mountain View, CA, US)
- Jie Tan (Mountain View, CA, US)
- Byungha Chun (Tokyo, JP)
- Aleksandra Faust (Palo Alto, CA, US)
Cpc classification
International classification
Abstract
Various implementations are provided which include receiving an instance of natural language (NL) text input indicating a task for a multi-legged robot to perform in an environment. In many implementations, the system can process the NL text input using a large language model (LLM) to generate a foot contact pattern, indicating a sequence of leg positions of the robot relative to the surface, where one or more of the legs of the robot are in contact with the surface. Additionally or alternatively, the system can generate low-level robot control output by processing the foot contact pattern using a locomotion controller.
Claims
1. A method implemented by one or more processors, the method comprising: receiving an instance of natural language (NL) text input, wherein the instance of NL text input indicates a task for a robot to perform in an environment, wherein the robot has a plurality of legs, and wherein the robot is on a surface in the environment; processing the instance of NL text input using a large language model (LLM) to generate a foot contact pattern, wherein the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot, and wherein one or more of the legs of the robot are in contact with the surface; generating control output by processing the foot contact pattern using a locomotion controller of the robot; and causing the robot to perform one or more actions based on the control output.
2. The method of claim 1, wherein the foot contact pattern indicates whether one or more of the legs of the robot are in contact with the surface and wherein the foot contact pattern further indicates whether one or more of the legs of the robot are not in contact with the surface.
3. The method of claim 1, wherein the foot contact pattern indicates whether the one or more legs of the robot are in contact with the surface, wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a first distance from the surface, and wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a second distance from the surface.
4. The method of claim 1, wherein the task for the robot to perform in the environment, indicated by the instance of NL text input, is a locomotion task with a target gait of the robot.
5. The method of claim 4, wherein the target gait of the robot is a cyclic motion pattern that produces locomotion through a sequence of contacts with the surface.
6. The method of claim 4, wherein the target gait of the robot includes a bounding gait, a trotting gait, a pacing gait, standing still, and/or standing on three legs.
7. The method of claim 1, wherein the robot is a quadruped robot and wherein the plurality of legs of the robot includes a front left leg, a front right leg, a rear left leg, and a rear right leg.
8. The method of claim 1, wherein processing the instance of NL text input using the LLM to generate the foot contact pattern comprises: generating a prompt for the LLM based on the instance of NL text input; and generating the foot contact pattern based on processing the LLM prompt using the LLM.
9. The method of claim 8, where the prompt for the LLM, that is based on the instance of NL text input, includes one or more general instructions for the LLM, wherein the one or more general instructions for the LLM include instructions to translate the NL text input into the foot contact pattern, and wherein the one or more general instructions are in addition to any of the NL text input.
10. The method of claim 9, wherein the prompt for the LLM, that is based on the instance of NL text input, further includes one or more gait definitions, wherein each gait, of the one or more gait definitions, includes a NL text description of the gait, and wherein the NL text description of each gait is in addition to any of the NL text input.
11. The method of claim 10, wherein the NL text description of one or more of the gaits, of the one or more gait definitions, includes NL text indicating an emotion corresponding to the gait.
12. The method of claim 10, wherein the prompt for the LLM, that is based on the instance of NL text input, further includes one or more foot contact pattern output instructions, wherein the one or more foot contact pattern output instructions include NL text description of how to format the foot contact pattern, and wherein the NL text description of how to format the foot contact pattern is in addition to any of the NL text input.
13. The method of claim 12, wherein the prompt for the LLM, that is based on the instance of NL text input, further includes one or more example foot contact patterns.
14. The method of claim 1, wherein the NL text input, or additional input, includes a user defined velocity for the robot in performance of the task in the environment, and wherein generating the control output by processing the foot contact pattern using the locomotion controller of the robot comprises: identifying a current state of one or more components of the robot; and generating the control output by processing, using the locomotion controller, (1) the current state of the one or more components of the robot, (2) the user defined velocity for the robot, and (3) the foot contact pattern.
15. The method of claim 1, wherein a control policy of the locomotion controller is trained using one or more training foot contact patterns, where each of the training foot contact patterns is generated using a random pattern generator based on a training gait.
16. The method of claim 15, wherein training the control policy of the locomotion controller using a given training foot contact pattern, of the one or more training foot contact patterns comprises: processing the given training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states; determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern; and updating one or more portions of the control policy of the locomotion controller based on the reward.
17. The method of claim 16, wherein determining the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern comprises: generating the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern to maximize an expected reward.
18. The method of claim 16, wherein the LLM is fine-tuned based on the generated reward.
19. The method of claim 16, prior to receiving the instance of NL text input, wherein fine-tuning the LLM is based on a previously generated reward, wherein the previously generated reward is generated based on processing a prior sequence of robot actions and prior corresponding robot states and/or a prior training foot contact pattern.
20. The method of claim 1, wherein the instance of NL text input is a text representation of a spoken utterance captured in an instance of audio data and/or the NL text input is an instance of text provided by a user via a keyboard.
21. A method implemented by one or more processors, the method comprising: training a locomotion controller to generate control output for controlling a robot with a plurality of legs on a surface in the environment, wherein the control output is generated based on processing a foot contact pattern using the locomotion controller, wherein the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot where one or more of the legs of the robot are in contact with the surface, and wherein training the control policy of the locomotion controller comprises: selecting a training foot contact pattern generated based on a training gait using a random pattern generator; processing the training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states for use in controlling locomotion of the robot; determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern; and updating one or more portions of the control policy of the locomotion controller based on the reward.
22. The method of claim 21, further comprising: transmitting the control policy of the locomotion controller for use in controlling a given robot.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
DETAILED DESCRIPTION
[0027] Large language models (LLMs) have demonstrated the potential to perform high-level planning. However, it remains a challenge for some LLMs to comprehend low-level commands, such as joint angle targets or motor torques. Various implementations described herein use foot contact patterns as an interface that bridges human commands in natural language and a locomotion controller that outputs these low-level commands. In some implementations, the system allows users to flexibly craft diverse locomotion behaviors for robot(s) (e.g., quadrupedal robots). In some implementations, the system can include LLM prompt design, a reward function, and/or exposing the controller to a feasible distribution of contact patterns. The resulting controller is capable of achieving diverse locomotion patterns that can be transferred to real world robot hardware.
[0028] In some implementations, simple and/or effective interaction between a human user and a quadrupedal robot can pave the way towards creating intelligent and capable helper robots. Some human-robot interaction systems can enable quadrupedal robots to respond to natural language instructions as language is an important communication channel for human beings. Recent developments in Large Language Models (LLMs) have engendered a spectrum of applications fueled by the proficiency of LLMs to ingest an enormous amount of historical data, to adapt in-context to novel tasks with few examples, and to understand and interact with user intentions through a natural language interface.
[0029] LLMs have been used to develop interactive and capable systems for physical robots. Researchers have demonstrated the potential of using LLMs to perform high-level planning and/or robot code writing. Nevertheless, unlike text generation where LLMs directly interpret the atomic elementstokensit often proves challenging for LLMs to comprehend low-level robotic commands such as joint angle targets or motor torques, especially for inherently unstable legged robots necessitating high-frequency control signals. Consequently, many systems presume the provision of high-level APIs for LLMs to dictate robot behavior, which may inherently limit the system's expressive capabilities.
[0030] In some implementations, systems can use foot contact patterns as an interface that bridges human instructions in natural language and low-level commands. In some of those implementations, the resulting system for legged robots, particularly quadrupedal robots, can be interactive, where the system can allow users to craft diverse locomotion behaviors flexibly. Additionally or alternatively, the patterns of feet establishing and breaking contacts with the ground often govern the final locomotion behavior for legged robots due to the heavy reliance of quadruped locomotion on environmental contact. In some implementations, a contact pattern, describing the contact establishing and breaking timings for each leg, is a compact and flexible interface to author locomotion behaviors for legged robots. To leverage this interface for controlling quadruped robots, some implementations use an LLM-based approach to generate contact patterns, represented by 0s and 1s, from user instructions. Despite LLMs being trained with mostly natural language dataset, with proper prompting and in-context learning, the LLMs can produce contact patterns to represent diverse quadruped motions. In some implementations, a Deep Reinforcement Learning (DRL) based approach can be used to generate robot actions given a desired contact pattern. The DRL based approach can use a reward structure that only concerns about contact timing and exposing the policy to the right distribution of contact patterns. In some implementations, the system can include a controller capable of achieving diverse locomotion patterns that can be transferred to the real robot hardware.
[0031] In some implementations, the system can include an interface of contact pattern for harnessing knowledge from LLMs to flexibly and interactive control quadruped robots; a pipeline to teach LLMs to generate complex contact patterns from user instructions; and/or a DRL-based method to train a low-level controller that realizes diverse contact patterns on real quadruped robots.
[0032] In some implementations, the system can include using desired foot contact patterns as an interface between human commands in natural language and the locomotion controller. The locomotion controller can be used to not only complete the main task (e.g., following specified velocities), but also to place the robot's feet on the ground at the right time, such that the realized foot contact patterns are as close as possible to the desired ones. In some of those implementations, the locomotion controller can process the robot's proprioceptive sensor data, task commands (e.g., following a desired linear velocity), and/or the desired foot contact pattern to generate output indicating the desired joint positions. In some implementations, the foot contact patterns can be extracted by a cyclic sliding window from a pattern template, which is generated by a random pattern generator during training, and is translated from human commands in natural language by an LLM.
[0033] In some implementations, a desired foot contact pattern can be defined by a cyclic sliding window of size L.sub.w that extracts the four feet ground contact flags between t+1 and t+L.sub.w from a pattern template and is of shape 4L.sub.w. A contact pattern template can be is a 4T matrix of 0s and 1s. In some implementations, 0s can represent feet in the air and 1s can represent feet on the ground. Additionally or alternatively, from top to bottom, each row in the matrix gives the foot contact patterns of the front left (FL), front right (FR), rear left (RL) and rear right (RR) feet. In some implementations, the LLM can map human commands into foot contact pattern templates in specified formats accurately given properly designed prompts. In some of those implementations, the LLM can map human commands into foot contact pattern templates even when the commands are unstructured and/or vague. In some implementations, a random pattern generator can be used to produce contact pattern templates that are of various pattern lengths T, with foot-ground contact ratios within a cycle based on a given gait type G. In some of those implementations, the locomotion controller can be trained using a wide distribution of movements, which can enable the locomotion controller to generalize better.
[0034] While LLMs can learn knowledge from a vast amount of text data at training, providing proper prompts at inference can be the key to unlock and/or direct the acquired knowledge in meaningful ways. Carefully designed prompts can serve as the starting point for the models to generate text and guide the direction and context of the outputs. In some implementations, the system aims to enable the LLM to map any human commands in natural language to foot contact patterns in a specified format.
[0035] In some implementations, the LLM prompt can include one or more general instructions, one or more gait definitions, one or more foot contact pattern output instructions, one or more example foot contact patterns, one or more additional or alternative prompt portions, and/or combinations thereof. The one or more general instructions can include natural language text which describes the task the LLM should accomplish. In some implementations, the one or more general instructions can include a task where the LLM is expected to translate an arbitrary command (e.g., the NL text input) to a foot contact pattern. The one or more gait definitions can include natural language text which gives basic knowledge of quadrupedal gaits. Although their descriptions are neither exhaustive nor sufficiently accurate, the one or more gait definitions provide enough information for the LLM to follow the rules. Additionally or alternatively, the one or more gait definitions connects the bounding gait to a general impression of emotion. This helps the LLM generalize over vague human commands that do not explicitly specify what gaits the robot should use.
[0036] In some implementations, the LLM prompt can include an output definition which specifies the format of the output. For example, the system can discretize the desired velocities
so the LLM can give proper outputs corresponding to commands that contain words like fast(er) and/or slow(er).
[0037] In some implementations, the LLM prompt can include an examples block, which can include general knowledge of instruction fine-tuning and/or can provide the LLM with a few concrete input-output pairs. For example, the examples block can include one or more gait examples, one or more velocity examples, etc.
[0038] In some implementations, robot locomotion control can be represented as a Markov Decision Process (MDP). In some of those implementations, the MDP can be solved using DRL algorithms. For example, a MDP can be a tuple (S, A, r, f, P.sub.0, ), where S is the state space, A is the action space, r(s.sub.t, a.sub.t, s.sub.t+1) is the reward function, (s.sub.ta.sub.t) is the system transition function, P.sub.0 is the distribution of initial statess.sub.0, and [0,1] is the reward discount factor. In some implementations, the goal of a DRL algorithm can be to optimize a policy : SA so that the expected accumulated reward J=E.sub.s.sub.
[0039] In some implementations, the random pattern generator receives a gait type G, and can randomly sample a corresponding cycle length T and the ground contact ratio within the cycle for each foot, can conduct proper scaling and phase shifts, and/or can output a pattern template. While humans can give commands that map to a much wider set of foot contact pattern templates, as an example, the system can define and train on five types: G{BOUND, TROT, PACE, STAND_STILL, STAND_3LEGS}.
[0040] In some implementations, the system can use a feed-forward neural network as the control policy .sub.. The control policy can be used to generate the desired positions for each motor joint and its input includes the base's angular velocities, the gravity vector {right arrow over (g)}=[0,0,1] in the base's frame, user specified velocity, current joint positions and velocities, policy output from the last time step, desired foot contact patterns, and/or one or more additional or alternative parameters. In some implementations, Unitree A1 can be used as the quadrupedal robot. A1 has 3 joints per leg (i.e., hip, thigh and calf joints) and L.sub.w=5, therefore the dimensions of the policy's input and output are 65 and 12, respectively. The policy has three hidden layers of sizes [512, 256, 128] with ELU (=1.0) at each hidden layer as the non-linear activation function.
[0041] To encourage natural and symmetric behaviors, the system can employ a double-pass trick in the control policy. Specifically, instead of using a.sub.t=.sub.(s.sub.t) directly as the output, the system can use a.sub.t=0.5[.sub.(s.sub.t)+.sub.act(.sub.(.sub.obs(s.sub.t))] where .sub.act(.Math.) and .sub.obs(.Math.) flips left-right the policy's output and the robot's state respectively. Intuitively, this double-pass trick says the control policy should output consistently when it receives the original and the left-right mirrored states. In practice, this trick greatly improves the naturalness of the robot's movement and helps shrink the sim-to-real gap.
[0042] One of the controller's main tasks is to follow user specified linear velocities along the robot's heading direction, while keeping the linear velocity along the lateral direction and the yaw angular velocity as close to zeros as possible. Additionally or alternatively, the controller needs to plan for the correct timing for feet-ground strikes so that the realized foot contact patterns match the desired ones. For real world deployment, the system can use a regularization term that penalizes action changing rate so that the real robot's movement is smoother. In addition to applying domain randomization, extra reward terms that keep the robot base stable can greatly shrink the sim-to-real gap and produce natural looking gaits. Finally, although no heavy engineering is required to train the locomotion policy with extra contact pattern inputs, it can help to balance the ratio of the gait types during training.
[0043] In some implementations, the random pattern generator can generate a foot contact pattern given a specific gait G. As an illustrated example, the system can generate a foot contact pattern for G=PACING. In general, the system can sample T[24, 28]. When the control frequency is 50 Hz, the corresponding cycle length is 0.480.56 seconds. The system can sample a foot-ground contact length ratio within the cycle r.sub.contact[0.5, 0.7], where the foot-ground contact length ratio Tr.sub.contact can indicate the number of 1s and T(1r.sub.contact) can indicate the number of 0s in each row. In some implementations, the system needs to scale the length and/or shift bits of these ones and zeroes to produce feasible foot contact patterns on a real robot.
[0044] For example, for G=BOUND, the foot-ground contact time can be shortened to 60% of the sampled value (i.e., r.sub.contact=0.6r.sub.contact). Similarly, the ones can be placed at the beginning of the FL and FR rows, and the ones in the RL and RR rows can be shifted by 0.5Tr.sub.contact bits to the right. In contrast, for G=TROT, the system does not perform any scaling. Additionally or alternatively, the 1s can be shifted to form a complete foot contact pattern, where the ones are placed at the beginning of FL and RR and at the end of the FR and RL rows. In some implementations, r.sub.contact is not changed for G=PACE, but the cycle lengths can be shortened to half of its sampled value (i.e., T=0.5T) to make the gait natural and/or feasible. The system can still place the ones at the beginning in the FL and RL rows and at the end of the FR and RR rows.
[0045] For G={STAND_STILL, STAND_3LEGS}, the system performs no scaling and fills the foot contact pattern matrix with ones. Additionally, for G=STAND_3LEGS, the system can randomly sample one row and replace it with zeroes.
[0046] In some implementations, the reward can consist of a set of weighted reward terms (e.g., 9 weighted reward terms): J=.sub.i=1.sup.8 w.sub.ir.sub.i, where the w.sub.i's are the weights and r.sub.i's are the rewards. In some implementations, the definition of one or more of the reward terms and the value of the weights are described as follows, where the purpose of each reward term is in brackets. [0047] (1) A linear velocity tracked reward [task reward] of
[0055] In some implementations, the control policy of the locomotion controller can be trained using the following training configurations. In some implementations, PD control can be used to convert positions to torques in the system. The base value for the 2 gains are k.sub.p=20 and k.sub.d=0.5, and the control frequency is 50 Hz. In some implementations, a gait G can be sampled based on a randomly assigned gait to a robot at environment resets. Additionally or alternatively, the system can also sample the gait again every 150 steps in simulation. Of the 5 gait types described herein, some gaits are harder to learn than others. To avoid the case where the hard-to-learn gaits die out, leaving the controller to learn only the easier gaits, the system can restrict the sampling distribution such that the ratio of the gait types are approximately the same.
[0056] In some implementations, the system can use proximal policy optimization (PPO) as the reinforcement learning method to train the controller. In some of those implementations, PPO can be used to train an actor-critic policy. The actor policy can use a feed-forward neural network as the actor policy .sub.. The actor policy can be used to generate the desired positions for each motor joint and its input includes the base's angular velocities, the gravity vector {right arrow over (g)}=[0,0,1] in the base's frame, user specified velocity, current joint positions and velocities, policy output from the last time step, desired foot contact patterns, and/or one or more additional or alternative parameters. In some implementations, Unitree A1 can be used as the quadrupedal robot. A1 has 3 joints per leg (i.e., hip, thigh and calf joints) and L.sub.w=5, therefore the dimensions of the policy's input and output are 65 and 12, respectively. The policy has three hidden layers of sizes [512, 256, 128] with ELU (=1.0) at each hidden layer as the non-linear activation function.
[0057] In some implementations, the system critic policy can use a similar network architecture as the actor policy, except the output size is 1 (instead of 12) and it can receive the base velocities in the local frame as its input. The system can keep the hyper-parameters the same and train for 1000 iterations. For safety reasons, the system can end an episode early if the body height of the robot is lower than 0.25 meters.
[0058] In some implementations, during training the system can sample noise Uni, and can add them to the controller's observations. The system can use PD control to convert positions to torques, and domain randomization can be applied to the 2 gains k.sub.p and k.sub.d.
[0059] Turning now to the figures,
[0060] Robot 100 includes a base 102, a head 104, front left leg 106, back left leg 108, front right leg 110, and rear right leg 112, where the legs are provided for locomotion of the robot. The front left leg 106 has a corresponding foot 114; the back left leg 108 has a corresponding foot 116; the front right leg 110 has a corresponding foot 118; and the rear right leg 112 has a corresponding foot 120. The robot 100 may include, for example, one or more motors to move one or more legs of the robot 100 to achieve a desired direction, velocity, and/or acceleration of movement for the robot 100. Additional or alternative robots can include one or more robot arms (not depicted) with an end effector (not depicted) that takes the form of a gripper with two opposing fingers or digits.
[0061] Robot 100 also includes one or more vision components (not depicted) that can generate vision data (e.g., images) related to shape, color, depth, and/or other features of object(s) that are in the line of sight of the vision component(s). The vision component(s) can be, for example, a monocular camera, a stereographic camera (active or passive), and/or a 3D laser scanner. A 3D laser scanner can include one or more lasers that emit light and one or more sensors that collect data related to reflections of the emitted light. The 3D laser scanner can generate vision component data that is a 3D point cloud with each of the points of the 3D point cloud defining a position of a point of a surface in 3D space. A monocular camera can include a single sensor (e.g., a charge-coupled device (CCD)), and generate, based on physical properties sensed by the sensor, images that each includes a plurality of data points defining color values and/or grayscale values. For instance, the monocular camera can generate images that include red, blue, and/or green channels. Each channel can define a value for each of a plurality of pixels of the image such as a value from 0 to 255 for each of the pixels of the image. A stereographic camera can include two or more sensors, each at a different vantage point. In some of those implementations, the stereographic camera generates, based on characteristics sensed by the two sensors, images that each includes a plurality of data points defining depth values and color values and/or grayscale values. For example, the stereographic camera can generate images that include a depth channel and red, blue, and/or green channels.
[0062] Robot 100 also includes one or more processors that, for example, can implement all or aspects of process 700 and/or 800 described herein. Additional description of some examples of the structure and functionality of various robots is provided herein.
[0063] At a given time, one or more of the feet (e.g., 114, 116, 118, 120) of the robot can make contact with a surface in the environment of the robot. For example, one or more feet of the robot, at a given time, can make contact with the ground, a step, a wall, an object in the environment, one or more additional or alternative surfaces, and/or combinations thereof.
[0064]
[0065] Similarly,
[0066]
[0067]
[0068] alternative task description; and/or combinations thereof.
[0069] The <gait definition block> 404 can include basic knowledge of one or more quadrupedal gaits. In some implementations, the gait definitions can connect the bounding gait to a general impression of emotion, which helps the LLM generalize over vague human commands (that may not explicitly specify what gaits the robot should use). For example, <gait definition block> 404 can include: 1. Trotting is a gait where two diagonally opposite legs strike the ground at the same time.; 2. Pacing is a gait where the two legs on the left/right side of the body strike the ground at the same time.; 3. Bounding is a gait where the two front/rear legs strike the ground at the same time. It has a longer suspension phase where all feet are off the ground, for example, for at least 25% of the cycle length. This gait also gives a happy feeling.; one or more additional or alternative descriptions of quadrupedal gaits; and/or combinations thereof.
[0070] The <output format definition block> 406 can specify the format of the output. In some implementations, the desired velocities can be discretized as
so the LLM can give outputs corresponding to words like fast(er) and slow(er). For example, <output format definition block> 406 can include The following are rules for describing the velocity and foot contact patterns:; 1. You should first output the velocity, then the foot contact pattern.; 2. There are five velocities to choose from: [1.0, 0.5, 0.0, 0.5, 1.0].; 3. A pattern has 4 lines, each of which represents the foot contact pattern of a leg.; 4. Each line has a label. FL is the front left leg, FR is the front right leg, RL is the rear left leg, and RR is the rear right leg.; 5. In each line, 0 represents the foot in the air, 1 represents the foot on the ground.; one or more additional or alternative output format definitions; and/or combinations thereof.
[0071] The <examples block> 408 can include general knowledge of instruction fine-tuning and/or can show the LLM a few concrete input-output pairs. In some implementations, the LLM can generalize and handle various commands, including those which vaguely state what velocity or gait the robot should use, based on a small number of gait examples (e.g., 3 gait examples, 5 gait examples, 10 gait examples, etc.). For example, the <examples block> 408 can include an example corresponding to the input trot slowly, the input bound in place, and the input pace backward fast.
[0072]
[0073] In some implementations, a control policy of the locomotion controller 508 can generate one or more joint positions 514 (e.g., one or more low-level commands for the robot) based on process the foot contact pattern 506, a linear velocity 510, one or more instances of sensor data 512 describing the current position of the robot, one or more additional or alternative value, and/or combinations thereof. In some implementations, the linear velocity 510 can be randomly selected by a user. Additionally or alternatively, the linear velocity can be sampled by the system and/or by the random pattern generator 504. In some implementations, the system can generate a sequence of robot states and a corresponding sequence of robot actions based on processing the foot contact pattern 506 using the control policy. In some of those implementations, the system can generate a reward based on the sequence of robot states and corresponding robot actions. One or more portions of the control policy can be updated based on the reward using reinforcement learning. For example, the system can update one or more portions of the control policy based on the determined reward in accordance with process 700 described herein with respect to
[0074]
[0075] In some implementations, the system can process the NL text input 602 using LLM 604 to generate a foot contact pattern 606. For example, the system can process a LLM prompt 400 as described herein with respect to
[0076] In some implementations, a linear velocity 608 can be provided, where the linear velocity is a target velocity for the robot to maintain while performing the locomotion task. In some implementations, the user can provide a linear velocity 608 (not depicted). In some other implementations, the LLM 604 can generate the linear velocity 608 in addition to the foot contact pattern 606 based on processing the NL text input 602.
[0077] In some implementations a locomotion controller 610 (e.g., a control policy of the locomotion controller) can generate one or more joint positions 614 which can be used to control the robot based on processing the foot contact pattern 606, the linear velocity 608, and/or one or more instances of sensor data 612. For example, the system can generate the one or more joint positions 614 in accordance with process 800 of
[0078]
[0079] At block 702, the system generates a training foot contact pattern using a random pattern generator. In some implementations, the system can generate the training foot contact pattern based on a training gait. In some implementations, the random pattern generator can generate a foot contact pattern given a specific gait G. As an illustrated example, the system can generate a foot contact pattern for G=PACING. In general, the system can sample T[24, 28]. When the control frequency is 50 Hz, the corresponding cycle length is 0.480.56 seconds. The system can sample a foot-ground contact length ratio within the cycle r.sub.contact[0.5, 0.7], where the foot-ground contact length ratio Tr.sub.contact can indicate the number of 1s and T(1r.sub.contact) can indicate the number of 0s in each row. In some implementations, the system needs to scale the length and/or shift bits of these ones and zeroes to produce feasible foot contact patterns on a real robot.
[0080] For example, for G=BOUND, the foot-ground contact time can be shortened to 60% of the sampled value (i.e., r.sub.contact=0.6r.sub.contact). Similarly, the ones can be placed at the beginning of the FL and FR rows, and the ones in the RL and RR rows can be shifted by 0.5Tr.sub.contact bits to the right. In contrast, for G=TROT, the system does not perform any scaling. Additionally or alternatively, the 1s can be shifted to form a complete foot contact pattern, where the ones are placed at the beginning of FL and RR and at the end of the FR and RL rows. In some implementations, r.sub.contact is not changed for G=PACE, but the cycle lengths can be shortened to half of its sampled value (i.e., T=0.5T) to make the gait natural and/or feasible. The system can still place the ones at the beginning in the FL and RL rows and at the end of the FR and RR rows. For G={STAND_STILL, STAND_3LEGS}, the system performs no scaling and fills the foot contact pattern matrix with ones. Additionally, for G=STAND_3LEGS, the system can randomly sample one row and replace it with zeroes.
[0081] At block 704, the system processes the training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and a corresponding sequence of robot states. In some implementations, the system processes a linear velocity and/or one or more instances of sensor data indicating the position of the robot with the training foot contact pattern using the control policy of the locomotion controller to generate the sequence of robot actions and corresponding sequence of robot states.
[0082] At block 706, the system determines a reward based on the generated sequence of robot actions and the corresponding sequence of robot states and/or the training foot contact pattern.
[0083] At block 708, the system updates one or more portions of the control policy of the locomotion controller based on the reward.
[0084] At block 710, the system determines whether to process any additional training foot contact patterns. In some implementations, the system can determine whether to process any additional training foot contact patterns based on whether there are any remaining unprocessed training foot contact patterns, whether a threshold number of foot contact patterns have been processed, whether a threshold duration of time has been spent training the control policy, based on one or more additional or alternative metrics, and/or combinations thereof. If the system determines to process one or more additional foot contact patterns, the system generates an additional training foot contact pattern at block 712 before proceeding back to blocks 704, 706, and 708. If not, the process ends.
[0085] At block 712, the system generates an additional training foot contact pattern using the random pattern generator. In some implementations, the additional training foot contact pattern can be based on the training gait of the robot or an additional training gait of the robot.
[0086]
[0087] At block 802, the system receives an instance of natural language (NL) text input indicating a task for a robot to perform in the environment. For example, a user can provide NL text input of Good news, we are going to a picnic!; bound forward slowly; raise your right front leg; act as if the ground is very hot; one or more additional or alternative instances of NL text input; and/or combinations thereof.
[0088] At block 804, the system processes the instance of NL text input using a LLM to generate a foot contact pattern. In some implementations, the foot contact pattern indicates a sequence of leg positions relative to the surface of the plurality of legs of the robot. In some of those implementations, one or more legs of the robot are in contact with the surface. In some implementations, the prompt provided to the LLM can include one or more general instructions, one or more gait definitions, one or more output format definitions, one or more foot contact pattern examples, one or more additional or alternative instances of text description, and/or combinations thereof. In some implementations, the LLM prompt is described herein with respect to
[0089] At block 806, the system generates control output based on processing the foot contact pattern using a locomotion controller of the robot. In some implementations, the system can process the foot contact pattern, one or more instances of sensor data identifying the current pose of the robot, and/or a linear velocity (e.g., a user provided linear velocity) using a control policy of the locomotion controller to generate control output. In some implementations, the control output can include one or more low-level commands to control the joint positions and/or motor angles of the robot.
[0090] At block 808, the system causes the robot to perform one or more actions based on the control output. For instance, the system can cause the robot to perform a locomotion task in accordance with the NL text input provided at block 802.
[0091]
[0092] Operational components 940a-940n may include, for example, one or more end effectors and/or one or more servo motors or other actuators to effectuate movement of one or more components of the robot. For example, the robot 920 may have multiple degrees of freedom and each of the actuators may control the actuation of the robot 920 within one or more of the degrees of freedom responsive to the control commands. As used herein, the term actuator encompasses a mechanical or electrical device that creates motion (e.g., a motor), in addition to any driver(s) that may be associated with the actuator and that translate received control commands into one or more signals for driving the actuator. Accordingly, providing a control command to an actuator may comprise providing the control command to a driver that translates the control command into appropriate signals for driving an electrical or mechanical device to create desired motion.
[0093] The robot control system 960 may be implemented in one or more processors, such as a CPU, GPU, and/or other controller(s) of the robot 920. In some implementations, the robot 920 may comprise a brain box that may include all or aspects of the control system 960. For example, the brain box may provide real time bursts of data to the operational components 940a-n, with each of the real time bursts comprising a set of one or more control commands that dictate, inter alia, the parameters of motion (if any) for each of one or more of the operational components 940a-n. In some implementations, the robot control system 960 may perform one or more aspects of method(s) described herein, such as process 700 of
[0094] As described herein, in some implementations all or aspects of the control commands generated by control system 960, in controlling a robot during performance of a robotic task, can be generated based on robotic skill(s) determined to be relevant for the robotic task and, optionally, based on determined map location(s) for environmental object(s). Although control system 960 is illustrated in
[0095]
[0096] User interface input devices 1022 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touch screen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices. In general, use of the term input device is intended to include all possible types of devices and ways to input information into computing device 1010 or onto a communication network.
[0097] User interface output devices 1020 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide non-visual display such as via audio output devices. In general, use of the term output device is intended to include all possible types of devices and ways to output information from computing device 1010 to the user or to another machine or computing device.
[0098] Storage subsystem 1024 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 1024 may include the logic to perform selected aspects of the process 700 of
[0099] These software modules are generally executed by processor 1014 alone or in combination with other processors. Memory 1025 used in the storage subsystem 1024 can include a number of memories including a main random access memory (RAM) 1030 for storage of instructions and data during program execution and a read only memory (ROM) 1032 in which fixed instructions are stored. A file storage subsystem 1026 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations may be stored by file storage subsystem 1026 in the storage subsystem 1024, or in other machines accessible by the processor(s) 1014.
[0100] Bus subsystem 1012 provides a mechanism for letting the various components and subsystems of computing device 1010 communicate with each other as intended. Although bus subsystem 1012 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple buses.
[0101] Computing device 1010 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computing device 1010 depicted in
[0102] In some implementations, a method implemented by one or more processors is provide, the method includes receiving an instance of natural language (NL) text input, wherein the instance of NL text input indicates a task for a robot to perform in an environment, wherein the robot has a plurality of legs, and wherein the robot is on a surface in the environment. In some implementations, the method includes processing the instance of NL text input using a large language model (LLM) to generate a foot contact pattern, wherein the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot, and wherein one or more of the legs of the robot are in contact with the surface. In some implementations, the method includes generating control output by processing the foot contact pattern using a locomotion controller of the robot. In some implementations, the method includes causing the robot to perform one or more actions based on the control output.
[0103] These and other implementations of the technology can include one or more of the following features.
[0104] In some implementations, the foot contact pattern indicates whether one or more of the legs of the robot are in contact with the surface and wherein the foot contact pattern further indicates whether one or more legs of the robot are not in contact with the surface. For example, a first set of the leg positions of the sequence can indicate contact with the surface, and a second set of leg positions of the sequence, that is distinct from the first set, can indicate non-contact with the surface.
[0105] In some implementations, the foot contact pattern indicates whether the one or more legs of the robot are in contact with the surface, wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a first distance from the surface, and wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a second distance from the surface. For example, a first set of leg positions of the sequence can indicate contact with the surface; a second set of leg positions of the sequence, that is distinct from the first set, can indicate non-contact with the surface at a first distance from the surface; and a third set of leg positions of the sequence, that is distinct from the first set and the second set, can indicate non-contact with the surface at a second distance from the surface, where the second distance from the surface is distinct from the first distance.
[0106] In some implementations, the task for the robot to perform in the environment, indicated by the instance of NL text input, is a locomotion task with a target gait of the robot. In some versions of those implementations, the target gait of the robot is a cyclic motion pattern that produces locomotion through a sequence of contacts with the surface. In some versions of those implementations, the target gait of the robot includes a bounding gait, a trotting gait, a pacing gait, standing still, and/or standing on three legs.
[0107] In some implementations, the robot is a quadruped robot and wherein the plurality of legs of the robot includes a front left leg, a front right leg, a rear left leg, and a rear right leg.
[0108] In some implementations, processing the instance of NL text input using the LLM to generate the foot contact pattern includes generating a prompt for the LLM based on the instance of NL text input. In some implementations, the method further includes generating the foot contact pattern based on processing the LLM prompt using the LLM. In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, includes one or more general instructions for the LLM, wherein the one or more general instructions for the LLM include instructions to translate the NL text input into the foot contact pattern, and wherein the one or more general instructions are in addition to any of the NL text input. In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, further includes one or more gait definitions, wherein each gait, of the one or more gait definitions, includes a NL text description of the gait, and wherein the NL text description of each gait is in addition to any of the NL text input.
[0109] In some versions of those implementations, the NL text description of one or more of the gaits, of the one or more gait definitions, includes NL text indicating an emotion corresponding to the gait. In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, further includes one or more foot contact pattern output instructions, wherein the one or more foot contact pattern output instructions include NL text description of how to format the foot contact pattern, and wherein the NL text description of how to format the foot contact pattern is in addition to any of the NL text input.
[0110] In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, further includes one or more example foot contact patterns.
[0111] In some implementations, the NL text input, or additional input, includes a user defined velocity for the robot in performance of the task in the environment. In some versions of those implementations, generating the control output by processing the foot contact pattern using the locomotion controller of the robot includes identifying a current state of one or more components of the robot. In some implementations, the method further includes generating the control output by processing, using the locomotion controller, (1) the current state of the one or more components of the robot, (2) the user defined velocity for the robot, and (3) the foot contact pattern.
[0112] In some implementations, a control policy of the locomotion controller is trained using one or more training foot contact patterns, where each of the training foot contact patterns is generated using a random pattern generator based on a training gait. In some versions of those implementations, training the control policy of the locomotion controller using a given training foot contact pattern, of the one or more training foot contact patterns includes processing the given training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states. In some implementations, the method further includes determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern. In some implementations, the method further includes updating one or more portions of the control policy of the locomotion controller based on the reward. In some versions of those implementations, determining the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern includes generating the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern to maximize an expected reward. In some versions of those implementations, the LLM is fine-tuned based on the generated reward. In some versions of those implementations, prior to receiving the instance of NL text input, wherein fine-tuning the LLM is based on a previously generated reward, wherein the previously generated reward is generated based on processing a prior sequence of robot actions and prior corresponding robot states and/or a prior training foot contact pattern.
[0113] In some implementations, the instance of NL text input is a text representation of a spoken utterance captured in an instance of audio data and/or the NL text input is an instance of text provided by a user via a keyboard.
[0114] In some implementations, a method implemented by one or more processors is provided, the method includes training a locomotion controller to generate control output for controlling a robot with a plurality of legs on a surface in the environment. In some implementations, the control output is generated based on processing a foot contact pattern using the locomotion controller. In some implementations, the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot where one or more of the legs of the robot are in contact with the surface. In some implementations, training the control policy of the locomotion controller includes selecting a training foot contact pattern generated based on a training gait using a random pattern generator. In some implementations, the method includes processing the training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states for use in controlling locomotion of the robot. In some implementations, the method includes determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern. In some implementations, the method includes updating one or more portions of the control policy of the locomotion controller based on the reward.
[0115] These and other implementations of the technology can include one or more of the following features.
[0116] In some implementations, the method further includes transmitting the control policy of the locomotion controller for use in controlling a given robot.
[0117] Other implementations can include a non-transitory computer readable storage medium storing instructions executable by one or more processor(s) (e.g., a central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s)), and/or tensor processing unit(s) (TPU(s))) to perform a method such as one or more of the methods described herein. Yet other implementations can include a system of one or more computers and/or one or more robots that include one or more processors operable to execute stored instructions to perform a method such as one or more of the methods described herein.