FOOT CONTACT PATTERN(S) AS INTERFACE FOR LANGUAGE TO CONTROL ROBOT(S)

20250291351 ยท 2025-09-18

    Inventors

    Cpc classification

    International classification

    Abstract

    Various implementations are provided which include receiving an instance of natural language (NL) text input indicating a task for a multi-legged robot to perform in an environment. In many implementations, the system can process the NL text input using a large language model (LLM) to generate a foot contact pattern, indicating a sequence of leg positions of the robot relative to the surface, where one or more of the legs of the robot are in contact with the surface. Additionally or alternatively, the system can generate low-level robot control output by processing the foot contact pattern using a locomotion controller.

    Claims

    1. A method implemented by one or more processors, the method comprising: receiving an instance of natural language (NL) text input, wherein the instance of NL text input indicates a task for a robot to perform in an environment, wherein the robot has a plurality of legs, and wherein the robot is on a surface in the environment; processing the instance of NL text input using a large language model (LLM) to generate a foot contact pattern, wherein the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot, and wherein one or more of the legs of the robot are in contact with the surface; generating control output by processing the foot contact pattern using a locomotion controller of the robot; and causing the robot to perform one or more actions based on the control output.

    2. The method of claim 1, wherein the foot contact pattern indicates whether one or more of the legs of the robot are in contact with the surface and wherein the foot contact pattern further indicates whether one or more of the legs of the robot are not in contact with the surface.

    3. The method of claim 1, wherein the foot contact pattern indicates whether the one or more legs of the robot are in contact with the surface, wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a first distance from the surface, and wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a second distance from the surface.

    4. The method of claim 1, wherein the task for the robot to perform in the environment, indicated by the instance of NL text input, is a locomotion task with a target gait of the robot.

    5. The method of claim 4, wherein the target gait of the robot is a cyclic motion pattern that produces locomotion through a sequence of contacts with the surface.

    6. The method of claim 4, wherein the target gait of the robot includes a bounding gait, a trotting gait, a pacing gait, standing still, and/or standing on three legs.

    7. The method of claim 1, wherein the robot is a quadruped robot and wherein the plurality of legs of the robot includes a front left leg, a front right leg, a rear left leg, and a rear right leg.

    8. The method of claim 1, wherein processing the instance of NL text input using the LLM to generate the foot contact pattern comprises: generating a prompt for the LLM based on the instance of NL text input; and generating the foot contact pattern based on processing the LLM prompt using the LLM.

    9. The method of claim 8, where the prompt for the LLM, that is based on the instance of NL text input, includes one or more general instructions for the LLM, wherein the one or more general instructions for the LLM include instructions to translate the NL text input into the foot contact pattern, and wherein the one or more general instructions are in addition to any of the NL text input.

    10. The method of claim 9, wherein the prompt for the LLM, that is based on the instance of NL text input, further includes one or more gait definitions, wherein each gait, of the one or more gait definitions, includes a NL text description of the gait, and wherein the NL text description of each gait is in addition to any of the NL text input.

    11. The method of claim 10, wherein the NL text description of one or more of the gaits, of the one or more gait definitions, includes NL text indicating an emotion corresponding to the gait.

    12. The method of claim 10, wherein the prompt for the LLM, that is based on the instance of NL text input, further includes one or more foot contact pattern output instructions, wherein the one or more foot contact pattern output instructions include NL text description of how to format the foot contact pattern, and wherein the NL text description of how to format the foot contact pattern is in addition to any of the NL text input.

    13. The method of claim 12, wherein the prompt for the LLM, that is based on the instance of NL text input, further includes one or more example foot contact patterns.

    14. The method of claim 1, wherein the NL text input, or additional input, includes a user defined velocity for the robot in performance of the task in the environment, and wherein generating the control output by processing the foot contact pattern using the locomotion controller of the robot comprises: identifying a current state of one or more components of the robot; and generating the control output by processing, using the locomotion controller, (1) the current state of the one or more components of the robot, (2) the user defined velocity for the robot, and (3) the foot contact pattern.

    15. The method of claim 1, wherein a control policy of the locomotion controller is trained using one or more training foot contact patterns, where each of the training foot contact patterns is generated using a random pattern generator based on a training gait.

    16. The method of claim 15, wherein training the control policy of the locomotion controller using a given training foot contact pattern, of the one or more training foot contact patterns comprises: processing the given training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states; determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern; and updating one or more portions of the control policy of the locomotion controller based on the reward.

    17. The method of claim 16, wherein determining the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern comprises: generating the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern to maximize an expected reward.

    18. The method of claim 16, wherein the LLM is fine-tuned based on the generated reward.

    19. The method of claim 16, prior to receiving the instance of NL text input, wherein fine-tuning the LLM is based on a previously generated reward, wherein the previously generated reward is generated based on processing a prior sequence of robot actions and prior corresponding robot states and/or a prior training foot contact pattern.

    20. The method of claim 1, wherein the instance of NL text input is a text representation of a spoken utterance captured in an instance of audio data and/or the NL text input is an instance of text provided by a user via a keyboard.

    21. A method implemented by one or more processors, the method comprising: training a locomotion controller to generate control output for controlling a robot with a plurality of legs on a surface in the environment, wherein the control output is generated based on processing a foot contact pattern using the locomotion controller, wherein the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot where one or more of the legs of the robot are in contact with the surface, and wherein training the control policy of the locomotion controller comprises: selecting a training foot contact pattern generated based on a training gait using a random pattern generator; processing the training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states for use in controlling locomotion of the robot; determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern; and updating one or more portions of the control policy of the locomotion controller based on the reward.

    22. The method of claim 21, further comprising: transmitting the control policy of the locomotion controller for use in controlling a given robot.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0014] FIG. 1 illustrates an example of a robot in accordance with some implementations described herein.

    [0015] FIG. 2A illustrates an example foot contact pattern for a quadruped robot, where the foot contact pattern corresponds with trotting in accordance with some implementations described herein.

    [0016] FIG. 2B illustrates another example foot contact for a quadruped robot, where the foot contact pattern corresponds with bounding in accordance with some implementations described herein.

    [0017] FIG. 3A illustrates an example foot contact pattern generated using an LLM, where the foot contact pattern corresponds with bounding in accordance with some implementations.

    [0018] FIG. 3B illustrates another example foot contact pattern generated using an LLM, where the foot contact pattern corresponds with trotting in accordance with some implementations.

    [0019] FIG. 3C illustrates a further example foot contact pattern generated using an LLM, where the foot contact pattern corresponds with pacing in accordance with some implementations.

    [0020] FIG. 4 illustrates example LLM prompts for generating a foot contact pattern in accordance with some implementations.

    [0021] FIG. 5 is a block diagram illustrating an example of training a locomotion controller in accordance with some implementations.

    [0022] FIG. 6 is a block diagram illustrating an example of controlling a robot based on natural language text input in accordance with some implementations.

    [0023] FIG. 7 is a flowchart illustrating an example process of training a locomotion controller in accordance with some implementations.

    [0024] FIG. 8 is a flowchart illustrating an example process of generating a foot contact pattern using a LLM for controlling a robot in accordance with some implementations.

    [0025] FIG. 9 schematically depicts an example architecture of a robot.

    [0026] FIG. 10 schematically depicts an example architecture of a computer system.

    DETAILED DESCRIPTION

    [0027] Large language models (LLMs) have demonstrated the potential to perform high-level planning. However, it remains a challenge for some LLMs to comprehend low-level commands, such as joint angle targets or motor torques. Various implementations described herein use foot contact patterns as an interface that bridges human commands in natural language and a locomotion controller that outputs these low-level commands. In some implementations, the system allows users to flexibly craft diverse locomotion behaviors for robot(s) (e.g., quadrupedal robots). In some implementations, the system can include LLM prompt design, a reward function, and/or exposing the controller to a feasible distribution of contact patterns. The resulting controller is capable of achieving diverse locomotion patterns that can be transferred to real world robot hardware.

    [0028] In some implementations, simple and/or effective interaction between a human user and a quadrupedal robot can pave the way towards creating intelligent and capable helper robots. Some human-robot interaction systems can enable quadrupedal robots to respond to natural language instructions as language is an important communication channel for human beings. Recent developments in Large Language Models (LLMs) have engendered a spectrum of applications fueled by the proficiency of LLMs to ingest an enormous amount of historical data, to adapt in-context to novel tasks with few examples, and to understand and interact with user intentions through a natural language interface.

    [0029] LLMs have been used to develop interactive and capable systems for physical robots. Researchers have demonstrated the potential of using LLMs to perform high-level planning and/or robot code writing. Nevertheless, unlike text generation where LLMs directly interpret the atomic elementstokensit often proves challenging for LLMs to comprehend low-level robotic commands such as joint angle targets or motor torques, especially for inherently unstable legged robots necessitating high-frequency control signals. Consequently, many systems presume the provision of high-level APIs for LLMs to dictate robot behavior, which may inherently limit the system's expressive capabilities.

    [0030] In some implementations, systems can use foot contact patterns as an interface that bridges human instructions in natural language and low-level commands. In some of those implementations, the resulting system for legged robots, particularly quadrupedal robots, can be interactive, where the system can allow users to craft diverse locomotion behaviors flexibly. Additionally or alternatively, the patterns of feet establishing and breaking contacts with the ground often govern the final locomotion behavior for legged robots due to the heavy reliance of quadruped locomotion on environmental contact. In some implementations, a contact pattern, describing the contact establishing and breaking timings for each leg, is a compact and flexible interface to author locomotion behaviors for legged robots. To leverage this interface for controlling quadruped robots, some implementations use an LLM-based approach to generate contact patterns, represented by 0s and 1s, from user instructions. Despite LLMs being trained with mostly natural language dataset, with proper prompting and in-context learning, the LLMs can produce contact patterns to represent diverse quadruped motions. In some implementations, a Deep Reinforcement Learning (DRL) based approach can be used to generate robot actions given a desired contact pattern. The DRL based approach can use a reward structure that only concerns about contact timing and exposing the policy to the right distribution of contact patterns. In some implementations, the system can include a controller capable of achieving diverse locomotion patterns that can be transferred to the real robot hardware.

    [0031] In some implementations, the system can include an interface of contact pattern for harnessing knowledge from LLMs to flexibly and interactive control quadruped robots; a pipeline to teach LLMs to generate complex contact patterns from user instructions; and/or a DRL-based method to train a low-level controller that realizes diverse contact patterns on real quadruped robots.

    [0032] In some implementations, the system can include using desired foot contact patterns as an interface between human commands in natural language and the locomotion controller. The locomotion controller can be used to not only complete the main task (e.g., following specified velocities), but also to place the robot's feet on the ground at the right time, such that the realized foot contact patterns are as close as possible to the desired ones. In some of those implementations, the locomotion controller can process the robot's proprioceptive sensor data, task commands (e.g., following a desired linear velocity), and/or the desired foot contact pattern to generate output indicating the desired joint positions. In some implementations, the foot contact patterns can be extracted by a cyclic sliding window from a pattern template, which is generated by a random pattern generator during training, and is translated from human commands in natural language by an LLM.

    [0033] In some implementations, a desired foot contact pattern can be defined by a cyclic sliding window of size L.sub.w that extracts the four feet ground contact flags between t+1 and t+L.sub.w from a pattern template and is of shape 4L.sub.w. A contact pattern template can be is a 4T matrix of 0s and 1s. In some implementations, 0s can represent feet in the air and 1s can represent feet on the ground. Additionally or alternatively, from top to bottom, each row in the matrix gives the foot contact patterns of the front left (FL), front right (FR), rear left (RL) and rear right (RR) feet. In some implementations, the LLM can map human commands into foot contact pattern templates in specified formats accurately given properly designed prompts. In some of those implementations, the LLM can map human commands into foot contact pattern templates even when the commands are unstructured and/or vague. In some implementations, a random pattern generator can be used to produce contact pattern templates that are of various pattern lengths T, with foot-ground contact ratios within a cycle based on a given gait type G. In some of those implementations, the locomotion controller can be trained using a wide distribution of movements, which can enable the locomotion controller to generalize better.

    [0034] While LLMs can learn knowledge from a vast amount of text data at training, providing proper prompts at inference can be the key to unlock and/or direct the acquired knowledge in meaningful ways. Carefully designed prompts can serve as the starting point for the models to generate text and guide the direction and context of the outputs. In some implementations, the system aims to enable the LLM to map any human commands in natural language to foot contact patterns in a specified format.

    [0035] In some implementations, the LLM prompt can include one or more general instructions, one or more gait definitions, one or more foot contact pattern output instructions, one or more example foot contact patterns, one or more additional or alternative prompt portions, and/or combinations thereof. The one or more general instructions can include natural language text which describes the task the LLM should accomplish. In some implementations, the one or more general instructions can include a task where the LLM is expected to translate an arbitrary command (e.g., the NL text input) to a foot contact pattern. The one or more gait definitions can include natural language text which gives basic knowledge of quadrupedal gaits. Although their descriptions are neither exhaustive nor sufficiently accurate, the one or more gait definitions provide enough information for the LLM to follow the rules. Additionally or alternatively, the one or more gait definitions connects the bounding gait to a general impression of emotion. This helps the LLM generalize over vague human commands that do not explicitly specify what gaits the robot should use.

    [0036] In some implementations, the LLM prompt can include an output definition which specifies the format of the output. For example, the system can discretize the desired velocities

    [00001] { - 1 , - 0.5 , 0 , 0 . 5 , 1 } m s

    so the LLM can give proper outputs corresponding to commands that contain words like fast(er) and/or slow(er).

    [0037] In some implementations, the LLM prompt can include an examples block, which can include general knowledge of instruction fine-tuning and/or can provide the LLM with a few concrete input-output pairs. For example, the examples block can include one or more gait examples, one or more velocity examples, etc.

    [0038] In some implementations, robot locomotion control can be represented as a Markov Decision Process (MDP). In some of those implementations, the MDP can be solved using DRL algorithms. For example, a MDP can be a tuple (S, A, r, f, P.sub.0, ), where S is the state space, A is the action space, r(s.sub.t, a.sub.t, s.sub.t+1) is the reward function, (s.sub.ta.sub.t) is the system transition function, P.sub.0 is the distribution of initial statess.sub.0, and [0,1] is the reward discount factor. In some implementations, the goal of a DRL algorithm can be to optimize a policy : Scustom-characterA so that the expected accumulated reward J=E.sub.s.sub.0.sub.P.sub.p[.sub.t .sup.tr(s.sub.t, a.sub.t, s.sub.t+1)] is maximized. In some implementations, a.sub.t=.sub.(s.sub.t) and is the set of learnable parameters. In locomotion tasks, s.sub.t often includes sensory data and goal conditions (e.g., user specified velocity commands), and a.sub.t is desired joint angles or motor torques. In some implementations, s.sub.t can be expanded to include a desired foot contact pattern, and the controller needs to achieve the main task as well as realize the desired foot contact patterns.

    [0039] In some implementations, the random pattern generator receives a gait type G, and can randomly sample a corresponding cycle length T and the ground contact ratio within the cycle for each foot, can conduct proper scaling and phase shifts, and/or can output a pattern template. While humans can give commands that map to a much wider set of foot contact pattern templates, as an example, the system can define and train on five types: G{BOUND, TROT, PACE, STAND_STILL, STAND_3LEGS}.

    [0040] In some implementations, the system can use a feed-forward neural network as the control policy .sub.. The control policy can be used to generate the desired positions for each motor joint and its input includes the base's angular velocities, the gravity vector {right arrow over (g)}=[0,0,1] in the base's frame, user specified velocity, current joint positions and velocities, policy output from the last time step, desired foot contact patterns, and/or one or more additional or alternative parameters. In some implementations, Unitree A1 can be used as the quadrupedal robot. A1 has 3 joints per leg (i.e., hip, thigh and calf joints) and L.sub.w=5, therefore the dimensions of the policy's input and output are 65 and 12, respectively. The policy has three hidden layers of sizes [512, 256, 128] with ELU (=1.0) at each hidden layer as the non-linear activation function.

    [0041] To encourage natural and symmetric behaviors, the system can employ a double-pass trick in the control policy. Specifically, instead of using a.sub.t=.sub.(s.sub.t) directly as the output, the system can use a.sub.t=0.5[.sub.(s.sub.t)+.sub.act(.sub.(.sub.obs(s.sub.t))] where .sub.act(.Math.) and .sub.obs(.Math.) flips left-right the policy's output and the robot's state respectively. Intuitively, this double-pass trick says the control policy should output consistently when it receives the original and the left-right mirrored states. In practice, this trick greatly improves the naturalness of the robot's movement and helps shrink the sim-to-real gap.

    [0042] One of the controller's main tasks is to follow user specified linear velocities along the robot's heading direction, while keeping the linear velocity along the lateral direction and the yaw angular velocity as close to zeros as possible. Additionally or alternatively, the controller needs to plan for the correct timing for feet-ground strikes so that the realized foot contact patterns match the desired ones. For real world deployment, the system can use a regularization term that penalizes action changing rate so that the real robot's movement is smoother. In addition to applying domain randomization, extra reward terms that keep the robot base stable can greatly shrink the sim-to-real gap and produce natural looking gaits. Finally, although no heavy engineering is required to train the locomotion policy with extra contact pattern inputs, it can help to balance the ratio of the gait types during training.

    [0043] In some implementations, the random pattern generator can generate a foot contact pattern given a specific gait G. As an illustrated example, the system can generate a foot contact pattern for G=PACING. In general, the system can sample T[24, 28]. When the control frequency is 50 Hz, the corresponding cycle length is 0.480.56 seconds. The system can sample a foot-ground contact length ratio within the cycle r.sub.contact[0.5, 0.7], where the foot-ground contact length ratio Tr.sub.contact can indicate the number of 1s and T(1r.sub.contact) can indicate the number of 0s in each row. In some implementations, the system needs to scale the length and/or shift bits of these ones and zeroes to produce feasible foot contact patterns on a real robot.

    [0044] For example, for G=BOUND, the foot-ground contact time can be shortened to 60% of the sampled value (i.e., r.sub.contact=0.6r.sub.contact). Similarly, the ones can be placed at the beginning of the FL and FR rows, and the ones in the RL and RR rows can be shifted by 0.5Tr.sub.contact bits to the right. In contrast, for G=TROT, the system does not perform any scaling. Additionally or alternatively, the 1s can be shifted to form a complete foot contact pattern, where the ones are placed at the beginning of FL and RR and at the end of the FR and RL rows. In some implementations, r.sub.contact is not changed for G=PACE, but the cycle lengths can be shortened to half of its sampled value (i.e., T=0.5T) to make the gait natural and/or feasible. The system can still place the ones at the beginning in the FL and RL rows and at the end of the FR and RR rows.

    [0045] For G={STAND_STILL, STAND_3LEGS}, the system performs no scaling and fills the foot contact pattern matrix with ones. Additionally, for G=STAND_3LEGS, the system can randomly sample one row and replace it with zeroes.

    [0046] In some implementations, the reward can consist of a set of weighted reward terms (e.g., 9 weighted reward terms): J=.sub.i=1.sup.8 w.sub.ir.sub.i, where the w.sub.i's are the weights and r.sub.i's are the rewards. In some implementations, the definition of one or more of the reward terms and the value of the weights are described as follows, where the purpose of each reward term is in brackets. [0047] (1) A linear velocity tracked reward [task reward] of

    [00002] r 1 = e - 4 ( ( v x - v x ) 2 + v y 2 ) , where v.sub.x and {circumflex over (v)}.sub.x are the current and desired linear velocities along the robot's heading direction, and v.sub.y is the current linear velocity along the lateral direction. In some implementations, all velocities are in the base frame, and w.sub.1=1. [0048] (2) An angular velocity tracking reward [task reward] of

    [00003] r 2 = e - 4 z 2 , where .sub.z is the current angular yaw velocity in the base frame and w.sub.2=0.5. [0049] (3) A penalty on foot contact pattern violations [task reward] of

    [00004] r 3 = 1 4 .Math. i = 1 4 .Math. "\[LeftBracketingBar]" c i - c i .Math. "\[RightBracketingBar]" , where c.sub.i, .sub.i{0,1} are the realized and desired foot-ground contact indicators for the i-th foot, and w.sub.3=1. [0050] (4) A regularization on action rate [sim-to-real] of r.sub.4=.sub.i=1.sup.l2 (a.sub.ta.sub.t1).sup.2, where a.sub.t and a.sub.t1 are the controller's output at the current and the previous time steps, and w.sub.4=0.005. [0051] (5) A penalty on roll and pitch angular velocities [sim-to-real]. In some implementations, the system encourage the robot's base to be stable during motion, hence r.sub.5=.sub.x.sup.2+.sub.y.sup.2, where .sub.x and .sub.y are the current roll and pitch angular velocities in the base frame, and w.sub.5=0/05. In some of those implementations, the penalty does not apply to G=BOUND. [0052] (6) A penalty on linear velocity along the z-axis [sim-to-real]. In some implementations, the system encourages base stability during motion, hence of r.sub.6=v.sub.z.sup.2, where v.sub.z is the current linear velocity along the z-axis in the base frame, and w.sub.6=2. In some of those implementations, the penalty does not apply to G=BOUND [0053] (7) A penalty on body collision [natural motion] of r.sub.7.sub.i=1.sup.K 1{F.sub.i>0.1}, where F.sub.i is the contact force on the i-th body. In some implementations, K=8 (i.e., 4 thighs and 4 calves) and w.sub.7=1. [0054] (8) A penalty on deviation from the default pose [natural motion] of r.sub.8=.sub.a.sub.t.sub.hip |a.sub.t|, where a.sub.t's are the actions (i.e., deviation from the default joint position) applied to the hip joints, and w.sub.8=0.03.

    [0055] In some implementations, the control policy of the locomotion controller can be trained using the following training configurations. In some implementations, PD control can be used to convert positions to torques in the system. The base value for the 2 gains are k.sub.p=20 and k.sub.d=0.5, and the control frequency is 50 Hz. In some implementations, a gait G can be sampled based on a randomly assigned gait to a robot at environment resets. Additionally or alternatively, the system can also sample the gait again every 150 steps in simulation. Of the 5 gait types described herein, some gaits are harder to learn than others. To avoid the case where the hard-to-learn gaits die out, leaving the controller to learn only the easier gaits, the system can restrict the sampling distribution such that the ratio of the gait types are approximately the same.

    [0056] In some implementations, the system can use proximal policy optimization (PPO) as the reinforcement learning method to train the controller. In some of those implementations, PPO can be used to train an actor-critic policy. The actor policy can use a feed-forward neural network as the actor policy .sub.. The actor policy can be used to generate the desired positions for each motor joint and its input includes the base's angular velocities, the gravity vector {right arrow over (g)}=[0,0,1] in the base's frame, user specified velocity, current joint positions and velocities, policy output from the last time step, desired foot contact patterns, and/or one or more additional or alternative parameters. In some implementations, Unitree A1 can be used as the quadrupedal robot. A1 has 3 joints per leg (i.e., hip, thigh and calf joints) and L.sub.w=5, therefore the dimensions of the policy's input and output are 65 and 12, respectively. The policy has three hidden layers of sizes [512, 256, 128] with ELU (=1.0) at each hidden layer as the non-linear activation function.

    [0057] In some implementations, the system critic policy can use a similar network architecture as the actor policy, except the output size is 1 (instead of 12) and it can receive the base velocities in the local frame as its input. The system can keep the hyper-parameters the same and train for 1000 iterations. For safety reasons, the system can end an episode early if the body height of the robot is lower than 0.25 meters.

    [0058] In some implementations, during training the system can sample noise Uni, and can add them to the controller's observations. The system can use PD control to convert positions to torques, and domain randomization can be applied to the 2 gains k.sub.p and k.sub.d.

    [0059] Turning now to the figures, FIG. 1 illustrates an example mobile robot 100 in accordance with some implementations described herein. However, additional and/or alternative robots can be utilized with techniques disclosed herein, such as additional robots that vary in one or more respects from robot 100 illustrated in FIG. 1. For example, a mobile forklift robot, an unmanned aerial vehicle (UAV), a non-mobile robot, and/or a humanoid robot can be utilized instead of or in addition to robot 100, in techniques described herein.

    [0060] Robot 100 includes a base 102, a head 104, front left leg 106, back left leg 108, front right leg 110, and rear right leg 112, where the legs are provided for locomotion of the robot. The front left leg 106 has a corresponding foot 114; the back left leg 108 has a corresponding foot 116; the front right leg 110 has a corresponding foot 118; and the rear right leg 112 has a corresponding foot 120. The robot 100 may include, for example, one or more motors to move one or more legs of the robot 100 to achieve a desired direction, velocity, and/or acceleration of movement for the robot 100. Additional or alternative robots can include one or more robot arms (not depicted) with an end effector (not depicted) that takes the form of a gripper with two opposing fingers or digits.

    [0061] Robot 100 also includes one or more vision components (not depicted) that can generate vision data (e.g., images) related to shape, color, depth, and/or other features of object(s) that are in the line of sight of the vision component(s). The vision component(s) can be, for example, a monocular camera, a stereographic camera (active or passive), and/or a 3D laser scanner. A 3D laser scanner can include one or more lasers that emit light and one or more sensors that collect data related to reflections of the emitted light. The 3D laser scanner can generate vision component data that is a 3D point cloud with each of the points of the 3D point cloud defining a position of a point of a surface in 3D space. A monocular camera can include a single sensor (e.g., a charge-coupled device (CCD)), and generate, based on physical properties sensed by the sensor, images that each includes a plurality of data points defining color values and/or grayscale values. For instance, the monocular camera can generate images that include red, blue, and/or green channels. Each channel can define a value for each of a plurality of pixels of the image such as a value from 0 to 255 for each of the pixels of the image. A stereographic camera can include two or more sensors, each at a different vantage point. In some of those implementations, the stereographic camera generates, based on characteristics sensed by the two sensors, images that each includes a plurality of data points defining depth values and color values and/or grayscale values. For example, the stereographic camera can generate images that include a depth channel and red, blue, and/or green channels.

    [0062] Robot 100 also includes one or more processors that, for example, can implement all or aspects of process 700 and/or 800 described herein. Additional description of some examples of the structure and functionality of various robots is provided herein.

    [0063] At a given time, one or more of the feet (e.g., 114, 116, 118, 120) of the robot can make contact with a surface in the environment of the robot. For example, one or more feet of the robot, at a given time, can make contact with the ground, a step, a wall, an object in the environment, one or more additional or alternative surfaces, and/or combinations thereof.

    [0064] FIG. 2A illustrates an example foot contact pattern 200 in accordance with some implementations. The example foot contact pattern 200 represents a quadruped robot (e.g., robot 100 of FIG. 1) moving with a trotting gait. In some implementations, the robot is trotting based on processing natural language text input of trot forward slowly. For each of the legs of the robot, the foot contact pattern includes a representation of when the foot makes contact with the ground and when the foot does not make contact with the ground over time. The foot contact pattern 200 includes a portion corresponding to the front left leg 202, the front right leg 204, the rear left leg 206, and the rear right leg 208. In the illustrated example, periods of time where the corresponding leg makes contact with the surface is indicated by diagonal shading.

    [0065] Similarly, FIG. 2B illustrates another example foot contact pattern 250 of a quadruped robot (e.g., robot 100 of FIG. 1) moving with a bounding gait. In some implementations, the robot is bounding based on processing natural language text input of good news, we are going to a picnic this weekend!. As described above with respect to FIG. 2A, for each of the legs of the robot, the foot contact pattern 250 includes a representation of when the foot makes contact with the ground and when the foot does not make contact with the ground. The foot contact pattern 250 includes a portion corresponding to the front left leg 252, the front right leg 254, the rear left leg 256, and the rear right leg 258. In the illustrated example, periods of time where the corresponding leg makes contact with the surface is indicated by diagonal shading.

    [0066] FIGS. 3A-3C illustrate example foot contact patterns generated using the LLM in accordance with some implementations, where a leg making contact with the ground (e.g., the surface) is represented by 1 and where the leg does not make contact with the ground is represented by 0. FIG. 3A illustrates a foot contact pattern 300 representing a bounding gait of the robot over time. FIG. 3B illustrates a foot contact pattern 330 representing a trotting gait over time. FIG. 3C illustrates a pacing foot contact pattern 360 over time.

    [0067] FIG. 4 illustrates example LLM prompts in accordance with various implementations. In the illustrated example 400, the LLM prompts can be divided into categories including <general instruction block> 402, <gait definition block> 404, <output format definition block 406>, <examples block> 408, one or more additional or alternative block types (not depicted), and/or combinations thereof. The <general instruction block> 402 describes the task the LLM should accomplish, such as the task of translating a natural language text command to a foot contact pattern. For example, the <general instruction block> 402 can include one or more task descriptions of You are a dog foot contact pattern expert; Your job is to give a velocity and a foot contact pattern based on the input; You will always give the output in the correct format no matter what the input is; one or more additional or

    [0068] alternative task description; and/or combinations thereof.

    [0069] The <gait definition block> 404 can include basic knowledge of one or more quadrupedal gaits. In some implementations, the gait definitions can connect the bounding gait to a general impression of emotion, which helps the LLM generalize over vague human commands (that may not explicitly specify what gaits the robot should use). For example, <gait definition block> 404 can include: 1. Trotting is a gait where two diagonally opposite legs strike the ground at the same time.; 2. Pacing is a gait where the two legs on the left/right side of the body strike the ground at the same time.; 3. Bounding is a gait where the two front/rear legs strike the ground at the same time. It has a longer suspension phase where all feet are off the ground, for example, for at least 25% of the cycle length. This gait also gives a happy feeling.; one or more additional or alternative descriptions of quadrupedal gaits; and/or combinations thereof.

    [0070] The <output format definition block> 406 can specify the format of the output. In some implementations, the desired velocities can be discretized as

    [00005] v x { - 1 , - 0.5 , 0 , 0.5 , 1 } m s

    so the LLM can give outputs corresponding to words like fast(er) and slow(er). For example, <output format definition block> 406 can include The following are rules for describing the velocity and foot contact patterns:; 1. You should first output the velocity, then the foot contact pattern.; 2. There are five velocities to choose from: [1.0, 0.5, 0.0, 0.5, 1.0].; 3. A pattern has 4 lines, each of which represents the foot contact pattern of a leg.; 4. Each line has a label. FL is the front left leg, FR is the front right leg, RL is the rear left leg, and RR is the rear right leg.; 5. In each line, 0 represents the foot in the air, 1 represents the foot on the ground.; one or more additional or alternative output format definitions; and/or combinations thereof.

    [0071] The <examples block> 408 can include general knowledge of instruction fine-tuning and/or can show the LLM a few concrete input-output pairs. In some implementations, the LLM can generalize and handle various commands, including those which vaguely state what velocity or gait the robot should use, based on a small number of gait examples (e.g., 3 gait examples, 5 gait examples, 10 gait examples, etc.). For example, the <examples block> 408 can include an example corresponding to the input trot slowly, the input bound in place, and the input pace backward fast.

    [0072] FIG. 5 is a block diagram illustrating an example 500 of training a control policy of a locomotion controller in accordance with various implementations. In some implementations, the system can select a desired gait type 502. For example, the system can define and train the control policy on five types: G{BOUND, TROT, PACE, STAND_STILL, STAND_3LEGS}. The random pattern generator 504 can process the desired gait type 502 to output a foot contact pattern 506. In some implementations, the random pattern generator 504 samples a pattern length T and/or the ground contact length within the cycle for each of the feet. Additionally or alternatively, the random pattern generator 504 can perform scaling and/or phase shifting to the foot contact pattern 506 as needed. As described above, the random pattern generator 504 can sample a foot contact pattern of length T, sample a contact ratio r.sub.contact, scale T or r.sub.contact as needed, and shift the bits based on the selected gait.

    [0073] In some implementations, a control policy of the locomotion controller 508 can generate one or more joint positions 514 (e.g., one or more low-level commands for the robot) based on process the foot contact pattern 506, a linear velocity 510, one or more instances of sensor data 512 describing the current position of the robot, one or more additional or alternative value, and/or combinations thereof. In some implementations, the linear velocity 510 can be randomly selected by a user. Additionally or alternatively, the linear velocity can be sampled by the system and/or by the random pattern generator 504. In some implementations, the system can generate a sequence of robot states and a corresponding sequence of robot actions based on processing the foot contact pattern 506 using the control policy. In some of those implementations, the system can generate a reward based on the sequence of robot states and corresponding robot actions. One or more portions of the control policy can be updated based on the reward using reinforcement learning. For example, the system can update one or more portions of the control policy based on the determined reward in accordance with process 700 described herein with respect to FIG. 7.

    [0074] FIG. 6 is a block diagram illustrating an example 600 of generating one or more robot joint positions 614 based on a foot contact pattern 606 in accordance with various implementations. In some implementations, the system can receive natural language (NL) text input 602 from a user. For example, the system can receive NL text input 602 of Go catch that squirrel on the tree. NL text input 602 can include a text representation of an utterance spoken by the user (e.g., a text representation of a spoken utterance captured in one or more instances of audio data). Additionally or alternatively, NL text input 602 can include text captured via one or more user interface input devices (e.g., a keyboard, a touchpad, etc.). In some implementations the NL text input 602 can be generated based on processing data not directly provided by the user. For example, the system can process one or more frames of a video to generate a text description of a desired foot contact pattern based on movement and/or audio captured in the video.

    [0075] In some implementations, the system can process the NL text input 602 using LLM 604 to generate a foot contact pattern 606. For example, the system can process a LLM prompt 400 as described herein with respect to FIG. 4 using the LLM 604.

    [0076] In some implementations, a linear velocity 608 can be provided, where the linear velocity is a target velocity for the robot to maintain while performing the locomotion task. In some implementations, the user can provide a linear velocity 608 (not depicted). In some other implementations, the LLM 604 can generate the linear velocity 608 in addition to the foot contact pattern 606 based on processing the NL text input 602.

    [0077] In some implementations a locomotion controller 610 (e.g., a control policy of the locomotion controller) can generate one or more joint positions 614 which can be used to control the robot based on processing the foot contact pattern 606, the linear velocity 608, and/or one or more instances of sensor data 612. For example, the system can generate the one or more joint positions 614 in accordance with process 800 of FIG. 8 described herein.

    [0078] FIG. 7 is a flowchart illustrating an example process 700 of training a control policy of a locomotion controller in accordance with various implementations described herein. For convenience, the operations of the process 700 are described with reference to a system that performs the operations. This system can include one or more components of a robot, such as a robot processor and/or robot control system of robot 100, robot 920, and/or other robot and/or can include one or more components of a computer system, such as computer system 1010. Moreover, while operations of process 700 are shown in a particular order, this is not meant to be limiting. One or more operations can be reordered, omitted and/or added.

    [0079] At block 702, the system generates a training foot contact pattern using a random pattern generator. In some implementations, the system can generate the training foot contact pattern based on a training gait. In some implementations, the random pattern generator can generate a foot contact pattern given a specific gait G. As an illustrated example, the system can generate a foot contact pattern for G=PACING. In general, the system can sample T[24, 28]. When the control frequency is 50 Hz, the corresponding cycle length is 0.480.56 seconds. The system can sample a foot-ground contact length ratio within the cycle r.sub.contact[0.5, 0.7], where the foot-ground contact length ratio Tr.sub.contact can indicate the number of 1s and T(1r.sub.contact) can indicate the number of 0s in each row. In some implementations, the system needs to scale the length and/or shift bits of these ones and zeroes to produce feasible foot contact patterns on a real robot.

    [0080] For example, for G=BOUND, the foot-ground contact time can be shortened to 60% of the sampled value (i.e., r.sub.contact=0.6r.sub.contact). Similarly, the ones can be placed at the beginning of the FL and FR rows, and the ones in the RL and RR rows can be shifted by 0.5Tr.sub.contact bits to the right. In contrast, for G=TROT, the system does not perform any scaling. Additionally or alternatively, the 1s can be shifted to form a complete foot contact pattern, where the ones are placed at the beginning of FL and RR and at the end of the FR and RL rows. In some implementations, r.sub.contact is not changed for G=PACE, but the cycle lengths can be shortened to half of its sampled value (i.e., T=0.5T) to make the gait natural and/or feasible. The system can still place the ones at the beginning in the FL and RL rows and at the end of the FR and RR rows. For G={STAND_STILL, STAND_3LEGS}, the system performs no scaling and fills the foot contact pattern matrix with ones. Additionally, for G=STAND_3LEGS, the system can randomly sample one row and replace it with zeroes.

    [0081] At block 704, the system processes the training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and a corresponding sequence of robot states. In some implementations, the system processes a linear velocity and/or one or more instances of sensor data indicating the position of the robot with the training foot contact pattern using the control policy of the locomotion controller to generate the sequence of robot actions and corresponding sequence of robot states.

    [0082] At block 706, the system determines a reward based on the generated sequence of robot actions and the corresponding sequence of robot states and/or the training foot contact pattern.

    [0083] At block 708, the system updates one or more portions of the control policy of the locomotion controller based on the reward.

    [0084] At block 710, the system determines whether to process any additional training foot contact patterns. In some implementations, the system can determine whether to process any additional training foot contact patterns based on whether there are any remaining unprocessed training foot contact patterns, whether a threshold number of foot contact patterns have been processed, whether a threshold duration of time has been spent training the control policy, based on one or more additional or alternative metrics, and/or combinations thereof. If the system determines to process one or more additional foot contact patterns, the system generates an additional training foot contact pattern at block 712 before proceeding back to blocks 704, 706, and 708. If not, the process ends.

    [0085] At block 712, the system generates an additional training foot contact pattern using the random pattern generator. In some implementations, the additional training foot contact pattern can be based on the training gait of the robot or an additional training gait of the robot.

    [0086] FIG. 8 is a flowchart illustrating an example process 800 of controlling a robot in accordance with various implementations described herein. For convenience, the operations of the process 800 are described with reference to a system that performs the operations. This system can include one or more components of a robot, such as a robot processor and/or robot control system of robot 100, robot 920, and/or other robot and/or can include one or more components of a computer system, such as computer system 1010. Moreover, while operations of process 700 are shown in a particular order, this is not meant to be limiting. One or more operations can be reordered, omitted and/or added.

    [0087] At block 802, the system receives an instance of natural language (NL) text input indicating a task for a robot to perform in the environment. For example, a user can provide NL text input of Good news, we are going to a picnic!; bound forward slowly; raise your right front leg; act as if the ground is very hot; one or more additional or alternative instances of NL text input; and/or combinations thereof.

    [0088] At block 804, the system processes the instance of NL text input using a LLM to generate a foot contact pattern. In some implementations, the foot contact pattern indicates a sequence of leg positions relative to the surface of the plurality of legs of the robot. In some of those implementations, one or more legs of the robot are in contact with the surface. In some implementations, the prompt provided to the LLM can include one or more general instructions, one or more gait definitions, one or more output format definitions, one or more foot contact pattern examples, one or more additional or alternative instances of text description, and/or combinations thereof. In some implementations, the LLM prompt is described herein with respect to FIG. 4.

    [0089] At block 806, the system generates control output based on processing the foot contact pattern using a locomotion controller of the robot. In some implementations, the system can process the foot contact pattern, one or more instances of sensor data identifying the current pose of the robot, and/or a linear velocity (e.g., a user provided linear velocity) using a control policy of the locomotion controller to generate control output. In some implementations, the control output can include one or more low-level commands to control the joint positions and/or motor angles of the robot.

    [0090] At block 808, the system causes the robot to perform one or more actions based on the control output. For instance, the system can cause the robot to perform a locomotion task in accordance with the NL text input provided at block 802.

    [0091] FIG. 9 schematically depicts an example architecture of a robot 920. The robot 920 includes a robot control system 960, one or more operational components 940a-940n, and one or more sensors 942a-942m. The sensors 942a-942m may include, for example, vision sensors, light sensors, pressure sensors, pressure wave sensors (e.g., microphones), proximity sensors, accelerometers, gyroscopes, thermometers, barometers, and so forth. While sensors 942a-m are depicted as being integral with robot 920, this is not meant to be limiting. In some implementations, sensors 942a-m may be located external to robot 920, e.g., as standalone units.

    [0092] Operational components 940a-940n may include, for example, one or more end effectors and/or one or more servo motors or other actuators to effectuate movement of one or more components of the robot. For example, the robot 920 may have multiple degrees of freedom and each of the actuators may control the actuation of the robot 920 within one or more of the degrees of freedom responsive to the control commands. As used herein, the term actuator encompasses a mechanical or electrical device that creates motion (e.g., a motor), in addition to any driver(s) that may be associated with the actuator and that translate received control commands into one or more signals for driving the actuator. Accordingly, providing a control command to an actuator may comprise providing the control command to a driver that translates the control command into appropriate signals for driving an electrical or mechanical device to create desired motion.

    [0093] The robot control system 960 may be implemented in one or more processors, such as a CPU, GPU, and/or other controller(s) of the robot 920. In some implementations, the robot 920 may comprise a brain box that may include all or aspects of the control system 960. For example, the brain box may provide real time bursts of data to the operational components 940a-n, with each of the real time bursts comprising a set of one or more control commands that dictate, inter alia, the parameters of motion (if any) for each of one or more of the operational components 940a-n. In some implementations, the robot control system 960 may perform one or more aspects of method(s) described herein, such as process 700 of FIG. 7 and/or process 800 of FIG. 8.

    [0094] As described herein, in some implementations all or aspects of the control commands generated by control system 960, in controlling a robot during performance of a robotic task, can be generated based on robotic skill(s) determined to be relevant for the robotic task and, optionally, based on determined map location(s) for environmental object(s). Although control system 960 is illustrated in FIG. 9 as an integral part of the robot 920, in some implementations, all or aspects of the control system 960 may be implemented in a component that is separate from, but in communication with, robot 920. For example, all or aspects of control system 960 may be implemented on one or more computing devices that are in wired and/or wireless communication with the robot 920, such as computing device 1010.

    [0095] FIG. 10 is a block diagram of an example computing device 1010 that may optionally be utilized to perform one or more aspects of techniques described herein. Computing device 1010 typically includes at least one processor 1014 which communicates with a number of peripheral devices via bus subsystem 1012. These peripheral devices may include a storage subsystem 1024, including, for example, a memory subsystem 1025 and a file storage subsystem 1026, user interface output devices 1020, user interface input devices 1022, and a network interface subsystem 1016. The input and output devices allow user interaction with computing device 1010. Network interface subsystem 1016 provides an interface to outside networks and is coupled to corresponding interface devices in other computing devices.

    [0096] User interface input devices 1022 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touch screen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices. In general, use of the term input device is intended to include all possible types of devices and ways to input information into computing device 1010 or onto a communication network.

    [0097] User interface output devices 1020 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide non-visual display such as via audio output devices. In general, use of the term output device is intended to include all possible types of devices and ways to output information from computing device 1010 to the user or to another machine or computing device.

    [0098] Storage subsystem 1024 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 1024 may include the logic to perform selected aspects of the process 700 of FIG. 7 and/or process 800 of FIG. 8.

    [0099] These software modules are generally executed by processor 1014 alone or in combination with other processors. Memory 1025 used in the storage subsystem 1024 can include a number of memories including a main random access memory (RAM) 1030 for storage of instructions and data during program execution and a read only memory (ROM) 1032 in which fixed instructions are stored. A file storage subsystem 1026 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations may be stored by file storage subsystem 1026 in the storage subsystem 1024, or in other machines accessible by the processor(s) 1014.

    [0100] Bus subsystem 1012 provides a mechanism for letting the various components and subsystems of computing device 1010 communicate with each other as intended. Although bus subsystem 1012 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple buses.

    [0101] Computing device 1010 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computing device 1010 depicted in FIG. 10 is intended only as a specific example for purposes of illustrating some implementations. Many other configurations of computing device 1010 are possible having more or fewer components than the computing device depicted in FIG. 10.

    [0102] In some implementations, a method implemented by one or more processors is provide, the method includes receiving an instance of natural language (NL) text input, wherein the instance of NL text input indicates a task for a robot to perform in an environment, wherein the robot has a plurality of legs, and wherein the robot is on a surface in the environment. In some implementations, the method includes processing the instance of NL text input using a large language model (LLM) to generate a foot contact pattern, wherein the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot, and wherein one or more of the legs of the robot are in contact with the surface. In some implementations, the method includes generating control output by processing the foot contact pattern using a locomotion controller of the robot. In some implementations, the method includes causing the robot to perform one or more actions based on the control output.

    [0103] These and other implementations of the technology can include one or more of the following features.

    [0104] In some implementations, the foot contact pattern indicates whether one or more of the legs of the robot are in contact with the surface and wherein the foot contact pattern further indicates whether one or more legs of the robot are not in contact with the surface. For example, a first set of the leg positions of the sequence can indicate contact with the surface, and a second set of leg positions of the sequence, that is distinct from the first set, can indicate non-contact with the surface.

    [0105] In some implementations, the foot contact pattern indicates whether the one or more legs of the robot are in contact with the surface, wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a first distance from the surface, and wherein the foot contact pattern further indicates whether one or more of the legs of the robot not in contact with the surface are a second distance from the surface. For example, a first set of leg positions of the sequence can indicate contact with the surface; a second set of leg positions of the sequence, that is distinct from the first set, can indicate non-contact with the surface at a first distance from the surface; and a third set of leg positions of the sequence, that is distinct from the first set and the second set, can indicate non-contact with the surface at a second distance from the surface, where the second distance from the surface is distinct from the first distance.

    [0106] In some implementations, the task for the robot to perform in the environment, indicated by the instance of NL text input, is a locomotion task with a target gait of the robot. In some versions of those implementations, the target gait of the robot is a cyclic motion pattern that produces locomotion through a sequence of contacts with the surface. In some versions of those implementations, the target gait of the robot includes a bounding gait, a trotting gait, a pacing gait, standing still, and/or standing on three legs.

    [0107] In some implementations, the robot is a quadruped robot and wherein the plurality of legs of the robot includes a front left leg, a front right leg, a rear left leg, and a rear right leg.

    [0108] In some implementations, processing the instance of NL text input using the LLM to generate the foot contact pattern includes generating a prompt for the LLM based on the instance of NL text input. In some implementations, the method further includes generating the foot contact pattern based on processing the LLM prompt using the LLM. In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, includes one or more general instructions for the LLM, wherein the one or more general instructions for the LLM include instructions to translate the NL text input into the foot contact pattern, and wherein the one or more general instructions are in addition to any of the NL text input. In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, further includes one or more gait definitions, wherein each gait, of the one or more gait definitions, includes a NL text description of the gait, and wherein the NL text description of each gait is in addition to any of the NL text input.

    [0109] In some versions of those implementations, the NL text description of one or more of the gaits, of the one or more gait definitions, includes NL text indicating an emotion corresponding to the gait. In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, further includes one or more foot contact pattern output instructions, wherein the one or more foot contact pattern output instructions include NL text description of how to format the foot contact pattern, and wherein the NL text description of how to format the foot contact pattern is in addition to any of the NL text input.

    [0110] In some versions of those implementations, the prompt for the LLM, that is based on the instance of NL text input, further includes one or more example foot contact patterns.

    [0111] In some implementations, the NL text input, or additional input, includes a user defined velocity for the robot in performance of the task in the environment. In some versions of those implementations, generating the control output by processing the foot contact pattern using the locomotion controller of the robot includes identifying a current state of one or more components of the robot. In some implementations, the method further includes generating the control output by processing, using the locomotion controller, (1) the current state of the one or more components of the robot, (2) the user defined velocity for the robot, and (3) the foot contact pattern.

    [0112] In some implementations, a control policy of the locomotion controller is trained using one or more training foot contact patterns, where each of the training foot contact patterns is generated using a random pattern generator based on a training gait. In some versions of those implementations, training the control policy of the locomotion controller using a given training foot contact pattern, of the one or more training foot contact patterns includes processing the given training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states. In some implementations, the method further includes determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern. In some implementations, the method further includes updating one or more portions of the control policy of the locomotion controller based on the reward. In some versions of those implementations, determining the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern includes generating the reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern to maximize an expected reward. In some versions of those implementations, the LLM is fine-tuned based on the generated reward. In some versions of those implementations, prior to receiving the instance of NL text input, wherein fine-tuning the LLM is based on a previously generated reward, wherein the previously generated reward is generated based on processing a prior sequence of robot actions and prior corresponding robot states and/or a prior training foot contact pattern.

    [0113] In some implementations, the instance of NL text input is a text representation of a spoken utterance captured in an instance of audio data and/or the NL text input is an instance of text provided by a user via a keyboard.

    [0114] In some implementations, a method implemented by one or more processors is provided, the method includes training a locomotion controller to generate control output for controlling a robot with a plurality of legs on a surface in the environment. In some implementations, the control output is generated based on processing a foot contact pattern using the locomotion controller. In some implementations, the foot contact pattern indicates a sequence of leg positions relative to the surface, of the plurality of legs of the robot where one or more of the legs of the robot are in contact with the surface. In some implementations, training the control policy of the locomotion controller includes selecting a training foot contact pattern generated based on a training gait using a random pattern generator. In some implementations, the method includes processing the training foot contact pattern using the control policy of the locomotion controller to generate a sequence of robot actions and corresponding robot states for use in controlling locomotion of the robot. In some implementations, the method includes determining a reward based on processing the sequence of robot actions and corresponding robot states and/or the training foot contact pattern. In some implementations, the method includes updating one or more portions of the control policy of the locomotion controller based on the reward.

    [0115] These and other implementations of the technology can include one or more of the following features.

    [0116] In some implementations, the method further includes transmitting the control policy of the locomotion controller for use in controlling a given robot.

    [0117] Other implementations can include a non-transitory computer readable storage medium storing instructions executable by one or more processor(s) (e.g., a central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s)), and/or tensor processing unit(s) (TPU(s))) to perform a method such as one or more of the methods described herein. Yet other implementations can include a system of one or more computers and/or one or more robots that include one or more processors operable to execute stored instructions to perform a method such as one or more of the methods described herein.