Method for controlling a robotic device
12447610 ยท 2025-10-21
Assignee
Inventors
- Philipp Christian Schillinger (Renningen, DE)
- Akshay Dhonthi Ramesh Babu (Ingolstadt, DE)
- Leonel Rozo (Boeblingen, DE)
Cpc classification
B25J9/1661
PERFORMING OPERATIONS; TRANSPORTING
B25J9/163
PERFORMING OPERATIONS; TRANSPORTING
B25J9/1653
PERFORMING OPERATIONS; TRANSPORTING
B25J13/08
PERFORMING OPERATIONS; TRANSPORTING
International classification
B25J9/00
PERFORMING OPERATIONS; TRANSPORTING
B25J13/08
PERFORMING OPERATIONS; TRANSPORTING
Abstract
A method of controlling a robotic device. The method includes generating a robot control model for performing a task, wherein the robot control model comprises parameters which influence the performance of the task, adjusting the parameters of the robot control model by optimizing a target function which evaluates the adherence to at least one condition with respect to the temporal progression of at least one continuous sensor signal when performing the task, and controlling the robotic device according to the robot control model in order to perform the task using the adjusted parameters.
Claims
1. A method for controlling a robotic device, the method comprising the following steps: generating a robot control model for performing a task, wherein the robot control model includes parameters which influence the performance of the task; adjusting the parameters of the robot control model by optimizing a target function which evaluates the adherence to at least one condition with respect to a temporal progression of at least one continuous sensor signal when performing the task; representing the at least one condition according to temporal signal logic in at least one temporal signal logic formula; converting the at least one temporal signal logic formula into at least one measure of robustness; evaluating the target function by determining a value of the at least one measure of robustness for performing the task; and controlling the robotic device according to the robot control model to perform the task using the adjusted parameters.
2. The method according to claim 1, wherein the parameters of the robot control model include time-related parameters and location-related parameters.
3. The method according to claim 1, wherein the robot control model is a hidden semi-Markov model (HSMM).
4. The method according to claim 1, wherein the at least one continuous sensor signal indicates a location of a portion of the robotic device and/or a force acting on a portion of the robotic device.
5. A robot control device configured to control a robotic device, the robotic device configured to: generate a robot control model for performing a task, wherein the robot control model includes parameters which influence the performance of the task; adjust the parameters of the robot control model by optimizing a target function which evaluates the adherence to at least one condition with respect to a temporal progression of at least one continuous sensor signal when performing the task; represent the at least one condition according to temporal signal logic in at least one temporal signal logic formula; convert the at least one temporal signal logic formula into at least one measure of robustness; evaluate the target function by determining a value of the at least one measure of robustness for performing the task; and control the robotic device according to the robot control model to perform the task using the adjusted parameters.
6. A non-transitory computer-readable medium on which is stored a computer program for controlling a robotic device, the computer program, when executed by a processor, causing the processor to perform the following steps: generating a robot control model for performing a task, wherein the robot control model includes parameters which influence the performance of the task; adjusting the parameters of the robot control model by optimizing a target function which evaluates the adherence to at least one condition with respect to a temporal progression of at least one continuous sensor signal when performing the task; representing the at least one condition according to temporal signal logic in at least one temporal signal logic formula; converting the at least one temporal signal logic formula into at least one measure of robustness; evaluating the target function by determining a value of the at least one measure of robustness for performing the task; and controlling the robotic device according to the robot control model to perform the task using the adjusted parameters.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
(4) The following detailed description relates to the figures, which show, for clarification, specific details and aspects of this disclosure by way of which the present invention can be implemented. Other aspects can be used, and structural, logical, and electrical changes can be made without departing from the scope of protection of the present invention. The various aspects of this disclosure are not necessarily mutually exclusive since some aspects of this disclosure can be combined with one or more other aspects of this disclosure in order to form new aspects.
(5) Various examples are described in more detail below.
(6)
(7) The robot 100 comprises a robotic arm 101, e.g., an industrial robotic arm used for handling or assembling a workpiece (or one or more other objects). The robotic arm 101 comprises manipulators 102, 103, 104 and a base (or support) 105, by means of which the manipulators 102, 103, 104 are supported. The term manipulator refers to the movable elements of the robotic arm 101, the actuation of which enables physical interaction with the environment, e.g., in order to perform a task. For the purpose of control, the robot 100 comprises a (robot) control device 106 configured to implement the interaction with the environment according to a control program. The last element 104 (farthest from the support 105) of the manipulators 102, 103, 104 is also referred to as the end effector 104 and can comprise one or more tools, e.g., a welding torch, a gripping instrument, a paint tool, or the like.
(8) The other manipulators 102, 103 (which are closer to the base 105) can form a positioning device so that, together with the end effector 104, the robotic arm 101 is provided with the end effector 104 at its end. The robotic arm 101 is a mechanical arm that can provide functions similar to those of a human arm (possibly with a tool at its end).
(9) The robotic arm 101 can comprise joint elements 107, 108, 109 connecting the manipulators 102, 103, 104 to one another and to the base 105. A joint element 107, 108, 109 can comprise one or more joints that may each provide rotary movement (i.e., rotational movement) and/or translational movement (i.e., displacement) for associated manipulators relative to one another. The movement of the manipulators 102, 103, 104 can be initiated by means of actuators controlled by the control device 106.
(10) The term actuator can be understood to mean a component that is designed to influence a mechanism or process in response to the component being driven. The actuator can convert instructions output by the control device 106 (referred to as activation) into mechanical movements. The actuator, e.g. an electromechanical converter, can be designed to convert, in response to its activation, electrical energy into mechanical energy.
(11) The term control device can be understood to mean any type of logic implemented by an entity including, e.g., a circuit and/or a processor capable of executing software that is stored in a storage medium, firmware, or a combination of both, and which can issue instructions, e.g., to an actuator in the present example. For example, the control device can be configured by means of a program code (e.g., software) in order to control the operation of a robot.
(12) In the present example, the control device 106 comprises one or more processors 110 and a memory 111 that stores code and data, based on which the processor 110 controls the robotic arm 101. According to various embodiments, the control device 106 controls the robotic arm 101 on the basis of a machine learning model 112 stored in the memory 111. For example, the robot 100 is to pick up an object 113. For example, the end effector 104 is a gripper and is to pick up the object 113, but the end effector 104 can also be configured for example to apply suction to the object 113 in order to pick it up.
(13) According to various embodiments of the present invention, learning from demonstrations (LfD) is used to teach the robot 100 to perform a task. Human demonstrations can be encoded by the machine learning model 112 (in this case, a probabilistic or statistical model) representing the nominal plan of the task for the robot. The control device 106 can subsequently use the statistical model 112, which is also referred to as a robot trajectory model, to generate desired robotic movements.
(14) The basic idea of LfD is to adapt a prescribed movement skill model, e.g. GMM, to a set of demonstrations. M demonstrations are to be provided, each containing T.sub.m data points for a data set of N=.sub.mT.sub.m overall observations ={.sub.t}.sub.t=1.sup.N, where .sub.t.sup.d. It is also assumed that the same demonstrations are recorded from the perspective of P different coordinate systems (given by the task parameters, e.g., local coordinate systems or frames of reference of objects of interest). One conventional way of obtaining such data consists of transforming the demonstrations from a static global frame of reference to a frame of reference p by .sub.t.sup.(p)=A.sup.(p).sup.
(15) In contrast to standard GMM, the mixture model above cannot be independently learned for each frame of reference. In fact, the mixing coefficients .sub.k are shared by all reference frames, and the k-th component in reference frame p must map onto the corresponding k-th component in the global reference frame. Expectation maximization (EM) is an established method for learning such models.
(16) Once learned, the TP-GMM can be used during execution to reproduce a trajectory for the learned movement skill. This includes controlling the robot so that starting from an initial configuration it reaches a target configuration (e.g., its end effector 104 moves from an initial pose to an end pose). To this end, the (time dependent) acceleration of the joint elements 107, 108, 109 is calculated. In view of the observed frames of reference {b.sup.(p), A.sup.(p)}.sub.p=1.sup.P, the learned TP-GMM is converted into a single GMM with parameters {.sub.k, ({circumflex over ()}.sub.k, {circumflex over ()}.sub.k)}.sub.k=1.sup.K by multiplying the affinely transformed Gaussian components across various frames of reference as follows
{circumflex over ()}.sub.k=[.sub.p=1.sup.P({circumflex over ()}.sub.k.sup.(p)).sup.1].sup.1,{circumflex over ()}.sub.k={circumflex over ()}.sub.k[.sub.p=1.sup.P({circumflex over ()}.sub.k.sup.(p)).sup.1{circumflex over ()}.sub.k.sup.(p)],(1)
where the parameters of the updated Gaussian bell curve at each reference frame p are calculated as .sub.k.sup.(p)=A.sup.(p).sub.k.sup.(p)+b.sup.(p) and {circumflex over ()}.sub.k.sup.(p)=A.sup.(p).sub.k.sup.(p)A.sup.(p).sup.
(17) Hidden semi-Markov models (HSMMs) extend hidden standard Markov Models (HMMs) by embedding time information of the underlying stochastic process. In other words, whereas in HMM the underlying hidden process is assumed to be Markov, i.e., the probability of transition to the next state only depends on the current state, in HSMM the state process is assumed to be semi-Markov. This means that a transition to the next state depends on the current state as well as on the elapsed time since the state was entered. These models can be used in combination with TP-GMMs for robotic movement skill coding to learn spatial-temporal characteristics of the demonstrations. A task-parameterized HSMM (TP-HSMM) model is defined as:
={{a.sub.hk}.sub.h=1.sup.K,(.sub.k.sup.D,.sub.k.sup.D),.sub.k,{(.sub.k.sup.(p),.sub.k.sup.(p))}.sub.p=1.sup.P}.sub.k=1.sup.K,
where a.sub.hk is the transition probability from state h to k; (.sub.k.sup.D, .sub.k.sup.D) describes the Gaussian distributions for the duration of state k, i.e., the probability that state k will persist for a certain number of consecutive steps; and {.sub.k,{.sub.k.sup.(p),.sub.k.sup.(p)}.sub.p=1.sup.P)}.sub.k=1.sup.K is equal to the previously introduced TP-GMM, which represents the probability of observation that corresponds to state k. In this context, it should be noted that the number of states represents the number of Gaussian components in the connected TP-GMM.
(18) In view of a particular (partial) sequence of observed data points {.sub.l}.sub.l=1.sup.t, it shall be assumed that the associated sequence of states in is given by s.sub.t=s.sub.1s.sub.2 . . . s.sub.t. The probability that the data point .sub.t belongs to state k (i.e., s.sub.t=k) is given by the forward variable: .sub.t(k)=p(s.sub.t=k, {.sub.l}.sub.l=1.sup.t):
.sub.t(k)=.sub.=1.sup.t-1.sub.h=1.sup.K.sub.t-(h)a.sub.hk(|.sub.k.sup.D,.sub.k.sup.D).sub..sup.t,(2)
where o.sub..sup.t=.sub.l=t-+1.sup.t(.sub.l|{circumflex over ()}.sub.k,{circumflex over ()}.sub.k) is the emission probability, and ({circumflex over ()}.sub.k,{circumflex over ()}.sub.k) is derived from (1) in view of the task parameters. Further, the same forward variable can also be used during reproduction in order to predict future steps until T.sub.m.
(19) However, since future observations are not available in this case, only transition and duration information will be used, i.e., by setting (.sub.l|{circumflex over ()}.sub.k,{circumflex over ()}.sub.k)=1 for all k and
>t in (2). Finally, the sequence of most likely states s*.sub.T.sub.
(20) A desired end observation of the robotic state is now to be given as .sub.T, where T is the movement skill time horizon (e.g., the average length across the demonstrations). Moreover, the initial robotic state is observed as .sub.1. Regarding the execution of the movement skill (i.e., movement skill reproduction) in view of the learned model .sub.a, only the most likely state sequence s*.sub.T is constructed in view of only .sub.1 and .sub.T.
(21) Reproduction using the forward variable cannot be done directly in this case, as the forward variable in equation (2) calculates the sequence of marginally most likely states, whereas what is desired is the collectively most likely sequence of states in consideration of .sub.1 and .sub.T. Consequently, when (2) is used, there is no guarantee that the returned sequence s*.sub.T will correspond to both the spatial-temporal patterns of the demonstrations and the final observation. With respect to an example of picking up an object, it may return a most likely sequence corresponding to picking up from the side even if the desired end configuration is that the end effector is located on the top side of the object.
(22) According to one embodiment, a modification of the Viterbi algorithm is used to solve this problem. The classical Viterbi algorithm can be used to find the most likely sequence of states (also called the Viterbi path) in HMMs that result in a given flow of observed events. According to one embodiment, a method is used that differs from the one described in two main aspects: (a) it works with HSMM instead of HMM; and, more significantly, (b) most observations are absent, apart from the first and the last. In particular, given the absence of observations, the Viterbi algorithm is
(23)
where p.sub.j(d)=(d|.sub.j.sup.D, .sub.j.sup.D) is the probable duration of the state j actual, .sub.t(j) is the probability that the system is in state j at time t, and not in state j at t+1; and
(24)
where ({circumflex over ()}.sub.j, {circumflex over ()}.sub.j) is the global Gaussian component j in .sub.a of (1) in view of .sub.t. Specifically, at any time t and for any state j, the two arguments that maximize the equation .sub.t(j) are recorded, and a simple backtracking procedure is used to find the most likely state sequence s*.sub.T. In other words, the above algorithm derives the most likely sequence s*.sub.T for the movement skill a that yields the final observation .sub.T based on .sub.1.
(25) Regarding a (movement) skill of a robot 100, it may now be desirable in certain applications for certain conditions to be definable that are met by the robot 100 when executing the skill. Examples in this regard are that a particular force or friction is not exceeded (e.g., two parts are not pushed too hard together, or a sleeve is not pushed over a rod with too much friction), or a particular spatial area is not departed from.
(26) According to various embodiments, an approach is provided for considering formal specifications within skills (or capabilities) learned by LfD. Specifically, according to various embodiments, signal temporal logic (STL), which is a more expressive variant of temporal logic as compared to LTL, is used to formulate task specifications in the form of reward functions, and a black box optimization (BBO) approach is employed to adjust a skill learned by LfD as the hidden semi-Markov model (HSMM).
(27) Accordingly, various embodiments are based on STL, BBO, and learning of robotic skills by way of LfD as HSMM. Regarding conversion of an STL specification to a targeting function for BBO, see reference [3] for a description of various robustness measures that can be used in connection with the embodiments described below. Bayesian optimization (BO) or CMA-ES (Covariance Matrix Adaptation Evolution Strategy) can be used for BBO. The embodiments described below do not make any specific assumptions regarding the BB optimization technique to be used.
(28) According to various embodiments, an optimization method improves a robot control model for a particular skill so that given conditions are accounted for. This takes place by way of an iterative process, in which the following operations (1-4; see below) are repeated for a given number of times in order to obtain an improved version of the robot control model with respect to the specification of a target that reflects the given conditions.
(29) Based on an initial robotic control model used for a skill, according to various embodiments of an HSMM, e.g., derived from a set of human demonstrations (e.g., reference trajectories), and a formal STL specification of conditions (e.g., targets to be achieved when performing a task as auxiliary conditions), the following steps are performed repeatedly (e.g., by control device 106): 1) determining a modification (variant) of the robotic model for the skill 2) performing the skill by modifying the robotic model and recording (sensor) signals occurring in the process 3) assessing the extent to which the recorded signals comply with (i.e. satisfy) the predetermined STL specification for the conditions 4) updating the BBO optimizer and remembering the currently best modification (with respect to satisfaction of the conditions)
(30)
(31) Regarding operation 1), a robot control model for a skill can be altered in a variety of ways, i.e., by modifying a number of different parameters 201 of the robot control model. Given the representation of an HSMM model, the natural choice for such parameters 201 are the middle layers (.sub.k.sup.(p)) of the components, the probability distribution parameters for the duration of the components .sub.k.sup.D,.sub.k.sup.D and the transition probabilities a.sub.hk between the components. These are modified for each modification by interferences (changes) .sub.k.sup.(p), .sub.k.sup.D,.sub.k.sup.D, a.sub.hk in order to modify the robot control model.
(32) In addition, regarding operation 1), the determination of the parameters for the modification may be performed in various ways and will depend primarily on the choice of the BBO method. For example, the parameter values can be randomly selected. In Bayesian optimization (BO), the parameter values are typically determined (starting from an initialization 204 of the optimization parameters) by optimizing a so-called acquisition function 202, wherein a replacement (e.g., a Gaussian process 203) of the (unknown) function to be optimized is formed in order to model the relationship between the selection of the parameters and the expected target value. When using CMA-ES, the parameter values are taken from a probability distribution that is modified over time in order to sample parameters that result in a higher target function with a higher level of probability.
(33) Following modification of the robot control model, execution 206 of the skill is performed according to the modified robot control model.
(34) Regarding operation 2), the given STL specification 205 is expressed by a series of so-called predicates that require certain characteristics regarding the execution of the skill, e.g., maintaining a small magnitude of the contact forces or remaining outside or within certain areas of the work space. This indicates directly which sensor signals must be recorded during the execution of the task, i.e., which variables must be measured (namely, e.g., a respective contact force or the position, e.g., of the end effector 104 of the robot).
(35) In operation 3), the signals recorded during the execution 206 are discretized, and the value of a robustness metric 207 of STL specification 205 can be calculated for these signals (e.g., as described in reference [3]). Various formulations of robustness metrics exist comprising a variety of characteristics which are suitable for the purpose of task optimization. One descriptive metric is referred to as spatial robustness, which measures the distances of the signal values for changing the truth values of the propositions for each individual point in time along the discretized signal. For example, this metric measures the difference between measured forces and the specified contact force, or it measures the Euclidean distance to a particular area in the work space.
(36) Regarding operation 4), both the changes of the model parameters 201 and the resulting value 207 of the STL-specified target are now known and can be transferred to the BBO method being used. This operation will also depend on the BBO method being used. For example, in the case of BO, the Gaussian process (GP) 203 is updated so as to include the observation obtained or, in the case of CMA-ES, the sample distribution is updated accordingly. In extreme cases, e.g., in a random sample of parameters, this step can be omitted.
(37) In summary, a method is provided according to various embodiments, as shown in
(38)
(39) At 301, a robot control model for performing a task is generated, wherein the robot control model comprises parameters that influence the performance of the task.
(40) At 302, the parameters of the robot control model are adjusted by optimizing a target function which evaluates the adherence to at least one condition with respect to the temporal progression of at least one continuous sensor signal when performing the task.
(41) At 303, the robotic device is controlled according to the robot control model in order to perform the task using the adjusted parameters.
(42) The method shown in
(43) The approach shown in
(44) To generate the control signal, (sensor) data obtained from one or more received sensor signals are processed, e.g., in the form of scalar time series containing specific data about any type of trajectories, e.g., robot end effector poses (position and orientation), forces, robotic joint forces, etc.
(45) The sensor data are processed, which may include classifying the sensor data or performing semantic segmentation on the sensor data in order to detect the presence of objects (in the environment in which the sensor data were obtained) with respect to the adherence (and quantification of the adherence) to a user-provided formal specification of one or more conditions.
(46) Embodiments can be used in the context of training a machine learning system and controlling a robot, e.g., autonomously by robot manipulators, in order to accomplish various manipulation tasks under various scenarios. In particular, embodiments may be applied to the control and monitoring of the execution of manipulation tasks, e.g., in assembly lines. For example, they can be seamlessly integrated into a conventional GUI for a control process (e.g., in order to allow a user to specify conditions).
(47) Although specific embodiments have been illustrated and described herein, one skilled in the art will recognize that the specific embodiments shown and described can be substituted by a variety of alternative and/or equivalent implementations without departing from the scope of protection of the present invention. This application is to cover any adaptations or variations of the specific embodiments discussed herein.