Method and system for robot manipulation planning
11498212 · 2022-11-15
Assignee
Inventors
- Andras Gabor Kupcsik (Boeblingen, DE)
- Leonel Rozo (Boeblingen, DE)
- Marco Todescato (Stuttgart, DE)
- Markus Spies (Karlsruhe, DE)
- Markus Giftthaler (Freising, DE)
- Mathias Buerger (Stuttgart, DE)
- Meng Guo (Renningen, DE)
- Nicolai Waniek (Dornstadt, DE)
- Patrick Kesper (Korntal-Muenchingen, DE)
- Philipp Christian Schillinger (Renningen, DE)
Cpc classification
G06N7/01
PHYSICS
B25J9/1661
PERFORMING OPERATIONS; TRANSPORTING
B25J9/1664
PERFORMING OPERATIONS; TRANSPORTING
G05B2219/40629
PHYSICS
G06N5/01
PHYSICS
G05B2219/40429
PHYSICS
B25J9/163
PERFORMING OPERATIONS; TRANSPORTING
International classification
Abstract
A method for planning a manipulation task of an agent, particularly a robot. The method includes: learning a number of manipulation skills wherein a symbolic abstraction of the respective manipulation skill is generated; determining a concatenated sequence of manipulation skills selected from the number of learned manipulation skills based on their symbolic abstraction so that a given goal specification indicating a given complex manipulation task is satisfied; and executing the sequence of manipulation skills.
Claims
1. A computer-implemented method for planning a manipulation task of a robot, comprising the following steps: learning a number of manipulation skills and generating a symbolic abstraction of each of the manipulation skills, by which a respective statistic model of a transition to a respective final state from a respective current state is determined for each of the learned manipulation skills; determining a concatenated sequence of a subset of the manipulation skills selected from the number of learned manipulation skills based on their symbolic abstractions so that a given goal specification indicating a given complex manipulation task is satisfied, wherein: the determining of the concatenated sequence includes generating a complete model corresponding to a combination, in the sequence, of all of the subset of the manipulation skills; and the generating of the complete model is performed by computing, for each of a plurality of pairwise combinations of individual manipulation skills of the subset, a respective probability of a transition from a first one of the individual manipulation skills of the respective pairwise combination to a second one of the individual manipulation skills of the respective pairwise combination according to a divergence of emission probabilities between an end state of the first one of the individual manipulation skills of the respective pairwise combination and an initial state of the second one of the individual manipulation skills of the respective pairwise combination; and executing the sequence of manipulation skills.
2. The method according to claim 1, wherein: the statistical model is a Task Parameterized Hidden Semi-Markov model (TP-HSMM); and the learning of the number of manipulation skills includes, for each of the number of manipulation skills: demonstrating and recording a plurality of manipulation trajectories; determining a respective instance of the TP-HSMM depending on the plurality of manipulation trajectories of the respective manipulation skill; and determining a respective one of the symbolic abstractions.
3. The method according to claim 2, wherein the generating of the symbolic abstractions of the manipulation skills includes constructing a Planning Domain Definition Language (PDDL) model in which objects, the initial states, and a goal specification define problem instances, and in which predicates and actions define a domain of a given manipulation, wherein the symbolic abstractions of the manipulation skills use a classical PDDL planning language.
4. The method according to claim 2, where the determining of the concatenated sequence of manipulation skills is performed such that a probability of achieving the given goal specification is maximized, wherein a PDDL planning step is used to find a sequence of actions to fulfill the given goal specification, starting from a given one of the initial states.
5. The method according to claim 2, where the transition probabilities are determined using Expectation-Maximization.
6. The method according to claim 2, wherein the symbolic abstractions are determined by mapping low-variance geometric relations of segments of the manipulation trajectories into a set of predicates.
7. The method according to claim 2, wherein the determining of the concatenated sequence of manipulation skills includes optimizing a goal of minimization of a total length of a combined trajectory of the concatenated sequence.
8. The method according to claim 7, wherein the determining of the concatenated sequence of manipulation skills includes selectively reproducing one or more of the manipulation skills of the sequence of manipulation skills so as to maximize a probability of satisfying the given goal specification.
9. The method according to claim 7, wherein the determining of the concatenated sequence of manipulation skills further includes searching a most-likely complete state sequence between initial and goal states of the complex manipulation task as a whole using a Viterbi algorithm.
10. The method according to claim 7, wherein the determining of the concatenated sequence of manipulation skills further includes searching a most-likely complete state sequence between initial and goal states of the manipulation task as a whole using a Viterbi algorithm that includes missing observations and duration probabilities.
11. The method according to claim 1, wherein an individual one of the manipulation skills of the subset is included multiple times within the concatenated sequence.
12. A device for planning a manipulation task of a robot, the device comprising a processor, the processor configured to: learn a number of manipulation skills, thereby generating a symbolic abstraction of each of the manipulation skills and by which a respective statistic model of a transition to a respective final state from a respective current state is determined for each of the learned manipulation skills; determine a concatenated sequence of a subset of the manipulation skills selected from the number of learned manipulation skills based on their symbolic abstractions so that a given goal specification indicating a complex manipulation task is satisfied; and instruct execution of the sequence of manipulation skills; wherein: the determination of the concatenated sequence includes generating a complete model corresponding to a combination, in the sequence, of all of the subset of the manipulation skills; and the generating of the complete model is performed by computing, for each of a plurality of pairwise combinations of individual manipulation skills of the subset, a respective probability of a transition from a first one of the individual manipulation skills of the respective pairwise combination to a second one of the individual manipulation skills of the respective pairwise combination according to a divergence of emission probabilities between an end state of the first one of the individual manipulation skills of the respective pairwise combination and an initial state of the second one of the individual manipulation skills of the respective pairwise combination.
13. A non-transitory machine-readable storage medium on which is stored a computer program that is executable by a computer and that, when executed by the computer, causes the computer to perform a method for planning a manipulation task of a robot, the method comprising the following steps: learning a number of manipulation skills and generating a symbolic abstraction of each of the manipulation skills, by which a respective statistic model of a transition to a respective final state from a respective current state is determined for each of the learned manipulation skills; determining a concatenated sequence of a subset of the manipulation skills selected from the number of learned manipulation skills based on their symbolic abstractions so that a given goal specification indicating a given complex manipulation task is satisfied, wherein: the determining of the concatenated sequence includes generating a complete model corresponding to a combination, in the sequence, of all of the subset of the manipulation skills; and the generating of the complete model is performed by computing, for each of a plurality of pairwise combinations of individual manipulation skills of the subset, a respective probability of a transition from a first one of the individual manipulation skills of the respective pairwise combination to a second one of the individual manipulation skills of the respective pairwise combination according to a divergence of emission probabilities between an end state of the first one of the individual manipulation skills of the respective pairwise combination and an initial state of the second one of the individual manipulation skills of the respective pairwise combination; and executing the sequence of manipulation skills.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Example embodiments of the present invention are described in more detail in conjunction with the figures.
(2)
(3)
(4)
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
(5) .sup.3×S.sup.3×
.sup.1 (describing the Cartesian position, orientation and gripper state in a global coordinate system (frame)), that operates within a static or dynamic and known workspace. Also, within the reach of the robot arm 2, there are objects of interest denoted by O={o.sub.1, o.sub.1, . . . , o.sub.J}.
(6) Within this setup, a human user can perform several kinesthetic demonstrations on the arm to manipulate one or several objects for certain manipulation skills. Denote by A={a.sub.1, a.sub.1, . . . , a.sub.H} the set of demonstrated skills. Moreover, for manipulation skill a.sub.h∈A, the set of objects involved is given by O.sub.a.sub.
(7) The robot arm 2 is controlled by means of a control unit 3 which may actuate actuators to move the robot arm 2 and activate the end effector 22. Sensors may be provided at the robot arm 2 or at the robot workspace to record the state of objects in the robot workspace. Furthermore, the control unit 3 is configured to record movements made with the robot arm 2 and to obtain information about objects in the workspace from the sensors and further to perform a task planning process as described below. The control unit 3 has a processing unit where the algorithm as described below is implemented in hardware and/or software.
(8) All demonstrations are described by the structure of TP-GMM (task-parametrized-Gaussian Mixture Models). The basic idea is to fit a prescribed skill model such as GMMs to multiple demonstrations. GMMs are described, e.g., in S. Niekum et al. “Learning grounded finite-state representations from unstructured demonstrations”, The International Journal of Robotics Research, 34(2), pages 131-157, 2015. For a number M of given demonstrations (trajectory measurement results), each of which contains T.sub.m data points for a dataset, N=M*T.sub.m total observations ξ={ξ.sub.t}.sub.t=1.sup.N exists, where ξ.sub.t∈.sup.d for sake of clarity. Also, it is assumed the same demonstrations are recorded from the perspective of P different coordinate systems (given by the task parameters such as objects of interest). One common way to obtain such data is to transform the demonstrations from global frame (global coordinate system) to frame p by ξ.sub.t.sup.(p)=A.sub.t.sup.(p).sup.
(9) Differently from standard GMM learning, the mixture model above cannot be learned independently for each frame p. Indeed, the mixing coefficients π.sub.k are shared by all frames p and the k-th component in frame p must map to the corresponding k-th component in the global frame. For example, Expectation-Maximization (EM) is a well-established method to learn such models. In general, an expectation-maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in a statistical model, which depends on unobserved latent variables.
(10) Once learned, the TP-GMM can be used during execution to reproduce a trajectory for the learned skill. Namely, given the observed frames {b.sub.t.sup.(p),A.sub.t.sup.(p)}.sub.p=1.sup.P, the learned TP-GMM is converted into one single GMM with parameters {π.sub.k,μ.sub.t.sup.(p),Σ.sub.t.sup.(p)}.sub.k=1.sup.K, by multiplying the affine-transformed Gaussian components across different frames, as follows
(11)
where the parameters of the updated Gaussian at each frame p are computed as
{circumflex over (μ)}.sub.t,k.sup.(p)=A.sub.t.sup.(p)μ.sub.k.sup.(p)+b.sub.t.sup.(p) and {circumflex over (Σ)}.sub.t,k.sup.(p)=A.sub.t.sup.(p)μ.sub.k.sup.(p)+b.sub.t.sup.(p)
(12) Hidden semi-Markov Models (HSMMs) have been successfully applied, in combination with TP-GMMs, for robot skill encoding to learn spatio-temporal features of the demonstrations, such as manipulation trajectories of a robot or trajectories of a movable agent.
(13) Hidden semi-Markov Models (HSMMs) extend standard hidden Markov Models (HMMs) by embedding temporal information of the underlying stochastic process. That is, while in HMM the underlying hidden process is assumed to be Markov, i.e., the probability of transitioning to the next state depends only on the current state, in HSMM the state process is assumed semi-Markov. This means that a transition to the next state depends on the current state as well as on the elapsed time since the state was entered.
(14) More specifically, a task parametrized HSMM model consists of the following parameters
M.sub.θ={{a.sub.kh}.sub.h=1.sup.K,(μ.sub.k.sup.D,σ.sub.k.sup.D),π.sub.k,{μ.sub.k.sup.(p),Σ.sub.k.sup.(p)}.sub.p=1.sup.P}.sub.k=1.sup.K
where a.sub.kh is the transition probability from state k to h; (μ.sub.k.sup.D,σ.sub.k.sup.D) describes the Gaussian distributions for the duration of state k, i.e., the probability of staying in state k for a certain number of consecutive steps; {π.sub.k,{μ.sub.k.sup.(p),Σ.sub.k.sup.(p)}.sub.p=1.sup.P}.sub.k=1.sup.K equals the TP-GMM introduced earlier and, for each k, describe the emission probability, i.e., the probability of observation, corresponding to state k. In an HSMM the number of states correspond to the number of Gaussian components in the “attached” TP-GMM. In general, HSMM states are Gaussian distributions, which means that its observation probability distribution is represented as a classical GMM. To render HSMM to be object-centric, the observation probabilities can be parametrized as it is done in TP-GMM to obtain a TP-HSMM.
(15) Given a certain sequence of observed data points , the associated sequence of states in
is given by s.sub.t=s.sub.1, s.sub.2 . . . s.sub.t. The probability of data point ξ.sub.t belonging to state k (i.e., s.sub.t=k) is given by the forward variable a.sub.t(k)=p(s.sub.t=k,{
):
(16)
where o.sub.τ.sup.t=N(
) is the emission probability, where ({circumflex over (μ)}.sub.l,k,{circumflex over (Σ)}.sub.l,k) are derived from ({circumflex over (Σ)}.sub.t,k).sup.−1=Σ.sub.p=1.sup.P({circumflex over (Σ)}.sub.t,k.sup.(p)).sup.−1,{circumflex over (μ)}.sub.t,k={circumflex over (Σ)}.sub.t,kΣ.sub.p=1.sup.P({circumflex over (Σ)}.sub.t,k.sup.(p)).sup.−1{circumflex over (μ)}.sub.t,k.sup.(p) as shown above. Furthermore, the same forward variable can also be used during reproduction to predict future steps until T.sub.m. In this case however, since future observations are not available, only transition and duration information are used, i.e., by setting N(
)=1 for all k and
>t in a.sub.t(k)=Σ.sub.τ-1.sup.t-1Σ.sub.h-1.sup.Ka.sub.t-τ(h)a.sub.hkN(τ|μ.sub.k.sup.D,σ.sub.k.sup.D)o.sub.τ.sup.t. At last, the sequence of the most-likely states s*.sub.T.sub.
(17) All demonstrations are recorded from multiple frames. Normally, these frames are closely attached to the objects in O.sub.a.sub.
(18) In addition, consider a set of pre-defined predicates, denoted by B={b.sub.i, b.sub.2, . . . , b.sub.L}, representing possible geometric relations among the objects of interest. Here, predicates b∈B are abstracted as Boolean functions taking as inputs the status of several objects while outputting whether the associated geometric relation holds or not. For instance, grasp:=O.fwdarw.B indicates whether an object is grasped by the robot arm; within: O×O.fwdarw.B indicates whether an object is inside another object; and onTop: O×O.fwdarw.B indicates whether an object is on the top of another object. Note that these predicates are not bound to specific manipulation skills but rather shared among them. Usually, such predicate functions can be easily validated for the robot arm states and the object states (e.g., position and orientations).
(19) Finally, a goal specification G is given as a propositional logic expression over the predicates B, i.e., via nested conjunction, disjunction and negation operators. In general, the goal specification G represents the desired configuration of the arm and the objects, assumed to be feasible. As an example, one specification could be “within(peg,cylinder)ΛonTop(cylinder, box)”, i.e., “the peg should be inside the cylinder and the cylinder should be on top of the box”.
(20) In the following, a problem can be defined as follows: Given a set of demonstrations D for skills A and the goal G, the objective is a) to learn a TP-HSMM model M.sub.θ of the form
M.sub.θ={{a.sub.kh}.sub.h=1.sup.K,(μ.sub.k.sup.D,σ.sub.k.sup.D),π.sub.k,{μ.sub.k.sup.(p),Σ.sub.k.sup.(p)}.sub.p=1.sup.P}.sub.k=1.sup.K,
for each demonstrated skill a.sub.h and reproduce the skill for a given final configuration G. For the reproduction of the skill the sequence of states obtained through Viterbi algorithm can be used. b) to construct a PDDL model P of the form
P=(P.sub.D,P.sub.P)
where Objects, Initial State and Goal Specification define a problem instance P.sub.P, while Predicates and Actions define the domain of a problem P.sub.D; c) to derive and subsequently execute the sequence of manipulation skills, such that the probability of achieving the given goal specification G is maximized. Once the domain and problem files are specified, a PDDL (Planning Domain Definition Language.) planner has to find a sequence of actions to fulfill the given goal specification, starting from the initial state.
(21) The PDDL model P includes a domain for the demonstrated skills and a problem instance given the goal specification G.
(22) The Planning Domain Definition Language (PDDL) is the standard classic planning language. Formally, the language consists of the following key ingredients: Objects, everything of interest in the world; Predicates, object properties and relations; Initial States, set of grounded predicates as the initial states; A goal Specification, the goal states; and Actions, how predicates are changed by an action and also the preconditions on the actions.
(23) In the example embodiment described herein, motion planning is performed at the end-effector/gripper trajectory level. This means it is assumed that a low-level motion controller is used to track the desired trajectory.
(24) The example method for planning a manipulation task is described in detail with respect to the flowchart of
(25) Firstly, a TP-HSMM model M.sub.θ is to be learned for each demonstrated skill a.sub.h and reproduce the skill for a given final configuration G.
(26) One demonstrated skill a.sub.h∈A is considered. As described above, the set of available demonstrations, recorded in P frames, is given by D.sub.a.sub.
(27) Given a properly chosen number of components K which correspond to the TP-HSMM states, which are the Gaussian components representing the observation probability distributions, the TP-HSMM model M.sub.a.sub.
(28) A final goal configuration G is provided in step S2 which can be translated into the final state of the end effector 22 x.sub.G∈.sup.3×S.sup.3×
.sup.1. This configuration can be imposed as the desired final observation of the reproduction, i.e., ξ.sub.T.sub.
(29) The forward variable of formula
(30)
allows to compute the sequence of marginally most probable states, while we are looking for the jointly most probable sequence of states given the last observation ξ.sub.T.sub.
(31) To overcome this issue, in step S3 a modification of the Viterbi algorithm is used. Whereas the classical Viterbi algorithm has been extensively used to find the most likely sequence of states (also called the Viterbi path) in classical HMMs that result in a given sequence of observed events, the modified implementation differs in that: (a) it works on HSMM instead of HMM; and that (b) most observations except the first and the last ones are missing.
(32) Specifically, in the absence of observations the Viterbi algorithm becomes
(33)
(34) The Viterbi algorithm is modified to include missing observations, which is basically what is described for variable ‘b’. Moreover, the inclusion of duration probabilities p.sub.j(d) in the computation of variable ‘d_t(j)’ makes it work for HSMM.
(35) At each time t and for each state j, the two arguments that maximize equation δ.sub.t(j) are recorded, and a simple backtracking procedure can then be used to find the most probable state sequence s*.sub.T.sub.
(36) The above modified Viterbi algorithm provides the most likely state sequence for a single TP-HSMM model that produces the final observation δ.sub.T. As multiple skills are used, these models need to be sequenced and δ.sub.t(j) has to be computed for each individual model M.sub.a.sub.
(37) As a next step, symbolic abstractions of the demonstrated skills allow the robot to understand the meaning of each skill on a symbolic level, instead of the data level of HSMM. This may generalize a demonstrated and learned skill. Hence, the high-level reasoning of the herein described PDDL planner, needs to understand how a skill can be incorporated into an action sequence in order to achieve a desired goal specification starting from an initial state. A PDDL model contains the problem instance P.sub.p and the domain P.sub.D.
(38) While the problem P.sub.p can be easily specified given the objects O, the initial state and the goal specification G, the key ingredient for symbolic abstraction is to construct the actions description in the domain P.sub.D for each demonstrated skill, wherein P.sub.D should be invariant to different task parameters.
(39) a.sub.h∈A is a symbolic representation for one demonstrated skill in PDDL form. The learned TP-HSMM M.sub.a.sub.
{π.sub.k,{μ.sub.k.sup.(p),Σ.sub.k.sup.(p)}.sub.p=1.sup.P}.sub.k=1.sup.K.
(40) For each model, it is then possible to identify two sets, each of which containing initial and final states and denoted by ,
.Math.{1, . . . , K}, respectively.
(41) To construct the preconditions of a skill, the segments of demonstrations that belong to any of the initial state are to be identified, and to further derive the low-variance geometric relations which can be mapped into the set of predicates B. For each initial state i∈, its corresponding component in frame p is given by (μ.sub.i.sup.p,Σ.sub.i.sup.(p)) for p=1, . . . , P. These frames correspond to objects {o.sub.1, . . . , o.sub.P}, i.e., skill a.sub.h interacts with these objects. For each demonstration {ξ.sub.t.sup.(p)}.sub.t=1.sup.T.sup.
b.sub.l(o.sub.1, . . . ,o.sub.P)=T, if
(42)
where 0<η<1 is a design parameter (probability threshold). o.sub.1.sup.t, . . . , o.sub.P.sup.t are object states computed based on the recorded frame coordinates {b.sub.t.sup.(P),A.sub.t.sup.(p)}.sub.p=1.sup.P and object geometric dimensions. Denote by B.sub.i the set of instantiated predicates that are True within state i,∀i∈. As a result, the overall precondition of skill a.sub.h is given by the disjunction of the conjunction of these predicates for each initial state, i.e.,
(43)
where ∇ and Λ are the disjunction and conjunction operations. Similarly, to construct the effect of a skill, the procedure described above can be applied to the set of final states . In particular, for each final state f∈
, the set of instantiated predicates that are True within f is denoted by B.sub.f. However, in contrast to the precondition, the effect cannot contain a disjunction of predicates. Consequently, the effect of skill a.sub.h is given by
(44)
as the invariant predicates common for all of the final states. Based on the above elements, the PDDL model P can be generated in an automated way. More importantly, the domain P.sub.D can be constructed incrementally whenever a new skill is demonstrated and its descriptions are abstracted as above. On the other hand, the problem instance P.sub.p needs to be re-constructed whenever a new initial state or goal specification is given.
(45) Following, it is referred to the planning and sequencing of trained and abstracted skills. The PDDL definition P has been constructed, which can be directly fed into any compatible PDDL planner. Different optimization techniques can be enforced during the planning, e.g., minimizing the total length of the plan or total cost. Denote by a*.sub.D=a*.sub.1=a*.sub.2 . . . a*.sub.D the generated optimal sequence of skills, where a*.sub.d∈A holds for each skill. Moreover, denote by M.sub.a*.sub.
(46) Given this sequence a*.sub.D, each skill within a*.sub.D, is reproduced as the end-effector trajectory level, so to maximize the probability of satisfying the given goal G.
(47) The learned TP-HSMM encapsulates a general skill that might have several plausible paths and the choice relies heavily on the desired initial and final configurations. To avoid incompatible transitions from one skill to the next, a compatibility measure shall be embedded while concatenating the skills within a*.sub.D. Particularly, the proposed solution contains three main steps: a) Cascade the TP-HSMMs of each skill within a*.sub.D into one complete model {circumflex over (M)}.sub.a*.sub.
(48) Since the transition from one skill to another is never demonstrated, such transition probabilities are computed from the divergence of emission probabilities between the sets of final and starting states. Particularly, consider two consecutive skills a.sub.d* and a.sub.d+1* in α.sub.D* . The transition probability from one final state f of M.sub.a.sub.
(49)
where KL(⋅∥⋅) is a KL-divergence (Kullback-Leibler-Divergence; see also Kullback, S.; Leibler, R. A., “On Information and Sufficiency.” Ann. Math. Statist. 22 (1951), no. 1, 79-86. doi:10.1214/aoms/1177729694, P.sub.c is the set of common frames between these two skills, and a≥0 is a design parameter. The outgoing probability of any final state should be normalized. This process is repeated for all pairs of starting and final states between consecutive skills in a.sub.D* . In this way, one complete model {circumflex over (M)}.sub.a.sub..sub.3×S.sup.3×
.sup.1, the emission probabilities are therefore Riemannian Gaussian, and thus the step-wise reference is also given in M.sub.R. This implies that the required linear system dynamics for the end-effector 22 cannot be defined on M.sub.R, since it is not a vector space. However, the linear tangent spaces can be exploited to achieve a similar result. Specifically, the state error between the desired reference {circumflex over (μ)}.sub.s.sub.
(50)
that projects the minimum length path between {circumflex over (μ)}.sub.s.sub.
(51)
The covariance matrices {circumflex over (Σ)}.sub.s.sub.
(52)
Under the above assumptions, the control objective in the tangent space can be formulated as
(53)
(54) Finally, in step S4 the concatenated sequence of manipulation skills is executed.