DEVICE AND METHOD FOR CONTROLLING A ROBOT
20250249585 ยท 2025-08-07
Inventors
Cpc classification
B25J9/1607
PERFORMING OPERATIONS; TRANSPORTING
B25J9/1664
PERFORMING OPERATIONS; TRANSPORTING
International classification
Abstract
A method for controlling a robot. The method includes determining, for each robotic pose of a plurality of predetermined robot trajectories, a respective embedding in an embedding space having the structure of a hyperbolic manifold by searching an optimum of an objective function which incites, for each of the predetermined robot trajectories, the embeddings of the robotic poses of the predetermined robot trajectory to follow pre-defined dynamics of the embedding space, determining, for a starting pose from which the robot is to be controlled, a start embedding in the embedding space (, and, for a desired end pose, an end embedding in the embedding space and a geodesic between the start embedding and the end embedding according to a pullback metric of the embedding space and controlling the robot according to a sequence of robotic poses given by the determined geodesic.
Claims
1. A method for controlling a robot, comprising the following steps: determining, for each robotic pose of a plurality of predetermined robot trajectories, a respective embedding in an embedding space having a structure of a hyperbolic manifold, wherein the determining of the respective embeddings includes determining parameters of an encoder which maps robotic poses to embeddings, by searching an optimum of an objective function which incites, for each of the predetermined robot trajectories, the embeddings of the robotic poses of the predetermined robot trajectory to follow pre-defined dynamics of the embedding space; determining, for a starting pose from which the robot is to be controlled, a start embedding in the embedding space, and, for a desired end pose, an end embedding in the embedding space, and a geodesic between the start embedding and the end embedding according to a pullback metric of the embedding space, wherein the start embedding and the end embedding are determined by encoding the starting pose and the end pose using the encoder, respectively; and controlling the robot according to a sequence of robotic poses given by the determined geodesic, wherein the sequence of robotic poses is determined by mapping a sequence of embedding space elements given by the determined geodesic to a robotic pose space using a decoder which is configured to map from the embedding space to the robotic pose space and controlling the robot to follow the sequence of robotic poses, and wherein the pullback metric is a pullback metric according to a Jacobian of the decoder and Euclidean metric of robotic poses.
2. The method of claim 1, wherein the decoder implements a Gaussian process.
3. The method of claim 1, wherein the objective function further includes a term which, according to a taxonomy of robotic poses which includes a similarity measure between robotic poses, incites the embeddings to be determined such that a distance of embeddings of robotic poses in the embedding space reflects a similarity of the robotic poses according to the taxonomy.
4. A controller configured to control a robot, the controller configured to: determine, for each robotic pose of a plurality of predetermined robot trajectories, a respective embedding in an embedding space having a structure of a hyperbolic manifold, wherein the determining of the respective embeddings includes determining parameters of an encoder which maps robotic poses to embeddings, by searching an optimum of an objective function which incites, for each of the predetermined robot trajectories, the embeddings of the robotic poses of the predetermined robot trajectory to follow pre-defined dynamics of the embedding space; determine, for a starting pose from which the robot is to be controlled, a start embedding in the embedding space, and, for a desired end pose, an end embedding in the embedding space and a geodesic between the start embedding and the end embedding according to a pullback metric of the embedding space, wherein the start embedding and the end embedding are determined by encoding the starting pose and the end pose using the encoder, respectively; and control the robot according to a sequence of robotic poses given by the determined geodesic, wherein the sequence of robotic poses is determined by mapping a sequence of embedding space elements given by the determined geodesic to a robotic pose space using a decoder which is configured to map from the embedding space to the robotic pose space and controlling the robot to follow the sequence of robotic poses, and wherein the pullback metric is a pullback metric according to a Jacobian of the decoder and Euclidean metric of robotic poses.
5. A non-transitory computer-readable medium on which are stored instructions for controlling a robot, the instructions, when executed by a computer, causing the computer to perform the following steps: determining, for each robotic pose of a plurality of predetermined robot trajectories, a respective embedding in an embedding space having a structure of a hyperbolic manifold, wherein the determining of the respective embeddings includes determining parameters of an encoder which maps robotic poses to embeddings, by searching an optimum of an objective function which incites, for each of the predetermined robot trajectories, the embeddings of the robotic poses of the predetermined robot trajectory to follow pre-defined dynamics of the embedding space; determining, for a starting pose from which the robot is to be controlled, a start embedding in the embedding space, and, for a desired end pose, an end embedding in the embedding space and a geodesic between the start embedding and the end embedding according to a pullback metric of the embedding space, wherein the start embedding and the end embedding are determined by encoding the starting pose and the end pose using the encoder, respectively; and controlling the robot according to a sequence of robotic poses given by the determined geodesic, wherein the sequence of robotic poses is determined by mapping a sequence of embedding space elements given by the determined geodesic to a robotic pose space using a decoder which is configured to map from the embedding space to the robotic pose space and controlling the robot to follow the sequence of robotic poses, and wherein the pullback metric is a pullback metric according to a Jacobian of the decoder and Euclidean metric of robotic poses.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0036]
[0037]
[0038]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0039] The following detailed description refers to the figures that show, by way of illustration, specific details and aspects of this disclosure in which the present invention may be practiced. Other aspects may be utilized and structural, logical, and electrical changes may be made without departing from the scope of the present invention. The various aspects of this disclosure are not necessarily mutually exclusive, as some aspects of this disclosure can be combined with one or more other aspects of this disclosure to form new aspects.
[0040] In the following, various examples will be described in more detail.
[0041]
[0042] The robot 100 includes a robot arm 101, for example an industrial robot arm for handling or assembling a work piece (or one or more other objects 113). The robot arm 101 includes manipulators 102, 103, 104 and a base (or support) 105 by which the manipulators 102, 103, 104 are supported. The term manipulator refers to the movable members of the robot arm 101, the actuation of which enables physical interaction with the environment, e.g. to carry out a task. For control, the robot 100 includes a (robot) controller 106 configured to implement the interaction with the environment according to a control program. The last member 104 (furthest from the support 105) of the manipulators 102, 103, 104 is also referred to as the end-effector 104 and includes a grasping tool (which may also be a suction gripper).
[0043] The other manipulators 102, 103 (closer to the support 105) may form a positioning device such that, together with the end-effector 104, the robot arm 101 with the end-effector 104 at its end is provided. The robot arm 101 is a mechanical arm that can provide similar functions as a human arm.
[0044] The robot arm 101 may include joint elements 107, 108, 109 interconnecting the manipulators 102, 103, 104 with each other and with the support 105. A joint element 107, 108, 109 may have one or more joints, each of which may provide rotatable motion (i.e. rotational motion) and/or translatory motion (i.e. displacement) to associated manipulators relative to each other. The movement of the manipulators 102, 103, 104 may be initiated by means of actuators controlled by the controller 106.
[0045] The term actuator may be understood as a component adapted to affect a mechanism or process in response to be driven. The actuator can implement instructions issued by the controller 106 (the so-called activation) into mechanical movements. The actuator, e.g. an electromechanical converter, may be configured to convert electrical energy into mechanical energy in response to driving.
[0046] The term controller may be understood as any type of logic implementing entity, which may include, for example, a circuit and/or a processor capable of executing software stored in a storage medium, firmware, or a combination thereof, and which can issue instructions, e.g. to an actuator in the present example. The controller may be configured, for example, by program code (e.g., software) to control the operation of a system, a robot in the present example.
[0047] In the present example, the controller 106 includes one or more processors 110 and a memory 111 storing code and data based on which the processor 110 controls the robot arm 101. According to various embodiments, the controller 106 controls the robot arm 101 on the basis of a machine-learning model stored in the memory 111, in particular, according to various embodiments, a Gaussian Process Hyperbolic Dynamical Model (GPHDM) as described in detail below.
[0048] The end-effector 104 may be a multi (e.g. five)-fingered hand, i.e. a robotic hand. This means that the end-effector 104 has a high number of degrees of freedom: in addition to its global pose, i.e. its position and orientation in space, it has additional degrees of freedom for finger joint angles. The increased amount of degrees of freedom increases the complexity of the control. In particular, approaches designed for control of parallel grippers are typically not suitable for controlling an end-effector 104 which has the form of a multi-fingered hand. In the following, a (robotic hand or gripper) pose is meant to includes the complete pose, i.e. the position and orientation of all components of the robotic hand which can move independently from each other.
[0049] As explained above, the GPHLVM model described by reference [4] allows generating embeddings of observed robotic hand poses in a hyperbolic manifold (i.e. an embedding space (or latent space) having the form of a hyperbolic manifold) such that high-dimensional observations belonging to the same taxonomy node are closely embedded in a hyperbolic space, forming distinct clusters and such that geodesics between them traverse intermediate clusters, aligning with the expected taxonomy structure, but generates motions which are physically impractical.
[0050] Therefore, according to various embodiments, a dynamics prior on hyperbolic manifold is incorporated into GPHLVM, resulting in an approach which is called Gaussian Process Hyperbolic Dynamical Model (GPHDM). It imposes a first-order Riemannian dynamical model prior on the embeddings learned by GPHLVM. This allows retrieving dynamics-aware (and thus physically-consistent motion) trajectories from geodesics generated in the GPHLVM latent space.
[0051] GPHDM differs from the classical GPDM in that it leverages the hyperbolic geometry on the latent space in order to accommodate the hierarchical structure of the observed data, which is associated with a corresponding taxonomy. It uses geodesics with respect to a pullback metric which provide as dynamically-consistent motion trajectories.
[0052] More specifically, according to various embodiments, a linear dynamics model is formulated based on the first-order Markov assumption on Riemannian manifolds to give a hyperbolic dynamics prior based on a Riemannian Gaussian distribution on the model's latent space, i.e. a dynamics prior is built on a first-order Riemannian linear dynamics, and motion trajectories on the hyperbolic space are generated via pullback metrics and geodesic interpolation.
[0053] In the following, an embodiment is described in detail.
[0054] First, a Gaussian process dynamical model (GPDM) on a hyperbolic manifold is formulated. To do so, a hyperbolic dynamics prior is formulated similarly to the Euclidean case. A linear dynamics model based on the Markov assumption is assumed
[0055] for a parameter matrix A.sup.N.sup.
.sub.t=[.sub.1(x.sub.t) . . . .sub.N.sub..sup.N.sup.
[0056] obtained from a current latent point x.sub.t as in the Euclidean case. It should be noted that in the Euclidean case, the exponential map reduces to a simple addition:
[0057] The remaining difference to the Euclidean case is the basis vector matrix V.sub.x.sub..sup.(D.sup.
.sup.D.sup.
is obtained.
[0058] Besides the linear dynamics model, noise is introduced to make the model probabilistic as follows:
[0059] It should be noted that the noise vector {tilde over ()}.sub.t.sup.D.sup.
.sup.(D.sup.
[0060] Having introduced noisy observations, the next step is to compute the probability density function of the state x.sub.t+1 given x.sub.t as follows:
[0061] More precisely, in equation (4), the PDF (probability density function) is formulated in terms of a Riemannian Gaussian distribution. Then, an approximation of the hyperbolic PDF by the Euclidean PDF on the tangent space w.r.t the current state x.sub.t is used. To achieve this, the logarithmic map of the mean .sub.A(x.sub.t) and the parallel transport of the degenerate covariance matrix are used. It should be noted that the Lorentzian (hyperbolic) tangent space vectors are represented in local coordinates and the definition {tilde over (x)}.sub.t+1: =V.sub.x.sub..sup.D.sup.
[0062] It should also be noted that as the basis vector matrix V.sub.x.sub.
[0063] Now the hyperbolic dynamics prior can be derived by marginalizing out the parameters A, similar to the Euclidean case. For a single trajectory X of N latent variables x.sub.1, . . . , x.sub.N the hyperbolic dynamics prior, following a first-order Markov chain assumption, is given as follows
[0064] Like in the Euclidean case, the marginalization integral over the parameters A is first built and then the Markov property is applied to compute the conditional single-step distributions. It should be noted that equation (11) handles only terms that lie on Euclidean tangent spaces, hence the next steps are the same as in the Euclidean case. Further, it should be noted that since Log.sub..sub.
[0065] The Gaussian Process Hyperbolic Dynamical Model (GPHDM) now combines the hyperbolic dynamics prior of (12) with the GPHLVM. The goal of the model is to embed each high-dimensional observed motions (i.e. trajectory, i.e. sequence of poses in real space, e.g. joint space)
Y=[y.sub.1. . . y.sub.N].sup.T.sup.ND.sup.
[0066] into the hyperbolic latent space such that a low-dimensional latent variable x.sub.t
(i.e. pose embedding) is obtained for each corresponding observed pose y.sub.t. Additionally, in the training of the model, the latent embeddings are required to preserve the trajectory structure of the high-dimensional motions while simultaneously resembling the graph structure of a robotics taxonomy.
[0067] This means that it is assumed that the robotics taxonomy is given as an undirected graph G=(V,E) with nodes V={c.sup.1, . . . , c.sup.N.sup.
[0068] Further, it is assumed that fully-labelled training data {(y.sub.t, c.sub.t)}.sub.t=1.sup.N is available, i.e., that for each observed pose y.sub.t.sup.D.sup.
[0069] Then, the optimization problem for determining the GPHDM optimization process can be written as follows:
[0070] where the latent mapping, dynamics mapping, stress loss, and back-constraints are given by
[0071] It should be noted that p(Y|X, ) denotes the latent mapping, or likelihood, of the model. For given observations Y, the embeddings X and hyperparameters are optimized, wherein the latter include the noise variance, the kernel outputscale, and the lengthscale. In other words, latent variables X and hyperparameters should be find that describe the given data Y as close as possible in the sense of maximum likelihood estimation. However, the model would have no incentive to structure its latent space by solely optimizing the likelihood.
[0072] Therefore, the dynamics mapping p(X|) is added as a prior to the objective function of the optimization problem (13), turning it into a maximum a posteriori estimation. It should be noted that the dynamics mapping induces consecutive latent points x.sub.t, x.sub.t+1 to keep close together and consequently form smooth latent trajectories. Additionally to the dynamics mapping, the stress loss .sub.stress is subtracted, which can be viewed as a second prior.
[0073] It should be noted that that maximizing the negative stress loss minimizes the difference between the pairwise node distance on the taxonomy graph (i.e. pose similarities according to the taxonomy) and the distance of the corresponding embeddings in the latent space. This induces the latent embeddings to resemble the taxonomy graph. The three individual losses, i.e., the latent mapping, the dynamic mapping, and the stress loss, are weighted using scalar weights .sub.1, .sub.2, .sub.3(0, 1) and are summed up to obtain the final GPHDM loss.
[0074] While it is possible to optimize this loss directly over the latent variables X, the parameters (e.g. weights) W of an encoder function g.sub.W (e.g. a neural network) are optimized. This also referred to as back-constraints. That allows embedding new observations after the training directly into the latent space without additional training using the encoder (function) trained in this manner. Furthermore, the encoder can be adapted to incorporate information about the taxonomy structure.
[0075] According to one embodiment, point-wise multiplication is used to combine the SE (squared exponential) kernel on the observations K.sub.Y with the graph-Matrn kernel K.sub.G on the taxonomy nodes.
[0076] In total, according to various embodiments, the GPHDM relies on four different kernel functions. One multi-output hyperbolic heat kernel for the latent mapping K.sub.X.sup.ND.sup.
.sup.(N1)D.sup.
.sup.NN and one graph-Matern kernel on the taxonomy nodes also for the back-constraints K.sub.G
.sup.NN.
[0077] It should be noted that if one kernel function is shared across dimensions, the size of the kernel matrices can be significantly reduced to
[0078] K.sub.X.sup.NN and K.sub.1:N1
.sup.(N1)(N=1) which allows for lower memory requirements and faster training. It should further be noted that the graph-Matrn kernel K.sub.G can only be constructed when fully-labelled training data is available.
[0079] To optimize Equation (13), Riemannian Adam may for example be used, which is a first-order optimization method. First-order optimization methods rely on an initialization of the optimized variables. A typical choice for the initialization of the latent variables X.sub.init is the Principal Component Analysis (PCA) which spans a linear subspace such that the retained amount of variance in the data is maximized. The problem is that, typically, high-dimensional motions are highly non-linear. Thus, the projection into the linear subspace cannot preserve the data's structure. Since GPLVM models require a good initialization to avoid getting stuck in local optima, initializing the latent variables with something other than PCA might offer an improved model performance. In view of that, according to various embodiments, the latent variables are initialized by optimizing the stress loss
[0080] This initialization is referred to as stress loss initialization. The minimization in equation (18) itself also requires an initialization for the latent variables, for which PCA can be used or random latent variables can be chosen.
[0081] In summary, the optimization process involves three steps: [0082] (i) obtaining the initial latent variables from PCA. [0083] (ii) optimizing the stress loss on these latent variables, which typically converges after a few seconds. [0084] (iii) optimizing the full GPHDM as detailed in equation (13) on the stress loss initialized latent variables.
[0085] Similarly as the GPHLVM latent space, the GPHDM latent space can be exploited to plan motions by following trajectories in the low-dimensional latent space. The latent space geometry can be exploited to plan motions by following geodesics, i.e., shortest paths, between two embeddings in the hyperbolic latent space. However, such trajectories do not account for the uncertainty of the model and may pass through regions of the latent space where no training data is available to support motion prediction, causing the prediction to revert to the non-informative default case, i.e., to the Gaussian process (GP) mean. Therefore, according to various embodiments, geodesics in the hyperbolic latent space are computed according to a metric related to the high-dimensional motion metric, the so-called pullback metric. Geodesics computed according to the pullback metric tend to avoid regions with high uncertainty, thus leading to the generation of motions that follow the training data. Assuming a Euclidean observation space, the pullback metric G.sup.D.sup.
.sup.D.sup.
.sup.D.sup.
[0086] with J.sup.D.sup.
[0087] with mean and covariance
[0088] This distribution induces the metric tensor to follow a non-central Wishart distribution
[0089] with mean prediction for the metric tensor given by
[0090] According to one embodiment, the computation of the pullback metric G is adapted to the hyperbolic latent space. This is achieved by considering the fact that the kernel k in (21) is a hyperbolic kernel and by adapting the computation of the first and second derivative of the kernel accordingly. The resulting pullback metric G is then used within the classical geodesic equation, which is solved to compute geodesics in the hyperbolic latent space. The resulting geodesics follow the transitions between classes defined in the taxonomy, while avoiding uncertain regions of the latent space. More precisely, once the pullback metric has been estimated, the motion trajectory generation boils down to first compute a geodesic on the GPHDM latent space, and decode this geodesic via the Gaussian process of the GPDM. Specifically, two different points x,y (i.e., points corresponding to latent motion poses) are chosen and the shortest curve that connects these two points is determined by solving,
[0091] Given the metric G.sub.(s).sup.DD of a D-dimensional manifold
, a geodesic can be obtained by solving a set of ordinary differential equations called the geodesic equations,
[0092] where i(s) denotes the i-th coordinate of the curve point and .sub.ij.sup.k are the Christoffel symbols
[0093] Finally, the latent points representing the geodesics on the latent space of the GPDM can then be straightforwardly decoded to the original space (or observation space) via the Gaussian process of the model.
[0094]
[0095] Hand poses 201 according to different grasp types are embedded into a 2D hyperbolic latent space 202. The dynamic prior of the GPHDM gives smooth latent trajectories, which are organized according to the grasp taxonomy thanks to the back-constraints and stress loss. Each point in the latent space 202 can be mapped to a hand pose (using the function f). Following a geodesic 203 according the pullback metric leads to trajectory following the taxonomy structure and passing through low-uncertainty regions of the latent space 202.
[0096] A diagram 204 shows the motion of one DoF of the hand 205 when following the geodesic 203: it transitions between the motion according one observed trajectory (where the geodesic 203 starts) to the motion according to another observed trajectory (where the geodesic 203 ends). Decoding the geodesic 203 results in a sequence of hand poses 206 with smooth and realistic interpolation between poses.
[0097] Following a geodesic 207 which follows the hyperbolic metric would instead pass through an area with high uncertainty.
[0098] In summary, according to various embodiments, a method is provided as illustrated in
[0099]
[0100] In 301, for each robotic pose (e.g. robotic hand pose) of a plurality of predetermined (e.g. observed) robot trajectories (e.g. robotic hand trajectories, i.e. sequences of hand poses, e.g. ending in a grasping pose), a respective embedding in an embedding space having the structure of a hyperbolic manifold is determined.
[0101] This is done by searching an optimum of an objective function which incites (i.e. rewards with a respective reward term (or, equivalently, loss term), for each of the predetermined robot trajectories, the embeddings of the robotic poses of the predetermined robot trajectory to follow pre-defined (hyperbolic space) dynamics of the embedding space (i.e. a hyperbolic dynamics prior, i.e. a probability distribution of sequences of embedding space elements (i.e. of poses in embedded form)). In other words, an optimization problem with such a kind of objective function is solved (or at least some iterations for solving it are performed since the perfect optimum is typically not found).
[0102] In 302, determining, for a starting pose from which the robot is to be controlled, a start embedding in the embedding space is determined and, for a desired end pose, an end embedding in the embedding space is determined. Further, a geodesic between the start embedding and the end embedding according to a pullback metric of the embedding space (with respect to Euclidean metric of the real robotic pose space and the mapping from embedding space to robotic pose space, i.e. the decoding function) is determined and, in 303, the robot is controlled according to a sequence of robotic poses given by the determined geodesic.
[0103] The approach of
[0104] Various embodiments may receive and use image data (i.e. digital images) from various visual sensors (cameras) such as video, radar, LiDAR, ultrasonic, thermal imaging, motion, sonar etc., for example as a basis for determining the desired end pose (e.g. a suitable grasp pose for grasping or otherwise manipulating an object).
[0105] The method of
[0106] Accordingly, according to one embodiment, the method is computer-implemented.