METHOD AND SYSTEM FOR DETERMINING OPTIMIZED PROGRAM PARAMETERS FOR A ROBOT PROGRAM

Abstract

The invention relates to a method for determining optimized program parameters for a robot program, wherein the robot program is used to control a robot having a manipulator, preferably in a robot cell, comprising the steps: creating the robot program by means of a component-based graphical programming system on the basis of user inputs, wherein the robot program is formed from program components which are parameterizable via program parameters, and wherein initial program parameters are generated for the program components of the robot program; providing an interface for selecting one or more critical program components, wherein optimizable program parameters can be defined for the critical program components; carrying out an exploration phase for exploring a parameter range in relation to the optimizable program parameters, the robot program being carried out multiple times, the parameter range being scanned for the critical program components and trajectories of the robot being recorded such that training data are present for the critical program components; carrying out a learning phase in order to generate component representatives for the critical program components of the robot program on the basis of the training data collected in the exploration phase, wherein a component representative represents a system model which, in the form of a differentiable function, maps a specified state of the robot and specified program parameters to a predicted trajectory; carrying out an inference phase for determining optimized program parameters for the critical program components of the robot program, wherein optimizable program parameters of the component representative are iteratively optimized in respect of a specified target function by means of a gradient-based optimization method using the component representative. The invention furthermore relates to a corresponding system.

Claims

1. A method for determining optimized program parameters for a robot program, wherein the robot program is used to control a robot having a manipulator, comprising the steps: generating the robot program by means of a component-based graphical programming system on the basis of user inputs, wherein the robot program is formed from program components which are parameterizable via program parameters, and wherein initial program parameters are generated for the program components of the robot program; providing an interface for selecting one or more critical program components, wherein optimizable program parameters can be defined for the critical program components; carrying out an exploration phase for exploring a parameter space in relation to the optimizable program parameters, the robot program being executed multiple times, the parameter space being sampled for the critical program components and trajectories of the robot being recorded such that training data are available for the critical program components; carrying out a learning phase in order to generate component representatives for the critical program components of the robot program on the basis of the training data collected in the exploration phase, wherein a component representative represents a system model which, in the form of a differentiable function, maps a specified state of the robot and specified program parameters to a predicted trajectory; carrying out an inference phase for determining optimized program parameters for the critical program components of the robot program, wherein optimizable program parameters of the component representatives are iteratively optimized with respect to a specified target function by means of a gradient-based optimization method using the component representatives.

2. The method according to claim 1, wherein parameter domains are defined for the optimizable program parameters, wherein the optimizable program parameters are optimized via the parameter domains.

3. The method according to claim 1, wherein the parameter domains for the optimizable program parameters are at least one of specified, able to be specified or able to be set.

4. The method according to claim 1, wherein in the exploration phase for sampling the parameter space, the optimizable program parameters are sampled from their respective parameter domain.

5. The method according to claim 1, wherein the robot program is stored in a serialized form in a format that allows reconstruction and parameterization of the robot program or its program components.

6. The method according to claim 1, wherein for an execution of the robot program, a sampled trajectory is stored in such a way that an associated program component and a parameterization of the associated program component can be uniquely assigned to each data point of the trajectory at the time of the respective execution.

7. The method according to claim 1, wherein in the exploration phase the robot program is executed automatically, wherein at least 100 executions or at least 1000 executions of the robot program are carried out to extract the training data.

8. The method according to claim 1, wherein the training data collected in the exploration phase for each execution of the robot program comprises a parameterization of the critical program components, and a sampled trajectory of the critical program components.

9. The method according to claim 1, wherein the training data collected in the exploration phase for each executed program component comprises at least one of an ID or a status code.

10. The method according to claim 1, wherein in the learning phase for the critical program components, learnable component representatives are first generated, wherein the learnable component representatives are trained with the training data of the exploration phase in order then to represent system models for sub-processes encapsulated in the associated critical program components as component representatives.

11. The method according to claim 1, wherein the component representatives comprise a recurrent neural network

12. The method that wherein to generate the component representatives an analytical trajectory generator is placed upstream of the recurrent neural network, the analytical trajectory generator being designed to generate a prior trajectory.

13. The method according to claim 1, wherein the target function is defined in such a way that the target function maps a trajectory to a rational number and that the target function is differentiable with respect to the trajectory.

14. The method according to claim 1, wherein the target function comprises at least one of a predefined function, a parametric function, or a neural network.

15. The method according to claim 1, wherein the target function comprises a function based on a force measurement.

16. The method according to claim 1, wherein with the interface a critical sub-sequence of the robot program can be selected, wherein the critical sub-sequence comprises a plurality of critical program components, wherein the component representatives of the plurality of critical program components are combined into a differentiable overall system model that maps the program parameters of the critical sub-sequence to a combined trajectory, so that the optimizable program parameters are optimized with respect to the target function for a contiguous sub-sequence of critical program components.

17. A system for determining optimized program parameters for a robot program, wherein the robot program is used to control a robot having a manipulator, comprising: a component-based graphical programming system for generating a robot program on the basis of user inputs, wherein the robot program is formed from program components which are parameterizable via program parameters, and wherein initial program parameters can be generated for the program components of the robot program; an interface for selecting one or more critical program components, wherein optimizable program parameters can be defined for the critical program components; an exploration module for exploring a parameter space in relation to the optimizable program parameters, the robot program being executed multiple times, the parameter space being sampled for the critical program components and trajectories of the robot being recorded such that training data are available for the critical program components; a learning module for generating component representatives for the critical program components of the robot program on the basis of the training data collected in the exploration phase, wherein a component representative represents a system model which, in the form of a differentiable function, maps a specified state of the robot and specified program parameters to a predicted trajectory; an inference module for determining optimized program parameters for the critical program components of the robot program, wherein optimizable program parameters of the component representatives are iteratively optimized with respect to a specified target function by means of a gradient-based optimization method using the component representatives.

18. The method according to claim 4, wherein the optimizable program parameters are sampled in a uniformly distributed manner or adaptively sampled.

19. The method according to claim 5, wherein the format comprises at least one of a sequential execution sequence of the program components, types of program components, IDs of the program components, constant program parameters or program parameters that can be optimized.

20. The method according to claim 1, wherein the robot program is used to control the robot having the manipulator in a robot cell.

Description

[0077] In the drawings

[0078] FIG. 1 shows an activity diagram for a method for determining optimized program parameters for a robot program according to an exemplary embodiment of the invention,

[0079] FIG. 2 shows a supplementary activity diagram for the exemplary embodiment according to FIG. 1, wherein the exploration phase indicated in FIG. 1 is illustrated,

[0080] FIG. 3 shows an exemplary robot program for a force-controlled spiral search, wherein the critical sub-program is outlined with a solid line,

[0081] FIG. 4 shows an exemplary robot program for a force-controlled contact run, wherein the critical sub-program is outlined with a solid line,

[0082] FIG. 5 shows an activity diagram in a schematic view for a system for determining optimized program parameters for a robot program according to an exemplary embodiment of the invention,

[0083] FIG. 6 shows a schematic representation of the database scheme implemented in an exemplary reference implementation for a system or a method according to an exemplary embodiment of the invention,

[0084] FIG. 7 shows a schematic illustration of a differentiable robot program,

[0085] FIG. 8 shows a schematic illustration of a differentiable program component in accordance with one exemplary embodiment of the invention,

[0086] FIG. 9 shows a schematic illustration for illustrating a simplified calculation graph of a differentiable component representative, and

[0087] FIG. 10 shows a recurrent network architecture for one exemplary embodiment of the invention.

[0088] FIG. 1 and FIG. 2 show an activity diagram for a method for determining optimized program parameters for a robot program according to an exemplary embodiment of the invention,

[0089] From a process point of view, the method according to an embodiment of the invention has different versions or possible applications in the programming, commissioning and maintenance phases of production plants or robot cells. FIG. 1 and FIG. 2 show an overview of the method steps of the exemplary embodiment, including optional method steps which can be skipped depending on their type. In general, in each of the three abovementioned phases in the life cycle of a plant or a robot there is a possible variant of the exemplary embodiment. The following describes the method according to the exemplary embodiment for the programming, commissioning and maintenance phases.

A. Programming Phase

[0090] I. Defining the program structure: The robot programmer creates a robot program from parameterizable program components (motion templates), which map atomic movements of the robot. The robot program consists of a sequence of arbitrary force- or position-controlled program components. The sequence of program components maps the steps necessary to solve the application task. [0091] Example of force-controlled spiral search: FIG. 3 shows a schematic illustration of an exemplary semi-symbolic robot program 1 for the force-controlled spiral search. The critical sub-program 2 or the critical program components 3 and 4 of the robot program 1 are solidly outlined in FIG. 3. [0092] Example of contact run: FIG. 4 shows a schematic illustration of an exemplary semi-symbolic robot program 5 for a force-controlled contact run. The critical sub-program 6 or the critical program components 7 and 8 of the robot program 5 are solidly outlined in FIG. 4.

[0093] The execution semantics of a component of the type “Linear motion” 3 or 7 (cf. FIGS. 3 and 4) can be described as follows: given a target pose as well as a velocity v and acceleration a, move the robot such that the tool center point (tool coordinate system of the robot) describes a linear path in Cartesian space from the current tool pose to the target pose with the specified velocity and acceleration.

[0094] The execution semantics of “Contact run (relative)” 8 (cf. FIG. 4) can be described as follows: given a motion specification relative to the current position of the tool coordinate system of the robot (for example, “1 centimeter translation in z direction and 3° rotation about the X-axis”), a force specification that specifies the contact force along the Z-axis of the tool coordinate system, as well as a velocity v and acceleration a, move the robot along a linear path in Cartesian space according to the motion specification and with the specified acceleration and speed until the specified force is reached. The execution of the motion is considered successful when the force specification has been reached (contact established), otherwise as failed.

[0095] II. Definition of the initial program parameters: A robot programmer can manually define the initial parameters of the program components using common methods (teach-in, CAD to Path, . . . ) to solve the application task approximatively (possibly violating the specified cycle times and quality requirements).

[0096] III. Fine tuning of the parameters of relevant sub-programs: The robot programmer uses a method according to an exemplary embodiment of the invention for the automatic optimization of program parameters to meet cycle-time specifications and quality requirements.

[0097] III.a. Selection of critical sub-programs: The robot programmer selects critical sub-sequences of the program (i.e. critical sub-programs) or individual critical program components, the program parameters of which are to be optimized. [0098] Example of force-controlled spiral search: Here, the critical sub-program 2 consists of the sequence [“Linear motion”, “Spiral search (Relative)”], since the parameters of the linear motion (in particular its target position) fundamentally affect the position and orientation of the spiral search (cf. FIG. 3). [0099] Example of contact run: Here, the critical sub-program 6 consists of the sequence [“Linear motion”, “Contact run (Relative)”], since the Z-coordinate of the target pose of the linear motion in particular fundamentally affects the expected length of the contact run (cf. FIG. 4).

[0100] III.b. Selection of the parameters to be optimized: Depending on the environment and application task, certain program parameters of the critical sub-programs or the critical program components must be labeled as constants in order to ensure safety or quality requirements. This concerns, for example, target poses of movements in areas of the robot cell with restricted accessibility or lower or upper force limits of force-controlled movements. The designation of constant parameters is application-specific and requires domain knowledge, but in many cases can be determined already at the cell design stage using the CAD models of the cell, process simulation software, if used, and offline robot simulation software. [0101] Example of force-controlled spiral search: In spiral search movements, primarily speed and acceleration, but also the extent of the spiral along its principal axes, the orientation of the spiral and the distance between the turns are critical to the success of the search action and must therefore be optimized. In addition, the Z-component of the orientation of the target pose of the preceding linear motion (in Tait-Bryan angles) is relevant, as this specifies the orientation of the (planar) spiral. The parameters [Extent (X), Extent (Y), Distance between spiral arms, v, a] of “Spiral search (relative)” as well as the Z-rotation component of the target position input of the linear motion can therefore be optimized, all other parameters are labeled as constant (cf. FIG. 3). [0102] Example of contact run: In order not to exceed force limits, speed and acceleration are particularly critical. The Z-component of the upstream linear motion together with the position of the workpiece determines the length of the contact run. Optimizable parameters here are [v, a] of “Contact run (relative)” and the z-coordinate of the target pose input of “Linear motion” (cf. FIG. 4).

[0103] Program parameters of a program component can be either input (target poses, target forces, etc.) or intrinsic parameters (velocity, acceleration). Both parameter types can be optimized.

[0104] III.c. Definition of the domain for optimizable parameters: For each program parameter of the critical sub-programs or the critical program components that is to be optimized or optimizable (i.e. not constant), the robot programmer can select a permissible value range over which the parameter is to be optimized. This is application-specific and usually sufficiently narrow that all safety requirements on the manufacturing process as well as minimum quality and cycle-time requirements can be satisfied. [0105] Example of force-controlled spiral search: The speed and acceleration limits of successful spiral searches are strongly dependent on the robot and the environment, but are usually in the range [0.001 m/s, 0.005 m/s] and [0.001 m/s.sup.2, 0.05 m/s.sup.2] respectively. The expected scatter of the hole positions is typically in the millimeter range, and the limits for the extent and the turn interval of the spiral are determined accordingly. The domain of the Tait-Bryan z-component of the target pose of the linear motion is based on the approximate point symmetry of the spiral [0, 180°] (cf. FIG. 3). [0106] Example of contact run: Here, the correct restriction of the velocity domain is safety-critical, since during fast contact runs, forces of any magnitude can occur before the force controller stops the movement. Restriction of the domain must also be used to prevent a collision with the workpiece during the execution of the linear motion (cf. FIG. 4).

[0107] III.d. Exploration phase: An automatic stochastic exploration of the parameter space is carried out. The robot program is now executed automatically under realistic conditions, but not yet in the production environment (for example, 1000<N<10,000). For each execution, the program parameters to be optimized are sampled from their respective domain. For example, this takes place via an equally distributed sampling. During execution, the position and orientation of the tool center point (TCP) of the robot as well as the forces and torques occurring at the TCP are sampled at an arbitrary but fixed sampling interval (8 ms<Δt<100 ms) and stored in a database. In addition to the data of each executed program component, an ID with which the program component can be identified in the robot program as well as a status code are transferred from the robot to the database. The status code identifies whether the executed action was successfully completed, according to the semantics of the program component. Force-controlled runs to contact end successfully, for example, if contact has been established and the contact force is within a set tolerance range. In addition, the randomly generated program parameters are stored in the database and associated with the program execution. The sampling interval Δt is application-specific and can be specified by the programmer. Large sampling intervals reduce the amount of data to be processed and stored and simplify the learning problem, reducing the number of necessary program executions (N), but leading to aliasing and undersampling in high-frequency or vibrating processes. The number of program executions N is also application-specific and depends on the complexity and length of the robot movements, the (non-)linearity of the force and torque profiles during interactions of the robot with the environment, and the stochasticity of the process. If workpiece variances are expected, workpieces of different batches should be used during the exploration phase in order to teach in the workpiece variances.

[0108] III.e. Learning phase: The system models are automatically trained. For each program component of the critical sub-programs, on the basis of the previously collected parameter sets and trajectories a system model is learned which maps the component parameters to the expected positions and orientations of the TCP, the expected forces and torques as well as the expected status code. No user interaction is required for the training. The duration of the training depends on the number and complexity of the program components as well as on the number, length, and sampling characteristics of the trajectories in the training data set.

[0109] In this context, a “system model” can be defined as a mathematical function ƒ, which outputs the expected trajectory Ŷ given the input parameters x and the system state p. ƒ therefore implicitly includes the program logic (the translation of x into control commands for the robot by the robot program), the kinematics and dynamics of the robot, and the physical properties of the environment.

[0110] III.f. Specification of the target function: An arbitrary target function is defined, with respect to which the program parameters are to be optimized. Each target function is valid if it maps a trajectory to a rational number and can be differentiated with respect to the trajectory. Concave target functions simplify the optimization problem because they only have one (global) maximum and the result of the optimization is independent of the initial parameterization. For non-concave target functions with local maxima, the optimization is sensitive to the initial parameterization. Arbitrary target functions can be combined by weighted addition, wherein local maxima can be created by the addition. By using iterative Monte-Carlo methods, the convergence of the optimization to globally optimal parameter sets, given the correctness of the learned system model, can be asymptotically guaranteed. The specification of the target function is application-specific and may need to be carried out by an expert in the respective production domain. A gradient-based optimization method is used for the optimization and the target function is expressed as a loss function for the equivalent minimization problem. Examples of simple loss functions are the cycle time, the path length in the Cartesian or configuration space, or the error probability. Complex loss functions are the distance to one or more reference trajectories, for example from human-performed demonstrations, or the deviation of specified contact forces at the end of a trajectory or during the execution of a program component. An initial target function can be automatically generated by inference over a knowledge base from the semantics of the components of the critical program parts and adjusted by the programmer using a graphical user interface. [0111] Example of force-controlled spiral search: By specifying a combined loss function from error probability and cycle time or path length, force-controlled spiral search movements can be optimized for the optimum balance between cycle-time and reject minimization. With regard to the learned system model, the optimization results in parameters that optimally balance the radii along the principal axes, distance between the turns, orientation, velocity, and acceleration. [0112] Example of contact run: Force-controlled contact runs can be optimized in their dynamic properties such that the average target force is achieved as precisely as possible, by specifying a loss function proportional to the distance of the predicted force along the Z axis from a specified target force.

[0113] III.g. Inference phase: The system models are optimized automatically. For each critical sub-program, the learned system models of the associated program components are automatically combined to form an overall model, which maps the parameters of the sub-program to the combined sub-trajectory. A gradient-based optimization algorithm iteratively optimizes the program parameters with respect to the specified target function. The optimized program parameters are automatically transferred to the robot program. [0114] Example of force-controlled spiral search: In spiral search movements, global parameter optima typically result in maximum coverage of the probability mass of the expected hole distributions while simultaneously maximizing velocity and acceleration to the point where further velocity increases come at the cost of excessive error rates. The orientation of the principal axes of the spiral are matched to the principal axes of the hole distribution. [0115] Example of contact run: After optimization the velocity and acceleration parameters of contact runs guarantee the maximum possible probability of reaching and not exceeding the specified target contact force. With simultaneous cycle time minimization, the length of the contact run is minimized by lowering the target position of the preceding linear motion.

[0116] IV. Manual acceptance by the programmer/user: The robot programmer runs the optimized robot program repeatedly and ensures compliance with all safety, cycle-time and quality requirements. Quantitative, statistical methods may be used for the measurement and process parameters.

B. Commissioning Phase

[0117] I. Adjustment of program parameters during ramp-up: Once the robot cell has been integrated into the rest of the production line, production usually starts with lower quality, reduced quantities, or higher reject rates. This is often due to minimal deviations in the environment, workpieces or structure compared with the programming phase. The usual practice is the manual, iterative adjustment of the program parameters in order to bring the process back within the specified cycle-time and quality limits. Existing tools for automatic process optimization or for tuning controller parameters only partially automate the optimization process and only for certain parameters or movements. Using a simplified version of the procedure described in A.III, the operator can adjust the parameters of the robot program fully automatically to suit the changed conditions. Steps A.III.a to A.III.c can be skipped, because the hyperparameters of the method set there are robust against stochastic changes in the system or environment. The number of training data samples required (cf. A.III.d) is a factor of 10-20 lower than in the programming phase, since the existing system models can be reconditioned to the changed environment using transfer learning methods. Step A.III.f can also be skipped in many cases if the cycle-time and quality specifications have not changed compared to the programming phase. Here, however, it is also possible to adapt the target function to the changed conditions in the plant. [0118] Example of force-controlled spiral search: During commissioning, the integrator notices that components from a different manufacturer are used in production than those for which the robot cell was finely adjusted during the programming phase. For example, the mean orientation of the pins of electronic components has a stochastic offset of up to 2° compared to the programming phase, which causes a large number of search movements to fail and the cycle-time specifications can no longer be met. By retraining the system model and parameter inference, the distribution of the offset can be implicitly estimated and compensated by the new program parameters. [0119] Example of contact run: During commissioning, the plant worker notices that due to the transport of the cell, the positioning of the boards to be populated deviates on average by 1 mm in the Z-direction from the expected height, which means that contact runs for placing components take 0.5 seconds longer on average. The original cycle time can be restored by retraining the system model and parameter inference.

C. Maintenance Phase/Series Production

[0120] I. Compensation of process and workpiece variances: During production runs, changes in the environment, the production plant or the workpieces may occur. If a manufacturer or batch is changed, components may have different surface or bending properties. In addition, the system behavior can change over the course of the operating time of the plant due to maintenance work on the plant, replacement of motors and sensors, or wear effects. Using a simplified version of the procedure described in A.III, the operator can adjust the parameters of the robot program fully automatically to suit the changed conditions. Steps A.III.a to A.III.c can be skipped, because the hyperparameters of the method set there are robust against stochastic changes in the system or environment. The number of training data samples required (cf. A.III.d) is a factor of 10-20 lower than in the programming phase, since the existing system models can be reconditioned to the changed environment using transfer learning methods. Step A.III.f can also be skipped if the cycle-time and quality specifications remain the same. [0121] Example of force-controlled spiral search: Due to wear effects of the positioning system of the electronic circuit boards to be populated, the variance of the hole positions has increased significantly after long operation of the production system, so that the circuit boards can no longer be reliably populated. By retraining the system model again, the new hole distribution can be implicitly estimated and the spiral search movements can be re-parameterized by parameter inference in order to comply with the quality specifications by expanding the search region and refining the search grid.

[0122] II. Adaptation to new target specifications: If, for example due to reconfigurations at other points on the production line, cycle-time specifications or quality requirements change, the operator can adapt the parameters of the robot program to the new specifications by executing steps A.III.f and A.III.g by specifying a corresponding target function. The existing system models remain valid and can be reused without retraining. [0123] Example of contact run: Due to a supplier change, the pins of the installed electronic components are less resilient than before and become warped at the currently designated contact force. By reducing the force specification of the corresponding target function and repeated parameter inference without retraining, a new parameterization can be found which ensures the new, lower contact force.

[0124] FIG. 5 shows a schematic view of an exemplary system architecture with individual system components for a system for determining optimized program parameters for a robot program according to an exemplary embodiment of the invention,

System Components:

[0125] a. Robot cell 9 with six-axis industrial manipulator: It is assumed that it is possible to measure forces and torques at the TCP. An external force-torque sensor may be required for this.

[0126] b. Component-based graphical programming system 10 for programming and executing robot programs: For the creation of the initial robot program, its parameterization and execution on the robot controller, a software system with a graphical user interface is required which can process semi-symbolic robot programs, compile them into executable robot code and execute them on the robot controller.

[0127] c. Database 11 for robot programs and trajectories: In database 11 robot programs are stored in serialized form in a format that allows the reconstruction of the program structure and parameterization (execution sequence, type and unique IDs of the program components, constant and optimizable parameters of the program components). For each execution of the robot program, the database contains a sampled trajectory consisting of the position and orientation of the TCP, forces and torques on the TCP, and the status code of the program component belonging to the data point. The memory format is such that the associated program component and the parameterization of the program component can be uniquely assigned to each data point of a trajectory at the time of execution. FIG. 6 shows a schematic representation of the database schema implemented in an exemplary reference implementation.

[0128] d. Learning system 12 for differentiable component representatives: The learning system 12 transforms a serialized representation of the program structure of the critical sub-programs into a set of differentiable (parameter-optimized) motion primitives. Each differentiable motion primitive is a functionally equivalent analog (“representative”, “system model”) to a component instance from the sub-program, which maps the parameters of the component instance onto a trajectory expected during execution.

[0129] A component representative is defined as a system model at the component level or a model of the execution of the corresponding program component. A component representative for program component B is therefore a mathematical function ƒ.sub.B which, given the input parameters x.sub.B of the program component and the system state p, outputs the expected trajectory Ŷ.sub.B that will result when the program component is executed on the robot. Component representatives are therefore mathematical models of the execution of program components. These models can be learned on the basis of training data and can be differentiated, i.e. they allow the calculation of the derivative of Ŷ.sub.B with respect to x.sub.B. This allows the optimization of x.sub.B with gradient-based optimization methods. Since all component representatives are differentiable models of the execution of program components, a program according to FIG. 7 composed of component representatives can also be differentiated and enables the joint optimization of the parameters of all the component representatives contained in the program for a target function over the entire trajectory. This differentiable and thus optimizable representation of robot programs is the basis of an optimization procedure for program parameters according to an exemplary embodiment of the invention.

[0130] e. Knowledge base or ontology 13 of component-specific sub-targets: In many cases, the target function for the parameter optimization contains sub-targets that result directly from the execution semantics of the component types. For example, a force-controlled contact run has an implicit contact target in a specified force range. These implicit sub-targets are stored in a knowledge base in the form of an ontology. At the time of the specification of the target function, reasoning over the ontology is used to create an initial target function from the given program structure, which maps these implicit sub-targets. This can be adapted by the user and supplemented by additional application-specific sub-targets. The use of ontologies or knowledge bases for automatic bootstrapping of target functions represents a major advantage.

[0131] An ontology is a structured representation of information with logical relations (a knowledge database), which makes it possible to draw logical conclusions (reasoning) from the information contained in the ontology using suitable processing algorithms.

[0132] Most ontologies follow the OWL standard (https://www.w3.org/OWL/). Examples of ontologies are BFO (https://basic-formal-ontology.org/) or LapOntoSPM (https://pubmed.ncbi.nlm.nih.gov/26062794/). The most common software framework for reasoning is HermiT (http://www.hermit-reasoner.com/). OWL and HermiT can be used in an exemplary implementation according to an exemplary embodiment.

[0133] In an exemplary reference implementation according to an exemplary embodiment of the invention, the developed ontology forms a “database for predefined target functions”, on which by reasoning from a given semi-symbolic robot program it is possible to automatically derive target functions which due to the fixed semantics of the program blocks must always be valid, for example, that a “Contact run (relative)” component should produce a contact force along the Z-axis of the tool coordinate system or that in a “linear motion” component the target point should be reached as precisely as possible. This reduces the task of specifying the target function for the user to the aspects of the target function that do not already follow from the semantics of the program components, but, for example, from the application (contact forces, speeds, . . . ) or for business-related reasons (minimization of the cycle time, . . . ).

[0134] f. System 14 for specifying differentiable target functions: Differentiable target functions are initially calculated in software by means of reasoning over the knowledge base of the component-specific sub-targets and can then be edited by the user using an interface if necessary. The resulting internal representation of the combined target function is then translated into a differentiable calculation graph of the loss function for the equivalent minimization problem.

[0135] Three types of target functions are possible and can be combined with one another as required: [0136] Predefined functions: Classical process parameters such as cycle time or path length, which output a variable to be minimized. If the above user interface is used, these must only be selected by the user. [0137] Parametric functions: Predefined functions that have additional user-definable parameters. Examples are distance functions to specification values such as contact forces, tightening torques, or Euclidean target poses. The specified values can be set by the user via an interface. [0138] Neural networks: Since any differentiable functions can be used as target functions, neural networks can also be used as differentiable function approximators for complex target functions.

[0139] g. Inference system 15 for optimal robot parameters: The inference system 15 forms an end-to-end optimizable calculation graph for each critical sub-program by considering the specified target function and the trained component representatives. On this graph, the inference algorithm calculates the optimal program parameters for the specified target function. This system is novel in its design and application in industrial robotics.

External Interfaces:

[0140] Graphical user interface for creating, editing and executing robot programs: A graphical user interface is provided for the initial creation and manual editing of program structure and program parameterization. In an exemplary reference implementation of a method according to an exemplary embodiment, the ArtiMinds Robot Programming Suite (RPS) is used as an interface to create and parameterize robot programs in the semi-symbolic ArtiMinds Robot Task Model (ARTM) representation. The user interface also provides infrastructure for running loaded robot programs on the robot controller. [0141] Machine interface for reading, writing and saving robot program structure and parameterization as well as version control: During the learning phase, the parameter space is randomly sampled and the parameterized robot programs are stored in a database in a version-controlled form (cf. System component a.). In order to automate this process, a machine interface is provided to import parameter sets generated by the learning framework into the robot program, and to store the parameterized robot program after execution permanently in a database with version control in order to associate the resulting trajectory with the program structure and parameterization at the time of training. In the exemplary reference implementation, the control plugin of the ArtiMinds RPS fulfills this function. [0142] Machine interface for recording robot trajectories: The executed robot trajectories are sampled. The position, orientation, force and torque data that can be read off the robot controller are transformed geometrically into poses, forces and torques at the TCP in world coordinates. After each component has been executed, a Boolean value is calculated on the robot controller, which indicates whether the component has been executed successfully. This data is transferred to a database via a machine interface. Both database and interfaces are provided in the exemplary reference implementation by the ArtiMinds RPS and LAR (Learning and Analytics for Robots). [0143] User interface for creating and editing differentiable target functions: The exemplary reference implementation comprises a console-based dialog system, via which the user can interactively adapt the sub-targets calculated in advance from the knowledge base and supplement them with further sub-targets.

[0144] In the context of an exemplary embodiment of the invention, the following phases—namely exploration phase, learning phase and inference phase—can be executed and implemented, components of this exemplary embodiment being illustrated by FIGS. 8, 9 and 10:

Exploration Phase:

[0145] Automatic sampling of the parameter space: The automatic random sampling of parameter configurations (or the optimizable program parameters) from their respective domains was implemented in an exemplary reference implementation using the external programming interface of the ArtiMinds Robot Programming Suite.

Learning Phase:

[0146] Generating a learnable representative for each critical component: Core of a system according to the exemplary embodiment is a representation of program components, which allows the gradient-based optimization of the parameters with respect to a target function. Basically, the inference problem of optimal parameters is divided into a learning phase and an inference phase, wherein in the learning phase a model of the system (robot and environment during the execution of a module) is learned and in the inference phase a gradient-based optimization algorithm optimizes the input parameters of the component representative using the learned system model.

[0147] Component representatives map the component parameters to an expected trajectory and guarantee the differentiability of the output trajectory with respect to the component parameters. This mapping is realized by means of a recurrent neural network. Since long, finely sampled trajectories in particular contain a lot of redundant information and when using neural networks for prediction large sequence lengths significantly complicate the learning problem, an analytical trajectory generator is placed upstream of the neural network, which generates a prior trajectory (cf. FIG. 8). In a reference implementation of the method according to the exemplary embodiment, the trajectory generator consists of a differentiable implementation of an offline robot simulator. The prior trajectory can correspond to a generic execution of the program component without consideration of the environment, i.e. in an artificial space with zero forces and under idealized robot kinematics and dynamics, starting from a given initial state. This strong prior is combined with the component parameters to form an augmented input sequence for the neural network. The network is trained to predict the residual between prior and posterior (i.e. actually measured) trajectory as well as the probability of success of the execution of the component (cf. FIG. 8 and the simplified calculation graph in FIG. 9).

[0148] The addition of the residual and priors results in the output expected posterior trajectory for this program component and the given component parameters. Simplifying the learning problem in the training of neural networks by introducing strong priors is established practice. Algorithmic priors can be defined both by the specific network structure (cf. R. Jonschkowski, D. Rastogi, and O. Brock, “Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors,” ArXiv180511122 CS Stat, May 2018, Accessed: Apr. 3, 2020. [Online]. Available at: http://arxiv.org/abs/1805.11122) as well as by representing the output values as parameters of predefined parametric probability distributions (cf. the use of Gaussian processes, for example, in M. Y. Seker, M. Imre, J. Piater, and E. Ugur, “Conditional Neural Movement Primitives”, p. 9) or Gaussian mixes in A. Graves, “Generating Sequences with Recurrent Neural Networks,” ArXiv13080850 Cs, June 2014, Accessed: Nov. 22, 2019. [Online]. Available at: http://arxiv.org/abs/1308.0850). In this case, aspects of the velocity profile, the coarse positioning in the working space in absolute coordinates as well as deterministically pre-planned movements are generated by the generator and no longer need to be learned. In the case of force-controlled spiral search movements, the problem is partially linearized, since the deterministic spiral shape does not have to be learned as well, but only the deviations of the real from the planned trajectory. The use of strong priors can significantly reduce the need for training data by an order of magnitude. This effect is particularly noticeable in long trajectories or with strongly deterministic trajectories. When training a component representative for the force-controlled spiral search, the required amount of training data can be reduced by a factor of 20 as part of one exemplary embodiment. The use of a differentiably implemented analytical generator as a strong prior is a considerable advantage. [0149] Representation of the parameter vectors: The parameter vectors x.sub.i of each component representative i are component-dependent and are the result of the concatenation of the respective parameters. Pose-valued parameters can be represented as vectors of length 7, with the first 3 entries representing the position in Cartesian space and the last 4 entries representing the orientation as a quaternion. The quaternion representation has the advantage that they can be interpolated without singularities and the individual components assume smooth curves over time, which significantly simplifies the learning problem. Forces and torques can be represented as vectors of length 6, which designate the forces along the 3 Cartesian spatial directions and the torques around the 3 Cartesian spatial axes. The parameter vectors x.sub.i contain both optimizable and constant parameters. In principle, the component representatives can x.sub.i contain fewer or different parameters than the corresponding program components, as long as a bijection exists between the parameter vectors and the behavior is the same with the same parameterization. This is the case, for example, with “Spiral search (relative)”: for the calculation of the search region, the ARTM module accepts four poses, which lie in a plane and describe the four corners of a parallelogram relative to the starting pose. For the component representative, this representation is converted into two real numbers which describe the extent of the parallelogram in the x- and y-directions. This representation is much more compact, but mathematically equivalent. Long values of x.sub.i complicate the learning and inference problem significantly, and therefore the most compact representations of the parameters are advantageous. [0150] Representation of the state vectors: In an exemplary implementation, s.sub.i consists of the TCP pose of the last data point of the predicted trajectory, using the convention for poses described above. Depending on the form of the method, s.sub.i can exist around forces and torques, the joint-angle position of the robot or the poses of manipulated objects or objects detected in the environment by external sensors. [0151] Representation of trajectories: In one exemplary implementation, trajectories are represented as two-dimensional tensors, with the first variable-length dimension representing the time axis. The second dimension is of fixed length. In the reference implementation, trajectories in the second dimension have 14 entries, wherein the first 7 entries describe the pose of the TCP in world coordinates according to the above convention and the following 6 entries describe the forces and torques according to the above convention. The last entry is the probability of success p.sub.erfolg of the movement, with p.sub.erfolg∈[0, 1]. Furthermore, the space of the trajectories, in particular in the context of the exemplary embodiments, can be designated as y and a trajectory from this space as Y. The trajectory resulting from the execution of the i-th component of a robot program can be designated as Y.sub.i and the n-th vector in the trajectory Y.sub.i as (Y.sub.i).sub.n.

[0152] Training of the learnable representatives as system models for the sub-process encapsulated in the associated component: [0153] Training algorithm for differentiable component representatives: By implementing differentiable component representatives as neural networks, they become trainable. In the exemplary reference implementation according to one exemplary embodiment, these are trained to triples (x.sub.train, s.sub.train, Y.sub.train). x.sub.train is the parameter vector for the program component and contains both the constant and the component parameters that can be optimized. Y.sub.train is a sequence of vectors, each containing the absolute position and orientation of the TCP relative to the base coordinate system of the robot, forces and torques at the TCP in all Cartesian spatial directions, and the status code that encodes whether the component was executed successfully. s.sub.train is the measured system status at the start of execution of the component. The trajectory generator maps (x.sub.train, s.sub.train) to the prior trajectory Ŷ. The recurrent neural network maps (x.sub.train, Ŷ) to Y.sub.res. The expected posterior trajectory Y.sub.pred resulting from the addition of Y.sub.res and Ŷ. The prediction of the position, orientation, force and torque components is treated as a joint learning problem and a joint loss value is calculated using a special loss function. This regression loss is the weighted sum of the mean square error of the position, force and torque components as well as the angular difference of the orientation component encoded in quaternions. The prediction of the status code is treated as a binary classification problem and evaluated by means of the binary cross-entropy. Regression and classification loss are combined by weighted addition and the weights of the neural network are learned using a gradient-based optimization algorithm. The selected representation of trajectories as well as the regression loss function for trajectories are particularly advantageous. [0154] Implementation: For the implementation of the component representatives, in an exemplary reference implementation according to one exemplary embodiment a differentiable generator can be implemented for each supported component type. Since the representatives of different component types only differ structurally in the length of the parameter vector x.sub.i, component representatives can be constructed generically from the associated generator and an untrained neural network. In the reference implementation, the Adam optimization algorithm is used for training the neural networks (cf. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” ArXiv14126980 Cs, December 2014, Accessed: Aug. 12, 2019. [Online]. Available at: http://arxiv.org/abs/1412.6980; algorithm 1, page 2). Before each training step, the entries of x.sub.train, s.sub.train and Y.sub.train are scaled to the domain [−1, 1]. An exception is the p.sub.erfolg entry of x.sub.train, because the binary cross-entropy loss function expects logs. For training the component representatives and subsequent parameter inference, both the label trajectories and the predicted trajectories are filled to a fixed length, since the recurrent components of the network architecture expect sequences of fixed length. To restore the original trajectory, a Boolean flag p.sub.padding is added to the last dimension of the trajectory sensors, which indicates whether the data point belongs to the padding sequence or not. In order to learn the padding, the training algorithm is extended to include another classification problem, similar to the prediction of p.sub.erfolg.

Inference Phase:

[0155] Combination of the learned representatives into complete system models for each contiguous sequence of critical components: [0156] Algorithm: Since program components are executed sequentially and the execution of previous components influences the execution of subsequent components, consecutive trained component representatives are combined to form a common calculation graph (cf. FIG. 9). Context information such as the current position and orientation of the TCP flows from one component to the next via the state vector s. The parameter vectors x.sub.i for each component i are fed into the calculation graph as leaf nodes. The resulting expected posterior overall trajectory of the sub-program is the concatenation of the expected posterior partial trajectories of the constituent component representatives. Each processing step within a component representative is configured in such a way that the output can be differentiated with respect to the input, from which it follows that the entire component representative can be differentiated with respect to the input parameters. The end-to-end differentiability (ability to differentiate the output trajectories with respect to the input parameters) of the component representatives as well as the state vectors s.sub.i ensure the end-to-end differentiability of the overall trajectory with respect to the parameter vectors. This differentiable representation of complex robot programs represents a significant innovation compared to the prior art. [0157] Implementation: Specifically, a Python class hierarchy is instantiated, which maps the program structure and the leaves of which contain the differentiable component representatives trained in step 3. The root object (the program abstraction) keeps an ordered list of all representatives. The differentiable calculation graph is dynamically generated by the Autograd framework of PyTorch during the successive evaluation of the component representatives (cf. A. Paszke et al., “Automatic differentiation in PyTorch”, October 2017, Accessed: Aug. 12, 2019. [Online]. Available at: https://openreview.net/forum?id=BJJsrmfCZ). This reduces the calculation of the overall trajectory to the evaluation of the calculation graph. The state vectors s.sub.i are calculated using only differentiable operations from the predicted sub-trajectories of the preceding components (Y.sub.i-1). In the reference implementation, the calculation corresponds to the removal of the last pose from Y.sub.i-1.

Inference of Optimal Parameters:

[0158] Formulation of the optimization problem: The target function is an input into the optimization algorithm with the signature ϕ: custom-character .fwdarw., and thus maps a trajectory to a real number. The goal of the optimization is to find the optimal parameterization x*, which also maximizes the target function φ.sub.P,ϕ:.fwdarw. with φ.sub.P,ϕ(x)=ϕ(P(x)), where denotes the space of the program parameters and P the differentiable program representation. In order to simplify the implementation, the loss function custom-character =−φ.sub.P,ϕ and the corresponding minimization problem

[00001] $x^{*} = \underset{x}{\arg \min} ℒ (x)$ are considered instead of the target function φ. [0159] Example: Loss function for the cycle time: A loss function for minimizing the cycle time can be defined as follows: custom-character .sub.Zyklus(Y)=Σ.sub.i=1.sup.N (1−σ((Y.sub.i,p.sub.padding−0.5)*T, where σ represents the sigmoid function, N the filled, fixed length of the trajectory, T (˜100) is a constant and Y.sub.i,padding is the entry p.sub.padding of the i-th vector of the trajectory tensor Y. .sub.Zyklus calculates the approximated unfilled length of the trajectory Y and can be differentiated. T determines the accuracy of the approximation. [0160] Example: Loss function for the cycle time: A loss function to minimize the probability of program execution failure can be defined as follow:

[00002] $ℒ_{Fehler} (Y) = 1 - \max (0, \min (\frac{1}{N} {.Math.}_{i = 1}^{N} Y_{i, p_{erfolg}}, 1)),$ where N represents the filled, fixed length of the trajectory and Y.sub.i,p.sub.erfolg the entry p.sub.erfolg of the i-th vector of the trajectory tensor Y. custom-character .sub.Fehler calculates the average probability that the execution of the robot program will fail, over all points of the trajectory. [0161] Algorithm: The program parameters are optimized using a variant of Neural Network Iterative Inversion (NNII) or gradient descent in the input space (cf. D. A. Hoskins, J. N. Hwang, and J. Vagners, “Iterative inversion of neural networks and its application to adaptive control”, IEEE Trans. Neural Netw., Volume 3, No. 2, pp. 292-301, March 1992, doi: 10.1109/72.125870): firstly, the parameter vectors x.sub.i in the calculation graph are initialized with an initial parameterization and the starting state so is initialized with the current state of the robot cell. In each step of the iterative optimization procedure, the expected overall trajectory is predicted by evaluating the calculation graph and the target function is evaluated. Using a gradient-based optimization method, the parameter vectors are adjusted incrementally in the direction of the gradient of the loss function, according to the following formula:

[00003] $\begin{matrix} \frac{d ℒ}{d x_{t}} = \frac{d ℒ}{d ϕ} \frac{\partial ϕ}{\partial P} \frac{\partial P}{x_{t}} = - \frac{\partial ϕ}{\partial P} \frac{\partial P}{\partial x_{t}} \\ x_{t + 1} = x_{t} - \frac{λ d ℒ}{d x_{t}} \end{matrix}$ [0162] The formula refers to a Neural Network Iterative Inversion (NNII) (gradient descent in the input space), where λ is the learning rate. The gradients of parameters labeled as constant are masked out in each optimization step. After a finite number of iterations (100<N<1000), the parameters converge to a local minimum. As with all optimization methods based on gradient descent, NNII is asymptotically optimal for a convex loss function, i.e. converges to a global minimum in an arbitrary number of iteration steps and at an arbitrarily small learning rate. In the actual application, the global convergence of NNII depends on the initial parameterization, due to local minima of the loss function. In practice, convergence can be guaranteed by using Monte Carlo methods (meta-optimization by repeated optimization based on randomly sampled initial parameter settings) or similar blackbox optimization methods, with additional expenditure of computing time. In addition, the initial parameterization, i.e. that originally specified by the robot programmer, is in many cases already located in a locally convex region of the target function around the global optimum. The use of NNII (gradient descent in the input space) for the inference of optimal robot program parameters represents a significant improvement. [0163] Implementation. The PyTorch implementation of the Adam optimization algorithm is used to solve the minimization problem. This is initialized with the parameters of the component representatives of the sub-program currently under consideration that are declared as optimizable (not constant). Reference is made to the following pseudocode for the Neural Network Iterative Inversion (NNII) procedure: [0164] optimizable_params= [0165] [(neural_template.optimizable_parameters( )) [0166] for neural_template in neural_program] [0167] optimizer=Adam(optimizable_params, lr=0.005) [0168] for i in range (n iterations): [0169] trajectory=neural_program.forward( ) [0170] loss=loss_fn(trajectory) [0171] backpropagate(neural_program, loss) [0172] optimizer.update_parameters( )

[0173] The increment (lr or λ) is a globally adjustable hyperparameter of the optimization algorithm, the choice of which depends on the application domain, limitations in the computation time for the optimization, and the desired convergence properties of the optimization method. For large values of λ, Adam converges faster, but with unfavorable combinations of target functions it can oscillate. For small values of λ, Adam converges more slowly, but oscillates much less and terminates closer to the global optimum. Depending on the nature of the procedure, the Adam optimization algorithm can be supplemented by mechanisms such as weight decay or learning rate scheduling, to dynamically balance convergence and runtime. The Autograd library of PyTorch is used to calculate the gradients (backpropagate). Apart from the optimizable input parameters of the components (optimizable_params), all other parameters (constant component parameters, but also the weights of the neural networks within the component representatives) remain constant.

[0174] FIG. 10 shows a recurrent network architecture for one exemplary embodiment of the invention. The length s of the state vector and the length x of the parameter vector can be set or are component-dependent. The sequence length here is set to 500. The batch dimension has been omitted for convenience.

[0175] The network maps inputs (left) to outputs (right).

[0176] Inputs: [0177] The prior trajectory (output of the trajectory generator), a tensor of dimension (500, 13) (a 500×13 matrix, i.e. 500 vectors of length 13) [0178] The current state, a vector of length p, depending on how the state is encoded as a vector. In an exemplary implementation, the length of the state vector depends on the component; some components may require additional information such as the current gripper opening, etc. that other components do not require. [0179] The vector of the input parameters with length x (the length depends on the component because the components have different parameters)

[0180] Outputs: [0181] The residual trajectory, a tensor of dimension (500, 13). In FIG. 8, this is Ŷ.sub.res,i. This residual, added to the prior trajectory, gives the posterior trajectory Ŷ.sub.i. [0182] p.sub.padding: a tensor of dimension (500, 1) that indicates for each time step of the trajectory whether the time step belongs to the padding or not (contains values between 0 and 1). [0183] p.sub.erfolg: a tensor of dimension (500, 1) that specifies for each time step of the trajectory the probability of success of the component at this time (contains values between 0 and 1).

[0184] From left to right, the following function is performed: [0185] First, the state and input vector are converted by repetition into tensors of dimensions (500, p) and (500, x). [0186] The resulting tensor is mapped to a tensor of dimension (500, 256) by a fully connected network layer (FCN). [0187] This is followed by 4 Gated Recurrent Units (GRU), recurrent network layers, each producing output tensors of dimension (500, 256). For a theoretical consideration of GRUs, see K. Cho, et al., “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation,” in EMNLP, Doha, Qatar, October 2014, pp. 1724-1734, doi: 10.3115/v1/D14-1179. For a practical implementation, see the PyTorch implementation of GRUs at https://pytorch.org/docs/master/generated/torch.nn.GRU.html. The GRUs are “residual” (this has nothing to do with the residual trajectory Ŷ.sub.res,i), i.e. the outputs of a GRU are not only inputs for the following GRU, but also the one after that. This is indicated in FIG. 10 by the thin arrows and the dashed tensors. [0188] The output of the last GRU is converted into the residual trajectory by a final fully connected layer, p.sub.padding and p.sub.erfolg. [0189] Each layer is followed by a downstream activation function, but for the sake of simplicity this is not shown in FIG. 10. Scaled Exponential Linear Units (SELU) are used here. For a theoretical consideration of SELUs, see G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-Normalizing Neural Networks,” in NeurIPS, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, published 2017, pp. 971-980. For a practical implementation, see the PyTorch implementation of GRUs at https://pytorch.org/docs/master/generated/torch.nn.SELU.html.

[0190] The training is particularly effective when the network is trained on batches of training data in parallel on a graphics card (GPU). The batch dimension has been omitted in FIG. 10 for simplification purposes. For example, according to a reference implementation according to an exemplary embodiment, batches of size 64 can be used.

[0191] With regard to further advantageous configurations of the method according to the invention and the system according to the invention, reference is made to the general part of the description and to the attached claims in order to avoid repetition.

[0192] Finally, it should be expressly pointed out that the above described exemplary embodiments of the method according to the invention and the system according to the invention serve only to elucidate the claimed teaching, but do not restrict it to the exemplary embodiments.

LIST OF REFERENCE NUMERALS

[0193] 1 semi-symbolic robot program [0194] 2 critical sub-program [0195] 3 critical program component [0196] 4 critical program component [0197] 5 semi-symbolic robot program [0198] 6 critical sub-program [0199] 7 critical program component [0200] 8 critical program component [0201] 9 robot cell [0202] 10 programming system [0203] 11 database [0204] 12 learning system [0205] 13 ontology [0206] 14 system for specifying target functions [0207] 15 inference system

METHOD AND SYSTEM FOR DETERMINING OPTIMIZED PROGRAM PARAMETERS FOR A ROBOT PROGRAM

Inventors

Cpc classification

Classification Explorer

B25J9/1656

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B25J9/161

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G05B2219/40395

PHYSICS

Classification Explorer

G05B2219/40392

PHYSICS

Classification Explorer

B25J9/163

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G05B2219/39298

PHYSICS

International classification

Classification Explorer

B25J9/16

PERFORMING OPERATIONS; TRANSPORTING

Abstract

Claims

Description