Aggregating a dataset into a function term with the aid of transformer networks
20230032634 · 2023-02-02
Inventors
Cpc classification
G06N5/01
PHYSICS
International classification
Abstract
A method for aggregating a dataset, which respectively assigns an output variable value to a plurality of input variable vectors, into a function term. In the method, one or more elementary function expression(s) from an alphabet is/are sampled using a neural transform network. The elementary function expressions are assembled to form one or more candidate function term(s). When the candidate function term(s) is/are complete, the input variables are mapped to associated candidate output variable values using each candidate function term. A deviation between candidate output variable values and corresponding output variable values of the dataset is evaluated using a predefined metric. It is checked whether a predefined abort condition is satisfied. If the abort condition has not been satisfied, parameters which characterize the behavior of the transformer network are updated and branching back for sampling elementary function expressions using the transformer network takes place.
Claims
1-16. (canceled)
17. A method for aggregating a dataset, which respectively assigns an output variable value to a plurality of input variable vectors, into a function term, the method including the following steps: sampling one or a plurality of elementary function expressions from a given alphabet using a neural network, the neural network being a transformer network; assembling the one or plurality of elementary function expressions to form one or more candidate function terms; checking whether the one or more candidate function terms is complete; based on the one or more candidate function terms being not yet complete, branching back for sampling further elementary function expressions; based on the one or more candidate function terms being complete, respectively mapping the input variable vectors onto associated candidate output variable values using each of the one or more candidate function terms; evaluating a deviation between the associated candidate output variable values and corresponding output variable values from the dataset using a predefined metric; checking whether a predefined abort condition is satisfied; based on the abort condition not being satisfied: updating parameters that characterize a behavior of the transformer network with a goal that a renewed sampling of function expressions and assembling of the renewed sampled expressions to form one or more complete candidate function terms will likely improve the evaluation then obtained, and branching back to the sampling of elementary function expressions using the transformer network; and based on the predefined abort condition being satisfied, ascertaining a candidate function term of the one or more candidate function terms having the best evaluation as a desired function term into which the dataset is aggregated.
18. The method as recited in claim 17, wherein one or more elementary function expressions of at least one candidate function term and its/their positions in the candidate function term is/are additionally conveyed to the transformer network.
19. The method as recited in claim 17, wherein: numerical codes are respectively assigned to the elementary function expressions from the alphabet, and their positions in the candidate function term, at least one candidate function term is converted into a representation formed from the numerical codes; and the representation is supplied to the transformer network.
20. The method as recited in claim 19, wherein the numerical codes for the positions of elementary function expressions in the candidate function term indicate positions of the elementary function expressions in a semantic expression tree of the candidate function term, in which: operators or functions on the one hand and operands on the other hand form the nodes, and a node which belongs to an operator or a function has as children the nodes that belong to the operands that are processed by the operator or this function.
21. The method as recited in claim 20, wherein numerical codes are assigned also to non-occupied positions in the tree.
22. The method as recited in claim 20, wherein the numerical codes include vectors that respectively have separate components for levels of the tree, and each component assigned to a level indicates a direction in which branching took place on a path from a root of the tree to the node in a transition to the respective level.
23. The method as recited in claim 17, wherein the parameters that characterize the behavior of the transformer network are optimized toward a goal of improving an evaluation averaged across a plurality or distribution of candidate function terms.
24. The method as recited in claim 17, wherein only deviations that stem from a selection of best-evaluated candidate function terms are used for updating the parameters.
25. The method as recited in claim 17, wherein the input variable vectors and/or the output variable values, include measured data that were recorded using at least one sensor.
26. The method as recited in claim 25, wherein the output variable is a measured variable of a first sensor, and the input variable vectors include measured variables of further sensors from which the measured variable of the first sensor is ascertainable at least as an approximation.
27. The method as recited in claim 17, wherein: measured data that were recorded using at least one sensor are mapped as components of the input variable vectors, using the ascertained function term, to output variable values; an actuation signal is formed from the output variable values; and a vehicle is actuated using the actuation signal.
28. The method as recited in claim 17, wherein: the alphabet is restricted to operators or functions that are available on a predefined embedded platform for the evaluation of the ascertained function term, and the predefined embedded platform is set up for the evaluation of the ascertained function term.
29. The method as recited in claim 23, wherein the elementary function expressions of at least one best-evaluated candidate function term and their positions in the best-evaluated candidate function term in multiple epochs of the optimization are supplied to the transformer network.
30. A non-transitory machine-readable data carrier on which is stored a computer program including machine-readable instructions for aggregating a dataset, which respectively assigns an output variable value to a plurality of input variable vectors, into a function term, the instructions, when executed by a computer, causing the computer to perform the following steps: sampling one or a plurality of elementary function expressions from a given alphabet using a neural network, the neural network being a transformer network; assembling the one or plurality of elementary function expressions to form one or more candidate function terms; checking whether the one or more candidate function terms is complete; based on the one or more candidate function terms being not yet complete, branching back for sampling further elementary function expressions; based on the one or more candidate function terms being complete, respectively mapping the input variable vectors onto associated candidate output variable values using each of the one or more candidate function terms; evaluating a deviation between the associated candidate output variable values and corresponding output variable values from the dataset using a predefined metric; checking whether a predefined abort condition is satisfied; based on the abort condition not being satisfied: updating parameters that characterize a behavior of the transformer network with a goal that a renewed sampling of function expressions and assembling of the renewed sampled expressions to form one or more complete candidate function terms will likely improve the evaluation then obtained, and branching back to the sampling of elementary function expressions using the transformer network; and based on the predefined abort condition being satisfied, ascertaining a candidate function term of the one or more candidate function terms having the best evaluation as a desired function term into which the dataset is aggregated.
31. One or more computers configured to aggregate a dataset, which respectively assigns an output variable value to a plurality of input variable vectors, into a function term, the one or more computers configured to: sample one or a plurality of elementary function expressions from a given alphabet using a neural network, the neural network being a transformer network; assemble the one or plurality of elementary function expressions to form one or more candidate function terms; check whether the one or more candidate function terms is complete; based on the one or more candidate function terms being not yet complete, branch back for sampling further elementary function expressions; based on the one or more candidate function terms being complete, respectively map the input variable vectors onto associated candidate output variable values using each of the one or more candidate function terms; evaluate a deviation between the associated candidate output variable values and corresponding output variable values from the dataset using a predefined metric; check whether a predefined abort condition is satisfied; based on the abort condition not being satisfied: update parameters that characterize a behavior of the transformer network with a goal that a renewed sampling of function expressions and assembling of the renewed sampled expressions to form one or more complete candidate function terms will likely improve the evaluation then obtained, and branch back to the sampling of elementary function expressions using the transformer network; and based on the predefined abort condition being satisfied, ascertain a candidate function term of the one or more candidate function terms having the best evaluation as a desired function term into which the dataset is aggregated.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0043]
[0044]
[0045]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0046]
[0047] In step 110, a function expression or a plurality of function expressions from an alphabet A is/are sampled with the aid of transformer network 1.
[0048] In the process, alphabet A according to block 111 is able to be restricted to operators or functions that are available on a predefined embedded platform for the evaluation of ascertained function term 4*.
[0049] In step 120, these elementary function expressions 3a-3d are assembled to form one or more candidate function term(s) 4.
[0050] According to block 112, numerical codes 6a-6d; 7a-7d are able to be assigned to elementary function expressions 3a-3d from alphabet A as well as their positions 3a #-3d # in candidate function term 4 in each case. According to block 113, at least one candidate function term 4 is then able to be converted into a representation 8 formed from these numerical codes 6a-6d; 7a-7d. According to block 114, this representation 8 may then be conveyed to transformer network 1 during sampling 110 in order to be able to develop candidate function term 4 also in multiple steps of the sampling.
[0051] In step 125, it is checked whether the candidate function term(s) 4 is/are complete. If this is not the case (truth value 0), branching back for the sampling 110 of further elementary function expressions then takes place in step 126.
[0052] However, if the candidate function term(s) 4 is/are complete (truth value 1), input variables X.sub.i are mapped in step 130 to associated candidate output variable values y.sub.i* with the aid of each candidate function term 4.
[0053] In step 140, a deviation between candidate output variable values y.sub.i* and corresponding output variable values y.sub.i from dataset 2 are evaluated using a predefined metric 5.
[0054] In step 180, it is checked whether a predefined abort condition is satisfied. If this is not the case, [0055] parameters θ that characterize the behavior of transformer network 1 are updated in step 150 with the goal that the renewed sampling of function expressions 3a-3d and the assembly of these expressions to form one or more complete candidate function term(s) (4) most likely improves the evaluation 5a then obtained, and [0056] branching back to sampling 110 of elementary function expressions 3a-3d using transformer network 1 takes place in step 160.
[0057] In the process, according to block 151, parameters θ which characterizes the behavior of transformer network 1 can be optimized with the goal of improving an evaluation 5a averaged across a plurality or distribution of candidate function terms 4.
[0058] According to block 152, only deviations that stem from a selection of best-evaluated candidate function terms 4 may be used for updating parameters θ.
[0059] Optionally, in step 170, one or more elementary function expression(s) 3a-3d of at least one candidate function term 4 and its/their position(s) 3a #-3d # in this candidate function term 4 may additionally be conveyed to transformer network 1. In the process, for instance, especially the elementary function expressions 3a-3d and also their positions 3a #-3d # are able to be encoded by numerical codes 6a-6d; 7a-7d in the same way as in the original preparation of the complete candidate function term.
[0060] According to block 174, elementary function expressions 3a-3d of at least one best-evaluated candidate function term 4 and their positions 3a #-3d # in this best-evaluated candidate function term 4 in a plurality of optimization epochs may be conveyed to transformer network 1.
[0061] On the other hand, if the abort condition is satisfied (truth value 1 in step 180), then a candidate function term 4 having the best evaluation 5a is ascertained as the desired function term 4* in step 190 into which dataset 2 is aggregated. If there is a selection from among a plurality of candidate function terms 4 of different complexity, then in particular a less complex candidate function term 4 may be given priority.
[0062] In step 210, measured data that were recorded using at least one sensor are mapped as components of input variable vectors X.sub.i with the ascertained function term 4* to output variable values y.sub.i.
[0063] In step 220, an actuation signal 220a is formed from these output variable values y.sub.i.
[0064] In step 230, a vehicle 50 is actuated with the aid of this actuation signal 220a.
[0065] If alphabet A was restricted to the operators or functions available on a predefined embedded platform according to block 111, then this predefined embedded platform is set up in step 240 for the evaluation of ascertained function term 4*.
[0066]
[0067] Via preprocessing layers 11 and/or 12, elementary function expressions 3a-3d as well as their positions 3a #-3d #, and/or their numerical codes 6a-6d, 7a-7d are processed into an input 1a for transformer network 1. Transformer network 1 includes two multi-head attention layers 13 and 14, which generate an output 1b. This output 1b is combined in an averaging layer 15 and processed into a softmax probability distribution p(δ) for elementary function expressions 6. The next elementary function expression 3a-3d to be added to the function term is drawn from this probability distribution p(δ). This elementary function expression is assigned the position 7e with numerical code 5 in expression tree 9.
[0068]
[0069] According to
[0070] In contrast, in
[0071] According to
[0072] All nodes that are obtained by branching to the left from the root are given the direction −1 in the second component of their numerical code. All nodes that are obtained by branching to the right of the root receive the direction 1 in the second component of their numerical code. For the nodes at the second level of the tree, the third component is still 0 because the third level has not yet been reached.
[0073] In an analogous manner, branching to the left in the transition from the second to the third level of the tree leads to an entry −1, and branching to the right leads to an entry 1 in the third component of the numerical code.