METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM INCLUDING A PLURALITY OF OUTPUTS
20230022777 · 2023-01-26
Inventors
Cpc classification
G06V10/255
PHYSICS
G06N7/01
PHYSICS
G06V20/70
PHYSICS
G06V10/84
PHYSICS
G06N3/0985
PHYSICS
International classification
Abstract
A method for creating a machine learning system, which is configured for segmentation and object detection. The method includes: providing a directed graph, selecting a path through the graph, at least one additional node being selected from a subset and a path being selected through the graph from the input node along the edges via the additional node up to the output node, the path initially being drawn as a function of probabilities of the edges, which defines a drawing probability of all architectures within the graph, creating a machine learning system as a function of the selected path and training the created machine learning system.
Claims
1. A computer-implemented method for creating a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, comprising the following steps: providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input, output, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; selecting a path through the graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as a second output, the path through the graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; creating a machine learning system as a function of the selected path and training the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selecting a path step and the creating and training a machine learning system step; and creating the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.
2. The method as recited in claim 1, wherein for each respective node of the subset, a total number of first subpaths from the respective node of the subset up to the input node and a total number of second subpaths from the respective node of the subset up to the output node are counted, the probabilities of those edges contained in the first subpaths are each initially set to a number of possible paths which connect the input node to the respective node of the subset and extend over those edges contained in the first subpaths, divided by the total number of the first subpaths, and the probabilities of those edges contained in the second subpaths are each initially set to a number of possible paths which connect the output node to the respective node of the subset and extend over those edges contained in the second subpaths, divided by the total number of the second subpaths.
3. The method as recited in claim 1, wherein the nodes of the subset, which all satisfy a predefined property with respect to a data resolution, are each also assigned a probability, the probabilities of the nodes of the subset being normalized.
4. The method as recited in claim 3, wherein the probabilities of the nodes of the subset are initially set to a probability that the number of paths is set by the respective node of the subset divided by the total number of paths through the directed graph.
5. The method as recited in claim 3, wherein the probabilities of the nodes of the subset are initially set to a probability that all nodes of the subset are initially selected with equal probability.
6. The method as recited in claim 1, wherein when selecting the path, at least two additional nodes are selected, a path through the graph including at least two paths, each of which extends via one of the additional nodes to the output node, and the two paths from the input node to the additional nodes being created separately from one another starting at the additional nodes up to the input node.
7. The method as recited in claim 1, wherein during training of the machine learning system, a cost function is optimized, the cost function including one first function, which evaluates an efficiency of the machine learning system with respect to its outputs, and includes one second function, which estimates a latency and/or a computer resource consumption of the machine learning system as a function of a length of the path and of the operations of the edges.
8. A non-transitory machine-readable memory element on which is stored a computer program for creating a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, the computer program, when executed by a computer, causing the computer to perform the following steps: providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input, output, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; selecting a path through the graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as a second output, the path through the graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; creating a machine learning system as a function of the selected path and training the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selecting a path step and the creating and training a machine learning system step; and creating the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.
9. A device configured to create a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, the device configured to: provide a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input, output, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; select a path through the graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as a second output, the path through the graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; create a machine learning system as a function of the selected path and train the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selection of the path and the creation and training of a machine learning system; and create the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0042] Specific embodiments of the present invention are explained in greater detail below with reference to the figures.
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0058] In order to find good architectures of deep neural networks for a predefined data set, automatic methods for the architecture search, so-called neural architecture search methods, may be applied. For this purpose, a search space of possible architectures of neural networks is explicitly or implicitly defined.
[0059] The term “operation” will be used below for describing a search space, which describes a calculation rule that converts one or multiple n-dimensional input data tensors into one or multiple output data tensors and, in the process, may have adaptable parameters. In image processing, operations used are, for example, frequently convolutions with various kernel sizes and different types of convolutions (regular convolution, depthwise-separable convolution) and pooling operations.
[0060] Furthermore, a calculation graph (the so-called One-Shot model) will be defined, which includes all architectures in the search space as subgraphs. Since the One-Shot model may be very large, individual architectures may be drawn from the One-Shot model for the training. This occurs typically by drawing individual paths from an established input node to an established output node of the network.
[0061] In the simplest case, when the calculation graph is made up of a chain of nodes, each of which may be connected via various operations, it is sufficient to draw the operation for two consecutive nodes each, which connects them.
[0062] If the One-Shot model is more generally a directed graph, a path may be iteratively drawn by starting at the input, the next node and the connecting operation are then drawn, and this procedure is then continued iteratively up to the target node.
[0063] The One-Shot model with drawing may then be trained by drawing an architecture for each mini-batch and by adapting the weights of the operations in the drawn architecture with the aid of a standard gradient step method. Finding the best architecture may take place either as a separate step after the training of the weights or may be carried out alternatingly with the training of the weights.
[0064] In order to draw architectures from a One-Shot model, which has branches and multiple outputs, a sampling model may be used in one specific embodiment for paths in the opposite direction. For this purpose, one path may be drawn for each output of the One-Shot model, which leads, starting from the output, to the input of the One-Shot model. For drawing the paths, the transposed One-Shot model may be considered for this purpose, in which all directed edges point in the opposite direction as in the original One-Shot model.
[0065] Once the first path has been drawn, it may happen that the next path reaches a node of the previous path. In this case, the drawing of the instantaneous path may be terminated, since a path from the shared node to the input already exists. Alternatively, it is possible to nevertheless draw the path further and to possibly obtain a second path to the input node.
[0066] In addition, the case is to be considered that the drawn architectures include one or multiple node(s) of the One-Shot model, which is/are not situated at the full depth of the network and is/are referred to below as NOI (“Nodes of Interest”), as well as an output at full depth of the One-Shot model. In this case, the creation of the path may take place by a back-directed drawing for the NOIs in order to connect these to the input. In addition, a forward-directed drawing is also carried out for each NOI, which leads to the output of the One-Shot model. As in the case of the back-directed drawing, the drawing in the case of the forward-directed drawing may be stopped once a path is reached that already leads to the output.
[0067] As an alternative to the back-directed drawing, a purely forward-directed drawing may take place by drawing for each NOI a path from the input to the corresponding NOI. This is achieved in that the drawing is carried out only on the subgraph, which is made up of all nodes that lie on a path from the input of the network to the instantaneous NOI, as well as all edges of the One-shot model between these nodes.
[0068] One exemplary embodiment is a multi-task network for object detection and semantic segmentation. The NOIs in this case are nodes at which an object classification output (detection head or object detection head) may be attached. In addition, one more output is used for the semantic segmentation at the output at the full depth of the network.
[0069] The automatic architecture search initially requires the creation of a search space (S21 in
[0070] For each node in G, a probability distribution across the outgoing edges is defined. For each node and for each path, preferably for each set of NOIs, a separate probability distribution may be defined within one architecture. This means, the different paths within one architecture use different probabilities. In addition, the transposed One-Shot model G.sub.t is considered, which has the same node, but all directed edges point in the reverse direction. On G.sub.t, a probability distribution across the outgoing edges is also introduced for each node (this corresponds to a probability distribution across incoming edges in G).
[0071] For the back-directed drawing, a path is drawn in G.sub.t for the first NOI (23 in
[0072]
[0073] With each drawing of an architecture, the NOIs may vary, since these NOIs may also be randomly drawn.
[0074] Based on graph G, an artificial neural network 60 (depicted in
[0075]
[0076]
[0077] The method may start with step S21, in which graph G is provided.
[0078] This is followed by step S22. In this step, the probabilities of the edges as explained in
[0079] Step S23 then follows. In this step, the architectures are drawn from the graph as a function of the probabilities of the edges. This is followed by the steps of training the drawn architecture as well as of the transfer of the optimized parameters and probabilities as a result of the training into the graph.
[0080]
[0081] Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, each sensor signal S may also be directly adopted as input image x). Input image x may, for example, be a section or a further processing of sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is ascertained as a function of sensor signal S. The sequence of input images x is fed to a machine learning system, in the exemplary embodiment, to an artificial neural network 60, which has been created, for example, according to the method according to
[0082] Artificial neural network 60 is preferably parameterized by parameters ϕ, which are stored in a parameter memory St.sub.1 and are provided by the latter.
[0083] Artificial neural network 60 ascertains from input images x output variables y. These output variables y may include, in particular, a classification and semantic segmentation of input images x. Output variables y are fed to an optional forming unit 80, which ascertains therefrom activation signals A, which are fed to actuator 10 in order to activate actuator 10 accordingly. Output variable y includes pieces of information about objects that sensor 30 has detected.
[0084] Actuator 10 receives activation signals A, is activated accordingly and carries out a corresponding action. Actuator 10 in this case may include a (not necessarily structurally integrated) activation logic, which ascertains a second activation signal from activation signal A, with which actuator 10 is then activated.
[0085] In further specific embodiments, control system 40 includes sensor 30. In still further specific embodiments, control system 40 includes alternatively or in addition also actuator 10.
[0086] In further preferred specific embodiments, control system 40 includes a singular or a plurality of processors 45 and at least one machine-readable memory medium 46, on which instructions are stored which, when they are carried out on processors 45, prompt control system 40 to carry out the method according to the present invention.
[0087] In alternative specific embodiments, a display unit 10a is provided alternatively or in addition to actuator 10.
[0088]
[0089] Sensor 30 may, for example, be a video sensor preferably situated in motor vehicle 100.
[0090] Artificial neural network 60 is configured to reliably identify objects from input images x.
[0091] Actuator 10 preferably situated in motor vehicle 100 may, for example, be a brake, a drive or a steering of motor vehicle 100. Activation signal A may be ascertained in such a way that the actuator or actuators 10 is/are activated in such a way that motor vehicle 100 prevents, for example, a collision with the objects reliably identified by artificial neural network 60, in particular, if it involves objects of particular classes, for example, pedestrians.
[0092] Alternatively, the at least one semi-autonomous robot may also be another mobile robot (not depicted), for example, one which moves by flying, floating, diving or pacing. The mobile robot may, for example, also be an at least semi-autonomous lawn mower or an at least semi-autonomous cleaning robot. In these cases as well, activation signal A may be ascertained in such a way that the drive and/or steering of the mobile robot may be activated in such a way that the at least semi-autonomous robot prevents, for example, a collision with objects identified by artificial neural network 60.
[0093] Alternatively or in addition, display unit 10a may be activated with activation signal A and, for example, the ascertained safe areas may be shown. It is also possible, for example, in a motor vehicle 100 with non-automated steering, that display unit 10a is activated with activation signal A in such a way that it outputs a visual or acoustic warning signal if it is ascertained that motor vehicle 100 threatens to collide with one of the reliably identified objects.
[0094]
[0095] Sensor 30 may then, for example, be an optical sensor, which detects, for example, properties of manufactured products 12a, 12b. It is possible that these manufactured products 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is activated as a function of an assignment of detected manufactured products 12a, 12b, so that manufacturing machine 11 accordingly carries out a subsequent processing step of the correct one of manufactured products 12a, 12b. It is also possible that by identifying the correct properties of the same one of manufactured products 12a, 12b (i.e., without a misclassification), manufacturing machine 11 accordingly adapts the same manufacturing step for a processing of a subsequent manufactured product.
[0096]
[0097]
[0098]
[0099] Control system 40 ascertains as a function of the signals of sensor 30 an activation signal A of personal assistant 250, for example, by the neural network carrying out a gesture recognition. This ascertained activation signal A is then conveyed to personal assistant 250 and it is thus activated accordingly. This ascertained activation signal A may be selected, in particular, in such a way that it corresponds to a presumed desired activation by user 249. This presumed desired activation may be ascertained as a function of the gesture recognized by artificial neural network 60. Control system 40 may then select activation signal A as a function of the presumed desired activation for the conveyance to personal assistant 250 and/or may select activation signal A for the conveyance to the personal assistant corresponding to presumed desired activation 250.
[0100] This corresponding activation may, for example, entail personal assistant 250 retrieving pieces of information from a database and intelligibly reproducing these for user 249.
[0101] Instead of personal assistant 250, a household appliance (not depicted), in particular, a washing machine, a stove, an oven, a microwave or a dishwasher may also be provided in order to be activated accordingly.
[0102]
[0103]
[0104] The methods carried out by training system 140 implemented as computer program may be stored on a machine-readable memory medium 147 and carried out by a processor 148.
[0105] It is, of course, not necessary to classify whole images. It is possible that, for example, image details are classified as objects using a detection algorithm, that these image details are then cut out, optionally a new image detail is generated and used in the associated image instead of the cut out image detail.
[0106] The term “computer” includes arbitrary devices for processing predefinable calculation rules. These calculation rules may be present in the form of software or in the form of hardware or also in a mixture of software and hardware.