METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM INCLUDING A PLURALITY OF OUTPUTS

Abstract

A method for creating a machine learning system, which is configured for segmentation and object detection. The method includes: providing a directed graph, selecting a path through the graph, at least one additional node being selected from a subset and a path being selected through the graph from the input node along the edges via the additional node up to the output node, the path initially being drawn as a function of probabilities of the edges, which defines a drawing probability of all architectures within the graph, creating a machine learning system as a function of the selected path and training the created machine learning system.

Claims

1. A computer-implemented method for creating a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, comprising the following steps: providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input, output, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; selecting a path through the graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as a second output, the path through the graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; creating a machine learning system as a function of the selected path and training the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selecting a path step and the creating and training a machine learning system step; and creating the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.

2. The method as recited in claim 1, wherein for each respective node of the subset, a total number of first subpaths from the respective node of the subset up to the input node and a total number of second subpaths from the respective node of the subset up to the output node are counted, the probabilities of those edges contained in the first subpaths are each initially set to a number of possible paths which connect the input node to the respective node of the subset and extend over those edges contained in the first subpaths, divided by the total number of the first subpaths, and the probabilities of those edges contained in the second subpaths are each initially set to a number of possible paths which connect the output node to the respective node of the subset and extend over those edges contained in the second subpaths, divided by the total number of the second subpaths.

3. The method as recited in claim 1, wherein the nodes of the subset, which all satisfy a predefined property with respect to a data resolution, are each also assigned a probability, the probabilities of the nodes of the subset being normalized.

4. The method as recited in claim 3, wherein the probabilities of the nodes of the subset are initially set to a probability that the number of paths is set by the respective node of the subset divided by the total number of paths through the directed graph.

5. The method as recited in claim 3, wherein the probabilities of the nodes of the subset are initially set to a probability that all nodes of the subset are initially selected with equal probability.

6. The method as recited in claim 1, wherein when selecting the path, at least two additional nodes are selected, a path through the graph including at least two paths, each of which extends via one of the additional nodes to the output node, and the two paths from the input node to the additional nodes being created separately from one another starting at the additional nodes up to the input node.

7. The method as recited in claim 1, wherein during training of the machine learning system, a cost function is optimized, the cost function including one first function, which evaluates an efficiency of the machine learning system with respect to its outputs, and includes one second function, which estimates a latency and/or a computer resource consumption of the machine learning system as a function of a length of the path and of the operations of the edges.

8. A non-transitory machine-readable memory element on which is stored a computer program for creating a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, the computer program, when executed by a computer, causing the computer to perform the following steps: providing a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input, output, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; selecting a path through the graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as a second output, the path through the graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; creating a machine learning system as a function of the selected path and training the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selecting a path step and the creating and training a machine learning system step; and creating the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.

9. A device configured to create a machine learning system, which is configured for segmentation and object description, the machine learning system including an input for receiving an image and two outputs, a first output outputting the segmentation of the image and a second output outputting the object description, the device configured to: provide a directed graph, the directed graph including an input node, an output node, and a plurality of further nodes, the input node and the output node being connected via the further nodes using directed edges, the input, output, and further nodes representing data and the edges representing operations, which convert a first node of each respective edge into a further node connected to the respective edge, each respective edge of the edges being assigned a probability which characterizes with which probability the respective edge is selected; select a path through the graph, a subset of nodes being determined from the plurality of further nodes, all of which satisfy a predefined property with respect to a data resolution, at least one additional node being selected from the subset, which serves as a second output, the path through the graph from the input node along the edges via the additional node up to the output node being selected as a function of the probability assigned to the edges; create a machine learning system as a function of the selected path and train the created machine learning system, adapted parameters of the trained machine learning system being stored in the corresponding edges of the directed graph and the probabilities of the edges of the path being adapted; multiple repeating of the selection of the path and the creation and training of a machine learning system; and create the machine learning system as a function of the directed graph; wherein the probabilities of the edges are set initially to one value, so that all paths through the directed graph are selected with equal probability.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0042] Specific embodiments of the present invention are explained in greater detail below with reference to the figures.

[0043] FIG. 1 schematically shows a One-Shot model including an input and output node as well as two ‘Nodes of Interest’ (NOI).

[0044] FIG. 2 schematically shows a back-directed drawing of a first path from first NOI to the input.

[0045] FIG. 3 schematically shows a back-directed drawing of a second path from the second NOI to the input.

[0046] FIG. 4 schematically shows a back-directed drawing of the second path from the second NOI to the input with a stop.

[0047] FIG. 5 schematically shows a forward-directed drawing of two paths starting with the first path to the output.

[0048] FIG. 6 schematically shows a representation of an initial assignment of normalized probabilities of edges of the One-Shot model.

[0049] FIG. 7 schematically shows a representation of a flowchart of one specific embodiment of the present invention.

[0050] FIG. 8 schematically shows a representation of an actuator control system, according to the present invention.

[0051] FIG. 9 shows one exemplary embodiment for controlling an at least semi-autonomous robot, according to the present invention.

[0052] FIG. 10 schematically shows one exemplary embodiment for controlling a manufacturing system, according to the present invention.

[0053] FIG. 11 schematically shows one exemplary embodiment for controlling an access system, according to the present invention.

[0054] FIG. 12 schematically shows one exemplary embodiment for controlling a monitoring system, according to the present invention.

[0055] FIG. 13 schematically shows one exemplary embodiment for controlling a personal assistant, according to the present invention.

[0056] FIG. 14 schematically shows one exemplary embodiment for controlling a medical imaging system, according to the present invention.

[0057] FIG. 15 shows one possible structure of a training device, according to the present invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0058] In order to find good architectures of deep neural networks for a predefined data set, automatic methods for the architecture search, so-called neural architecture search methods, may be applied. For this purpose, a search space of possible architectures of neural networks is explicitly or implicitly defined.

[0059] The term “operation” will be used below for describing a search space, which describes a calculation rule that converts one or multiple n-dimensional input data tensors into one or multiple output data tensors and, in the process, may have adaptable parameters. In image processing, operations used are, for example, frequently convolutions with various kernel sizes and different types of convolutions (regular convolution, depthwise-separable convolution) and pooling operations.

[0060] Furthermore, a calculation graph (the so-called One-Shot model) will be defined, which includes all architectures in the search space as subgraphs. Since the One-Shot model may be very large, individual architectures may be drawn from the One-Shot model for the training. This occurs typically by drawing individual paths from an established input node to an established output node of the network.

[0061] In the simplest case, when the calculation graph is made up of a chain of nodes, each of which may be connected via various operations, it is sufficient to draw the operation for two consecutive nodes each, which connects them.

[0062] If the One-Shot model is more generally a directed graph, a path may be iteratively drawn by starting at the input, the next node and the connecting operation are then drawn, and this procedure is then continued iteratively up to the target node.

[0063] The One-Shot model with drawing may then be trained by drawing an architecture for each mini-batch and by adapting the weights of the operations in the drawn architecture with the aid of a standard gradient step method. Finding the best architecture may take place either as a separate step after the training of the weights or may be carried out alternatingly with the training of the weights.

[0064] In order to draw architectures from a One-Shot model, which has branches and multiple outputs, a sampling model may be used in one specific embodiment for paths in the opposite direction. For this purpose, one path may be drawn for each output of the One-Shot model, which leads, starting from the output, to the input of the One-Shot model. For drawing the paths, the transposed One-Shot model may be considered for this purpose, in which all directed edges point in the opposite direction as in the original One-Shot model.

[0065] Once the first path has been drawn, it may happen that the next path reaches a node of the previous path. In this case, the drawing of the instantaneous path may be terminated, since a path from the shared node to the input already exists. Alternatively, it is possible to nevertheless draw the path further and to possibly obtain a second path to the input node.

[0066] In addition, the case is to be considered that the drawn architectures include one or multiple node(s) of the One-Shot model, which is/are not situated at the full depth of the network and is/are referred to below as NOI (“Nodes of Interest”), as well as an output at full depth of the One-Shot model. In this case, the creation of the path may take place by a back-directed drawing for the NOIs in order to connect these to the input. In addition, a forward-directed drawing is also carried out for each NOI, which leads to the output of the One-Shot model. As in the case of the back-directed drawing, the drawing in the case of the forward-directed drawing may be stopped once a path is reached that already leads to the output.

[0067] As an alternative to the back-directed drawing, a purely forward-directed drawing may take place by drawing for each NOI a path from the input to the corresponding NOI. This is achieved in that the drawing is carried out only on the subgraph, which is made up of all nodes that lie on a path from the input of the network to the instantaneous NOI, as well as all edges of the One-shot model between these nodes.

[0068] One exemplary embodiment is a multi-task network for object detection and semantic segmentation. The NOIs in this case are nodes at which an object classification output (detection head or object detection head) may be attached. In addition, one more output is used for the semantic segmentation at the output at the full depth of the network.

[0069] The automatic architecture search initially requires the creation of a search space (S21 in FIG. 7), which is constructed here in the form of a One-Shot model G. The One-Shot model in this case contains an input node 10, an output node 11 and multiple nodes in the middle (i.e., not at full depth) of the model, which must be part of the drawing architecture and are referred to as NOI (Nodes of Interest). The One-Shot model in this case is designed in such a way that all paths that start at the input node lead to the output node (cf. FIGS. 1 through 5).

[0070] For each node in G, a probability distribution across the outgoing edges is defined. For each node and for each path, preferably for each set of NOIs, a separate probability distribution may be defined within one architecture. This means, the different paths within one architecture use different probabilities. In addition, the transposed One-Shot model G.sub.t is considered, which has the same node, but all directed edges point in the reverse direction. On G.sub.t, a probability distribution across the outgoing edges is also introduced for each node (this corresponds to a probability distribution across incoming edges in G).

[0071] For the back-directed drawing, a path is drawn in G.sub.t for the first NOI (23 in FIG. 7), which leads from the NOI to the input of the One-Shot model (cf. FIG. 2). This is repeated iteratively for all further NOIs (cf. FIG. 3), the drawing of the individual paths being capable of being stopped once a node of a previous path to the input is reached (cf. FIG. 4). For the forward-directed drawing, a path is drawn in G for the first NOI, which leads from the NOI to the output of the One-Shot model. This is repeated iteratively for all further NOIs, the drawing of the individual paths being capable of being stopped once a node of a previous path to the output is reached (cf. FIG. 5).

[0072] FIG. 5 schematically shows a forward-directed drawing of two paths starting with the first NOI to the output. Drawing the path from the second NOI is stopped in this case since a node from the path of the first NOI has been reached. The overall drawn architecture thus contains both NOIs as well as the output node of the One-Shot model.

[0073] With each drawing of an architecture, the NOIs may vary, since these NOIs may also be randomly drawn.

[0074] Based on graph G, an artificial neural network 60 (depicted in FIG. 8) may then be created and used as explained below.

[0075] FIG. 6 schematically shows a representation of an initial assignment of normalized probabilities of the edges of a simple One-Shot model. For example, two NOIs are selected, NOI′1 including one node and NOI′2 including two nodes. For each of the NOIs, subpaths are then initially counted backwards to input node 10. This results here in 3 subpaths for NOI′1. Each edge of these subpaths is then assigned a normalized probability, so that each of these subpaths is drawn with equal probability. Since just two of the three subpaths extend over the two incoming edges of NOI′1 and only one of the three subpaths extends over the other incoming edge, these edges are assigned a probability of ⅓ and ⅔. As depicted below, the same procedure is then carried out for the forward-directed drawing of NOI′1 to output node 11. The same procedure is further also carried out for both nodes in NOI′2.

[0076] FIG. 7 schematically shows a flowchart of a method for creating a machine learning system using the above-explained procedure for initializing the probabilities as well as for drawing architectures from graph G.

[0077] The method may start with step S21, in which graph G is provided.

[0078] This is followed by step S22. In this step, the probabilities of the edges as explained in FIG. 6 are normalized.

[0079] Step S23 then follows. In this step, the architectures are drawn from the graph as a function of the probabilities of the edges. This is followed by the steps of training the drawn architecture as well as of the transfer of the optimized parameters and probabilities as a result of the training into the graph.

[0080] FIG. 8 shows an actuator 10 in its surroundings 20 in interaction with a control system 40. Surroundings 20 are detected at preferably regular temporal intervals in a sensor 30, in particular, in an imaging sensor such as a video sensor, which may also be provided as a plurality of sensors, for example, a stereo camera. Other imaging sensors are also possible such as, for example, radar, ultrasound or LIDAR. A thermal imaging camera is also possible. Sensor signal S—or in the case of multiple sensors, one sensor signal S each—of sensor 30 is conveyed to control system 40. Control system 40 thus receives a sequence of sensor signals S. Control system 40 ascertains therefrom activation signals A, which are transferred to actuator 10.

[0081] Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, each sensor signal S may also be directly adopted as input image x). Input image x may, for example, be a section or a further processing of sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is ascertained as a function of sensor signal S. The sequence of input images x is fed to a machine learning system, in the exemplary embodiment, to an artificial neural network 60, which has been created, for example, according to the method according to FIG. 7.

[0082] Artificial neural network 60 is preferably parameterized by parameters ϕ, which are stored in a parameter memory St.sub.1 and are provided by the latter.

[0083] Artificial neural network 60 ascertains from input images x output variables y. These output variables y may include, in particular, a classification and semantic segmentation of input images x. Output variables y are fed to an optional forming unit 80, which ascertains therefrom activation signals A, which are fed to actuator 10 in order to activate actuator 10 accordingly. Output variable y includes pieces of information about objects that sensor 30 has detected.

[0084] Actuator 10 receives activation signals A, is activated accordingly and carries out a corresponding action. Actuator 10 in this case may include a (not necessarily structurally integrated) activation logic, which ascertains a second activation signal from activation signal A, with which actuator 10 is then activated.

[0085] In further specific embodiments, control system 40 includes sensor 30. In still further specific embodiments, control system 40 includes alternatively or in addition also actuator 10.

[0086] In further preferred specific embodiments, control system 40 includes a singular or a plurality of processors 45 and at least one machine-readable memory medium 46, on which instructions are stored which, when they are carried out on processors 45, prompt control system 40 to carry out the method according to the present invention.

[0087] In alternative specific embodiments, a display unit 10a is provided alternatively or in addition to actuator 10.

[0088] FIG. 9 shows how control system 40 may be used for controlling an at least semi-autonomous robot, here, of an at least semi-autonomous motor vehicle 100.

[0089] Sensor 30 may, for example, be a video sensor preferably situated in motor vehicle 100.

[0090] Artificial neural network 60 is configured to reliably identify objects from input images x.

[0091] Actuator 10 preferably situated in motor vehicle 100 may, for example, be a brake, a drive or a steering of motor vehicle 100. Activation signal A may be ascertained in such a way that the actuator or actuators 10 is/are activated in such a way that motor vehicle 100 prevents, for example, a collision with the objects reliably identified by artificial neural network 60, in particular, if it involves objects of particular classes, for example, pedestrians.

[0092] Alternatively, the at least one semi-autonomous robot may also be another mobile robot (not depicted), for example, one which moves by flying, floating, diving or pacing. The mobile robot may, for example, also be an at least semi-autonomous lawn mower or an at least semi-autonomous cleaning robot. In these cases as well, activation signal A may be ascertained in such a way that the drive and/or steering of the mobile robot may be activated in such a way that the at least semi-autonomous robot prevents, for example, a collision with objects identified by artificial neural network 60.

[0093] Alternatively or in addition, display unit 10a may be activated with activation signal A and, for example, the ascertained safe areas may be shown. It is also possible, for example, in a motor vehicle 100 with non-automated steering, that display unit 10a is activated with activation signal A in such a way that it outputs a visual or acoustic warning signal if it is ascertained that motor vehicle 100 threatens to collide with one of the reliably identified objects.

[0094] FIG. 10 shows one exemplary embodiment, in which control system 40 is used for activating a manufacturing machine 11 of a manufacturing system 200 by activating an actuator 10 controlling this manufacturing machine 11. Manufacturing machine 11 may, for example, be a machine for stamping, sawing, drilling and/or cutting.

[0095] Sensor 30 may then, for example, be an optical sensor, which detects, for example, properties of manufactured products 12a, 12b. It is possible that these manufactured products 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is activated as a function of an assignment of detected manufactured products 12a, 12b, so that manufacturing machine 11 accordingly carries out a subsequent processing step of the correct one of manufactured products 12a, 12b. It is also possible that by identifying the correct properties of the same one of manufactured products 12a, 12b (i.e., without a misclassification), manufacturing machine 11 accordingly adapts the same manufacturing step for a processing of a subsequent manufactured product.

[0096] FIG. 11 shows one exemplary embodiment, in which control system 40 is used for controlling an access system 300. Access system 300 may include a physical access control, for example, a door 401. Video sensor 30 is configured to detect a person. This detected image may be interpreted with the aid of object identification system 60. If multiple persons are detected simultaneously, it is possible to particularly reliable ascertain the identity of the persons by an assignment of the persons (i.e., of the objects) relative to one another, for example, by an analysis of their movements. Actuator 10 may be a lock, which releases or does not release the access control as a function of activation signal A, for example, opens or does not open door 401. For this purpose, activation signal A may be selected as a function of the interpretation of object identification system 60, for example as a function of the ascertained identity of the person. Instead of the physical access control, a logical access control may also be provided.

[0097] FIG. 12 shows one exemplary embodiment, in which control system 40 is used for controlling a monitoring system 400. This exemplary embodiment differs from the exemplary embodiment represented in FIG. 5 in that instead of actuator 10, display unit 10a is provided, which is activated by control system 40. For example, an identity of the objects recorded by video sensor 30 may be reliably ascertained by artificial neural network 60, in order as a function thereof, for example, to conclude which become suspicious, and activation signal A is then selected in such a way that this object is displayed color-highlighted by display unit 10a.

[0098] FIG. 13 shows one exemplary embodiment, in which control system 40 is used for controlling a personal assistant 250. Sensor 30 is preferably an optical sensor, which receives images of a gesture of a user 249.

[0099] Control system 40 ascertains as a function of the signals of sensor 30 an activation signal A of personal assistant 250, for example, by the neural network carrying out a gesture recognition. This ascertained activation signal A is then conveyed to personal assistant 250 and it is thus activated accordingly. This ascertained activation signal A may be selected, in particular, in such a way that it corresponds to a presumed desired activation by user 249. This presumed desired activation may be ascertained as a function of the gesture recognized by artificial neural network 60. Control system 40 may then select activation signal A as a function of the presumed desired activation for the conveyance to personal assistant 250 and/or may select activation signal A for the conveyance to the personal assistant corresponding to presumed desired activation 250.

[0100] This corresponding activation may, for example, entail personal assistant 250 retrieving pieces of information from a database and intelligibly reproducing these for user 249.

[0101] Instead of personal assistant 250, a household appliance (not depicted), in particular, a washing machine, a stove, an oven, a microwave or a dishwasher may also be provided in order to be activated accordingly.

[0102] FIG. 14 shows one exemplary embodiment, in which control system 40 is used for controlling a medical imaging system 500, for example, an MRT device, X-ray device or ultrasound device. Sensor 30 may, for example, be provided as an imaging sensor, display unit 10a is activated by control system 40. For example, it may be ascertained by neural network 60 whether an area recorded by the imaging sensor is striking, and activation signal A is then selected in such a way that this area is displayed color-highlighted by display unit 10a.

[0103] FIG. 15 shows an exemplary training device 140 for training a drawn machine learning system from graph G, in particular, of corresponding neural network 60. Training device 140 includes a provider 71, which provides, for example, input images x and setpoint output images ys, for example, setpoint classifications. Input image x is fed to artificial neural network 60 to be trained, which ascertains therefrom output variables y. Output variables y and setpoint variables ys are fed to a comparator 75, which ascertains therefrom new parameters ϕ′, as a function of a correspondence, of respective output variables y and setpoint variables ys, which are conveyed to parameter memory P where they replace parameters ϕ.

[0104] The methods carried out by training system 140 implemented as computer program may be stored on a machine-readable memory medium 147 and carried out by a processor 148.

[0105] It is, of course, not necessary to classify whole images. It is possible that, for example, image details are classified as objects using a detection algorithm, that these image details are then cut out, optionally a new image detail is generated and used in the associated image instead of the cut out image detail.

[0106] The term “computer” includes arbitrary devices for processing predefinable calculation rules. These calculation rules may be present in the form of software or in the form of hardware or also in a mixture of software and hardware.

METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM INCLUDING A PLURALITY OF OUTPUTS

Inventors

Cpc classification

Classification Explorer

G06V10/255

PHYSICS

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V20/70

PHYSICS

Classification Explorer

G06V10/84

PHYSICS

Classification Explorer

G06N3/0985

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

International classification

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V10/84

PHYSICS

Abstract

Claims

Description