METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM

Abstract

A method for creating a machine learning system which is designed for segmentation and object detection in images. The method includes: providing a directed graph; selecting a path through the graph, at least one additional node being selected from this subset, a path through the graph from the input node along the edges via the additional node up to the output node being selected; creating a machine learning system as a function of the selected path; and training the machine learning system created.

Claims

1. A computer-implemented method for creating a machine learning system that is configured for segmentation and object detection in images, the machine learning system having one input for receiving an image and two outputs, a first output of the two outputs outputting the segmentation of the image and a second output of the two outputs outputting the object detection, the method comprising the following steps: providing a directed graph, the directed graph having an input node, an output node, and a number of further nodes, the output node being connected via the further nodes using directed edges, and the nodes representing data and the edges representing operations that define a calculation rule and transfer a first node of the edges to further nodes connected to the respective edge; selecting a path through the graph, including: from the number of further nodes, a subset is determined, all of whose nodes satisfy a predetermined characteristic with respect to data resolution, from the subset, at least one additional node is selected which is used as output for the object detection, the selected path is a path through the graph from the input node along the edges via the additional node up to the output node; creating a machine learning system as a function of the selected path; and training the created machine learning system, adapted parameters of the machine learning system being stored in corresponding edges of the directed graph; repeating the selecting a path step; and creating the machine learning system as a function of the directed graph.

2. The method as recited in claim 1, wherein at least two additional nodes are selected, the path through the graph has at least two routes, each of which running via one of the additional nodes to the output node, and the two routes from the input node to the additional nodes, beginning at the additional nodes, being created independently of each other up to the input node.

3. The method as recited in claim 2, wherein when a second route of the two routes encounters the first route of the two routes, then a remaining portion of the first route is used for the second route.

4. The method as recited in claim 3, wherein starting from the additional nodes, further routes are created up to the output node, the first and second route and the further routes yielding the path.

5. The method as recited in claim 4, wherein further routes are drawn independently of each other, and when the further routes meet, then a route already drawn continues to be used.

6. The method as recited in claim 1, wherein during the training of the machine learning systems, a cost function is optimized, the cost function having a first function which assesses a performance capability of the machine learning system in terms of segmentation and object detection, and having a second function which estimates a latency period of the machine learning system based on a length of the path and the operations of the edges.

7. A non-transitory machine-readable storage medium on which is stored a computer program for creating a machine learning system that is configured for segmentation and object detection in images, the machine learning system having one input for receiving an image and two outputs, a first output of the two outputs outputting the segmentation of the image and a second output of the two outputs outputting the object detection, the computer program, when executed by a computer, causing the computer to perform the following steps: providing a directed graph, the directed graph having an input node, an output node, and a number of further nodes, the output node being connected via the further nodes using directed edges, and the nodes representing data and the edges representing operations that define a calculation rule and transfer a first node of the edges to further nodes connected to the respective edge; selecting a path through the graph, including: from the number of further nodes, a subset is determined, all of whose nodes satisfy a predetermined characteristic with respect to data resolution, from the subset, at least one additional node is selected which is used as output for the object detection, the selected path is a path through the graph from the input node along the edges via the additional node up to the output node; creating a machine learning system as a function of the selected path; and training the created machine learning system, adapted parameters of the machine learning system being stored in corresponding edges of the directed graph; repeating the selecting a path step; and creating the machine learning system as a function of the directed graph.

8. A device configured to create a machine learning system that is configured for segmentation and object detection in images, the machine learning system having one input for receiving an image and two outputs, a first output of the two outputs outputting the segmentation of the image and a second output of the two outputs outputting the object detection, the device configured to: provide a directed graph, the directed graph having an input node, an output node, and a number of further nodes, the output node being connected via the further nodes using directed edges, and the nodes representing data and the edges representing operations that define a calculation rule and transfer a first node of the edges to further nodes connected to the respective edge; select a path through the graph, including: from the number of further nodes, a subset is determined, all of whose nodes satisfy a predetermined characteristic with respect to data resolution, from the subset, at least one additional node is selected which is used as output for the object detection, the selected path is a path through the graph from the input node along the edges via the additional node up to the output node; create a machine learning system as a function of the selected path; and train the created machine learning system, adapted parameters of the machine learning system being stored in corresponding edges of the directed graph; repeat the selection of a path; and create the machine learning system as a function of the directed graph.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0027] FIG. 1 shows schematically a one-shot model having an input node and an output node, as well as two nodes of interest (NOI) within the network.

[0028] FIG. 2 shows schematically a first route drawn backwards from the first NOI to the input.

[0029] FIG. 3 shows schematically a second route drawn backwards from the second NOI to the input.

[0030] FIG. 4 shows schematically the second route drawn backwards from the second NOI to the input with discontinuation.

[0031] FIG. 5 shows schematically a forwards-directed drawing of two paths, beginning with the first to the output.

[0032] FIG. 6 shows a schematic representation of a flowchart of an example embodiment of the present invention.

[0033] FIG. 7 shows a schematic representation of an actuator-control system, in accordance with an example embodiment of the present invention.

[0034] FIG. 8 shows an exemplary embodiment for the control of an at least semi-autonomous robot, in accordance with the present invention.

[0035] FIG. 9 shows schematically an exemplary embodiment for the control of a manufacturing system, in accordance with the present invention.

[0036] FIG. 10 shows schematically an exemplary embodiment for the control of an access system, in accordance with the present invention.

[0037] FIG. 11 shows schematically an exemplary embodiment for the control of a monitoring system, in accordance with the present invention.

[0038] FIG. 12 shows schematically an exemplary embodiment for the control of a personal assistant, in accordance with the present invention.

[0039] FIG. 13 shows schematically an exemplary embodiment for the control of a medical imaging system, in accordance with the present invention.

[0040] FIG. 14 shows a possible design of a training device, in accordance with an example embodiment of the present invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0041] In order to find good architectures of deep neural networks for a given data record, automatic methods, commonly known as neural architecture search methods, may be used for the architecture search. To that end, a search space of possible architectures of neural networks is defined explicitly or implicitly.

[0042] To describe a search space, hereinafter the term operation shall be used, which describes a calculation rule that transfers one or more n-dimensional input-data tensors to one or more output-data tensors, and in this context, may have adaptable parameters. For example, convolutions with different kernel sizes and different types of convolutions (regular convolution, depth-wise separable convolution) and pooling operations are often used as operations in the processing of images.

[0043] In the following, a calculation graph (the so-called one-shot model) shall also be defined, which contains all architectures in the search space as subgraphs. Since the one-shot model may be very large, individual architectures may be drawn from the one-shot model for the training. Typically, this is done by drawing individual paths from a defined input node to a defined output node of the network.

[0044] In the simplest case, if the calculation graph is made up of a chain of nodes that are able to be connected in each case via various operations, then for each two successive nodes, it is sufficient to draw the operation which connects them.

[0045] If the one-shot model is more generally a directed graph, a path may be drawn iteratively by beginning at the input, then drawing the next node and the connecting operation, and then continuing this procedure iteratively up to the destination node.

[0046] The one-shot model with drawing may then be trained by drawing an architecture for each minibatch and adjusting the weights of the operations in the drawn architecture with the aid of a standard gradient-step method. The finding of the best architecture may be carried out either as a separate step after the training of the weights, or may be carried out alternately with the training of the weights.

[0047] In order to draw architectures from a one-shot model which have branches and several outputs, in one specific embodiment, a sampling model for paths in the reverse direction may be used. To that end, for each output of the one-shot model, a path may be drawn which, beginning from the output, leads to the input of the one-shot model. To draw the paths, the transposed one-shot model may be considered, in which all directed edges point in the direction opposite of that in the original one-shot model.

[0048] As soon as the first path has been drawn, it may happen that the next node will reach a node of the previous path. In this case, the drawing of the current path may be terminated, since a path already exists from the shared node to the input. Alternatively, it is possible to nevertheless continue to draw the path and perhaps obtain a second path to the input node.

[0049] In addition, the case shall be considered where the architectures drawn contain one or more nodes of the one-shot model which do not lie at full depth of the network and hereinafter are called NOI (nodes of interest), as well as an output at full depth of the one-shot model. In this case, the path may be created by a backwards-directed drawing for the NOIs in order to connect them to the input. Furthermore, a forwards-directed drawing is also carried out for each NOI, which leads to the output of the one-shot model. As in the case of the backwards-directed drawing, in the case of the forwards-directed drawing, the drawing may be discontinued as soon as a path is reached which already leads to the output.

[0050] As an alternative to the backwards-directed drawing, a purely forwards-directed drawing may be carried out, in that for each NOI, a path is drawn from the input to the corresponding NOI. This is achieved owing to the fact that the drawing is carried out only on the subgraph which is made up of all nodes that lie on one path from the input of the network to the current NOI, as well as all edges of the one-shot model between these nodes.

[0051] One exemplary embodiment is a multitask network for object detection and semantic segmentation. In this case, the NOIs are nodes to which an object detection head is attached. Moreover, at the output at full depth of the network, in addition an output for the semantic segmentation is used.

[0052] A specific embodiment of the present invention is described in the following:

[0053] The automatic architecture search requires first of all the creation of a search space (S21 in FIG. 6), which here is in the form of a one-shot model G. In this case, the one-shot model contains an input node (10), an output node (11) and several nodes in the middle (that is, not at full depth) of the model, which must be part of the drawn architecture and are called NOI (nodes of interest). In this context, the one-shot model must be designed so that all paths which begin at the input node lead to the output node (see FIGS. 1-5).

[0054] For each node in G, a probability distribution over the outgoing edges is defined. In addition, transposed one-shot model G.sub.t is considered, which has the same nodes, but all directed edges point in the reverse direction. A probability distribution over the outgoing edges is introduced for each node in G.sub.t, as well, (this corresponds to a probability distribution over incoming edges in G).

[0055] For the drawing directed backwards, a path is drawn in G.sub.t for the first NOI (S22 in FIG. 6), which leads from the NOI to the input of the one-shot model (see FIG. 2). This is repeated iteratively for all further NOIs (FIG. 3), in doing so, the drawing of the individual paths may be discontinued as soon as a node of a previous path to the input is reached (see FIG. 4). For the drawing directed forwards, a path is drawn in G for the first node NOI which leads from the NOI to the output of the one-shot model. This is repeated iteratively for all further NOIs, in doing so, the drawing of the individual paths may be discontinued as soon as a node of a previous path to the output is reached (see FIG. 5).

[0056] FIG. 5 shows schematically a forwards-directed drawing of two paths, beginning with the first to the output. In this case, the drawing of the path from the second NOI is again discontinued, since a node of the path of the first NOI was reached. The architecture drawn altogether therefore contains both NOIs as well as the output node of the one-shot model.

[0057] The NOIs may be different in the case of each drawing of an architecture, since the probability distributions for backwards-directed and forwards-directed drawing are defined separately for all nodes.

[0058] An artificial neural network 60 (shown in FIG. 7) may then be created from graph G and utilized as explained in the following.

[0059] FIG. 7 shows an actuator 10 in its surroundings 20 in interaction with a control system 40. At preferably regular time intervals, surroundings 20 are detected in a sensor 30, particularly an imaging sensor such as a video sensor, which may also be provided by a plurality of sensors, e.g., a stereo camera. Other imaging sensors such as radar, ultrasound or lidar are also conceivable. A thermal imaging camera is conceivable, as well. Sensor signal S—or rather, one sensor signal S each in the case of several sensors—of sensor 30 is transmitted to control system 40. Control system 40 thus receives a sequence of sensor signals S. From them, control system 40 determines control signals A, which are transmitted to actuator 10.

[0060] Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, in each case sensor signal S may also be accepted directly as input image x). For example, input image x may be a section of, or a further processing of, sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is determined as a function of sensor signal S. The sequence of input images x is supplied to a machine learning system, an artificial neural network 60 in the exemplary embodiment.

[0061] By preference, artificial neural network 60 is parameterized by parameters ϕ, which are stored in a parameter memory P that makes them available.

[0062] Artificial neural network 60 determines output quantities y from input images x. In particular, these output quantities y may include a classification and semantic segmentation of input images x. Output quantities y are fed to an optional conversion unit 80, which from them, determines control signals A that are supplied to actuator 10 in order to drive actuator 10 accordingly. Output quantity y includes information about objects which sensor 30 has detected.

[0063] Control system 40 also includes a monitoring unit 61 for monitoring the functioning of artificial neural network 60. Input image x is supplied to monitoring unit 61, as well. As a function thereof, monitoring unit 61 determines a monitoring signal d, which likewise is fed to conversion unit 80. Control signal A is determined as a function of monitoring signal d.

[0064] Monitoring signal d characterizes whether or not neural network 60 is determining output quantities y reliably. If monitoring signal d characterizes an unreliability, then, for example, control signal A may be determined according to a protected operating mode (while otherwise, it is determined in a normal operating mode). For example, the protected operating mode may include that a dynamic of actuator 10 is reduced, or that functionalities for driving actuator 10 are switched off.

[0065] Actuator 10 receives control signals A, is driven accordingly and carries out a corresponding action. In this case, actuator 10 may include a (not necessarily structurally integrated) control logic, which from control signal A, determines a second control signal with which actuator 10 is then controlled.

[0066] In further specific embodiments, control system 40 contains sensor 30. In other specific embodiments, control system 40 alternatively or additionally includes actuator 10, as well.

[0067] In further preferred specific embodiments, control system 40 includes one or more processors 45 and at least one machine-readable storage medium 46 on which instructions are stored which, when executed in processors 45, then prompt control system 40 to carry out the method according to the invention.

[0068] In alternative specific embodiments, alternatively or in addition to actuator 10, a display unit 10a is provided.

[0069] FIG. 8 shows how control system 40 may be used to control an at least semi-autonomous robot, here an at least semi-autonomous motor vehicle 100.

[0070] For example, sensor 30 may be a video sensor disposed preferably in motor vehicle 100.

[0071] Artificial neural network 60 is designed to reliably identify objects from input images x.

[0072] For example, actuator 10 disposed preferably in motor vehicle 100 may be a brake, a drive or a steering system of motor vehicle 100. Control signal A may then be ascertained in such a way that actuator or actuators 10 is/are controlled in a manner that, for example, motor vehicle 100 prevents a collision with the objects identified reliably by artificial neural network 60, especially if they are objects of certain classes, e.g., pedestrians.

[0073] Alternatively, the at least semi-autonomous robot may also be another mobile robot (not shown), for example, one which moves by flying, swimming, submerging or stepping. For instance, the mobile robot may also be an at least semi-autonomous lawn mower or an at least semi-autonomous cleaning robot. In these cases, as well, control signal A may be determined in a manner that the drive and/or steering of the mobile robot is/are controlled in such a way that, e.g., the at least semi-autonomous robot prevents a collision with objects identified by artificial neural network 60.

[0074] Alternatively or additionally, display unit 10a may be controlled by control signal A and, e.g., the ascertained safe areas are displayed. In the case of a motor vehicle 100 without automated steering, for instance, it is also possible for display unit 10a to be controlled by control signal A in such a way that it outputs a visual or acoustic warning signal if it is determined that motor vehicle 100 is in danger of colliding with one of the reliably identified objects.

[0075] FIG. 9 shows an exemplary embodiment in which control system 40 is used to control a manufacturing machine 11 of a manufacturing system 200, by driving an actuator 10 controlling this manufacturing machine 11. For example, manufacturing machine 11 may be a machine for punching, sawing, drilling and/or cutting.

[0076] As an example, sensor 30 may then be an optical sensor which, e.g., detects properties of manufacturing articles 12a 12b. It is possible that these manufacturing articles 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is driven as a function of an assignment of detected manufacturing articles 12a, 12b, so that manufacturing machine 11 executes a following processing step on the correct manufacturing article 12a, 12b, accordingly. It is also possible that by identification of the correct properties of the same one of manufacturing articles 12a, 12b (that is, without an incorrect assignment), manufacturing machine 11 adjusts the same manufacturing step accordingly for processing a following manufacturing article.

[0077] FIG. 10 shows an exemplary embodiment in which control system 40 is used to control an access system 300. Access system 300 may include a physical access control, e.g., a door 401. Video sensor 30 is set up to detect a person. This detected image is able to be interpreted with the aid of object identification system 60. If several people are detected simultaneously, by assigning the people (thus, the objects) to each other, for example, the identity of the people may be determined particularly reliably, e.g., by analyzing their movements. Actuator 10 may be a lock which does or does not release the access control, for example, does or does not open door 401, depending on control signal A. To that end, control signal A may be selected as a function of the interpretation of object-identification system 60, e.g., depending on the ascertained identity of the person. Instead of the physical access control, a logical access control may also be provided.

[0078] FIG. 11 shows an exemplary embodiment in which control system 40 is used to control a monitoring system 400. This exemplary embodiment differs from the exemplary embodiment shown in FIG. 5 [sic], because instead of actuator 10, display unit 10a is provided which is controlled by control system 40. For example, an identity of the objects picked up by video sensor 30 may be determined reliably by artificial neural network 60 in order as a function thereof, to infer, for instance, which is becoming suspicious, and control signal A may then be selected in such a way that display unit 10a shows this object highlighted in terms of color.

[0079] FIG. 12 shows an exemplary embodiment in which control system 40 is used to control a personal assistant 250. Sensor 30 is preferably an optical sensor, which receives images of a gesture of a user 249.

[0080] Depending on the signals of sensor 30, control system 40 determines a control signal A of personal assistant 250, for example, in that the neural network implements a gesture recognition and identification. This determined control signal A is then transmitted to personal assistant 250, thus controlling it accordingly. In particular, this ascertained control signal A may be selected in such a way that it corresponds to a control presumed to be desired by user 249. This presumed desired control may be ascertained as a function of the gesture recognized by artificial neural network 60. Depending on the presumed desired control, control system 40 may then select control signal A for transmission to personal assistant 250 and/or may select control signal A for transmission to the personal assistant in accordance with presumed desired control 250 [sic].

[0081] For example, this corresponding control may include that personal assistant 250 retrieve information from a database and render it in a manner apprehensible for user 249.

[0082] Instead of personal assistant 250, a household appliance (not shown) may also be provided, particularly a washing machine, a range, a baking oven, a microwave or a dishwasher, in order to be controlled accordingly.

[0083] FIG. 13 shows an exemplary embodiment in which control system 40 is used to control a medical imaging system 500, e.g., an MRT machine, x-ray machine or ultrasonic device. For instance, sensor 30 may be provided by an imaging sensor; display unit 10a is controlled by control system 40. As an example, neural network 60 may determine whether an area picked up by the imaging sensor is suspicious, and control signal A may then be selected in such a way that this area is displayed highlighted in terms of color by display unit 10a.

[0084] FIG. 14 shows an exemplary second training device 140 for a drawn machine learning system from graph G training neural network 60..sup.1 Training device 140 includes a provider 71, which provides input images x and desired output quantities ys, e.g., desired classifications. Input image x is supplied to artificial neural network 60 to be trained, which from it, determines output quantities y. Output quantities y and desired output quantities ys are fed to a comparator 75 which, depending on an agreement between respective output quantities y and desired output quantities ys, determines from them new parameters ϕ′ that are transmitted to parameter memory P and replace parameters ϕ there. .sup.1 Translator's note: The German sentence is garbled. This is the best I can make of it.

[0085] The methods carried out by training system 140 may be implemented as a computer program stored on a machine-readable storage medium 147 and executed by a processor 148.

[0086] Of course, whole images do not have to be classified. It is possible that using a detection algorithm, for example, image sections may be classified as objects, these image sections may then be cut out, and a new image section may be generated if desired and inserted into the associated image in place of the cut-out image section.

[0087] The term “computer” includes any devices for processing predefinable calculation instructions. These calculation instructions may exist in the form of software, or in the form of hardware, or in a mixed form of software and hardware.

METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM

Inventors

Cpc classification

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06F18/217

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06V20/58

PHYSICS

Classification Explorer

G06V10/87

PHYSICS

Classification Explorer

G06F18/285

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06T7/11

PHYSICS

Classification Explorer

G06V30/1916

PHYSICS

Classification Explorer

G06N5/04

PHYSICS

Classification Explorer

G06V30/18181

PHYSICS

Classification Explorer

G06V2201/06

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

G06T2207/20072

PHYSICS

International classification

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06N5/04

PHYSICS

Classification Explorer

G06T7/11

PHYSICS

Abstract

Claims

Description