METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM
20220004806 · 2022-01-06
Inventors
Cpc classification
G06F18/217
PHYSICS
G06V20/58
PHYSICS
G06V10/87
PHYSICS
G06F18/285
PHYSICS
G06V30/18181
PHYSICS
International classification
Abstract
A method for creating a machine learning system which is designed for segmentation and object detection in images. The method includes: providing a directed graph; selecting a path through the graph, at least one additional node being selected from this subset, a path through the graph from the input node along the edges via the additional node up to the output node being selected; creating a machine learning system as a function of the selected path; and training the machine learning system created.
Claims
1. A computer-implemented method for creating a machine learning system that is configured for segmentation and object detection in images, the machine learning system having one input for receiving an image and two outputs, a first output of the two outputs outputting the segmentation of the image and a second output of the two outputs outputting the object detection, the method comprising the following steps: providing a directed graph, the directed graph having an input node, an output node, and a number of further nodes, the output node being connected via the further nodes using directed edges, and the nodes representing data and the edges representing operations that define a calculation rule and transfer a first node of the edges to further nodes connected to the respective edge; selecting a path through the graph, including: from the number of further nodes, a subset is determined, all of whose nodes satisfy a predetermined characteristic with respect to data resolution, from the subset, at least one additional node is selected which is used as output for the object detection, the selected path is a path through the graph from the input node along the edges via the additional node up to the output node; creating a machine learning system as a function of the selected path; and training the created machine learning system, adapted parameters of the machine learning system being stored in corresponding edges of the directed graph; repeating the selecting a path step; and creating the machine learning system as a function of the directed graph.
2. The method as recited in claim 1, wherein at least two additional nodes are selected, the path through the graph has at least two routes, each of which running via one of the additional nodes to the output node, and the two routes from the input node to the additional nodes, beginning at the additional nodes, being created independently of each other up to the input node.
3. The method as recited in claim 2, wherein when a second route of the two routes encounters the first route of the two routes, then a remaining portion of the first route is used for the second route.
4. The method as recited in claim 3, wherein starting from the additional nodes, further routes are created up to the output node, the first and second route and the further routes yielding the path.
5. The method as recited in claim 4, wherein further routes are drawn independently of each other, and when the further routes meet, then a route already drawn continues to be used.
6. The method as recited in claim 1, wherein during the training of the machine learning systems, a cost function is optimized, the cost function having a first function which assesses a performance capability of the machine learning system in terms of segmentation and object detection, and having a second function which estimates a latency period of the machine learning system based on a length of the path and the operations of the edges.
7. A non-transitory machine-readable storage medium on which is stored a computer program for creating a machine learning system that is configured for segmentation and object detection in images, the machine learning system having one input for receiving an image and two outputs, a first output of the two outputs outputting the segmentation of the image and a second output of the two outputs outputting the object detection, the computer program, when executed by a computer, causing the computer to perform the following steps: providing a directed graph, the directed graph having an input node, an output node, and a number of further nodes, the output node being connected via the further nodes using directed edges, and the nodes representing data and the edges representing operations that define a calculation rule and transfer a first node of the edges to further nodes connected to the respective edge; selecting a path through the graph, including: from the number of further nodes, a subset is determined, all of whose nodes satisfy a predetermined characteristic with respect to data resolution, from the subset, at least one additional node is selected which is used as output for the object detection, the selected path is a path through the graph from the input node along the edges via the additional node up to the output node; creating a machine learning system as a function of the selected path; and training the created machine learning system, adapted parameters of the machine learning system being stored in corresponding edges of the directed graph; repeating the selecting a path step; and creating the machine learning system as a function of the directed graph.
8. A device configured to create a machine learning system that is configured for segmentation and object detection in images, the machine learning system having one input for receiving an image and two outputs, a first output of the two outputs outputting the segmentation of the image and a second output of the two outputs outputting the object detection, the device configured to: provide a directed graph, the directed graph having an input node, an output node, and a number of further nodes, the output node being connected via the further nodes using directed edges, and the nodes representing data and the edges representing operations that define a calculation rule and transfer a first node of the edges to further nodes connected to the respective edge; select a path through the graph, including: from the number of further nodes, a subset is determined, all of whose nodes satisfy a predetermined characteristic with respect to data resolution, from the subset, at least one additional node is selected which is used as output for the object detection, the selected path is a path through the graph from the input node along the edges via the additional node up to the output node; create a machine learning system as a function of the selected path; and train the created machine learning system, adapted parameters of the machine learning system being stored in corresponding edges of the directed graph; repeat the selection of a path; and create the machine learning system as a function of the directed graph.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0041] In order to find good architectures of deep neural networks for a given data record, automatic methods, commonly known as neural architecture search methods, may be used for the architecture search. To that end, a search space of possible architectures of neural networks is defined explicitly or implicitly.
[0042] To describe a search space, hereinafter the term operation shall be used, which describes a calculation rule that transfers one or more n-dimensional input-data tensors to one or more output-data tensors, and in this context, may have adaptable parameters. For example, convolutions with different kernel sizes and different types of convolutions (regular convolution, depth-wise separable convolution) and pooling operations are often used as operations in the processing of images.
[0043] In the following, a calculation graph (the so-called one-shot model) shall also be defined, which contains all architectures in the search space as subgraphs. Since the one-shot model may be very large, individual architectures may be drawn from the one-shot model for the training. Typically, this is done by drawing individual paths from a defined input node to a defined output node of the network.
[0044] In the simplest case, if the calculation graph is made up of a chain of nodes that are able to be connected in each case via various operations, then for each two successive nodes, it is sufficient to draw the operation which connects them.
[0045] If the one-shot model is more generally a directed graph, a path may be drawn iteratively by beginning at the input, then drawing the next node and the connecting operation, and then continuing this procedure iteratively up to the destination node.
[0046] The one-shot model with drawing may then be trained by drawing an architecture for each minibatch and adjusting the weights of the operations in the drawn architecture with the aid of a standard gradient-step method. The finding of the best architecture may be carried out either as a separate step after the training of the weights, or may be carried out alternately with the training of the weights.
[0047] In order to draw architectures from a one-shot model which have branches and several outputs, in one specific embodiment, a sampling model for paths in the reverse direction may be used. To that end, for each output of the one-shot model, a path may be drawn which, beginning from the output, leads to the input of the one-shot model. To draw the paths, the transposed one-shot model may be considered, in which all directed edges point in the direction opposite of that in the original one-shot model.
[0048] As soon as the first path has been drawn, it may happen that the next node will reach a node of the previous path. In this case, the drawing of the current path may be terminated, since a path already exists from the shared node to the input. Alternatively, it is possible to nevertheless continue to draw the path and perhaps obtain a second path to the input node.
[0049] In addition, the case shall be considered where the architectures drawn contain one or more nodes of the one-shot model which do not lie at full depth of the network and hereinafter are called NOI (nodes of interest), as well as an output at full depth of the one-shot model. In this case, the path may be created by a backwards-directed drawing for the NOIs in order to connect them to the input. Furthermore, a forwards-directed drawing is also carried out for each NOI, which leads to the output of the one-shot model. As in the case of the backwards-directed drawing, in the case of the forwards-directed drawing, the drawing may be discontinued as soon as a path is reached which already leads to the output.
[0050] As an alternative to the backwards-directed drawing, a purely forwards-directed drawing may be carried out, in that for each NOI, a path is drawn from the input to the corresponding NOI. This is achieved owing to the fact that the drawing is carried out only on the subgraph which is made up of all nodes that lie on one path from the input of the network to the current NOI, as well as all edges of the one-shot model between these nodes.
[0051] One exemplary embodiment is a multitask network for object detection and semantic segmentation. In this case, the NOIs are nodes to which an object detection head is attached. Moreover, at the output at full depth of the network, in addition an output for the semantic segmentation is used.
[0052] A specific embodiment of the present invention is described in the following:
[0053] The automatic architecture search requires first of all the creation of a search space (S21 in
[0054] For each node in G, a probability distribution over the outgoing edges is defined. In addition, transposed one-shot model G.sub.t is considered, which has the same nodes, but all directed edges point in the reverse direction. A probability distribution over the outgoing edges is introduced for each node in G.sub.t, as well, (this corresponds to a probability distribution over incoming edges in G).
[0055] For the drawing directed backwards, a path is drawn in G.sub.t for the first NOI (S22 in
[0056]
[0057] The NOIs may be different in the case of each drawing of an architecture, since the probability distributions for backwards-directed and forwards-directed drawing are defined separately for all nodes.
[0058] An artificial neural network 60 (shown in
[0059]
[0060] Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, in each case sensor signal S may also be accepted directly as input image x). For example, input image x may be a section of, or a further processing of, sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is determined as a function of sensor signal S. The sequence of input images x is supplied to a machine learning system, an artificial neural network 60 in the exemplary embodiment.
[0061] By preference, artificial neural network 60 is parameterized by parameters ϕ, which are stored in a parameter memory P that makes them available.
[0062] Artificial neural network 60 determines output quantities y from input images x. In particular, these output quantities y may include a classification and semantic segmentation of input images x. Output quantities y are fed to an optional conversion unit 80, which from them, determines control signals A that are supplied to actuator 10 in order to drive actuator 10 accordingly. Output quantity y includes information about objects which sensor 30 has detected.
[0063] Control system 40 also includes a monitoring unit 61 for monitoring the functioning of artificial neural network 60. Input image x is supplied to monitoring unit 61, as well. As a function thereof, monitoring unit 61 determines a monitoring signal d, which likewise is fed to conversion unit 80. Control signal A is determined as a function of monitoring signal d.
[0064] Monitoring signal d characterizes whether or not neural network 60 is determining output quantities y reliably. If monitoring signal d characterizes an unreliability, then, for example, control signal A may be determined according to a protected operating mode (while otherwise, it is determined in a normal operating mode). For example, the protected operating mode may include that a dynamic of actuator 10 is reduced, or that functionalities for driving actuator 10 are switched off.
[0065] Actuator 10 receives control signals A, is driven accordingly and carries out a corresponding action. In this case, actuator 10 may include a (not necessarily structurally integrated) control logic, which from control signal A, determines a second control signal with which actuator 10 is then controlled.
[0066] In further specific embodiments, control system 40 contains sensor 30. In other specific embodiments, control system 40 alternatively or additionally includes actuator 10, as well.
[0067] In further preferred specific embodiments, control system 40 includes one or more processors 45 and at least one machine-readable storage medium 46 on which instructions are stored which, when executed in processors 45, then prompt control system 40 to carry out the method according to the invention.
[0068] In alternative specific embodiments, alternatively or in addition to actuator 10, a display unit 10a is provided.
[0069]
[0070] For example, sensor 30 may be a video sensor disposed preferably in motor vehicle 100.
[0071] Artificial neural network 60 is designed to reliably identify objects from input images x.
[0072] For example, actuator 10 disposed preferably in motor vehicle 100 may be a brake, a drive or a steering system of motor vehicle 100. Control signal A may then be ascertained in such a way that actuator or actuators 10 is/are controlled in a manner that, for example, motor vehicle 100 prevents a collision with the objects identified reliably by artificial neural network 60, especially if they are objects of certain classes, e.g., pedestrians.
[0073] Alternatively, the at least semi-autonomous robot may also be another mobile robot (not shown), for example, one which moves by flying, swimming, submerging or stepping. For instance, the mobile robot may also be an at least semi-autonomous lawn mower or an at least semi-autonomous cleaning robot. In these cases, as well, control signal A may be determined in a manner that the drive and/or steering of the mobile robot is/are controlled in such a way that, e.g., the at least semi-autonomous robot prevents a collision with objects identified by artificial neural network 60.
[0074] Alternatively or additionally, display unit 10a may be controlled by control signal A and, e.g., the ascertained safe areas are displayed. In the case of a motor vehicle 100 without automated steering, for instance, it is also possible for display unit 10a to be controlled by control signal A in such a way that it outputs a visual or acoustic warning signal if it is determined that motor vehicle 100 is in danger of colliding with one of the reliably identified objects.
[0075]
[0076] As an example, sensor 30 may then be an optical sensor which, e.g., detects properties of manufacturing articles 12a 12b. It is possible that these manufacturing articles 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is driven as a function of an assignment of detected manufacturing articles 12a, 12b, so that manufacturing machine 11 executes a following processing step on the correct manufacturing article 12a, 12b, accordingly. It is also possible that by identification of the correct properties of the same one of manufacturing articles 12a, 12b (that is, without an incorrect assignment), manufacturing machine 11 adjusts the same manufacturing step accordingly for processing a following manufacturing article.
[0077]
[0078]
[0079]
[0080] Depending on the signals of sensor 30, control system 40 determines a control signal A of personal assistant 250, for example, in that the neural network implements a gesture recognition and identification. This determined control signal A is then transmitted to personal assistant 250, thus controlling it accordingly. In particular, this ascertained control signal A may be selected in such a way that it corresponds to a control presumed to be desired by user 249. This presumed desired control may be ascertained as a function of the gesture recognized by artificial neural network 60. Depending on the presumed desired control, control system 40 may then select control signal A for transmission to personal assistant 250 and/or may select control signal A for transmission to the personal assistant in accordance with presumed desired control 250 [sic].
[0081] For example, this corresponding control may include that personal assistant 250 retrieve information from a database and render it in a manner apprehensible for user 249.
[0082] Instead of personal assistant 250, a household appliance (not shown) may also be provided, particularly a washing machine, a range, a baking oven, a microwave or a dishwasher, in order to be controlled accordingly.
[0083]
[0084]
[0085] The methods carried out by training system 140 may be implemented as a computer program stored on a machine-readable storage medium 147 and executed by a processor 148.
[0086] Of course, whole images do not have to be classified. It is possible that using a detection algorithm, for example, image sections may be classified as objects, these image sections may then be cut out, and a new image section may be generated if desired and inserted into the associated image in place of the cut-out image section.
[0087] The term “computer” includes any devices for processing predefinable calculation instructions. These calculation instructions may exist in the form of software, or in the form of hardware, or in a mixed form of software and hardware.