METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM

Abstract

A method for creating a machine learning system. The method includes: providing a directed graph including an input node and an output node, a probability being in each case assigned to each edge which characterizes the probability with which an edge is drawn. The probabilities are manipulated as a function of a characteristic degree of an exploration of the architectures of the directed graph prior to a random drawing of the architectures.

Claims

1. A computer-implemented method for creating a machine learning system, comprising the following steps: providing a directed graph including one or multiple input and output nodes, which are connected via a multitude of edges and nodes, a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn; randomly drawing a multitude of subgraphs by the directed graph as a function of the respective variables, the respective variables being changed in the graph as a function of a distribution of values of the respective variables; training a machine learning systems corresponding to the drawn subgraph, wherein during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized; and drawing a subgraph, as a function of the adapted respective variables, and creating the machine learning system corresponding to this subgraph.

2. The method as recited in claim 1, wherein, when a measure of the distribution of the values of the respective variables relative to a predefined target measure of a target distribution is greater, the respective variables are changed in such a way that edges having an essentially equal probability are drawn.

3. The method as recited in claim 1, wherein the change of the respective variables takes place as a function of an entropy of the directed graph, and a number of training steps which have already been carried out.

4. The method as recited in claim 3, wherein, when the entropy is greater than a predefined target entropy, a parameter by which the respective variables are changed is changed in such a way that it changes values of the respective variables, so that the probability distribution characterizing the respective variables has a lesser similarity to a uniform distribution, and when the ascertained entropy is smaller than the predefined target entropy, the parameter is changed in such a way that it changes values of the respective variables, so that the probability distribution characterizing the respective variables characterizes a uniform distribution.

5. The method as recited in claim 1, wherein the change of the respective variables takes place as a function of an exploration probability, the exploration probability characterizing a probability with which the edges are drawn either as a function of the respective variables assigned to the edges or with an essentially identical probability.

6. The method as recited in claim 1, wherein the change of the respective variables takes place using a temperature scaling.

7. The method as recited in claim 6, wherein, during the temperature scaling, the respective variables are scaled as a function of a temperature which is changed as a function of the distribution of the values of the respective variables.

8. A non-transitory machine-readable memory element on which is stored a computer program for creating a machine learning system, the computer program, when executed by a computer, causing the computer to perform the following steps: providing a directed graph including one or multiple input and output nodes, which are connected via a multitude of edges and nodes, a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn; randomly drawing a multitude of subgraphs by the directed graph as a function of the respective variables, the respective variables being changed in the graph as a function of a distribution of values of the respective variables; training a machine learning systems corresponding to the drawn subgraph, wherein during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized; and drawing a subgraph, as a function of the adapted respective variables, and creating the machine learning system corresponding to this subgraph.

9. A device configured to create a machine learning system, the device being configured to: provide a directed graph including one or multiple input and output nodes, which are connected via a multitude of edges and nodes, a respective variable being assigned to each respective edge of the edges, which characterizes a probability with which the respective edge is drawn; randomly draw a multitude of subgraphs by the directed graph as a function of the respective variables, the respective variables being changed in the graph as a function of a distribution of values of the respective variables; train a machine learning systems corresponding to the drawn subgraph, wherein during the training, parameters of the machine learning system and the respective variables are adapted so that a cost function is optimized; and draw a subgraph, as a function of the adapted respective variables, and create the machine learning system corresponding to this subgraph.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0027] FIG. 1 shows a schematic representation of a flowchart of one specific example embodiment of the present invention.

[0028] FIG. 2 shows a schematic representation of an actuator control system, in accordance with an example embodiment of the present invention.

[0029] FIG. 3 shows one exemplary embodiment for controlling an at least semi-autonomous robot, in accordance with the present invention.

[0030] FIG. 4 schematically shows one exemplary embodiment for controlling a manufacturing system, in accordance with the present invention.

[0031] FIG. 5 schematically shows one exemplary embodiment for controlling an access system, in accordance with the present invention.

[0032] FIG. 6 schematically shows one exemplary embodiment for controlling a monitoring system, in accordance with the present invention.

[0033] FIG. 7 schematically shows one exemplary embodiment for controlling a personal assistant, in accordance with the present invention.

[0034] FIG. 8 schematically shows one exemplary embodiment for controlling a medical imaging system, in accordance with the present invention.

[0035] FIG. 9 shows a possible design of a training device, in accordance with an example embodiment of the present invention.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0036] To find good architectures of deep neural networks for a predefined data set, automatic methods for the architecture search may be employed, so-called neural architecture search (NAS) methods. For this purpose, a search space of possible architectures of neural networks is explicitly or implicitly defined.

[0037] Hereafter, a calculation graph (the so-called one-shot model) is to be defined for describing a search space, which includes a plurality of possible architectures in the search space as subgraphs. Since the one-shot model may be very large, individual architectures from the one-shot model may be drawn for the training. This takes place, e.g., in that individual paths are drawn from an established input node to an established output node of the network.

[0038] In the simplest case, when the calculation graph is made up of a simple chain of nodes, which may each be connected via different operations, it suffices to draw the operation for two consecutive nodes which connects them.

[0039] If the one-shot model, more generally speaking, is an arbitrarily directed graph, e.g., a path may be iteratively drawn, in which the start occurs at the input (input node), and then the next node and the connecting edge are drawn, this procedure being iteratively continued to the target node.

[0040] The path thus obtained by the drawing, which may correspond to a subgraph of the directed graph, may then be trained in that an architecture is drawn for each mini batch of training data, and the weights of the operations in the drawn architecture are adapted with the aid of a standard gradient step method. The locating of the best architecture may either take place as a separate step after the training of the weights, or be carried out alternately with the training of the weights.

[0041] To draw architectures from a one-shot model, a multinomial distribution across the different discrete selection possibilities may be present during the drawing of a path/subgraph, i.e., an architecture of a machine learning system, which may in each case be parametrized by a real value vector named logits, which may be normalized by applying a softmax function to a probability vector. For a discrete selection, a logits vector α=(α.sub.1, . . . , α.sub.N) is defined, α.sub.i∈R being a real value, and N corresponding to the number of possible decisions. For NAS, the decisions are, for example, decisions which are to be drawn of the edges or [sic] next for the path.

[0042] For drawing, the logits vector is normalized using the softmax function σ, the i-th component being calculated as: σ.sub.i(α)=e.sup.α.sup.i/Σ.sub.ke.sup.α.sup.k, probability vector p=σ(α) thus being given.

[0043] This probability vector p is used to draw a decision from a multinomial distribution. A decision could be, for example, to select between the outgoing edges for a node in the graph. The drawing of a complete path may necessitate multiple of these decisions.

[0044] An optimization of the logits during the NAS process may cause a premature fixation to a smaller search space, better architectures outside this search space possibly not being further explored.

[0045] In a first specific embodiment for overcoming the premature fixation of NAS, a so-called epsilon-greedy exploration is provided. This means that a decision is not drawn according to the corresponding logit, but from a uniform distribution, with a probability of ϵ∈[0,1]. In this way, the decision may be selected from all options with the same probability, e.g., in multiple locations in the network, and not based on the probability values which are derived from the corresponding logits vector. Probability ϵ is hereafter referred to as exploration probability.

[0046] In a second specific embodiment, a temperature-dependent scaling of the logits is provided. For this purpose, a positive-real parameter T is introduced, which is referred to hereafter as the (exploration) temperature. The logits are then scaled as a function of this temperature, before they are normalized by the softmax function, i.e., the normalization takes on form:

[00001] $p = σ (\frac{α}{T}) .$

[0047] In the case of large values of T, all components of the logit vector will be close to zero, and the distribution will thus be essentially uniformly distributed. For T=1, the logit values are unchanged, and the drawing, as a function of the logits, takes place from the distribution defined by the logit vector. For T.fwdarw.0, the random sample approaches the calculation of argmax of the logit vector.

[0048] During the architecture search, the exploration probability or the exploration temperature is cooled, i.e., the architecture search is slowly shifted from a broad exploration of the search space at the start of the architecture search to a focused search of promising architectures.

[0049] A simple drop of the exploration probability or of the exploration temperature is directly implementable, however makes it necessary to establish a starting value of the exploration probability/temperature as well as a time schedule which establishes how pronounced the drop is to be. However, it is usually not clear how, e.g., an initial starting value is to be selected, and how quickly it is to cool off, since these values are usually application-specific.

[0050] It is therefore provided to introduce an auxiliary measure to approximate how concentrated the architecture distribution by the logits in the search space is distributed. Based on this auxiliary measure, the initial starting value of the exploration probability and the temperature may then be estimated. Furthermore, this auxiliary measure makes it possible to control how pronounced the drop is to be. It has been found that an entropy-based auxiliary measure leads to the best results. Preferably, the entropy of the search space is used for this purpose.

[0051] A target corridor or a target value of the exploration probability or of the temperature is indirectly planned in that a target corridor or target value of the entropy is determined and then, with the aid of this target entropy (S.sub.target), the exploration probability or temperature is accordingly regulated.

[0052] However, it may be complex to precisely ascertain the entropy of a large search space, which is why an estimation of the entropy by random samples may be carried out. Furthermore, it is typically also not possible to directly calculate the required exploration probability or exploration temperature in order to achieve a predefined entropy.

[0053] For this reason, the following procedure is provided for setting the exploration probability or temperature until a desired entropy is reached:

[0054] It shall apply that S.sub.target is the target entropy which the search space is to have, d∈[0,1] is a decay factor, and λ∈[0,1) is a smoothing factor, s.sub.max∈N is a maximum number of steps, and K is a small constant (e.g., κ=10.sup.−5) and stepcount=0. For example, it is possible to initially select T=1, and an averaged entropy of search space S.sub.avg is estimated, e.g., based on a low number of random samples (e.g., 25 random samples). Initially larger values for T are also possible.

[0055] The following steps are then carried out iteratively so that, based on the entropy of the search space, a relaxation of the logits is determined: [0056] 1. Determining a new estimation of entropy S.sub.new of the search space based on a predefined number of random samples (possibly, only by a simple random sample); [0057] 2. Updating the overall estimation of the entropy as a function of the new estimation of entropy S.sub.new, e.g., via a moving average: S.sub.avg←exp(λ log(S.sub.avg)+(1−λ)log (S.sub.new+κ))); [0058] 3. Adapting a drop constant γ=1+d.sup.stepcount; [0059] 4. If S.sub.target>S.sub.avg, then the temperature is adapted as a function of drop constant γ: T←T*γ, otherwise: T←T/γ; [0060] 5. Increasing counter stepcount by the value of one. If stepcount ≥s.sub.max, the method is terminated; otherwise, the next iteration is started with step 1.

[0061] It shall be noted that other moving averages may also be used in step 2, e.g., such as an exponentially moving average or a simple moving average. It shall furthermore be noted that other adaptive control loops may also be used in step 4 to adapt the temperature based on the instantaneous entropy estimation. It shall furthermore be noted that more complex methods for determining the exploration probability/temperature which results in a desired entropy may also be used. One example of this is a noisy-binary search algorithm (https://en.wikipedia.org/wiki/Binary_search_algorithm*Noisy_binary_search or https://www.cs.cornell.edu/˜rdk/papers/karpr2.pdf).

[0062] The just described steps 1 through 4 may also be used directly to accordingly adapt exploration probability ϵ. This is done namely in that temperature T is simply replaced by exploration probability ϵ in the above algorithm, and optionally an additional step is introduced, which ensures that ϵ∈[0,1] applies. Preferably, exploration probability ϵ is initially set to a large value, such as 0.9 or 1. In the event that the graph is initialized in such a way that the subgraphs at the beginning are drawn with the same probability, exploration probability ϵ may initially be set to value 0.

[0063] The time planning of the exploration probability or temperature then functions as follows. The initial entropy of the architecture distribution is estimated prior to the NAS run, based on, e.g., 1000 random samples, and a decay schedule is selected (e.g., exponential decay). Every time that the planner (scheduler) is retrieved, the new target entropy S.sub.target is calculated based on the initial entropy, and the scheduler then determines the required exploration probability or temperature, as described above.

[0064] FIG. 1 schematically shows a flowchart (20) of an improved method for the architecture search using a one-shot model.

[0065] The automatic architecture search may be carried out as follows. The automatic architecture search first requires a provision of a search space (S21), which may exist here in the form of a one-shot model, logits (α) being assigned to the edges.

[0066] In the subsequent step S22, an initial entropy is estimated prior to the application of a NAS method, based on, e.g., 1000 random samples of randomly drawn architectures from the one-shot model, and a decay schedule is selected for the scheduler (e.g., an exponential decay). The decay schedule thereupon ascertains a first target entropy S.sub.target as a function of the initial entropy.

[0067] After step S22 has ended, step S23 follows. In this step, the temperature or exploration probability ϵ is adapted according to above-described steps 1 through 5.

[0068] In the subsequent step S24, an NAS run is carried out using the ascertained parameterization from step S23, i.e., a drawing of subgraphs, using the relaxation of probability distribution p as a function of the ascertained parameter T or ϵ, as well as the training of the machine learning systems corresponding to the subgraphs, etc. It shall be noted that an optimization of the parameters and probabilities during the training may not only take place with respect to the accuracy, but also for specific hardware (e.g., hardware accelerator). This takes place, for example, in that, during the training, the cost function contains a further term, which characterizes the costs for executing the machine learning system with its configuration on the hardware.

[0069] After step S24 has ended, step S23, followed by step S24, may be consecutively repeated multiple times. During the repetitions of steps S23 and S24, the scheduler may be retrieved in advance to determine a new target entropy S.sub.target based on the initial entropy and the decay schedule. Then, as described above, S23 is used to adapt T, ϵ, and S24 is then carried out again.

[0070] The repetition of steps S23 and S24 may, e.g., be aborted when counter stepcount has reached the value of the maximum steps. This means that counter stepcount is used within S23. During each repetition of S23, counter stepcount is initially set back to 0.

[0071] Thereafter, in step S25, a final subgraph may be drawn based on the graph, and a corresponding machine learning system may be initialized according to this subgraph.

[0072] The created machine learning system after step S25 is preferably an artificial neural network 60 (shown in FIG. 2) and is used as described hereafter.

[0073] FIG. 2 shows an actuator 10 in its surroundings 20 in interaction with a control system 40. At preferably regular intervals, surroundings 20 are detected in a sensor 30, in particular an imaging sensor, such as a video sensor, which may be provided by a multitude of sensors, for example a stereo camera. Other imaging sensors are also possible, such as for example radar, ultrasound or LIDAR. An infrared camera is also possible. Sensor signal S, or in the case of multiple sensors a respective sensor signal S, of sensor 30 is transmitted to control system 40. Control system 40 thus receives a sequence of sensor signals S. Control system 40 ascertains activation signals A therefrom, which are transferred to actuator 10.

[0074] Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, it is also possible to directly adopt the respective sensor signal S as input image x). Input image x may, for example, be a portion or a further processing of sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is ascertained as a function of sensor signal S. The sequence of input images x is supplied to a machine learning system, an artificial neural network 60 in the exemplary embodiment.

[0075] Artificial neural network 60 is preferably parameterized by parameters □ [sic] which are stored in a parameter memory P and provided thereby.

[0076] Artificial neural network 60 ascertains output variables y from the input images x. These output variables y may, in particular, encompass a classification and a semantic segmentation of input images x. Output variables y are supplied to an optional conversion unit 80, which ascertains activation signals A therefrom, which are supplied to actuator 10 to accordingly activate actuator 10. Output variable y encompasses pieces of information about objects which sensor 30 has detected.

[0077] Actuator 10 receives activation signals A, is accordingly activated, and carries out a corresponding action. Actuator 10 may include a (not necessarily structurally integrated) activation logic, which ascertains a second activation signal, with which actuator 10 is then activated, from activation signal A.

[0078] In further specific embodiments, control system 40 includes sensor 30. In still further specific embodiments, control system 40 alternatively or additionally also includes actuator 10.

[0079] In further preferred specific embodiments, control system 40 includes one or multiple processor(s) 45 and at least one machine-readable memory medium 46 on which instructions are stored which, when they are executed on processors 45, prompt control system 40 to execute the method according to the present invention.

[0080] In alternative specific embodiments, a display unit 10a is provided as an alternative or in addition to actuator 10.

[0081] FIG. 3 shows how control system 40 may be used for controlling an at least semi-autonomous robot, here an at least semi-autonomous motor vehicle 100.

[0082] Sensor 30 may, for example, be a video sensor preferably situated in motor vehicle 100.

[0083] Artificial neural network 60 is configured to reliably identify objects from input images x.

[0084] Actuator 10 preferably situated in motor vehicle 100 may, for example, be a brake, a drive or a steering system of motor vehicle 100. Activation signal A may then be ascertained in such a way that actuator or actuators 10 is/are activated in such a way that motor vehicle 100, for example, prevents a collision with the objects reliably identified by artificial neural network 60, in particular, when objects of certain classes, e.g., pedestrians, are involved.

[0085] As an alternative, the at least semi-autonomous robot may also be another mobile robot (not shown), for example one which moves by flying, swimming, diving or walking. The mobile robot may, for example, also be an at least semi-autonomous lawn mower or an at least semi-autonomous cleaning robot. Activation signal A may also be ascertained in these cases in such a way that drive and/or steering system of the mobile robot is/are activated in such a way that the at least semi-autonomous robot, for example, prevents a collision with the objects identified by artificial neural network 60.

[0086] As an alternative or in addition, display unit 10a may be activated using activation signal A, and, for example, the ascertained safe areas may be represented. It is also possible in the case of a motor vehicle 100 including non-automated steering, for example, that display unit 10a is activated, using activation signal A, in such a way that it outputs a visual or an acoustic warning signal when it is ascertained that motor vehicle 100 is at risk of colliding with one of the reliably identified objects.

[0087] FIG. 4 shows one exemplary embodiment in which control system 40 is used for activating a manufacturing machine 11 of a manufacturing system 200, in that an actuator 10 controlling this manufacturing machine 11 is activated. Manufacturing machine 11 may, for example, be a machine for punching, sawing, drilling and/or cutting.

[0088] Sensor 30 may be an optical sensor, for example, which, e.g., detects properties of manufacturing products 12a, 12b. It is possible that these manufacturing products 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is activated as a function of an assignment of the detected manufacturing products 12a, 12b, so that manufacturing machine 11 accordingly executes a subsequent processing step of the correct one of manufacturing products 12a, 12b. It is also possible that manufacturing machine 11 accordingly adapts the same manufacturing step for a processing of a subsequent manufacturing product by identifying the correct properties of the same of manufacturing products 12a, 12b (i.e., without a misclassification).

[0089] FIG. 5 shows one exemplary embodiment in which control system 40 is used for controlling an access system 300. Access system 300 may encompass a physical access control, for example a door 401. Video sensor 30 is configured to detect a person. This detected image may be interpreted with the aid of object identification system 60. If multiple persons are detected simultaneously, it is possible, for example, to ascertain the identity of the persons particularly reliably by an assignment of the persons (i.e., of the objects) with respect to one another, for example by an analysis of their movements. Actuator 10 may be a lock which unblocks, or does not unblock, the access control as a function of activation signal A, for example opens, or does not open, door 401. For this purpose, activation signal A may be selected as a function of the interpretation of object identification system 60, for example as a function of the ascertained identity of the person. Instead of the physical access control, a logical access control may also be provided.

[0090] FIG. 6 shows one exemplary embodiment in which control system 40 is used for controlling a monitoring system 400. This exemplary embodiment differs from the exemplary embodiment shown in FIG. 5 in that, instead of actuator 10, display unit 10a is provided, which is activated by control system 40. For example, an identity of the objects recorded by video sensor 30 may be reliably ascertained by artificial neural network 60 in order to infer, e.g., which objects become suspect, and activation signal A may then be selected in such a way that this object is represented highlighted in color by display unit 10a.

[0091] FIG. 7 shows one exemplary embodiment in which control system 40 is used for controlling a personal assistant 250. Sensor 30 is preferably an optical sensor which receives images of a gesture of a user 249.

[0092] As a function of the signals of sensor 30, control system 40 ascertains an activation signal A of personal assistant 250, for example in that the neural network carries out a gesture recognition. This ascertained activation signal A is then communicated to personal assistant 250, and it is thus accordingly activated. This ascertained activation signal A may then, in particular, be selected in such a way that it corresponds to a presumed desired activation by user 249. This presumed desired activation may be ascertained as a function of the gesture recognized by artificial neural network 60. Control system 40 may then, as a function of the presumed desired activation, select activation signal A for the communication to personal assistant 250 and/or select activation A for the communication to the personal assistant corresponding to the presumed desired activation 250.

[0093] This corresponding activation may, for example, include that personal assistant 250 retrieves pieces of information from a database, and reproduces them acceptable for user 249.

[0094] Instead of personal assistant 250, a household appliance (not shown), in particular, a washing machine, a stove, an oven, a microwave or a dishwasher may also be provided to be accordingly activated.

[0095] FIG. 8 shows one exemplary embodiment in which control system 40 is used for controlling a medical imaging system 500, for example an MRI, X-ray or ultrasound device. Sensor 30 may, for example, be an imaging sensor, and display unit 10a is activated by control system 40. For example, it may be ascertained by neural network 60 whether an area recorded by the imaging sensor is conspicuous, and activation signal A may then be selected in such a way that this area is represented highlighted in color by display unit 10a.

[0096] FIG. 9 shows an exemplary training device 140 for training one of the drawn machine learning systems, in particular of neural network 60, from the graph. Training device 140 includes a provider 71, which provides input variables x, such as, e.g., input images, and setpoint output variables ys, for example setpoint classifications. Input variable x is supplied to artificial neural network 60 to be trained, which ascertains output variables y therefrom. Output variables y and setpoint output variables ys are supplied to a comparator 75, which ascertains new parameters ϕ′ therefrom, as a function of an agreement of the respective output variables y and setpoint output variables ys, which are transferred to parameter memory P, and replace parameters ϕ there.

[0097] The methods executed by training system 140 may be stored on a machine-readable memory medium 147, implemented as a computer program, and executed by a processor 148.

[0098] Of course, it is not necessary to classify entire images. It is possible that, e.g., image sections are classified as objects using a detection algorithm, that these image sections are then cut out, possibly a new image section is generated, and inserted into the associated image in place of the cut-out image section.

[0099] The term “computer” encompasses arbitrary devices for processing predefinable computing rules. These computing rules may be present in the form of software, or in the form of hardware, or also in a mixed form made up of software and hardware.

METHOD AND DEVICE FOR CREATING A MACHINE LEARNING SYSTEM

Inventors

Cpc classification

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06F18/29

PHYSICS

Classification Explorer

G06N3/0985

PHYSICS

International classification

Classification Explorer

G06N20/00

PHYSICS

Abstract

Claims

Description