MORE ROBUST TRAINING FOR ARTIFICIAL NEURAL NETWORKS

20220327387 · 2022-10-13

    Inventors

    Cpc classification

    International classification

    Abstract

    A method for training an artificial neural network, ANN, which comprises a multiplicity of processing units. Parameters that characterize the behavior of the ANN are optimized according to a cost function. Depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit is deactivated. Selection of the selected processing unit is achieved with the aid of a sequence of quasi-random numbers.

    Claims

    1. A method for training an artificial neural network (ANN), which includes a multiplicity of processing units, the method comprising: optimizing parameters that characterize a behavior of the ANN a according to a cost function; and deactivating, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers.

    2. The method as recited in claim 1, wherein the sequence of quasi-random numbers is initialized using a random value.

    3. The method as recited in claim 2, wherein the initialization of the sequence of random numbers is changed after each training pass has been carried out.

    4. The method as recited in claim 3, wherein the change in the initialization is performed by a specifiable increment.

    5. The method as recited in claim 1, wherein a specifiable proportion of the processing units of the ANN is selected and deactivated.

    6. The method as recited in claim 1, wherein the sequence of quasi-random numbers is one of the following sequences: Halton sequence, Hammersley sequence, Niederreiter sequence, Kronecker sequence, Sobol sequence, Van der Corput sequence.

    7. The method as recited in claim 1, wherein the ANN is configured as a classifier.

    8. The method as recited in claim 7, wherein the ANN is configured as a classifier of image data and/or audio data.

    9. A non-transitory machine-readable storage medium on which is stored a computer program for training an artificial neural network (ANN), which includes a multiplicity of processing units, the computer program, when executed by a computer, causing the computer to perform the following steps: optimizing parameters that characterize a behavior of the ANN a according to a cost function; and deactivating, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers.

    10. A training device configure to train an artificial neural network (ANN), which includes a multiplicity of processing units, the training device configured to: optimize parameters that characterize a behavior of the ANN a according to a cost function; and deactivate, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0063] FIG. 1 shows an exemplary ANN.

    [0064] FIG. 2 shows an exemplary device for training the ANN, in accordance with an example embodiment of the present invention.

    [0065] FIG. 3 shows an exemplary embodiment of a method for training the ANN, in accordance with the present invention.

    DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

    [0066] FIG. 1 shows an ANN (1), which comprises layers (2, 3, 4), and is configured to determine from an input quantity value (x) an associated output (y). The input quantity value (x) may be in the form for example of image data, and the output (y) may be for example a semantic segmentation of these image data.

    [0067] In this context, a selected layer (2) comprises a plurality of neurons (F.sub.1,F.sub.2,F.sub.3,F.sub.4), of which the output values (z.sub.1,z.sub.2,z.sub.3,z.sub.4) are forwarded as a typically multidimensional intermediate quantity (z) to a succeeding layer (3).

    [0068] The neurons may conventionally be arranged in multidimensional form, for example as a two-dimensional tensor of size M×N. It is possible to index the neurons in one layer by a one-dimensional count of the neurons.

    [0069] FIG. 2 shows a training device (140) for training the ANN (1). The parameters (Φ) of the ANN (1) are stored in a first memory (St.sub.1). A second memory (St.sub.2) provides training data (T). The training data (T) comprise pairs of learning input quantity values (x.sub.i) and respectively associated learning output quantity values (y.sub.i). During training, a unit (150) supplies input quantities (x.sub.i) to the ANN (1), which determines an associated output (ŷ.sub.i) from these. This output (ŷ.sub.i) and the learning output quantity value (y.sub.i) are supplied to a comparator (180), which determines a value of a cost function from these—for example, for a mini-batch of such pairs of outputs (ŷ.sub.i) and the learning output quantity values (y.sub.i). With the aid of a suitable optimization algorithm, such as stochastic gradient descent, in so doing new values (Φ′) are determined for the parameters (Φ) of the ANN (1), which are supplied to the first memory (St.sub.1), where they update the existing values.

    [0070] The training device (140) comprises for example a computer (145) which performs the training method, and a memory (146) in which there is stored a computer program which comprises instructions for performing the training method when it is run by the computer (145).

    [0071] FIG. 3 shows a specific embodiment of a method that is performed by the training device (140).

    [0072] First (1000), a sequence of quasi-random numbers, for example a Hammersley sequence, is initialized using an initial value, and is generated to a specifiable length. This specifiable length is preferably greater than the number of neurons of the ANN (1) that are to be deactivated.

    [0073] Then, with the aid of a floor operator, this sequence (1100) is mapped onto integer values in order to obtain a sequence of indices. These indices may either be exclusively neurons (F.sub.1,F.sub.2,F.sub.3,F.sub.4) of the selected layer (2) (with the result that the illustrated method may be repeated layer by layer for the purpose of deactivating neurons), or preferably address all the neurons of the ANN (1).

    [0074] Thereafter (1200), these neurons are deactivated—that is, the associated output values (z.sub.1,z.sub.2,z.sub.3,z.sub.4) are preferably set to a value of zero.

    [0075] Then (1300), a forward pass is performed—that is to say that, for learning input quantity values (x.sub.i) and respectively associated learning output quantity values (y.sub.i), with the aid of the ANN (1) and with neurons deactivated as described, associated output quantities (9) are determined, and the cost function is determined from these.

    [0076] Thereafter (1400), a backward pass is performed—that is to say that, for example with the aid of gradient formation of the cost function and backpropagation, the weights of the non-deactivated neurons are adapted.

    [0077] This procedure can be iterated until the training method is complete.