MORE ROBUST TRAINING FOR ARTIFICIAL NEURAL NETWORKS
20220327387 · 2022-10-13
Inventors
Cpc classification
G06N7/01
PHYSICS
G06N3/082
PHYSICS
International classification
Abstract
A method for training an artificial neural network, ANN, which comprises a multiplicity of processing units. Parameters that characterize the behavior of the ANN are optimized according to a cost function. Depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit is deactivated. Selection of the selected processing unit is achieved with the aid of a sequence of quasi-random numbers.
Claims
1. A method for training an artificial neural network (ANN), which includes a multiplicity of processing units, the method comprising: optimizing parameters that characterize a behavior of the ANN a according to a cost function; and deactivating, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers.
2. The method as recited in claim 1, wherein the sequence of quasi-random numbers is initialized using a random value.
3. The method as recited in claim 2, wherein the initialization of the sequence of random numbers is changed after each training pass has been carried out.
4. The method as recited in claim 3, wherein the change in the initialization is performed by a specifiable increment.
5. The method as recited in claim 1, wherein a specifiable proportion of the processing units of the ANN is selected and deactivated.
6. The method as recited in claim 1, wherein the sequence of quasi-random numbers is one of the following sequences: Halton sequence, Hammersley sequence, Niederreiter sequence, Kronecker sequence, Sobol sequence, Van der Corput sequence.
7. The method as recited in claim 1, wherein the ANN is configured as a classifier.
8. The method as recited in claim 7, wherein the ANN is configured as a classifier of image data and/or audio data.
9. A non-transitory machine-readable storage medium on which is stored a computer program for training an artificial neural network (ANN), which includes a multiplicity of processing units, the computer program, when executed by a computer, causing the computer to perform the following steps: optimizing parameters that characterize a behavior of the ANN a according to a cost function; and deactivating, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers.
10. A training device configure to train an artificial neural network (ANN), which includes a multiplicity of processing units, the training device configured to: optimize parameters that characterize a behavior of the ANN a according to a cost function; and deactivate, depending on outputs determined from learning input quantity values and on learning output quantity values, an output of at least one selected processing unit, and selection of the selected processing unit being achieved using a sequence of quasi-random numbers.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0063]
[0064]
[0065]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0066]
[0067] In this context, a selected layer (2) comprises a plurality of neurons (F.sub.1,F.sub.2,F.sub.3,F.sub.4), of which the output values (z.sub.1,z.sub.2,z.sub.3,z.sub.4) are forwarded as a typically multidimensional intermediate quantity (z) to a succeeding layer (3).
[0068] The neurons may conventionally be arranged in multidimensional form, for example as a two-dimensional tensor of size M×N. It is possible to index the neurons in one layer by a one-dimensional count of the neurons.
[0069]
[0070] The training device (140) comprises for example a computer (145) which performs the training method, and a memory (146) in which there is stored a computer program which comprises instructions for performing the training method when it is run by the computer (145).
[0071]
[0072] First (1000), a sequence of quasi-random numbers, for example a Hammersley sequence, is initialized using an initial value, and is generated to a specifiable length. This specifiable length is preferably greater than the number of neurons of the ANN (1) that are to be deactivated.
[0073] Then, with the aid of a floor operator, this sequence (1100) is mapped onto integer values in order to obtain a sequence of indices. These indices may either be exclusively neurons (F.sub.1,F.sub.2,F.sub.3,F.sub.4) of the selected layer (2) (with the result that the illustrated method may be repeated layer by layer for the purpose of deactivating neurons), or preferably address all the neurons of the ANN (1).
[0074] Thereafter (1200), these neurons are deactivated—that is, the associated output values (z.sub.1,z.sub.2,z.sub.3,z.sub.4) are preferably set to a value of zero.
[0075] Then (1300), a forward pass is performed—that is to say that, for learning input quantity values (x.sub.i) and respectively associated learning output quantity values (y.sub.i), with the aid of the ANN (1) and with neurons deactivated as described, associated output quantities (9) are determined, and the cost function is determined from these.
[0076] Thereafter (1400), a backward pass is performed—that is to say that, for example with the aid of gradient formation of the cost function and backpropagation, the weights of the non-deactivated neurons are adapted.
[0077] This procedure can be iterated until the training method is complete.