Neural network learning device, method, and program
11580383 · 2023-02-14
Assignee
Inventors
Cpc classification
G06V10/774
PHYSICS
G06F18/241
PHYSICS
G06F18/2132
PHYSICS
International classification
Abstract
A large amount of training data is typically required to perform deep network leaning, making it difficult to achieve using a few pieces of data. In order to solve this problem, the neural network device according to the present invention is provided with: a feature extraction unit which extracts features from training data using a learning neural network; an adversarial feature generation unit which generates an adversarial feature from the extracted features using the learning neural network; a pattern recognition unit which calculates a neural network recognition result using the training data and the adversarial feature; and a network learning unit which performs neural network learning so that the recognition result approaches a desired output.
Claims
1. A neural network learning device, comprising: a processor; and a memory storing executable instructions that, when executed by the processor, causes the processor to perform as: a feature extraction unit configured to extract features from training data using a neural network being currently learned; an adversarial feature generation unit configured to generate an adversarial feature by adding, to the extracted features, perturbations so that recognition by the neural network being currently learned becomes difficult; a pattern recognition unit configured to calculate a recognized result of the neural network using the extracted features and the adversarial feature; and a network learning unit configured to learn the neural network so that the recognized result approaches a desired output.
2. The neural network learning device as claimed in claim 1, wherein the adversarial feature generation unit is configured to generate the adversarial feature under a constraint which is represented by a linear combination of the training data.
3. A pattern recognition apparatus configured to perform pattern recognition based on a neural network which is learned by using the neural network learning device claimed in claim 1.
4. A neural network learning method comprising: extracting features from training data using a neural network being currently learned; generating an adversarial feature by adding, to the extracted features, perturbations so that recognition by the neural network being currently learned becomes difficult; calculating a recognized result of the neural network using the extracted features and the adversarial feature; and learning the neural network so that the recognized result approaches a desired output.
5. The neural network learning method as claimed in claim 4, wherein the generating generates the adversarial feature under a constraint which is represented by a linear combination of the training data.
6. A non-transitory computer readable recording medium for storing a neural network learning program for causing a computer to execute: a process for extracting features from training data using a neural network being currently learned; a process for generating an adversarial feature by adding, to the extracted features, perturbations so that recognition by the neural network being currently learned becomes difficult; a process for calculating a recognized result of the neural network using the extracted features and the adversarial feature; and a process for learning the neural network so that the recognized result approaches a desired output.
7. The non-transitory computer readable recording medium as claimed in claim 6, wherein the process for generating causes the computer to generate the adversarial feature under a constraint which is represented by a linear combination of the training data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DESCRIPTION OF EMBODIMENTS
(9) [Explanation of Configuration]
(10)
(11) The feature extraction unit 12 extracts features from training data using a neural network being currently learned. The adversarial feature generation unit 14 generates, using the neural network being currently learned, an adversarial feature from the features extracted by the feature extraction unit 12. The pattern recognition unit 16 calculates an output recognized result of the neural network using the training data and the adversarial feature. The network learning unit 18 learns the neural network so that the recognized result approaches a desired output. Herein, a combination of the training data and the adversarial feature corresponds to data which are generated by processing the training data.
(12) [Explanation of Operation]
(13) Referring to
(14) The feature extraction unit 12 extracts features from input training data using a neural network being currently learned (step S101). The adversarial feature generation unit 14 adds, to the features extracted by the feature extraction unit 12, perturbations so that recognition by the neural network being currently learned becomes difficult, and generates an adversarial feature (step S102). The pattern recognition unit 16 calculates, for each of the features extracted by the feature extraction unit 12 and the adversarial feature generated by the adversarial feature generation unit 14, a recognized result using the neural network being currently learned and outputs the recognized result (step S103). The network learning unit 18 renews the neural network so that the recognized result produced by the pattern recognition unit 16 becomes a desired recognized result, and learns the neural network (step S104).
(15) An advantageous effect of this example embodiment will be described. The advantageous effect of this example embodiment is that a neural network with high performance can be learned by processing training data with an adversarial feature generated on a feature space to efficiently generate data which contribute to an improvement of learning, and by learning the neural network using the data thus generated.
(16) The reason is as follows. The feature space is a space which well represents a distribution of the training data. Therefore, it is considered that a neighborhood of a pattern existing on the feature space is a set of patterns whose meanings are similar to that of the pattern in question. Accordingly, by generating the adversarial feature on the feature space, it is possible to generate, among patterns whose meanings are similar, a pattern which is most difficult to recognize and it is possible to efficiently generate the data which contribute to an improvement of the learning of the neural network.
(17) Using
(18) In a case of generating data using the data augmentation method of the related art, data are generated by preliminarily designating perturbations which may possibly occur in the data. For this reason, the data augmentation method can generate data following the distribution (the dotted line in the figure) of the training data as shown in
(19) On the other hand, the adversarial pattern generation method of the related art generates data so that discrimination becomes difficult. For this reason, the adversarial pattern generation method can generate data which are close to a discrimination boundary as shown in
(20) On the other hand, this example embodiment generates the adversarial feature on the one-dimensional feature space which well represents the distribution of the training data, as shown in
(21) In order to further facilitate understanding of this invention, description will proceed to differences between this example embodiment and the inventions described in the above-mentioned Patent Literatures 1-3.
(22) The invention disclosed in Patent Literature 1 optimizes the structure of the neural network by modifying the structure of the neural network. In comparison with this, this example embodiment processes the training data to be supplied to the neural network without modifying the structure of the neural network and learns the neural network using the data generated by the processing.
(23) In the invention disclosed in Patent Literature 2, a special-purpose feature calculation unit calculates a value of a feature without using a learning algorithm for a neural network in a defect classification unit or the like. In comparison with this, in this example embodiment, the feature extraction unit 12 extracts the features from the training data using the neural network being currently learned. The invention disclosed in Patent Literature 2 generates (supplements), in the pre-training, the new teaching data in the vicinity of the teaching data in a case where the number of pieces of the teaching data is insufficient In comparison with this, this example embodiment efficiently generates data which contribute to an improvement of the learning of the neural network by processing the training data supplied to the neural network without generating (supplementing) the new teaching data (training data).
(24) Although, in the invention disclosed in Patent Literature 3, the feature extraction unit extracts the n-dimensional feature, no description is made about which algorithm is specifically used for the purpose of extraction. In comparison with this, in this example embodiment, the feature extraction unit extracts the features from the training data using the neural network being currently learned. The invention disclosed in Patent Literature 3 generates the pattern recognition dictionary from a plurality of learning patterns. In comparison with this, this example embodiment uses and processes the training data and learns the neural network using the data generated by the processing without generating the pattern recognition dictionary.
(25) As described above, this example embodiment is quite different in problem to be solved, configuration, and function and effect from the inventions described in Patent Literatures 1-3.
Example 1
(26) New, description will proceed to an operation of a mode for embodying this invention using a specific first example. This first example illustrates an example of learning a neural network 30 shown in
(27) The neural network 30 includes an input layer 31, an intermediate layer 32, and an output layer 33. The input layer 31 is supplied with a two-dimensional learning pattern. The neural network 30 produces, from the output layer 33 through the intermediate layer 32 having one hidden unit, a probability per each class as discriminated results of two classes. In this example, it is assumed that all of the layers 31 to 33 are fully connected to one another and an activating function is an identity function.
(28) The feature extraction unit 12 extracts the features from the training data using the neural network 30. In a case of using the neural network 30 in
(29) The adversarial feature generation unit 14 generates the adversarial feature using the features extracted by the feature extraction unit 12 and the neural network being currently learned. Inasmuch as the neural network 30 of
(30) The pattern recognition unit 16 calculates the recognized result using the neural network 30 being currently learned for each of the features extracted by the feature extraction unit 12 and the adversarial feature generated by the adversarial feature generation unit 14. In the example in
(31) The network learning unit 18 renews the neural network 30 so that the recognized result produced by the pattern recognition unit 16 becomes the desired recognized result, and learns the neural network. In the example in
(32) As described above, in the first example, the discrimination boundary can be kept very far away from samples by generating the adversarial feature within the subspace where the training data exist and by learning the neural network. As a result, it is possible to learn the neural network with a large margin and high generalization performance.
Example 2
(33) Now, description will proceed to an operation of a mode for embodying this invention as regards a second example in a case where the intermediate layer is a multilayer. This second example illustrates an example of learning a neural network 50 shown in
(34) The neural network 50 includes an input layer 51, an intermediate layer 52, and an output layer 53. In such a neural network 50, the input layer 51 is supplied with a learning pattern and the output layer 53 produces a recognized result. The intermediate layer 52 includes four layers: an H1 layer 521, an H2 layer 522, an H3 layer 523, and an H4 layer 524.
(35) The feature extraction unit 12 extracts the features from the training data using the neural network 50 being currently learned. In a case of using the neural network 50 in
(36) When an input pattern is represented by x and a parameter in the network being currently learned is represented by θ, the extracted feature z is written as follows.
z=f(x|θ,In,H3) [Math. 1]
(37) Herein, f(x|θ,A,B) represents an operation of calculating a value of a B layer when a value of an A layer is given by x in the network having the parameter θ. Selection of the intermediate layer 52 to produce the feature may be carried out randomly or may be determined in a deterministic fashion in accordance with a method preliminarily determined.
(38) The adversarial feature generation unit 14 generates the adversarial feature using the features extracted by the feature extraction unit 12 and the neural network 50 being currently learned. As a method of generating the adversarial feature, a method similar to the adversarial pattern generation method of the related art may be used. For example, in a case of using Virtual Adversarial Training (VAT), the adversarial feature z′ for z is generated as follows.
(39)
(40) Herein, each of f(z|θ,H3,Out) and f(z+r|θ,H3,Out) represents an output of the output layer and therefore becomes a probability distribution of the class to which the input pattern belongs. KL (p,q) represents a function for calculating a KL divergence between two discrete probability distributions p and q.
(41)
(42) Herein, i represents an index of the probability distribution and, in the second example, represents the index of a unit of the output layer 53.
(43) In the second example, the adversarial feature generation unit 14 generates the adversarial feature by adding, to z, a perturbation providing a greatest change in the value of the output layer 53 among perturbations each having a magnitude which is equal to or less than ε.
(44) The pattern recognition unit 16 calculates, for each of the features extracted by the feature extraction unit 12 and the adversarial feature generated by the adversarial feature generation unit 14, the recognized result using the neural network 50 being currently learned. In the second example, the pattern recognition unit 16 calculates values of the output layer 53 for z and z′, respectively.
y=f(z|θ,H3,Out)
y′=f(z′|θ,H3,Out) [Math. 4]
(45) Herein, y represents the recognized result for original training data and y′ represents the recognized result for the adversarial feature.
(46) The network learning unit 18 renews the neural network 50 so that the recognized result produced by the pattern recognition unit 16 becomes the desired recognized result and learns the neural network. As a method of renewing the network, a gradient method based on a commonly-used backpropagation method or the like may be used. For example, when a most simple steepest descent method is used, a parameter in the neural network is renewed as follows.
(47)
(48) Herein, t represents the teaching signal indicative of the desired recognized result and μ represents a learning rate.
(49) As mentioned above, in the second example also, the discrimination boundary can be kept very far away from the samples by learning the neural network with the adversarial feature generated within the subspace where the training data exist. As a result, it is possible to learn the neural network with the large margin and the high generalization performance.
Example 3
(50) New, description will proceed to a third example. In order to make the adversarial feature further follow the distribution of the training data, restriction may be introduced on the adversarial feature or the perturbation for generating the adversarial feature.
(51) In the first example mentioned above, the restriction on the perturbation r* for generating the adversarial feature is only a constraint that a magnitude is equal to or less than ε. In comparison with this, the third example introduces a constraint, for example, that it can be expressed by a linear combination of the training data. When a coefficient of the linear combination is given by c, the perturbation r is written as follows.
(52)
(53) Herein, Z represents a matrix of features (z_1, . . . , z_M) which are extracted from the training data. In this event, the adversarial feature z′ can be generated as follows.
(54)
(55) As described above, in the third example also, the discrimination boundary can be kept very far away from the samples by learning the neural network with the adversarial feature generated within the subspace where the training data exist. As a result, it is possible to learn the neural network with the large margin and the high generalization performance.
(56) It is possible to achieve a pattern recognition apparatus by using the neural network obtained by learning as described above. That is, the pattern recognition apparatus carries out pattern recognition on the basis of the neural network 30 or 50 which is learned using the above-mentioned neural network learning device 10.
(57) Respective parts (respective components) of the neural network learning device 10 may be implemented by using a combination of hardware and software. In a form in which the hardware and the software are combined, the respective parts (the respective components) are implemented as various kinds of means by developing a neural network leaning program in an RAM (random access memory) and making hardware such as a control unit (CPU (central processing unit)) and so on operate based on the program. The program may be recorded in a recording medium to be distributed. The program recorded in the recording medium is read into a memory via a wire, wirelessly, or via the recording medium itself to operate the control unit and so on. By way of example, the recording medium may be an optical disc, a magnetic disk, a semiconductor memory device, a hard disk, or the like.
(58) Explaining the above-mentioned example embodiment (examples) with different expression, it is possible to implement the embodiment by making a computer to be operated as the neural network learning device 10 act as the feature extraction unit 12, the adversarial feature generation unit 14, the pattern recognition unit 16, and the network learning unit 18 according to the neural network learning program developed in the RAM.
(59) As described above, according to the example embodiment (examples) of the present invention, it is possible to effectively learn the neural network even with a small number of pieces of training data.
(60) This invention is not strictly limited to the specific configurations of the above-mentioned example embodiment, and this invention involves any changes in a range not departing from the gist of this invention.
(61) While the present invention has been described with reference to the example embodiment and the examples thereof, the present invention is not limited to the foregoing example embodiment and examples. The configuration and the details of this invention may be modified within the scope of this invention in various manners which could be understood by those of ordinary skill.
(62) Industrial Applicability
(63) This invention is applicable to, in image processing or speech processing, uses for discriminating a pattern, for example, face recognition, object recognition, and so on.
(64) Reference Signs List
(65) 10 neural network learning device 12 feature extraction unit 14 adversarial feature generation unit 16 pattern recognition unit 18 network learning unit 30 neural network 31 input layer 32 intermediate layer 33 output layer 50 neural network 51 input layer 52 intermediate layer 521 H1 layer 522 H2 layer 523 H3 layer 524 H4 layer 53 output layer