Learning device, signal processing device, and learning method
11580407 ยท 2023-02-14
Assignee
Inventors
Cpc classification
International classification
Abstract
A learning data processing unit accepts, as input, a plurality of pieces of learning data for a respective plurality of tasks, and calculates, for each of the tasks, a batch size which meets a condition that a value obtained by dividing a data size of corresponding one of the pieces of learning data by the corresponding batch size is the same between the tasks. A batch sampling unit samples, for each of the tasks, samples from corresponding one of the pieces of learning data with the corresponding batch size calculated by the learning data processing unit. A learning unit updates a weight of a discriminator for each of the tasks, using the samples sampled by the batch sampling unit.
Claims
1. A learning device for training a single neural network for a plurality of tasks of different types using learning data whose data size varies from task to task, comprising: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs multi-task training of the neural network using stochastic gradient descent by performing processes of, accepting, as input, corresponding learning data, having a corresponding data size, for each of the plurality of tasks, the corresponding data size for each task varying from task to task; calculating, for each of the plurality of tasks, a corresponding batch size which meets a condition that a value obtained by dividing the data size of the corresponding learning data for a task by the corresponding batch size for the task is the same between the plurality of tasks; sampling, for each of the plurality of tasks, samples from the corresponding learning data with the calculated corresponding batch size; training the single neural network by updating a corresponding weight of a discriminator of the neural network for each of the plurality of tasks, using the samples sampled; and repeating the sampling and training until all samples in the learning data have been sampled.
2. A learning device for training a single neural network for a plurality of tasks of different types using learning data whose data size varies from task to task, comprising: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs multi-task training of the neural network using stochastic gradient descent by performing processes of, accepting, as input, corresponding learning data for each of the plurality of tasks, the corresponding learning data for each task having a corresponding data size, the corresponding data size for each task varying from task to task; calculating, for the plurality of tasks, a respective batch size whose ratio to the corresponding data size for the task has a fixed value between the plurality of tasks; sampling, for each of the plurality of tasks, samples from the corresponding learning data with calculated corresponding batch size; training the single neural network by updating a corresponding weight of a discriminator of the neural network for each of the plurality of tasks, using the samples sampled; and repeating the sampling and training until all samples in the learning data have been sampled.
3. A signal processing device comprising: an input information processor to accept input of input information; and a discriminator to perform a discrimination process using the input information accepted by the input information processor, the discriminator being caused to learn by the learning device according to claim 1.
4. A signal processing device comprising: an input information processor to accept input of input information; and a discriminator to perform a discrimination process using the input information accepted by the input information processor, the discriminator being caused to learn by the learning device according to claim 2.
5. A learning method of performing multi-task training of a single neural network using stochastic gradient descent for a plurality of tasks of different types using learning data whose data size varies from task to task, the method comprising: accepting, as input, corresponding learning data, having a corresponding data size for each of the plurality of tasks, the corresponding data size for each task varying from task to task; calculating, for each of the plurality of tasks, a corresponding batch size which meets a condition that a value obtained by dividing the data size of the corresponding learning data for a task by the corresponding batch size for the task is the same between the plurality of tasks; sampling, for each of the plurality of tasks, samples from the corresponding learning data with the calculated corresponding batch size; training the single neural network by updating, a corresponding weight of a discriminator of the neural network for each of the plurality of tasks, using the samples sampled; and repeating the sampling and training until all samples in the learning data have been sampled.
6. A learning method of performing multi-task training of a single neural network using stochastic gradient descent for a plurality of tasks of different types using learning data whose data size varies from task to task, the method comprising: accepting, as input, corresponding learning data, having a corresponding data size, for each of the plurality of tasks, the corresponding data size for each task varying from task to task; calculating, for the plurality of tasks, a respective batch size whose ratio to the corresponding data size for the task has a fixed value between the plurality of tasks; sampling, for each of the plurality of tasks, samples from the corresponding learning data with the calculated corresponding batch size; training the single neural network by updating, a corresponding weight of a discriminator of the neural network for each of the plurality of tasks, using the samples sampled; and repeating the sampling and training until all samples in the learning data have been sampled.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DESCRIPTION OF EMBODIMENTS
(12) To describe the invention in more detail, modes for carrying out the invention will be described below with reference to the accompanying drawings.
First Embodiment
(13)
(14) The input information processing unit 2 generates information used in a discrimination process from input information, and provides the generated information for the discriminator 4. For example, when the signal processing device 1 is an object detection device that detects an object from video data, the input information processing unit 2 extracts features of video from video data obtained as input information, and outputs information on the features of video to the discriminator 4.
(15) The learning device 3 is a device that performs learning of the discriminator 4. Hereinafter, description is made assuming that the learning device 3 performs learning of the discriminator 4 using learning data for each of a task A and a task B of different types. An example of the tasks A and B of different types includes a classification task such as recognition of facial expression or estimation of face orientation, and a regression task such as detection of facial feature points.
(16) A combination of tasks serving as learning targets is based on the premise that the features of input information used in the tasks are similar. Since the above-described estimation of face orientation and detection of facial feature points are similar in that facial features are captured, multi-task learning is possible.
(17) Note that the extent to which this premise holds is that if multi-task learning can be appropriately performed, then the features of information used in tasks which are learning targets are also similar.
(18) The discriminator 4 discriminates a target object or event on the basis of the information inputted from the input information processing unit 2. An example of the discriminator 4 includes a neural network shown in
(19) In the neural network, perceptrons which are nodes are hierarchically arranged, and by processing input information by perceptrons for each hierarchy in the order of an input layer, a hidden layer, and an output layer, discrimination results are calculated. The output layer corresponds to the output of a task to be discriminated, and in the case of a regression task, the output of an activation function is outputted as it is as a predicted value. In addition, in the case of a classification task, the output layer outputs a value to which a softmax function is applied.
(20)
(21) The learning device 3 shown in
(22) The learning data processing unit 5 calculates batch sizes which meet a condition that a value obtained by dividing a data size of each of the pieces of learning data by a corresponding one of the batch sizes is the same between the task A and the task B.
(23) For example, when the data size and batch size of learning data for the task A are S1 and B1, respectively, and the data size and batch size of learning data for the task B are S2 and B2, respectively, the learning data processing unit 5 calculates B1 and B2 which meet S1/B1=S2/B2.
(24) Note that the data size of learning data is the number of samples included in the learning data.
(25) Note also that the batch size is the number of samples to be sampled at a time from the learning data.
(26) The batch sampling unit 6 samples samples from the learning data for each task with the batch sizes calculated by the learning data processing unit 5.
(27) Note that since the number of iterations of a single learning loop for the task A is S1/B1 and the number of iterations of a single learning loop for the task B is S2/B2, the timing at which a single learning loop is completed is the same between the task A and the task B.
(28) In this manner, in learning of the task A sampling is appropriately performed with the batch size B1, and in learning of the task B sampling is appropriately performed with the batch size B2.
(29) The learning unit 7 performs a weight update of the discriminator 4 by backpropagation, using the samples sampled in the learning loops for the task A and the task B.
(30) In stochastic gradient descent, the weights of the discriminator 4 are updated using backpropagation.
(31) Note that backpropagation is a method for updating a weight by propagating an output error of the neural network which is the discriminator 4 from the output layer to the input layer in turn.
(32) In addition,
(33) The processing circuit may be dedicated hardware or may be a central processing unit (CPU) or a graphics processing unit (GPU) that reads and executes programs stored in a memory.
(34) When the processing circuit is a processing circuit 100 which is dedicated hardware shown in
(35) In addition, a function of each of the learning data processing unit 5, the batch sampling unit 6, and the learning unit 7 may be implemented by a processing circuit, or the functions of the respective units may be all together implemented by a single processing circuit.
(36) When the above-described processing circuit is a CPU 101 such as that shown in
(37) The software and the firmware are described as programs and stored in a memory 102. The CPU 101 implements the function of each unit by reading and executing the programs stored in the memory 102. Namely, the memory 102 for storing programs that consequently perform each function when executed by the CPU 101 is provided.
(38) In addition, those programs cause a computer to perform procedures or methods for the learning data processing unit 5, the batch sampling unit 6, and the learning unit 7.
(39) Here, the memory corresponds, for example, to a nonvolatile or volatile semiconductor memory such as a random access memory (RAM), a ROM, a flash memory, an erasable programmable ROM (EPROM), or an electrically EPROM (EEPROM), a magnetic disk, a flexible disk, an optical disc, a compact disc, a MiniDisc, or a digital versatile disc (DVD).
(40) Note that some of the functions of the learning data processing unit 5, the batch sampling unit 6, and the learning unit 7 may be implemented by dedicated hardware and some of the functions may be implemented by software or firmware. For example, the learning data processing unit 5 implements its function by the processing circuit 100 which is dedicated hardware, and the batch sampling unit 6 and the learning unit 7 implement their functions by the CPU 101 executing programs stored in the memory 102.
(41) As such, the above-described processing circuit can implement the aforementioned functions by hardware, software, firmware, or a combination thereof.
(42) In addition,
(43) In
(44) In this case, as shown in
(45) On the other hand, in
(46) In this case, conventionally, there is no algorithm for determining batch sizes with which learning can be appropriately completed for the task A and the task B.
(47) Hence, the learning device 3 according to the first embodiment samples samples from the pieces of learning data for the task A and for the task B with batch sizes which meet the condition that a value obtained by dividing the data size of each of the pieces of learning data by the corresponding one of the batch sizes is the same between the pieces of learning data.
(48) By this, even with the use of learning data including samples that are not fully tagged with either one of the label for the task A and the label for the task B, multi-task learning by stochastic gradient descent can be appropriately performed.
(49) Next, operation will be described.
(50)
(51) First, the learning device 3 reads learning data (step ST1) and performs learning of the discriminator 4 using the learning data (step ST2). Here, multi-task learning by stochastic gradient descent is performed using learning data for each of the task A and the task B.
(52)
(53) First, the learning data processing unit 5 calculates a batch size B1 for the task A and a batch size B2 for the task B (step ST1a).
(54) Here, the learning data processing unit 5 accepts, as input, learning data for the task A and learning data for the task B, and calculates batch sizes B1 and B2 on the basis of the data sizes of the pieces of learning data. Specifically, when the data size of the learning data for the task A is S1 and the data size of the learning data for the task B is S2, B1 and B2 which meet S1/B1=S2/B2 are calculated.
(55) For example, when S1=50 and S2=100, the batch size for the task A is B1=2 and the batch size for the task B is B2=4.
(56) Subsequently, the learning data processing unit 5 initializes a weight W1 to be updated with the learning data for the task A and a weight W2 to be updated with the learning data for the task B in the discriminator 4, and further initializes an epoch (step ST2a).
(57) The epoch is a learning loop in which all samples in the learning data are used once.
(58) Note that for the task A, sampling is repeated S1/B1 times in one epoch and for the task B, sampling is repeated S2/B2 times in one epoch.
(59) Then, the learning data processing unit 5 shuffles the samples of the learning data for the task A and shuffles the samples of the learning data for the task B (step ST3a).
(60) Note that shuffling of the samples refers to any rearrangement of the order of samples to be sampled from the learning data.
(61) Subsequently, the batch sampling unit 6 samples samples with the batch size B1 from the learning data for the task A out of the pieces of learning data whose samples have been shuffled by the learning data processing unit 5 (step ST4a).
(62) Furthermore, the batch sampling unit 6 samples samples with the batch size B2 from the learning data for the task B out of the learning pieces of data whose samples have been shuffled by the learning data processing unit 5 (step ST5a).
(63) The learning unit 7 updates the weight W1 of the discriminator 4 using the samples sampled with the batch size B1 by the batch sampling unit 6 (step ST6a).
(64) Furthermore, the learning unit 7 updates the weight W2 of the discriminator 4 using the samples sampled with the batch size B2 by the batch sampling unit 6 (step ST7a).
(65) Thereafter, the learning unit 7 determines whether all samples have been sampled from the learning data for the task A and the learning data for the task B (step ST8a).
(66) If all samples have not been sampled from the pieces of learning data (step ST8a; NO), the learning unit 7 notifies the batch sampling unit 6 of such a fact.
(67) When the batch sampling unit 6 receives the notification from the learning unit 7, the batch sampling unit 6 samples next batches from the pieces of learning data. By this, a series of processes from step ST4a are repeated.
(68) On the other hand, if all samples have been sampled from the pieces of learning data (step ST8a; YES), the learning unit 7 notifies the learning data processing unit 5 of such a fact.
(69) When the learning data processing unit 5 receives the notification from the learning unit 7, the learning data processing unit 5 increments the epoch by one (step ST9a).
(70) Thereafter, the learning data processing unit 5 determines whether the epoch is greater than or equal to a predetermined number of iterations N (step ST10a).
(71) If the epoch is less than the number of iterations N (step ST10a; NO), the learning data processing unit 5 returns to step ST3a and repeats the aforementioned series of processes.
(72) In addition, if the epoch is greater than or equal to the number of iterations N (step ST10a; YES), the processes end. Note that since S1/B1=S2/B2, the timing at which a single epoch is completed is the same between the task A and the task B.
(73)
(74) First, the input information processing unit 2 reads input information (step ST1b).
(75) The discriminator 4 performs signal processing for discriminating a target object or event, on the basis of the information inputted from the input information processing unit 2 (step ST2b).
(76) In the first embodiment, even with the use of learning data including samples that are not fully tagged with the labels for the task A and the task B, batch sizes can be appropriately set.
(77) By this, multi-task learning by stochastic gradient descent using a plurality of pieces of learning data with different data sizes can be implemented.
(78) For example, it is possible to appropriately construct a neural network that performs a task of detection of feature points and a task of recognition of facial expression.
(79) Note that the discriminator 4 of the first embodiment may be any as long as the discriminator 4 performs learning using stochastic gradient descent. Namely, the learning device 3 can be used for, for example, learning of a convolutional neural network (CNN), a deep neural network (DNN), a recurrent neural network (RNN), a long short-term memory (LSTM), or an autoencoder.
(80) In addition, although the first embodiment shows a case in which the task A and the task B are learned by the discriminator 4, a learning algorithm by the learning device 3 can be applied to multi-task learning for three or more tasks.
(81) For example, when a task A, a task B, and a task C are learning targets and the task A is a reference task, for the task B and the task C, respective batch sizes are determined on the basis of the number of iterations for the reference task.
(82) As described above, the learning device 3 according to the first embodiment includes the learning data processing unit 5, the batch sampling unit 6, and the learning unit 7. In this configuration, samples are sampled for individual tasks with respective batch sizes which meet the condition that a value obtained by dividing the data size of each of the pieces of learning data by the corresponding one of the batch sizes is the same between the pieces of learning data.
(83) By this, even with the use of learning data including samples that are not fully tagged with the labels for a plurality of tasks serving as learning targets, multi-task learning by stochastic gradient descent can be appropriately performed.
(84) In addition, the signal processing device 1 according to the first embodiment includes the input information processing unit 2, the learning device 3, and the discriminator 4. By having this configuration, multi-task learning by stochastic gradient descent can be performed using learning data including samples that are not fully tagged with the labels for a plurality of tasks serving as learning targets.
Second Embodiment
(85) In a second embodiment, multi-task learning is performed with batch sizes whose ratio has a fixed value between respective tasks.
(86)
(87) The learning device 3A shown in
(88) In addition, as in the first embodiment, the learning device 3A performs multi-task learning by stochastic gradient descent.
(89) The learning data processing unit 5A accepts, as input, a plurality of pieces of learning data for a respective plurality of tasks, and calculates, for the tasks, respective batch sizes whose ratio has a fixed value between the tasks. For example, when the batch size of learning data for the task A is B1, the batch size of learning data for the task B is B2, and the fixed value is R, the learning data processing unit 5A calculates B1 and B2 which meet B1/B2=R.
(90) The batch sampling unit 6A samples samples from the learning data for each task with a corresponding one of the batch sizes calculated by the learning data processing unit 5A.
(91) Note that because the number of iterations of a single learning loop for the task A differs from the number of iterations of a single learning loop for the task B, the task A and the task B require different loop processes from each other.
(92) In addition, as described using
(93) Next, operation will be described.
(94)
(95) The learning data processing unit 5A calculates a batch size B1 for the task A and a batch size B2 for the task B (step ST1c). Here, the learning data processing unit 5A calculates B1 and B2 which meet B1/B2=R.
(96) For example, when the data size of learning data for the task A is S1=50, the data size of learning data for the task B is S2=100, and the fixed ratio is R=0.5, since B1/B2=0.5, B1=2 and B2=4.
(97) Subsequently, the learning data processing unit 5A initializes a weight W1 to be updated with the learning data for the task A and a weight W2 to be updated with the learning data for the task B, and initializes an epoch 1 for the task A and an epoch 2 for the task B (step ST2c).
(98) The learning data processing unit 5A shuffles the samples of the learning data for the task A and shuffles the samples of the learning data for the task B (step ST3c).
(99) The batch sampling unit 6A samples samples with the batch size B1 from the learning data for the task A out of the pieces of learning data whose samples have been shuffled by the learning data processing unit 5A (step ST4c). In addition, the batch sampling unit 6A samples samples with the batch size B2 from the learning data for the task B out of the pieces of learning data whose samples have been shuffled by the learning data processing unit 5A (step ST5c).
(100) The learning unit 7 updates the weight W1 of a discriminator 4 using the samples sampled with the batch size B1 by the batch sampling unit 6A (step ST6c).
(101) Furthermore, the learning unit 7 updates the weight W2 of the discriminator 4 using the samples sampled with the batch size B2 by the batch sampling unit 6A (step ST7c).
(102) Then, the learning unit 7 determines whether all samples have been sampled from the learning data for the task A (step ST8c).
(103) If all samples have been sampled from the learning data for the task A (step ST8c; YES), the learning unit 7 notifies the learning data processing unit 5A of such a fact.
(104) When the learning data processing unit 5A receives the notification from the learning unit 7, the learning data processing unit 5A increments the epoch 1 by one (step ST9c). Then, the learning data processing unit 5A shuffles the samples of the learning data for the task A (step ST10c). Thereafter, the learning data processing unit 5A notifies the learning unit 7 of the completion of the process at step ST10c.
(105) When there are unprocessed samples in the learning data for the task A (step ST8c; NO) or step ST10c is completed, the learning unit 7 determines whether all samples have been sampled from the learning data for the task B (step ST11c).
(106) Here, if all samples have been sampled from the learning data for the task B (step ST11c; YES), the learning unit 7 notifies the learning data processing unit 5A of such a fact.
(107) When the learning data processing unit 5A receives the notification from the learning unit 7, the learning data processing unit 5A increments the epoch 2 by one (step ST12c). Then, the learning data processing unit 5A shuffles the samples of the learning data for the task B (step ST13c).
(108) Note that in the second embodiment the number of iterations of an epoch is determined with reference to the task A. Hence, when there are unprocessed samples in the learning data for the task B (step ST11c; NO) or the process at step ST13c is completed, the learning data processing unit 5A determines whether the epoch 1 is greater than or equal to N (step ST14c).
(109) If the epoch 1 is less than N (step ST14c; NO), the learning data processing unit 5A returns to step ST4c and repeats the aforementioned series of processes.
(110) In addition, if the epoch 1 is greater than or equal to the number of iterations N (step ST14c; YES), the processes end. Namely, the processes end not depending on the number of iterations of the epoch 2 for the task B but depending on the number of iterations of the epoch 1 for the task A.
(111) As described above, the learning device 3A according to the second embodiment includes the learning data processing unit 5A, the batch sampling unit 6A, and the learning unit 7. In this configuration, samples are sampled for individual tasks with respective batch sizes whose ratio has a fixed value between the tasks. By this, even with the use of learning data including samples that are not fully tagged with the labels for a plurality of tasks serving as learning targets, multi-task learning by stochastic gradient descent can be appropriately performed.
(112) In addition, a signal processing device 1 according to the second embodiment includes an input information processing unit 2, the learning device 3A, and the discriminator 4. Even with such a configuration, multi-task learning by stochastic gradient descent can be performed using learning data including samples that are not fully tagged with the labels of a plurality of tasks serving as learning targets.
(113) Note that although the first embodiment and the second embodiment show the signal processing device 1 including the learning device 3 or the learning device 3A, the learning device 3 or the learning device 3A may be provided separately from the signal processing device 1. For example, a signal processing device 1A shown in
(114) Note that the discriminator 4 is learned by the learning device 3.
(115) Although in
(116) Note that in the invention a free combination of the embodiments, modifications to any component of the embodiments, or omissions of any component in the embodiments are possible within the scope of the invention.
INDUSTRIAL APPLICABILITY
(117) Learning devices according to the invention can appropriately perform multi-task learning by stochastic gradient descent even with the use of learning data sets including samples that are not fully tagged with the labels for a plurality of tasks serving as learning targets, and thus are suitable as learning devices for a discriminator that performs character recognition, etc.
REFERENCE SIGNS LIST
(118) 1 and 1A: Signal processing device, 2: Input information processing unit, 3 and 3A: Learning device, 4: Discriminator, 5 and 5A: Learning data processing unit, 6 and 6A: Batch sampling unit, 7: Learning unit, 100: Processing circuit, 101: CPU, and 102: Memory