NEURAL NETWORK MODEL TRAINING METHOD AND APPARATUS FOR COMPLEX CHARACTERISTIC CLASSIFICATION AND COMMON LOCALIZATION
20220406035 · 2022-12-22
Inventors
Cpc classification
G06V10/778
PHYSICS
International classification
Abstract
A neural network model training method and an apparatus for complex characteristic classification and common localization are proposed. In the method, a neural network model includes: a convolution layer for performing a convolution operation on an input image by using a convolution filter; a pooling layer for performing pooling on an output of the convolution layer; and class-specific fully connected layers respectively corresponding to classes into which complex characteristics are classified and outputting values obtained by multiplying an output of the pooling layer by class-specific weights (w.sub.fc(T.sub.t)). The method includes: (a) inputting the input image to the convolution layer; (b) calculating class-specific observation maps for respective classes on the basis of the output of the convolution layer; (c) calculating an observation loss (L.sub.obs) common to the classes on the basis of the class-specific observation maps; and (d) back-propagating a loss based on the observation loss to the neural network model.
Claims
1. A neural network model training method for complex characteristic classification and common localization of an image, wherein a neural network model comprises: a convolution layer configured to perform a convolution operation on an input image by using a convolution filter; a pooling layer configured to perform pooling on an output of the convolution layer; and a plurality of class-specific fully connected layers configured to respectively correspond to a plurality of classes into which complex characteristics are classified and output values obtained by multiplying an output of the pooling layer by class-specific weights (w.sub.fc(T.sub.t)), wherein different criteria distinguish each of the plurality of classes, each of the plurality of classes is classified into a plurality of class-specific characteristics, and the neural network model is capable of providing class-specific characteristic probabilities for the class-specific characteristics of each of the plurality of classes according to an output of each class-specific fully connected layer, wherein the neural network model training method comprises: (a) inputting the input image to the convolution layer; (b) calculating class-specific observation maps for the plurality of respective classes on the basis of the output of the convolution layer; (c) calculating an observation loss (L.sub.obs) common to the plurality of classes on the basis of the class-specific observation maps; and (d) back-propagating a loss based on the observation loss (L.sub.obs) to the neural network model, wherein step (c) comprises: (c-1) generating a common observation map common to the plurality of classes on the basis of the class-specific observation maps; and (c-2) calculating the observation loss (L.sub.obs) by using the common observation map and a target region of the input image, and wherein each step is performed by a computer processor.
2. The method of claim 1, wherein the common observation map is an average value of the class-specific observation maps.
3. The method of claim 1, wherein the observation loss is calculated by calculating a cosine distance for concatenated values obtained by respectively projecting the common observation map and the target region of the input image in horizontal and vertical directions.
4. The method of claim 1, wherein, in step (b), the class-specific observation maps are calculated by the following equation:
Σ.sub.k=1.sup.C(w.sub.fc.sup.k)T.sub.t)×o.sub.conv.sup.k) (where, T.sub.t denotes the classes, w.sub.fc(T.sub.t) denotes the weights of the class-specific fully connected layers, o.sub.conv denotes the output of the convolution layer, and C denotes the number of channels.)
5. The method of claim 1, wherein the neural network model further includes: a plurality of class-specific classifiers configured to respectively correspond to the plurality of class-specific fully connected layers, and calculate the class-specific characteristic probabilities according to the outputs of the class-specific fully connected layers.
6. The method of claim 5, wherein step (d) comprises: (d-1) calculating class-specific classification losses (L.sub.cls(T.sub.t)) on the basis of an output result of each of the plurality of class-specific classifiers; (d-2) calculating class-specific characteristic losses (L(T.sub.t)) on the basis of the observation loss (L.sub.obs) and the class-specific classification losses (L.sub.cls(T.sub.t)); and (d-3) back-propagating, for each class, the class-specific characteristic losses (L(T.sub.t)) to the plurality of class-specific classifiers and the plurality of class-specific fully connected layers.
7. The method of claim 6, wherein, in step (d-2), the class-specific characteristic losses (L(T.sub.t)) are calculated by the following equation:
(1−α).sub.cls(T.sub.t)+α
.sub.obs, (where, 0≤a≤1).
8. The method of claim 6, wherein step (d) further comprises: (d-4) calculating a multi-label classification loss (L(T)) on the basis of the plurality of class-specific classification losses (L.sub.cls(T.sub.t)) and the observation loss (L.sub.obs); and (d-5) back-propagating the multi-label classification loss (L(T)) to the plurality of class-specific classifiers, the plurality of class-specific fully connected layers, the pooling layer, and the convolution layer.
9. The method of claim 1, wherein the pooling layer is a global average pooling layer.
10. A neural network model training apparatus for complex characteristic classification and common localization of an image and comprising a memory in which a neural network model is stored and a processor, wherein the neural network model comprises: a convolution layer configured to perform a convolution operation on an input image by using a convolution filter; a pooling layer configured to perform pooling on an output of the convolution layer; and a plurality of class-specific fully connected layers configured to respectively correspond to a plurality of classes into which complex characteristics are classified and output values obtained by multiplying an output of the pooling layer by class-specific weights (w.sub.fc(T.sub.t)), wherein different criteria distinguish each of the plurality of classes, each of the plurality of classes is classified into a plurality of class-specific characteristics, and the neural network model is capable of providing class-specific characteristic probabilities for the class-specific characteristics of each of the plurality of classes according to an output of each class-specific fully connected layer, and wherein the apparatus comprises the processor configured to input the input image to the convolution layer, calculate a plurality of class-specific observation maps for the plurality of respective classes on the basis of the output of the convolution layer, generate a common observation map common to the plurality of classes on the basis of the class-specific observation maps, calculate an observation loss (L.sub.obs) by using the common observation map and a target region of the input image, and back-propagate a loss based on the observation loss (L.sub.obs) to the neural network model.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
DETAILED DESCRIPTION OF THE DRAWINGS
[0033] The terms or words used in this description and claims should be interpreted as meanings and concepts corresponding to the technical spirit of the present disclosure based on the principle that inventors may properly define the concept of a term in order to best describe their disclosure.
[0034] Throughout the description of the present disclosure, when a part is said to “include” or “comprise” a certain component, it means that it may further include or comprise other components, except to exclude other components unless the context clearly indicates otherwise. In addition, when a case is referred to as that a first component is “connected to”, “transmitted to”, “sent to”, “received from”, or “transferred to” a second component, the case includes not only that the first component is directly connected to, transmitted to, sent, received from, or transferred to the second component, but also that the first component is indirectly connected to, transmitted to, sent to, received from, or transferred to the second component by allowing a third component to be interposed therebetween. In addition, the terms “˜part”, “˜unit”, “module”, and the like mean a unit for processing at least one function or operation and may be implemented by a combination of hardware and/or software.
[0035] Hereinafter, a specific exemplary embodiment of the present disclosure will be described with reference to the drawings.
[0036]
[0037] In
[0038] Referring to
[0039] Referring to
[0040] In the neural network model training method for complex characteristic classification and common localization of an image according to the exemplary embodiment of the present disclosure, in the neural network model, a convolution layer configured to perform a convolution operation on an input image by using a convolution filter, a pooling layer configured to perform pooling on an output of the convolution layer, and a plurality of class-specific fully connected layers configured to respectively correspond to a plurality of classes into which complex characteristics are classified, and output values obtained by multiplying an output of the pooling layer by class-specific weights w.sub.fc(T.sub.t) are included.
[0041] In the method, (a) inputting an input image to a convolution layer, (b) calculating class-specific observation maps on the basis of an output of the convolution layer, (c) calculating an observation loss L.sub.obs that is common to the plurality of classes on the basis of the class-specific observation maps, and (d) back-propagating a loss based on the observation loss L.sub.obs to a neural network model are included.
[0042]
[0043] Referring to
[0044] The neural network model 1 may further include: a plurality of class-specific classifiers 40 configured to respectively correspond to the plurality of class-specific fully connected layers 30, and calculate class-specific characteristic probabilities according to outputs of the class-specific fully connected layers 30.
[0045] The convolution layer 10 performs a convolution operation on an input image by using a plurality of convolution filters, so as to extract a feature map. As shown in
[0046] The pooling layer 20 is positioned between the convolution layer 10 and the fully connected layers 30, and serves to reduce the size of the feature map o.sub.conv, so as to reduce operations required in the fully connected layers 30, which will be described later and to prevent overfitting. The pooling layer 20 may perform global average pooling that outputs an average value for each channel of the feature map o.sub.conv.
[0047] Each class-specific fully connected layer 30 outputs values obtained by respectively multiplying the outputs of the pooling layer 20 with the class-specific weights w.sub.fc(T.sub.1), w.sub.fc(T.sub.2), . . . w.sub.fc(T.sub.t), . . . w.sub.fc(T.sub.NT). In this case, each of the class-specific weights w.sub.fc(T.sub.1), w.sub.fc(T.sub.2), . . . w.sub.fc(T.sub.t), . . . w.sub.fc(T.sub.NT) may be a value of a plurality of values corresponding to the number of channels.
[0048] The class-specific classifiers 40 respectively correspond to the class-specific fully connected layers 30, and calculate the class-specific characteristic probabilities according to the outputs of the class-specific fully connected layers 30. Referring to
[0049] Next, a neural network model training method according to the exemplary embodiment of the present disclosure will be described with reference to
[0050]
[0051] Referring to
[0052] Next, in step S110, a plurality of class-specific observation maps is calculated on the basis of outputs of the convolution layer.
[0053] Each observation map, also called a class activation map (CAM), indicates how much each part of the input image affects a classification result. In the present exemplary embodiment, each observation map is calculated for each class.
Σ.sub.k=1.sup.C(w.sub.fc.sup.k)T.sub.t)×o.sub.conv.sup.k,
[0054] where, T.sub.t denotes the classes, w.sub.fc(T.sub.t) denotes the weights of the class-specific fully connected layers, o.sub.conv denotes the output of the convolution layer, and C is the number of channels.
[0055] Next, in step S120, an observation loss common to the plurality of classes is calculated on the basis of the plurality of class-specific observation maps.
[0056] According to the exemplary embodiment, step S120 may include: step S121 of generating a common observation map that is common to the plurality of classes on the basis of the plurality of class-specific observation maps; and step S122 of calculating an observation loss by using the common observation map and the target region of the input image. The observation loss may be calculated on the basis of differences between the target region of the input image and the common observation map. The common observation map may be an average value of the class-specific observation maps, and may be calculated by the following equation.
[0057] Here, Σ.sub.k=1.sup.C(w.sub.fc.sup.k)T.sub.t)×o.sub.conv.sup.k) denotes the class-specific observation maps described above, and N.sub.T denotes the number of classes.
[0058] However, this is only an example, and the ratio of class-specific observation maps may be allowed to be different, or the common observation map may be calculated on the basis of the observation maps of some classes among all classes.
[0059] The observation loss may be calculated by using the calculated common observation map and the target region of the input image.
[0060] For example, the observation loss may be calculated by the following equation.
[0061] Here,
[0062] ∈: lower bound value, M.sub.H=Σ.sub.h=0.sup.HM(h,W), M.sub.V=Σ.sub.w=0.sup.WM(H, w),
[0063] M.sup.i denotes a target region of an input image x.sub.i, and {circumflex over (M)}.sup.i denotes a common observation map of the input image x.sub.i.
[0064]
[0065]
[0066] Although only the target region of the input image has been described as an example in
[0067]
[0068] According to the exemplary embodiment, the observation loss may be obtained by calculating a cosine distance for the concatenated values obtained by respectively projecting the common observation map and the target region of the input image in the horizontal and vertical directions.
[0069] That is, the observation loss may be calculated by the following equation.
[0070] In a case of using the above equation, it is possible to reflect an overall distribution rather than the accuracy in units of pixels.
[0071] Next, referring back to
[0072] Steps S100 to S130 described above may be performed on a plurality of input images, and accordingly, the neural network model may be trained.
[0073] Although the observation map is generated for each class in the present exemplary embodiment, the observation loss is equally applied to the plurality of classes. Accordingly, an effect that the observation map becomes the same for the plurality of classes, that is, an effect of common localization may be thus acquired.
[0074]
[0075] First, in step S200, an image is input to a convolution layer of the neural network model.
[0076] Next, in steps S210 and S220, class-specific classification losses and an observation loss are calculated from the neural network model.
[0077] The class-specific classification losses are values indicating how accurately a characteristic belonging to each class is predicted, and is calculated for each class. The class-specific classification losses may be calculated on the basis of each output result of the plurality of class-specific classifiers. For example, the class-specific classification losses may be calculated from differences between class-specific characteristics of the input image and class-specific characteristic probabilities (refer to P.sub.1(T.sub.t), P.sub.2(T.sub.t), . . . P.sub.n(T.sub.t) of
[0078] The class-specific classification losses may be calculated by the following equation.
[0079] Here, p.sup.c (x.sub.i) denotes the output probabilities of the class-specific characteristics c for the entire class T.sub.t of the input image x.sub.i,
[0080] N.sub.X denotes the number of training images, and C.sub.Tt denotes the number of class-specific characteristics belonging to the class T.sub.t.
[0081] Since the observation loss is the same as described above, a redundant description is omitted.
[0082] Next, in step S230, class-specific characteristic losses are calculated on the basis of the class-specific classification losses and the observation loss.
[0083] The class-specific characteristic losses are values that reflect the observation loss and the class-specific classification losses. As described above, although the observation loss is the same for the plurality of classes, since the class-specific classification losses are different for each class, the class-specific characteristic losses may have a different value for each class.
[0084] The class-specific characteristic losses may be calculated by the following equation.
(1−α).sub.cls(T.sub.t)+α
.sub.obs
[0085] Here, L.sub.cls(T.sub.t) denotes the class-specific classification losses, L.sub.obs denotes the observation loss, and a condition 0≤α≤1 is satisfied.
[0086] Next, in step S240, the class-specific characteristic losses are back-propagated for each class to the plurality of class-specific classifiers and the plurality of class-specific fully connected layers.
[0087] Referring to the neural network model of
[0088] Next, in step S250, a multi-label classification loss is calculated.
[0089] The multi-label classification loss is a value that reflects the class-specific classification losses calculated for each class, unlike the class-specific classification losses in the previous step (refer to S210). The multi-label classification loss may be calculated on the basis of the plurality of class-specific classification losses and the observation loss. The multi-label classification loss is equally applied for the plurality of classes.
[0090] The class-specific weights of the plurality of class-specific fully connected layers 30 are adjusted by the back-propagation of step S240, and accordingly, the plurality of class-specific classification losses and the observation loss may also be changed. The multi-label classification loss may be calculated on the basis of the plurality of class-specific classification losses and the observation loss, which have been changed.
[0091] The multi-label classification loss may be calculated by the following equation.
[0092] Here,
[0093] P.sup.c(x.sub.i) denotes the output probabilities of the class-specific characteristics c for the plurality of all classes of the input image x.sub.i,
[0094] N.sub.X denotes the number of training images, and C.sub.T denotes the number of class-specific characteristics for the plurality of all classes.
[0095] Next, in step S260, the multi-label classification loss is back-propagated throughout the entire neural network model.
[0096] Referring to
[0097] Steps S200 to S260 described above may be performed on a plurality of input images, and accordingly, the neural network model may be trained.
[0098] Next, a data flow for training the neural network model according to the exemplary embodiment of the present disclosure will be described with reference to
[0099]
[0100] Referring to
[0101] As described above by referring to
[0102] As shown in
[0103] In addition, referring to
[0104] Next, referring to
[0105] Meanwhile, in step S320, class-specific observation maps are calculated on the basis of the output O.sub.conv of the convolution layer 10 and weights w.sub.fc(T.sub.t) of the class-specific fully connected layers 30.
[0106] Next, referring to
[0107] Next, referring to
[0108] Next, referring to
[0109] Next, referring to
[0110] Accordingly, the class-specific weights w.sub.fc(T.sub.t) of the class-specific fully connected layers 30 are adjusted. The processing of the class-specific fully connected layers 30, the processing of the class-specific classifiers 40, and the calculating of the class-specific classification losses (i.e. step S310) are performed again, whereby the class-specific classification losses L.sub.cls(T.sub.t) are adjusted. The calculating of the class-specific observation maps (i.e. step S320), the calculating of the common observation map (i.e. step S330), and the calculating of the observation loss (i.e. step S340) are performed again, whereby the observation loss L.sub.obs is adjusted.
[0111] Next, referring to
[0112] Next, referring to
[0113]
[0114] The neural network model training apparatus 1000 includes: a memory 1100 in which a neural network model is stored; and a processor 1200.
[0115] The neural network model stored in the memory 1100 has already been described with reference to
[0116] The processor 1200 performs the neural network model training method described with reference to
[0117] As above, the present disclosure has been described in detail through the preferred exemplary embodiments, but the present disclosure is not limited thereto, and it is apparent to those skilled in the art that various changes and applications may be made within the scope of the present disclosure without departing from the technical spirit of the present disclosure. Accordingly, the true protection scope of the present disclosure should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present disclosure.