METHOD FOR TRAINING AN IMAGE ANALYSIS NEURAL NETWORK, AND OBJECT RE-IDENTIFICATION METHOD IMPLEMENTING SUCH A NEURAL NETWORK
20230064615 · 2023-03-02
Assignee
Inventors
Cpc classification
G06N3/082
PHYSICS
G06V20/52
PHYSICS
International classification
Abstract
A computer-implemented method for training a neural network providing a digital signature for each image given as input to the neural network. The method includes a first training phase of the neural network with a set of training images and a training algorithm aiming to minimize a first cost function. The method also includes a second training phase including at least one iteration of providing an image originating from said set of training images to said neural network in order to obtain a so-called real signature, generating at least one so-called artificial signature from said real signature, calculating an error based upon said real and artificial signatures, and updating at least one layer of said neural network, based upon said error, in order to minimize a second cost function. A neural network trained by the method and to a method for object re-identification on images implementing the neural network.
Claims
1. A computer-implemented method for training a neural network providing a digital signature for each image given as input to said neural network, said computer-implemented method comprising: a first training phase of said neural network with a set of training images and a training algorithm aiming to minimize a first cost function, and a second training phase comprising at least one iteration of providing an image originating from said set of training images to said neural network in order to obtain a real signature, generating at least one artificial signature from said real signature, calculating an error based upon said real signature and said at least one artificial signature, and updating at least one layer of said neural network, based upon said error, in order to minimize a second cost function.
2. The computer-implemented method according to claim 1, wherein the second training phase further comprises, prior to the updating, locking said at least one layer of the neural network so that said at least one layer that is locked is not updated during the updating.
3. The computer-implemented method according to claim 1, wherein the updating is performed using the training algorithm as the first training phase.
4. The computer-implemented method according to claim 1, wherein the training algorithm uses a gradient backpropagation method.
5. The computer-implemented method according to claim 1, wherein said at least one artificial signature is generated from a normal distribution in which a mean matches the real signature and a variance is a predetermined value, of 0.1.
6. The computer-implemented method according to claim 1, wherein, for said real signature, a number of artificial signatures generated is equal to five.
7. The computer-implemented method according to claim 1, wherein the first cost function is a double cost function taking into account a double error, said double error comprising a triplet loss or a contrastive loss, and a loss identification error or a classification error, comprising a cross-entropy.
8. The computer-implemented method according to claim 1, wherein the second cost function calculates an aggregate error, taking into account, by addition a real error, calculated based upon the real signature; an artificial error, calculated based upon each artificial signature of said at least one artificial signature, by averaging all artificial errors obtained for all artificial signatures of said at least one artificial signature.
9. A convolutional neural network trained by a computer-implemented training method for training said convolutional neural network providing a digital signature for each image given as input to said convolutional neural network, said computer-implemented training method comprising: a first training phase of said convolutional neural network with a set of training images and a training algorithm aiming to minimize a first cost function, and a second training phase comprising at least one iteration of providing an image originating from said set of training images to said convolutional neural network in order to obtain a real signature, generating at least one artificial signature from said real signature, calculating an error based upon said real signature and said at least one artificial signature, and updating at least one layer of said convolutional neural network, based upon said error, in order to minimize a second cost function.
10. A computer-implemented method for re-identifying objects in images implementing a neural network trained by the computer-implemented method ,the computer-implemented method comprising: a first training phase of said neural network with a set of training images and a training algorithm aiming to minimize a first cost function, and a second training phase comprising at least one iteration of providing an image originating from said set of training images to said neural network in order to obtain a real signature, generating at least one artificial signature from said real signature, calculating an error based upon said real signature and said at least one artificial signature, and updating at least one layer of said neural network, based upon said error, in order to minimize a second cost function.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0056] Other benefits and features shall become evident upon examining the detailed description of one or more embodiments, and from the enclosed drawings in which:
[0057]
[0058]
[0059]
[0060]
DETAILED DESCRIPTION OF THE INVENTION
[0061] It is understood that the embodiments disclosed hereunder are by no means limiting. In particular, it is possible to imagine variants of the invention that comprise only a selection of the features disclosed hereinafter in isolation from the other features disclosed, if this selection of features is sufficient to confer a technical benefit or to differentiate the invention with respect to the prior state of the art. This selection comprises at least one preferably functional feature which is free of structural details, or only has a portion of the structural details if this portion alone is sufficient to confer a technical benefit or to differentiate the invention with respect to the prior state of the art.
[0062] In particular, all of the described variants and embodiments can be combined with each other if there is no technical obstacle to this combination.
[0063] In the figures and in the remainder of the description, the same reference has been used for the features that are common to several figures.
[0064]
[0065]
[0066] The two neural networks 102.sub.1 and 102.sub.2 share exactly the same parameters. The updates of the parameters are synchronized across the two networks 102.sub.1 and 102.sub.2, that is to say that when the parameters of one of the networks 102.sub.1 and 102.sub.2 are updated, those of the other one of the networks 102.sub.1 and 102.sub.2 are also updated in the same way. Thus, at each time t, the values of the parameters of the networks 102.sub.1 and 102.sub.2 are exactly the same.
[0067] Each of the networks 102.sub.1 and 102.sub.2 takes an image as input and provides a digital signature for this image as output. A comparator 104 takes the signatures provided by each of the networks 102.sub.1 and 102.sub.2 as input. The comparator 104 is configured to determine a distance, for example the cosine or Euclidean distance, or a similarity, for example the cosine or Euclidean similarity, between the signatures provided by the neural networks 102.sub.1 and 102.sub.2.
[0068] In at least one embodiment, the neural network 102.sub.1 produces a signature S.sub.i for the observation l.sub.i and the neural network 102.sub.2 produces a signature S.sub.j for the observation l.sub.j. The comparator 104 determines the standardized cosine distance, denoted d(S.sub.i,S.sub.j), between the two signatures S.sub.i and S.sub.j. This distance d(S.sub.i,S.sub.j) should be minimized if the two signatures belong to the same entity, and maximized otherwise.
[0069] A cost function can then be defined, for example as a sum of all the distances obtained for all the training images. The algorithm for training a neural network aims to minimize the cost function thus defined.
[0070]
[0071]
[0072] Each of the networks 102.sub.1-102.sub.3 takes an image as input and provides a digital signature for this image as output.
[0073] A comparator 104 takes as input the signatures provided by each of the networks 102.sub.1-102.sub.3 and configured to compare these signatures to one another, for example by calculating the distance between these signatures taken two by two.
[0074] In at least one embodiment, the neural network 202.sub.1 produces a signature S.sub.i for an image l.sub.i, the neural network 202.sub.2 produces a signature S.sub.j for an image l.sub.j, identical to the image l.sub.i, and the neural network 202.sub.3 produces a signature S.sub.k for an image l.sub.k different from the image l.sub.i. The comparator 204 determines an error based upon the digital signatures S.sub.i, S.sub.j and S.sub.k, with the digital signature S.sub.i as the anchor input, the signature S.sub.j as the positive input and the signature S.sub.k as the negative input. More information on the Triplet Loss can be found on the page: https://fr.wikipedia.org/wiki/Fonction_de_co%C3%BBt_par_triplet
[0075] A cost function can then be defined, for example as being a sum of all the triplet losses obtained for all the training images. The algorithm for training a neural network aims to minimize the cost function thus defined.
[0076]
[0077] The method 200 can be used to train a neural network used to generate a digital signature of an image given as input to said neural network.
[0078] The neural network can be a convolutional neural network, for example a 50-layer CNN Resnet.
[0079] The method 200 depicted in
[0080] The training set can be a set of images from the academic world. For example, the training set can be any one, and preferably any combination, of the following image sets: [0081] CHUK01, “Human Reidentificaiton with Transferred Metric Learning”, by Li Wei, Zhao Rui and Wang Wiaogang, ACCV, 2012 [0082] CHUK03, “DeepReID: Deep Filter Pairing Neural Network for Person Re-identification” by Li Wei, Zhao Rui, Xiao Tong and Wang Wiaogang, CVPR, 2014 [0083] Market1501, “Improving Person Re-identificiation by Attribute and Identity Learning”, by Lin Yutian, Zheng Liang, Zhang Zhedong,
[0084] In at least one embodiment, the first training phase comprises several iterations of the following steps.
[0085] During a step 204, an image is provided to the neural network. The latter then provides a digital signature for this image.
[0086] During a step 206, an error is calculated, for this image based upon the signature obtained during step 204. The error calculated during step 206 may be the “Contrastive Loss” described with reference to
[0087] Preferably, in one or more embodiments, the error calculated during step 206 may be a double error combining: [0088] a “Triplet loss” or a “Contrastive loss” or even a “Circle loss” (https://arxiv.org/pdf/2002.10857.pdf); and [0089] a “loss identification” error, for example a “cross-entropy” error (https://www.tensorflow.org/api_docs/python/tf/keras/losses/Categorica lCrossentropy).
[0090] During a step 208, the parameters of at least one layer of the neural network are updated in an attempt to minimize a first cost function taking into account all the errors calculated for all the training images. The parameters of the neural network are updated by a training algorithm, such as for example a gradient backpropagation algorithm.
[0091] Steps 204-208 are repeated as many times as desired until the neural network is sufficiently trained, that is to say until the first cost function is minimized, and even more particularly until the first cost function returns a value that is less than or equal to a predetermined threshold.
[0092] For example, in at least one embodiment, the first training phase 202 can be stopped, and the neural network can be considered sufficiently trained, when the first cost function no longer decreases during ten iterations of said first phase 202.
[0093] The method 200 depicted in
[0094] This second training phase 210 trains the neural network again, using, for example, the same set of training images as the one used during the first phase, but using a different cost function.
[0095] During a step 212 of this second training phase 210, a certain number of layers of the neural network are locked so that these layers will not be updated during the second training phase 210. For example, the number of locked layers can be equal to 30.
[0096] During a step 214, an image is provided to the neural network. The latter then provides a digital signature for this image. This signature, referred to as real signature and denoted S.sub.r hereinafter, is supposed to be a signature that perfectly matches this image since the neural network has already been trained during the first training phase 202.
[0097] During a step 216, N signature(s), referred to as artificial signature(s), are generated from the real signature provided in step 214 by the neural network. According to at least one embodiment, the artificial signatures are generated according to a normal distribution model of mean S.sub.r and variance V. According to a non-limiting exemplary embodiment, N=5 and V=0.1.
[0098] During a step 218, an error is calculated, based upon a part of the real signature S.sub.r and the at least one artificial signature.
[0099] The error calculated may be the “Contrastive Loss” described with reference to
[0100] According to at least one embodiment, an artificial error is calculated for each artificial signature. Then, an artificial error average is calculated by averaging all the artificial errors obtained for all the artificial signatures. Finally, a total error is calculated by adding the real error obtained for the real signature, and the average of the artificial errors.
[0101] During a step 220, the parameters of at least one layer of the neural network are updated in an attempt to minimize a second cost function that takes into account all the errors of all the training images, for example by addition. The parameters of the neural network are updated by a second training algorithm, such as for example a gradient backpropagation algorithm. During this update, the parameters of the layers locked in step 212 are not modified.
[0102] Steps 214-220 are repeated as many times as desired until the neural network is sufficiently trained, that is to say until the second cost function is minimized, in other words, when the second cost function returns a value that is less than or equal to a predetermined threshold.
[0103] For example, in at least one embodiment, the second training phase 210 may be stopped, and the neural network may be considered sufficiently trained, when the second cost function no longer decreases during ten iterations of said second training phase 210.
[0104]
[0105] The method 300 of
[0106] The method 300 uses a neural network trained according to one or more embodiments of the invention, such as for example a neural network trained by the method 200 of
[0107] The method 300 comprises a step 302 of providing a first image of a first object to the trained neural network. This step 302 provides a signature S.sub.1.
[0108] The method 300 comprises a step 304 of providing a second image of a second object to the trained neural network. This step provides a signature S.sub.2.
[0109] The neural network used during step 304 can be the same network as that used during step 302. In this case, steps 302 and 304 are carried out in turn.
[0110] Preferably, in at least one embodiment, the neural network used during step 304 is another neural network, identical to the neural network used during step 302. In other words, steps 302 and 304 use two Siamese neural networks. In this case, steps 302 and 304 can be carried out in turn, or, preferably, at the same time.
[0111] During a step 306, the signatures S.sub.1 and S.sub.2 are compared. For example, a distance d, for example the cosine or Euclidean distance, is calculated between the signatures S.sub.1 and S.sub.2.
[0112] This distance d is compared with a predetermined threshold value during a step 308. If the distance is less than the threshold value, then this indicates that the second object is the same as the first object. Otherwise, this indicates that the first object and the second object are different objects.
[0113] Thus, by repeating steps 304-308 on different images, it is possible to identify and track an object that appears on a first image.
[0114] Of course, the invention is not limited to the examples and embodiments disclosed above.