Method for person re-identification based on deep model with multi-loss fusion training strategy
11195051 · 2021-12-07
Assignee
- Tongji University (Shanghai, CN)
- Hefei University of Technology (Hefei, CN)
- Beijing E-Hualu Info Technology Co., Ltd. (Beijing, CN)
Inventors
- Deshuang Huang (Shanghai, CN)
- Sijia Zheng (Shanghai, CN)
- Zhongqiu Zhao (Hefei, CN)
- Xinyong Zhao (Beijing, CN)
- Jianhong Sun (Beijing, CN)
- Yang Zhao (Beijing, CN)
- Yongjun Lin (Beijing, CN)
Cpc classification
G06F18/214
PHYSICS
G06V10/454
PHYSICS
G06V10/7715
PHYSICS
G06V20/52
PHYSICS
G06V40/173
PHYSICS
G06V40/103
PHYSICS
International classification
Abstract
The invention relates to a method for person re-identification based on deep model with multi-loss fusion training strategy. The method uses a deep learning technology to perform preprocessing operations such as flipping, clipping, random erasing and style transfer, and then feature extraction is performed through a backbone network model; joint training of a network is performed by fusing a plurality of loss functions. Compared with other deep learning-based person re-identification algorithms, the present invention greatly improves the performance of person re-identification by adopting a plurality of preprocessing modes, the fusion of three loss functions and effective training strategy.
Claims
1. A method for person re-identification based on a deep model with multi-loss fusion training strategy, comprising the following steps: 1): Acquiring an original image data set, dividing the original image data set into a training set and a test set, and dividing the test set into a query set and a gallery set; 2): Sequentially subjecting image data of the training set to a data preprocessing process of flipping, noise adding, automatic clipping, random erasing and style transfer, and performing data augmentation after the preprocessing is completed; 3): Selecting and training a benchmark network, updating the weight, optimizing the benchmark network, and adjusting a hyper-parameter; wherein, the benchmark network is trained by a fusion of triplet loss function, cross-entropy loss function and center loss function, wherein the triplet loss function is used to increase an inter-class distance and shorten an intra-class distance, and the center loss function is used to make feature maps of the same identity (ID) close to the center; 4): Inputting the training set image data obtained in step 2) into the optimized and adjusted benchmark network for feature extraction; and 5): Calculating Euclidean distances for the extracted features in pairs, sorting the calculated Euclidean distances, and selecting from the gallery set an image closest to a target in the query set as an identification result.
2. The method for person re-identification based on a deep model with multi-loss fusion training strategy according to claim 1, wherein in step 3), for a large number of person re-identification data set, the transfer learning method is adopted to initialize pre-trained model parameters, and then training is further performed; and for a person data set with a small amount of data, the model trained on the large data is used to fine-tune the training.
3. The method for person re-identification based on a deep model with multi-loss fusion training strategy according to claim 1, wherein a triplet model is used as a skeleton and three images are used as a group of inputs, and an expression of a group of input images is:
R.sub.i=<R.sub.i.sup.o,R.sub.i.sup.+,R.sub.i.sup.o> wherein R.sub.i.sup.o, R.sub.i.sup.+ and R.sub.i.sup.− are expressions of the group of three images in the input, respectively, R.sub.i.sup.o and R.sub.i.sup.+ are positive sample pairs, respectively, and R.sub.i.sup.o and R.sub.i.sup.− are negative sample pairs, respectively.
4. The method for person re-identification based on a deep model with multi-loss fusion training strategy according to claim 1, wherein an expression of the loss function L for fusion of the triplet loss function, the cross entropy loss function and the center loss function is:
L=α.sub.1L.sub.1+α.sub.2L.sub.2+α.sub.3L.sub.3 wherein L.sub.1 is the cross-entropy loss function, L.sub.2 is the triplet loss function, L.sub.3 is the center loss function, α.sub.1 is the weight of a proportion of the cross-entropy loss, α.sub.2 is the weight of a proportion of the triplet loss, and α.sub.3 is the weight of a proportion of the center loss.
5. The method for person re-identification based on a deep model with multi-loss fusion training according to claim 4, wherein the expression of the cross-entropy loss function L.sub.1 is:
6. The method for person re-identification based on deep model with multi-loss fusion training strategy according to claim 5, wherein the corresponding expression of the triplet loss function L.sub.2 is:
L.sub.2=[thre+d)(F.sub.w(R.sub.i.sup.o),F.sub.w(R.sub.i.sup.+))−d(F.sub.w(R.sub.i.sup.o),F.sub.w(R.sub.i.sup.−))].sub.+ wherein thre is a hyper-parameter used to make a distance between sample pairs of the same class smaller than a distance between sample pairs of different classes, d(.) represents the distance measurement function, F.sub.w(R.sub.i.sup.o), F.sub.w(R.sub.i.sup.+) and F.sub.w(R.sub.i.sup.−) are feature maps corresponding to R.sub.i.sup.o, R.sub.i.sup.+ and R.sub.i.sup.− respectively, and [x].sub.+ is a function max (0, x).
7. The method for person re-identification based on deep model with multi-loss fusion training strategy according to claim 6, wherein the corresponding expression of the center loss function L.sub.3 is:
8. The method for person re-identification based on deep model with multi-loss fusion training strategy according to claim 7, wherein the center of the center loss function L.sub.3 is continuously updated in the training process, and the update formula is as follows when s=y.sub.i:
9. The method for person re-identification based on deep model with multi-loss fusion training strategy according to claim 1, wherein in Step 3), adjusting the hyper-parameters of the benchmark network comprises iteration step adjustment, initial value adjustment of iteration step, and selection of learning functions.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
DETAILED DESCRIPTION
(4) The present invention is described in detail below with reference to the accompanying drawings and specific embodiments. Apparently, the described embodiments are merely some rather than all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the disclosure without creative efforts shall fall within the protection scope of the disclosure.
(5) The present invention relates to a method for person re-identification based on a deep model with multi-loss fusion training strategy, including the following steps:
(6) Step 1: Acquiring an original image data set, and dividing the data set into a training set and a test set.
(7) Step 2: Performing data preprocessing and data augmentation on a benchmark data set of training set. The embodiment of the present invention adopts the following data processing modes: 1) Randomly extracting a plurality of images in the benchmark data set for horizontal flipping. 2) Randomly extracting a plurality of images in the benchmark data set and performing Gaussian and salt-pepper noise processing. 3) Randomly extracting a plurality of images in the benchmark data set for erasing the random size of the random area. 4) Using Cycle GAN to perform style transfer of the same person taken by different cameras in the same data set, which reduces the environmental difference between different camera visions.
(8) Step 3: Selecting and training the benchmark network, updating the weight, optimizing the model, and adjusting the hyper-parameters of the benchmark network. For the model training, a joint limit is performed by fusing an identification loss function, a center loss function and a triplet loss function.
(9) Step 4: After the data set is subjected to the corresponding organization and the foregoing data augmentation, input the image into the CNN for feature extraction.
(10) Step 5: Calculating the Euclidean distances for the extracted features in pairs, sorting the calculated Euclidean distances, and selecting from the gallery set an image closest to a target in the query set as an identification result.
(11) This embodiment takes the data set Market1501 as an example to illustrate the training process and test process of the network model. It should be understood that the specific embodiments described herein are merely illustrative of the present invention, but the invention is not used to limit a single special data set.
(12) Data Organization:
(13) A total of 12936 images of 751 ID persons in the Market1501 data set are taken as training data, and the rest 19732 images are taken as test data. The test data is divided into the query set and the gallery set. The query set has 3368 images, including 750 person IDs. The remaining images of the test data are used as the gallery set.
(14) Data Preprocessing:
(15) randomly extracted several images from the training data for horizontal flipping, noise adding, random erasing. At the same time, for the 6 cameras in the Market1501 data set, the images between different cameras are subjected to perform a camera style transfer using cycle GAN, which makes the data set augmented in multiples.
(16) Network Training:
(17) Market1501 is a relatively large person re-identification dataset, the network pre-trained on ImageNet is used for extraction, Due to parameter and time considerations, ResNet50 is used as a backbone network, and dropout is used to prevent over-fitting. By using the Adam method, loss functions of fusion of the triplet loss, the identification loss and the center loss are continuously reduced to update the weight and optimize the network.
(18) Network Evaluation:
(19) The trained network is used for feature extraction in the query set and the gallery set, calculate the distance between the proposed features by calculating the Euclidean distance; an image close to a target in the query set is obtained in the gallery set to judge whether the persons obtained in the gallery set and the query set are the same person, and if so, the output resulted is used as the identification result.
(20) Network Results:
(21) Through evaluation and calculation, the method for person re-identification based on a deep model with multi-loss fusion training strategy is proposed. On the Market1501 data set (without re-ranking), mAP is 70.1, rank1 accuracy is 86.6, and rank5 accuracy is 94.6. At the same time, some better experimental results are also achieved on other data sets.
(22) The foregoing descriptions are only specific implementations of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in art can easily conceive of various equivalent modifications or replacements within the technical scope disclosed in the present invention, and these equivalent modifications or replacements should fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.