PERSON SEARCH METHOD BASED ON PERSON RE-IDENTIFICATION DRIVEN LOCALIZATION REFINEMENT
20210365743 · 2021-11-25
Assignee
Inventors
- Nong Sang (Hubei, CN)
- Chuchu Han (Hubei, CN)
- Yuanjie Shao (Hubei, CN)
- Ruochen Zheng (Hubei, CN)
- Changxin Gao (Hubei, CN)
US classification
- 1/1
Cpc classification
G06V10/454
PHYSICS
G06V20/52
PHYSICS
G06V40/10
PHYSICS
G06V40/103
PHYSICS
International classification
G06T3/40
PHYSICS
Abstract
The invention discloses a person search method based on person re-identification driven localization refinement. On one hand, the region of interest (ROI) conversion module converts an original input image into a small image corresponding to a ROI, and contradiction existing in part of features shared by a person re-identification network and a detection network is avoided; and on the other hand, loss of the person re-identification network can be transmitted back to the detection network in a gradient manner through the ROI conversion module, the supervision of loss of the person re-identification network for the detection bounding box output by the detection network is realized, and the adjusted detection bounding box can effectively remove background interference, contains more useful attribute information and is more suitable for person search, so that the person search accuracy is greatly improved.
Claims
1. A person search method based on person re-identification driven localization refinement, characterized in comprising: (1) constructing a person re-identification driven localization refinement model, wherein the person re-identification driven localization refinement model comprises a detection module, a region of interest conversion module, and a person re-identification module; wherein the detection module is configured to detect a person in an input image and obtain coordinates of detected bounding boxes corresponding to a person position; the region of interest conversion module is configured to compute and obtain affine transformation parameters from the input image to the coordinates of the detected bounding boxes according to the coordinates of the detected bounding boxes, and extract a region of interest in the input image according to the affine transformation parameters and bilinear sampling; the person re-identification module is configured to extract depth features in the region of interest; (2) using an original picture as an input of the person re-identification driven localization refinement model, and using a probability value of an identity tag corresponding to the person in the original picture as an expected output after classification of features output by the person re-identification driven localization refinement model, and training the person re-identification driven localization refinement model; (3) inputting a query image to be searched and gallery images respectively into the trained person re-identification driven localization refinement model to obtain a person feature of the query image to be searched and a person feature of the gallery images, computing a similarity between the person feature of the query image to be searched and the person feature of the gallery images, and obtaining a matching result of the query image to be searched.
2. The person search method based on person re-identification driven localization refinement according to claim 1, characterized in that the person re-identification module is supervised by adopting cross entropy loss and triplet proxy loss.
3. The person search method based on person re-identification driven localization refinement according to claim 2, characterized in that the method for supervising the person re-identification module by using the triplet proxy loss is specifically as follows: (01) initializing a triplet proxy table T∈R.sup.N*.sup.K configured to store a feature value of each category, wherein N represents a total number of categories of samples, and K represents the number of features stored in each of the categories; (02) in forward propagation, making a distance between the samples of the same category shorter, and a distance between samples of different categories longer by computing a value of the triplet proxy loss; (03) in backward propagation, updating the features of corresponding category of the current sample in the triplet proxy table, wherein existing features are replaced based on a first-in first-out principle.
4. The person search method based on person re-identification driven localization refinement according to claim 1, characterized in that a loss function of the person re-identification module supervises the coordinates of the detected bounding boxes output by the detection module.
5. The person search method based on person re-identification driven localization refinement according to claim 1, characterized in that the detection module uses Faster R-CNN as a network backbone.
6. The person search method based on person re-identification driven localization refinement according to claim 5, characterized in that the Faster R-CNN comprises classification loss, but does not comprise regression loss.
7. The person search method based on person re-identification driven localization refinement according to claim 6, characterized in that an anchor frame aspect ratio adopted by the Faster R-CNN is less than 1.
8. The person search method based on person re-identification driven localization refinement according to claim 1, characterized in that the person re-identification module uses ResNet50 as a network backbone.
9. The person search method based on person re-identification driven localization refinement according to claim 8, characterized in that the ResNet50 uses a batch normalization layer to replace a final fully connected layer of a network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0025]
[0026]
DESCRIPTION OF THE EMBODIMENTS
[0027] In order to make the purposes, technical solutions and advantages of the present disclosure clearer, the following further describes the present disclosure in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure, but not to limit the present disclosure. In addition, the technical features involved in the various embodiments of the present disclosure described below can be combined with each other as long as they do not conflict with each other.
[0028] As shown in
[0029] (1) A person re-identification driven localization refinement model is constructed. As shown in
[0030] Specifically, the detection module provided by the embodiment of the disclosure adopts Faster R-CNN as the backbone of the network. Since the detection target is a person, the aspect ratio of the anchor in Faster R-CNN should be modified to be less than 1 in order to be more suitable for the proportion of the human body. In the embodiment of the disclosure, the aspect ratio of the anchor in Faster R-CNN is modified from 1:1, 1:2, 2:1 to 1:1, 1:2, 1:3. In the meantime, to enable the re-identification loss to dominate the formation of detection bounding box, instead of just making the detection bounding box close to the real frame, in the disclosure, only the classification loss of the original Faster R-CNN is retained, and the regression loss in the original network is removed.
[0031] With the function of the interest conversion module, the loss of the re-identification network can be returned to the detection network in a gradient manner, so as to supervise the detected coordinates. Specifically, the following formula is adopted to compute and obtain the affine transformation parameters θ from input image to coordinates of the detected bounding boxes according to the coordinates of the detected bounding boxes.
[0032] where x.sub.i.sup.s and y.sub.i.sup.s represent the coordinates of the detection bounding box in the original input image, and x.sub.i.sup.t and y.sub.i.sup.t represent the coordinates of the small image of the region of interest that is extracted.
[0033] According to the affine transformation parameters θ and bilinear sampling, the small image of the region of interest corresponding to the detection bounding box can be obtained, and the gradient return of the loss function can be realized. The calculation formula of the small image of the region of interest is as follows.
V=B(P.sup.S,U),
[0034] where B represents bilinear sampling, U and V respectively represent the original input image and the small image of the region of interest, P.sup.S is the pixel points from the small image to the original image obtained according to the affine transformation.
[0035] The person re-identification module uses ResNet50 as the backbone of the network. In order to keep the number of training categories consistent with the number of training set categories, in the disclosure, the final fully connected layer of ResNet50 is removed to obtain a modified residual network, and a batch normalization layer is added after the modified residual network.
[0036] (2) The original picture is used as the input of the person re-identification driven localization refinement model, and the probability value of the identity tag corresponding to the person in the original picture is used as the expected output after the classification of output features of the person re-identification driven localization refinement model, and the person re-identification driven localization refinement model is trained.
[0037] Specifically, the disclosure uses cross-entropy loss and triplet proxy loss to supervise the person re-identification model. Specifically, the triplet loss is a commonly used measurement loss in the field of person re-identification, and the loss can make the distance between samples of the same category shorter while making the distance between samples of different categories longer. However, since there are too few batch training samples in the person search task and the conventional triplet loss cannot be established, in the disclosure, a triplet proxy loss is designed, through which a triplet proxy table is used to store the features of all categories, and the features are updated in each iteration. In this manner, even if there are insufficient batch training samples to establish the triplet, it is possible to extract proxy samples from the triplet proxy table to establish a triplet, and therefore such method is called the triplet proxy loss. The method for supervising the person re-identification module by using the triplet proxy loss is specifically as follows. (01) The triplet proxy table T∈R.sup.N*.sup.K configured to store the feature value of each category is initialized; where N represents the total number of categories of the samples, and K represents the number of features stored in each category. The embodiment of the disclosure makes K=2. (02) In forward propagation, the distance between samples of the same category is shorter and the distance between samples of different categories is longer by computing a value of the triplet proxy loss.
[0038] where m constrains the distance between the negative sample pairs to be greater than the distance between the positive sample pairs, f.sub.i.sup.a, f.sub.i.sup.p, f.sub.i.sup.n respectively represent the features of the anchor sample, the positive sample and the negative sample in the triplet, and D represents the Euclidean distance.
[0039] (03) In backward propagation, the features of corresponding category of the current sample in the triplet proxy table are updated, and the existing features are replaced based on the first-in first-out principle.
[0040] (3) The query image to be searched and the gallery images are respectively input into the trained person re-identification driven localization refinement model to obtain the person feature of the query image to be searched and the person feature of the gallery images, so as to compute the similarity between the person feature of the query image to be searched and the person feature of the gallery images, thereby obtaining the matching result of the query image to be searched.
[0041] Those skilled in the art can easily understand that the above descriptions are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Any modification, equivalent replacement and improvement, etc. made within the spirit and principle of the present disclosure should fall within the scope of the present disclosure.