PERSON RE-IDENTIFICATION DEVICE AND METHOD

20220165048 · 2022-05-26

    Inventors

    Cpc classification

    International classification

    Abstract

    A person re-identification device comprises: a feature extracting and dividing unit, that receives images including a person to be re-identified and extracts a feature of each image according to a pre-learned pattern estimation method to acquire a 3-dimensional feature vector, and divides the 3-dimensional feature vector into a pre-designated size unit to acquire local feature vectors; a one-to-many relational reasoning unit, that estimates the relationship between each of the local feature vectors and remaining local feature vectors, and reflects the estimated relationship to acquire local relational features; a global contrastive pooling unit, that acquires a global contrastive feature by performing global contrastive pooling; and a person re-identification unit, that receives the local relational features and the global contrastive feature as a final descriptor of a corresponding image, and compares the final descriptor with a reference descriptor acquired in advance from an image including a person to be searched.

    Claims

    1. A person re-identification device for re-identifying a person included in an image, the device performing learning by receiving a plurality of learning images labeled with an identifier of a person included, and comprising: a feature extracting and dividing unit, that receives a plurality of images including a person to be re-identified, extracts a feature of each image according to a pre-learned pattern estimation method to acquire a 3-dimensional feature vector, and divides the 3-dimensional feature vector into a pre-designated size unit to acquire a plurality of local feature vectors; a one-to-many relational reasoning unit, that estimates a relationship between each of the plurality of local feature vectors and remaining local feature vectors according to a pre-learned pattern estimation method, and reflects the estimated relationship to each of the plurality of local feature vectors to acquire a plurality of local relational features; a global contrastive pooling unit, that acquires a global contrastive feature by performing global contrastive pooling in which a relationship between a maximum feature and an average feature of the plurality of local feature vectors is reflected back to the maximum feature according to a pre-learned pattern estimation method; and a person re-identification unit, that receives the plurality of local relational features and the global contrastive feature as a final descriptor of a corresponding image, and compares the final descriptor with a reference descriptor that is a final descriptor acquired in advance from an image including a person to be searched, thereby determining whether a person to be searched is included.

    2. The person re-identification device according to claim 1, wherein the one-to-many relational reasoning unit acquires the plurality of local relational features, by concatenating an enhanced local feature acquired by sequentially extracting features for each of the plurality of local feature vectors and a rest part enhanced average feature acquired by extracting features for an average pooling result of the remaining local feature vectors from which features are not extracted, extracting features again for the concatenated rest part enhanced average feature, and then adding the corresponding enhanced local feature.

    3. The person re-identification device according to claim 2, wherein the one-to-many relational reasoning unit includes: a local feature extracting unit that selects one of the plurality of local feature vectors in a pre-designated order, and extracts a feature of the selected local feature vector according to a pre-learned pattern estimation method, thereby acquiring the enhanced local feature; a rest part average sampling unit that acquires a rest part average feature by performing average pooling on a local feature vector not selected by the local feature extracting unit among the plurality of local feature vectors; a rest part average feature extracting unit that acquires the rest part enhanced average feature by extracting a feature of the rest part average feature according to a pre-learned pattern estimation method; an enhanced local feature concatenating unit that concatenates the enhanced local feature and the rest part enhanced average feature to generate a concatenated local feature; a concatenated local feature extracting unit that acquires an enhanced concatenated local feature by extracting a feature of the concatenated local feature according to a pre-learned pattern estimation method; and a local relational feature acquiring unit that acquires a local relational feature corresponding to a selected local feature vector by adding the enhanced concatenated local feature and the enhanced local feature.

    4. The person re-identification device according to claim 1, wherein the global contrastive pooling unit performs max pooling and average pooling on all of the plurality of local feature vectors, acquires an enhanced contrastive feature and an enhanced global maximum feature by extracting a feature of each of a global contrastive feature, which is a difference between a max pooling result and an average pooling result, and the max pooling result, extracts a feature from the result of concatenating the enhanced contrastive feature and the enhanced global maximum feature, and then add the enhanced global maximum feature again, thereby acquiring the global contrastive feature.

    5. The person re-identification device according to claim 4, wherein the global contrastive pooling unit includes: a global max sampling unit that acquires a global maximum feature by performing global max pooling on all of the plurality of local feature vectors; a global average sampling unit that acquires a global average feature by performing global average pooling on all of the plurality of local feature vectors; a contrastive feature acquiring unit that acquires a contrastive feature by calculating a difference between the global maximum feature and the global average feature; an enhanced maximum feature extracting unit that acquires an enhanced global maximum feature by extracting a feature of the global maximum feature according to a pre-learned pattern estimation method; an enhanced contrastive feature extracting unit that acquires an enhanced contrastive feature by extracting a feature of the contrastive feature according to a pre-learned pattern estimation method; an enhanced global feature concatenating unit that generates a concatenated global feature by concatenating the enhanced global maximum feature and the enhanced contrastive feature; a concatenated global feature extracting unit that acquires an enhanced concatenated global feature by extracting a feature of the concatenated global feature according to a pre-learned pattern estimation method; and a global contrastive feature acquiring unit that acquires a global contrastive feature by adding the enhanced global maximum feature and the enhanced concatenated global feature.

    6. The person re-identification device according to claim 1, wherein the person re-identification device further includes a learning unit that receives a learning image labeled with an identifier at the time of learning, calculates triplet losses and cross-entropy losses from the difference between the identifier labeled in the learning image and the final descriptor acquired from the learning image to acquire a total loss, and backpropagates the acquired total loss.

    7. A person re-identification method comprising the steps of: performing learning by receiving a plurality of learning images labeled with an identifier of a person included; acquiring a 3-dimensional feature vector by receiving a plurality of images including a person to be re-identified and extracting a feature of each image according to a pre-learned pattern estimation method; acquiring a plurality of local feature vectors by dividing the 3-dimensional feature vector into a pre-designated size unit; acquiring a plurality of local relational features by estimating a relationship between each of the plurality of local feature vectors and remaining local feature vectors according to a pre-learned pattern estimation method and reflecting the estimated relationship to each of the plurality of local feature vectors; acquiring a global contrastive feature by performing global contrastive pooling in which a relationship between a maximum feature and an average feature of the entire plurality of local feature vectors is reflected back to the maximum feature according to a pre-learned pattern estimation method; and receiving the plurality of local relational features and the global contrastive feature as a final descriptor of a corresponding image, and comparing the final descriptor with a reference descriptor that is a final descriptor acquired in advance from an image including a person to be searched, thereby determining whether the person to be searched is included.

    8. The person re-identification method according to claim 7, wherein the step of acquiring the plurality of local relational features includes the steps of: selecting one of the plurality of local feature vectors in a pre-designated order, and extracting a feature of the selected local feature vector according to a pre-learned pattern estimation method, thereby acquiring an enhanced local feature; acquiring a rest part average feature by performing average pooling on a local feature vector not selected among the plurality of local feature vectors; acquiring a rest part enhanced average feature by extracting a feature of the rest part average feature according to a pre-learned pattern estimation method; concatenating the enhanced local feature and the rest part enhanced average feature to generate a concatenated local feature; acquiring an enhanced concatenated local feature by extracting a feature of the concatenated local feature according to a pre-learned pattern estimation method; and acquiring a local relational feature corresponding to a selected local feature vector by adding the enhanced concatenated local feature and the enhanced local feature.

    9. The person re-identification method according to claim 7, wherein the step of acquiring the global contrastive feature includes the steps of: acquiring a global maximum feature by performing global max pooling on all of the plurality of local feature vectors; acquiring a global average feature by performing global average pooling on all of the plurality of local feature vectors; acquiring a contrastive feature by calculating a difference between the global maximum feature and the global average feature; acquiring an enhanced global maximum feature by extracting a feature of the global maximum feature according to a pre-learned pattern estimation method; acquiring an enhanced contrastive feature by extracting a feature of the contrastive feature according to a pre-learned pattern estimation method; generating a concatenated global feature by concatenating the enhanced global maximum feature and the enhanced contrastive feature; acquiring an enhanced concatenated global feature by extracting a feature of the concatenated global feature according to a pre-learned pattern estimation method; and acquiring the global contrastive feature by adding the enhanced global maximum feature and the enhanced concatenated global feature.

    10. The person re-identification method according to claim 7, wherein the step of performing the learning includes the steps of: receiving a learning image labeled with an identifier; acquiring a final descriptor for the learning image; calculating triplet losses and cross-entropy losses from difference between the identifier labeled in the learning image and the final descriptor acquired from the learning image to acquire a total loss; and backpropagating the acquired total loss.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0022] FIG. 1 is a diagram for explaining a concept of a person re-identification technique extracting local features.

    [0023] FIG. 2 shows a schematic structure of a person re-identification device according to an embodiment of the present disclosure.

    [0024] FIG. 3 is a diagram for explaining a concept in which the person re-identification device according to the present embodiment shown in FIG. 2 acquires a final descriptor.

    [0025] FIG. 4 shows an example of a detailed configuration of the one-to-many relational reasoning unit of FIG. 2.

    [0026] FIG. 5 shows an example of a detailed configuration of the global contrastive pooling unit of FIG. 2.

    [0027] FIG. 6 shows a person re-identification method according to an embodiment of the present disclosure.

    DETAILED DESCRIPTION

    [0028] In order to fully understand the present disclosure, operational advantages of the present disclosure, and objects achieved by implementing the present disclosure, reference should be made to the accompanying drawings illustrating preferred embodiments of the present disclosure and to the contents described in the accompanying drawings.

    [0029] Hereinafter, the present disclosure will be described in detail by describing preferred embodiments of the present disclosure with reference to accompanying drawings. However, the present disclosure can be implemented in various different forms and is not limited to the embodiments described herein. For a clearer understanding of the present disclosure, parts that are not of great relevance to the present disclosure have been omitted from the drawings, and like reference numerals in the drawings are used to represent like elements throughout the specification.

    [0030] Throughout the specification, reference to a part “including” or “comprising” an element does not preclude the existence of one or more other elements and can mean other elements are further included, unless there is specific mention to the contrary. Also, terms such as “unit”, “device”, “module”, “block”, and the like described in the specification refer to units for processing at least one function or operation, which may be implemented by hardware, software, or a combination of hardware and software.

    [0031] FIG. 2 shows a schematic structure of a person re-identification device according to an embodiment of the present disclosure, and FIG. 3 is a diagram for explaining a concept in which the person re-identification device according to the present embodiment shown in FIG. 2 acquires a final descriptor. FIG. 4 and FIG. 5 show an example of a detailed configuration of the one-to-many relational reasoning unit and the global contrastive pooling unit of FIG. 2.

    [0032] Referring to FIG. 2, a person re-identification device according to the present embodiment may include an image acquiring unit 110, a feature extracting unit 120, a feature dividing unit 130, a one-to-many relational reasoning unit 140, a global contrastive pooling unit 150 and a person re-identification unit 160.

    [0033] The image acquiring unit 110 acquires a plurality of images including a person to be re-identified as shown in (a) of FIG. 3. The image acquiring unit 110 may acquire a learning image from a database (not shown) in which a plurality of images are stored or an image acquiring device such as a camera, or may also acquire a learning image by receiving through an external device and network.

    [0034] In addition, the image acquiring unit 110, when learning of the person re-identification device, may acquire a plurality of learning images in which identifiers of the included person are pre-labeled.

    [0035] The feature extracting unit 120 is implemented as an artificial neural network in which a pattern estimation method has been learned in advance, and extracts a feature of the image received from the image acquiring unit 110, thereby acquiring a plurality of feature maps.

    [0036] The feature extracting unit 120 may be learned together when learning of the person re-identification device, however since various artificial neural networks which acquire a feature map by extracting features from images have already been studied and disclosed, it may also acquire a feature map by using an artificial neural network previously learned and disclosed. Here, as an example, it is assumed that the feature extracting unit 120 uses ResNet-50, which is one of the artificial neural networks learned for image classification, as shown in (b) of FIG. 3.

    [0037] The feature extracting unit 120 may acquire C feature maps of H×W size by extracting features from the received image. That is, it may acquire a 3-dimensional feature vector of H×W×C size.

    [0038] The feature dividing unit 130 divides the 3-dimensional feature vector acquired by the feature extracting unit 120 into a pre-designated size unit, samples each of the divided plurality of feature vectors, and thus acquires a plurality of local feature vectors (p.sub.1˜p.sub.n).

    [0039] The feature dividing unit 130 may divide the 3-dimensional feature vector into various forms according to a pre-designated method, however, (c) of FIG. 3 illustrates, as an example, a case in which a 3-dimensional feature vector is divided into six according to a horizontal grid. The feature dividing unit 130 may sample in a global max pooling method, as shown in (d) of FIG. 3, for each of the six divided 3-dimensional vectors, and thus acquire six local feature vectors (p.sub.1˜p.sub.6) each having a size of 1×1×C.

    [0040] The plurality of local feature vectors (p.sub.1˜p.sub.n) acquired by the feature dividing unit 130 are transmitted to each of the one-to-many relational reasoning unit 140 and the global contrastive pooling unit 150.

    [0041] Here, the feature extracting unit 120 and the feature dividing unit 130 may be integrated into the feature extracting and dividing unit.

    [0042] The one-to-many relational reasoning unit 140, which is shown as (e) in FIG. 3, receives a plurality of local feature vectors (p.sub.1˜p.sub.n) from the feature dividing unit 130, and estimates the relationship between each of the received plurality of local feature vectors (p.sub.1˜p.sub.6) and the remaining local feature vectors according to a pre-learned pattern estimation method, thereby enhancing such that the estimated relationship is reflected in the plurality of local feature vectors (p.sub.1˜p.sub.n). The one-to-many relational reasoning unit 140 acquires a plurality of local relational features (q.sub.1˜q.sub.n) that are enhanced local feature vectors, as shown in (f) of FIG. 3.

    [0043] Here, the one-to-many relational reasoning unit 140 may acquire a plurality of local relational features (q.sub.1˜q.sub.n) each having a size of 1×1×c (wherein, c≤C) from a plurality of local feature vectors (p.sub.1˜p.sub.n) having a size of 1×1×C.

    [0044] Referring to FIG. 4, the one-to-many relational reasoning unit 140 may include a local enhanced feature extracting unit 141, a rest part average sampling unit 142, a rest part average feature extracting unit 143, an enhanced local feature concatenating unit 144, a concatenated local feature extracting unit 145 and a local relational feature acquiring unit 146.

    [0045] First, the local enhanced feature extracting unit 141 selects a plurality of local feature vectors (p.sub.1˜p.sub.n) in a pre-designated order, and extracts features of the selected local feature vectors (p.sub.1˜p.sub.n) according to a pre-learned pattern estimation method, thereby acquiring enhanced local features (p.sub.1˜p.sub.n). Here, each of the enhanced local features (p.sub.1˜p.sub.n) may have a size of 1×1×c.

    [0046] Although in FIG. 4 it is illustrated on the assumption that the local enhanced feature extracting unit 141 selects a local feature vector (p.sub.1) as an example, the local enhanced feature extracting unit 141 also selects the remaining unselected local feature vectors (p.sub.2˜p.sub.n) according to a pre-designated order to acquire the enhanced local features (p.sub.1˜p.sub.n).

    [0047] While the local enhanced feature extracting unit 141 acquires each of the enhanced local features (p.sub.1˜p.sub.n) by selecting one (p.sub.i) from among a plurality of local feature vectors (p.sub.1˜p.sub.n), the rest part average sampling unit 142 acquires a rest part average feature (r.sub.i) by performing average pooling on the remaining local feature vectors (p.sub.2˜p.sub.n) except for the local feature vector selected by the local enhanced feature extracting unit 141.

    [0048] That is, the rest part average sampling unit 142 acquires the rest part average feature (r.sub.i) according to Equation 1.

    [00001] r i = 1 n - 1 .Math. j i p j [ Equation 1 ]

    [0049] (wherein, n is the number of local feature vectors, i is the index of local feature vector, and j is the index of local feature vector selected by the local enhanced feature extracting unit 141.)

    [0050] The rest part average feature extracting unit 143 extracts a feature of the rest part average feature (r.sub.i) according to a pre-learned pattern estimation method, thereby acquiring a rest part enhanced average feature (r.sub.i). While the local enhanced feature extracting unit 141 acquires the enhanced local features (p.sub.1˜p.sub.n), the rest part average feature extracting unit 143 acquires corresponding rest part enhanced average feature (r.sup.i).

    [0051] The enhanced local feature concatenating unit 144 concatenates the enhanced local features (p.sub.i) acquired by the local enhanced feature extracting unit 141 and the rest part enhanced average feature (r.sub.i), thereby generating concatenated local features. The concatenated local feature extracting unit 145 receives the concatenated local features, and extracts features of the concatenated local features according to the pre-learned pattern estimation method, thereby acquiring enhanced concatenated local features.

    [0052] The local relational feature acquiring unit 146 acquires local relational features (q.sub.i) by adding the enhanced local features (p.sub.i) acquired by the local enhanced feature extracting unit 141 and the enhanced concatenated local features.

    [0053] That is, the one-to-many relational reasoning unit 140 concatenates a feature of each of a plurality of local feature vectors (p.sub.1˜p.sub.n) with the average feature of rest plurality of local feature vectors, thereby acquiring a plurality of local relational features (q.sub.1˜q.sub.n) including relationships between each local feature vectors (p.sub.1˜p.sub.n) and the rest local feature vectors.

    [0054] A method in which the one-to-many relational reasoning unit 140 acquires a plurality of local relational features (q.sub.1˜q.sub.n) including relationships between each of a plurality of local feature vectors (p.sub.1˜p.sub.n) and the rest local feature vectors can be expressed as Equation 2.


    q.sub.i=p.sub.i+R.sub.p(T(p.sub.i,r.sub.i))  [Equation 2]

    [0055] (wherein, T is a combining function representing the concatenation of features, and R.sub.p is a relational function mathematically expressing the concatenated local feature extracting unit 145 in which a pattern estimation method has been learned.)

    [0056] Since the one-to-many relational reasoning unit 140 basically acquires a plurality of local relational features (q.sub.1˜q.sub.n) based on a plurality of local feature vectors (p.sub.1˜p.sub.n), it is possible to robustly extract features of a person even when a part of a person's body is missing or occlusion occurs due to being blocked from view.

    [0057] In the one-to-many relational reasoning unit 140, each of the local enhanced feature extracting unit 141, the rest part average feature extracting unit 143 and the concatenated local feature extracting unit 145 may be implemented as a convolutional neural network, for example.

    [0058] Meanwhile, the global contrastive pooling unit 150, which is shown as (g) in FIG. 3, receives a plurality of local feature vectors (p.sub.1˜p.sub.n) from the feature dividing unit 130 likewise the one-to-many relational reasoning unit 140, and performs global contrastive pooling to express the difference between the max sampling result and the average sampling result of all the received plurality of local feature vectors (p.sub.1˜p.sub.n) according to the pre-learned pattern estimation method. By performing global contrastive pooling, the global contrastive pooling unit 150 acquires one global contrastive feature (q.sub.0), as shown in (h) of FIG. 3, from a plurality of local feature vectors (p.sub.1˜p.sub.n).

    [0059] Here, the global contrastive pooling unit 150 may acquire one global contrastive feature (q.sub.0) having a size of 1×1×c (wherein, c≤C) from a plurality of local feature vectors (p.sub.1˜p.sub.n) having a size of 1×1×C.

    [0060] Referring to FIG. 5, the global contrastive pooling unit 150 may include a global max sampling unit 151, a global average sampling unit 152, a contrastive feature acquiring unit 153, an enhanced maximum feature extracting unit 154, an enhanced contrastive feature extracting unit 155, an enhanced global feature concatenating unit 156, a concatenated global feature extracting unit 157 and a global contrastive feature acquiring unit 158.

    [0061] The global max sampling unit 151 acquires a global maximum feature (p.sub.max) by performing global max pooling on all of the plurality of local feature vectors (p.sub.1˜p.sub.n). Meanwhile, the global average sampling unit 152 acquires a global average feature (p.sub.avg) by performing global average pooling on all of the plurality of local feature vectors (p.sub.1˜p.sub.n).

    [0062] The contrastive feature acquiring unit 153 acquires a contrastive feature (p.sub.cont) by calculating the difference between the global maximum feature (p.sub.max) and the global average feature (p.sub.avg). That is, it acquires the contrastive feature (p.sub.cont) by calculating the difference between the maximum value and the average value of the plurality of local feature vectors (p.sub.1˜p.sub.n).

    [0063] The enhanced maximum feature extracting unit 154 receives the global maximum feature (p.sub.max), and extracts a feature according to the pre-learned pattern estimation method, thereby acquiring an enhanced global maximum feature (p.sub.max). The enhanced contrastive feature extracting unit 155 receives the contrastive feature (p.sub.cont), and extracts a feature according to the pre-learned pattern estimation method, thereby acquiring an enhanced contrastive feature (p.sub.cont).

    [0064] The enhanced global feature concatenating unit 156 generates a concatenated global feature by concatenating the enhanced global maximum feature (p.sub.max) and the enhanced contrastive feature (p.sub.cont), and the concatenated global feature extracting unit 157 acquires an enhanced concatenated global feature by extracting a feature of the concatenated global feature according to a pre-learned pattern estimation method.

    [0065] The global contrastive feature acquiring unit 158 acquires a global contrastive feature (q.sub.0) by adding the enhanced global maximum feature (p.sub.max) acquired by the enhanced maximum feature extracting unit 154 and the enhanced concatenated global feature acquired by the concatenated global feature extracting unit 157.

    [0066] A method, in which the global contrastive pooling unit 150 reflects the contrastive value representing the difference between the maximum value and the average value of the plurality of local feature vectors (p.sub.1˜p.sub.n) to the maximum value of the plurality of local feature vectors (p.sub.1˜p.sub.n) and thus acquires a global contrastive feature (q.sub.0), can be expressed as Equation 3.


    q.sub.0=p.sub.max+R.sub.g(T(p.sub.max,p.sub.cont))  [Equation 3]

    [0067] (wherein, T is a combining function representing the concatenation of features, and R.sub.g is a relational function mathematically expressing the concatenated global feature extracting unit 157 in which a pattern estimation method has been learned.)

    [0068] Regarding that the global contrastive pooling unit 150 acquires a global contrastive feature (q.sub.0) on the basis of the relationship between the maximum value and the average value of a plurality of local feature vectors (p.sub.1˜p.sub.n), when max pooling is performed on a plurality of local feature vectors (p.sub.1˜p.sub.n), it has the advantage of being able to extract the most essential feature from the entire image, while the variety of features that can be expressed is limited. On the other hand, when average pooling is performed on a plurality of local feature vectors (p.sub.1˜p.sub.n), the proportion of unnecessary information being included in the feature increases.

    [0069] Accordingly, the global contrastive pooling unit 150 according to the present embodiment applies contrastive pooling that adds the difference between max pooling and average pooling on a plurality of local feature vectors (p.sub.1˜p.sub.n) and the max pooling result, such that it is possible to increase the diversity of feature expression and at the same time prevent unnecessary information from being excessively included in the feature.

    [0070] In the global contrastive pooling unit 150, each of the enhanced maximum feature extracting unit 154, the enhanced contrastive feature extracting unit 155 and the concatenated global feature extracting unit 157 may be implemented as a convolutional neural network, for example.

    [0071] The person re-identification unit 160 receives a plurality of local relational features (q.sub.1˜q.sub.n) acquired by the one-to-many relational reasoning unit 140 and a global contrastive feature (q.sub.0) as a final descriptor, and re-identifies a person included in the image using the received final descriptor (q.sub.0˜q.sub.n).

    [0072] The person re-identification unit 160 may acquire and store in advance a reference descriptor that is a final descriptor (q.sub.0˜q.sub.n) for an image including a person to be searched, and then, when a final descriptor (q.sub.0˜q.sub.n) for a re-identification image, in which has to be determined whether or not a person to be searched is included, is acquired, may re-identify a person included in the re-identification image by analyzing the similarity between the final descriptor (q.sub.0˜q.sub.n) for the re-identification image and the reference presenter.

    [0073] For example, the person re-identification unit 160, if the degree of similarity between the final descriptor (q.sub.0˜q.sub.n) and the reference descriptor is equal to or higher than a pre-designated reference degree of similarity, may determine that a person to be searched is included in the re-identification image, and, if it is lower than the reference degree of similarity, may determine that a person to be searched is not included.

    [0074] Meanwhile, the person re-identification device according to the present embodiment may further include a learning unit 170. The learning unit 170 is a configuration for making the one-to-many relational reasoning unit 140 and the global contrastive pooling unit 150 learn, and may be omitted when learning is completed.

    [0075] As described above, when learning of the person re-identification device, a plurality of learning image labeled in advance with a person's identifier are applied.

    [0076] In this embodiment, the learning unit 170 may calculate the loss (L) as in Equation 4 based on the triplet loss (L.sub.triplet) and the cross-entropy loss (L.sub.ce), which are already known losses in the field of artificial neural networks.


    custom-character=custom-character.sub.triplet+λcustom-character.sub.ce  [Equation 4]

    [0077] (wherein, λ represents a loss weight.)

    [0078] The cross-entropy loss (L.sub.ce) in Equation 4 is defined by Equation 5.

    [00002] ce = - .Math. n = 1 N .Math. i y n log y ^ i n [ Equation 5 ]

    [0079] (wherein, N denotes the number of images in a mini-batch, and y.sup.n denotes an identifier labeled in the learning image. In addition, ŷ.sub.i.sup.n is an identifier predicted for the final descriptor (q.sub.i) and is defined by Equation 6.)

    [00003] y ^ i n = argmax c K exp ( ( w i C ) T q i ) .Math. k = 1 K exp ( ( w i k ) T q i ) [ Equation 6 ]

    [0080] (wherein, K is the number of identification labels, and w.sub.i.sup.k denotes the final descriptor (q.sub.i) and the classifier of the identification label (k).)

    [0081] Meanwhile, triplet loss (L.sub.triplet) is defined by Equation 7.

    [00004] triplet = .Math. k = 1 N K .Math. m = 1 N M [ α + max n = 1 .Math. M .Math. q k , m A - q k , n P .Math. 2 - min l = 1 .Math. K n = 1 .Math. N l k .Math. q k , m A - q l , n N .Math. 2 ] + [ Equation 7 ]

    [0082] (wherein, N.sub.K is the number of identifiers in a mini-batch, and N.sub.M is the number of images for each identifier (wherein, N=N.sub.KN.sub.M), α is a margin variable that controls the distance between a pair of positive number and negative number in the feature space, wherein, q.sup.A.sub.i,j, q.sup.P.sub.i,j, and q.sup.N.sub.i,j denote anchor, positive, and negative image person expression, respectively, and wherein, i, j denote an identifier and an image index.)

    [0083] When the loss (L) is calculated according to Equations 4 to 7, the learning unit 170 may backpropagate the calculated loss to the one-to-many relational reasoning unit 140 and the global contrastive pooling unit 150, thereby making them learn.

    [0084] FIG. 6 shows a person re-identification method according to an embodiment of the present disclosure.

    [0085] Referring to FIG. 2 to FIG. 5, the person re-identification method according to the present embodiment is described as follows. First, a plurality of images including a person to be re-identified are acquired (S11). However, in the case of a learning stage in which learning is performed, a plurality of learning images in which identifiers of an included person are pre-labeled are acquired.

    [0086] Then, a 3D feature vector is acquired by extracting a feature from each of the acquired images according to a pre-learned pattern estimation method (S12). When the 3D feature vector is acquired, a plurality of local feature vectors (p.sub.1˜p.sub.n) are acquired by dividing the 3D feature vector into a pre-designated size unit (S13).

    [0087] Thereafter, the relationship between each of the plurality of local feature vectors (p.sub.1˜p.sub.n) acquired according to the pre-learned method and the remaining local feature vectors is estimated, and a plurality of local relational features (q.sub.1˜q.sub.n) which are enhanced local feature vectors are acquired by reflecting the estimated relationship to each of the plurality of local feature vectors (p.sub.1˜p.sub.n) (S14).

    [0088] The plurality of local relational features (q.sub.1˜q.sub.n) can be acquired by concatenating enhanced local features (p.sub.1˜p.sub.n) acquired by sequentially extracting a feature for each of the plurality of local feature vectors (p.sub.1˜p.sub.n) and rest part enhanced average features (r.sub.i) acquired by extracting a feature for an average pooling result of the remaining local feature vectors from which a feature is not extracted, extracting a feature again, and then adding the corresponding enhanced local feature (p.sub.1˜p.sub.n).

    [0089] In addition, a global contrastive feature (q.sub.0) is acquired by performing global contrastive pooling in which the relationship between the maximum feature and the average feature of the entire plurality of local feature vectors (p.sub.1˜p.sub.n) acquired according to a pre-learned method is reflected back to the maximum feature (S15).

    [0090] The global contrastive feature (q.sub.0) can be acquired by performing max pooling and average pooling on all of the plurality of local feature vectors (p.sub.1˜p.sub.n), acquiring an enhanced contrastive feature (p.sub.cont) and an enhanced global maximum feature (p.sub.max) by extracting a feature of each of a global contrastive feature, which is the difference between the max pooling result and the average pooling result, and the max pooling result, extracting a feature from the result of concatenating the enhanced contrastive feature (p.sub.cont) and the enhanced global maximum feature (p.sub.max), and then add the enhanced global maximum feature (p.sub.max) again.

    [0091] Then, the acquired global contrastive feature (q.sub.0) and the plurality of local relational features (q.sub.1˜q.sub.n) are acquired as final descriptors (q.sub.0˜q.sub.n) (S16).

    [0092] When the final descriptor for the acquired image is acquired, it is determined whether or not it is a learning stage (S17). If it is not a learning stage, the similarity is analyzed by comparing the acquired final descriptors (q.sub.0˜q.sub.n) with the reference descriptor, which is the final descriptor (q.sub.0˜q.sub.n) acquired in advance from an image including a person to be searched (S18).

    [0093] Then, according to the similarity analysis result, it is determined whether or not the person to be searched is included in the acquired image to re-identify the person (S19).

    [0094] On the other hand, if it is determined that it is a learning stage, the loss (L) is calculated according to Equations 4 to 7 using the acquired final descriptor (q.sub.0˜q.sub.n) and the identifier labeled in the learning image (S20). Then, learning is performed by backpropagating the calculated loss (S21).

    [0095] A method according to the present disclosure can be implemented as a computer program stored in a medium for execution on a computer. Here, the computer-readable medium can be an arbitrary medium available for access by a computer, where examples can include all types of computer storage media. Examples of a computer storage medium can include volatile and non-volatile, detachable and non-detachable media implemented based on an arbitrary method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data, and can include ROM (read-only memory), RAM (random access memory), CD-ROM's, DVD-ROM's, magnetic tapes, floppy disks, optical data storage devices, etc.

    [0096] While the present disclosure is described with reference to embodiments illustrated in the drawings, these are provided as examples only, and the person having ordinary skill in the art would understand that many variations and other equivalent embodiments can be derived from the embodiments described herein.

    [0097] Therefore, the true technical scope of the present disclosure is to be defined by the technical spirit set forth in the appended scope of claims.