MULTI-TASK DEEP HASH LEARNING-BASED RETRIEVAL METHOD FOR MASSIVE LOGISTICS PRODUCT IMAGES

20220414144 · 2022-12-29

    Inventors

    Cpc classification

    International classification

    Abstract

    The present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images. According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representation. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.

    Claims

    1. A multi-task deep Hash learning-based retrieval method for massive logistics product images, comprising the following steps: a) conducting image preprocessing on an input logistics product image x.sub.i, and constructing a similarity matrix S among logistics product images according to a label of the image x.sub.i; b) conducting convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector h.sub.img of the image, and taking the one-dimensional feature vector h.sub.img as a low-level image feature; c) inputting the low-level image feature h.sub.img to a multi-branch network to obtain a high-level image representation B.sub.k indicated by Hash codes of a plurality of lengths, wherein the multi-branch network is composed of N branches of a same structure; d) calculating a similarity loss function SI.sub.Loss by formula SI Loss = Loss ( s ij , b i b j T ) = - 1 n .Math. n = 0 n = 1000 s ij b i b j T - log ( 1 + e b i b j T ) , wherein s.sub.ij denotes similarity between an ith image and a jth image, s.sub.ij∈{1,0}, the value of s.sub.ij being 1 indicates the i th image is similar to the jth image, the value of s.sub.ij being 0 indicates the ith image is not similar to the jth image, b.sub.i denotes a binary Hash code regarding data of the ith image, b.sub.j denotes a binary Hash code regarding data of the jth image, and T denotes transposition; e) calculating a mutual information loss function MI.sub.Loss by formula MI.sub.LOSS=Loss(B.sub.k, W.sub.k.sup.TB.sub.k+1)+γ.sub.k∥W.sub.k∥.sub.1 = .Math. k = 0 N - 1 a k .Math. B K - W K T B k + 1 .Math. 1 + .Math. K = 0 N - 1 γ k .Math. W k .Math. 1 , wherein B.sub.k denotes a Hash code output from a kth branch, k∈custom-character0, . . . , N−1custom-character, B.sub.k+1 denotes a Hash code output from a k+1th branch, W.sub.k denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γ.sub.k denotes a regularization parameter, ∥⋅∥.sub.1 denotes an L1 norm, and a.sub.k denotes an optimization parameter; f) optimizing the similarity loss function SI.sub.Loss and the mutual information loss function MI.sub.Loss using a stochastic gradient descent algorithm, and after optimization, repeating Step a) to Step e) at least M times to obtain a trained model; g) inputting image data in a database to the trained model in Step f) to obtain a binary Hash code representation B.sub.database of different lengths for each image; h) inputting an image to be retrieved img.sub.query to the trained model in Step f) to obtain a binary Hash code representation B.sub.query of the image to be retrieved img.sub.query; and i) calculating a Hamming distance Dist.sub.Hamming by formula Dist.sub.Hamming=∥B.sub.query ⊕B.sub.database∥, and returning, based on the calculated Hamming distance Dist.sub.Hamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.

    2. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.

    3. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.

    4. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein N in Step c) is a positive integer.

    5. The multi-task deep Hash learning-based retrieval method for massive logistics product images according to claim 1, wherein M in Step f) is 5000.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0022] FIG. 1 is a flowchart of a method for multi-task feature extraction according to the present disclosure; and

    [0023] FIG. 2 is a flowchart of a method for Hash code learning according to the present disclosure.

    DETAILED DESCRIPTION OF THE EMBODIMENTS

    [0024] The present disclosure is further described with reference to FIG. 1 and FIG. 2.

    [0025] A multi-task deep Hash learning-based retrieval method for massive logistics product images, including the following steps: a) Conduct image preprocessing on an input logistics product image x.sub.i, and construct a similarity matrix S among logistics product images according to a label of the image x.sub.i.

    [0026] b) Conduct convolution and pooling on the preprocessed logistics product image to obtain a one-dimensional feature vector h.sub.img of the image, and obtain the one-dimensional feature vector h.sub.img of the image as a low-level image feature by stacking a certain quantity of convolution kernels and pooling kernels, and processing image data.

    [0027] c) Adopt hard parameter sharing network: the low-level feature networks have the same structure and share parameters. High-level feature networks have the same structure, but the parameters of the branch network are differentiated according to the difference in high-level features generated. Input the low-level image feature h.sub.img to the multi-branch network to obtain a high-level image representation B.sub.k indicated by Hash codes of a plurality of lengths, where the multi-branch network is composed of N branches of a same structure.

    [0028] d) Calculate a similarity loss function SI.sub.Loss by formula

    [00003] SI Loss = Loss ( s ij , b i b j T ) = - 1 n .Math. n = 0 n = 1000 s ij b i b j T - log ( 1 + e b i b j T ) ,

    where S denotes similarity between an ith image and a jth image, s.sub.ij∈{1,0}, the value of S being 1 indicates the i th image is similar to the jth image, the value of s.sub.ij being 0 indicates the ith image is not similar to the jth image, b.sub.i denotes a binary Hash code regarding data of the ith image, b.sub.j denotes a binary Hash code regarding data of the jth image, and T denotes transposition. This formula is mainly to establish a relationship between Hash codes and similarity of the original samples. If the original samples are similar, the corresponding Hash codes should be as similar as possible; and if the original samples are not similar, the corresponding Hash codes should not be similar.

    [0029] e) Calculate a mutual information loss function MI.sub.Loss by formula MI.sub.LOSS=Loss(B.sub.k, W.sub.K.sup.TB.sub.k+1)+γ.sub.k∥W.sub.k∥.sub.1

    [00004] = .Math. k = 0 N - 1 a k .Math. B K - W K T B k + 1 .Math. 1 + .Math. K = 0 N - 1 γ k .Math. W k .Math. 1 ,

    where B.sub.k denotes a Hash code output from a kth branch, k∈custom-character0, . . . , N−1custom-character, B.sub.k+1 denotes a Hash code output from a k+1th branch, W.sub.k denotes a mapping matrix for mapping the Hash code output from the kth branch to the Hash code output from the k+1th branch, γ.sub.k denotes a regularization parameter, ∥⋅∥.sub.1 denotes an L1 norm, and a.sub.k denotes an optimization parameter. Generally speaking, the length of Hash codes is positively correlated with the representational capacity of Hash codes. The purpose of minimizing mutual information loss MI.sub.Loss is to draw the representational capacity of a shorter Hash code closer to a longer Hash code, and further enhance the correlation among a plurality of Hash codes, so that the Hash codes learned have good representational capacity, and the Hash code retrieval is improved.

    [0030] f) Optimize the similarity loss function SI.sub.Loss and the mutual information loss function MI.sub.Loss using a stochastic gradient descent algorithm, and after optimization, repeat Step a) to Step e) at least M times to obtain a trained model. g) Input image data in a database to the trained model in Step f) to obtain a binary Hash code representation B.sub.database of different lengths for each image. For example, there may be various combinations, such as [16 bits, 32 bits, 48 bits, 64 bits] or [128 bits, 256 bits, 512 bits].

    [0031] h) Input an image to be retrieved img.sub.query to the trained model in Step f) to obtain a binary Hash code representation B.sub.query of the image to be retrieved img.sub.query.

    [0032] Calculate a Hamming distance Dist.sub.Hamming by formula Dist.sub.Hamming=∥B.sub.query ⊕B.sub.database∥, and return, based on the calculated Hamming distance Dist.sub.Hamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.

    [0033] In the multi-task deep Hash learning-based retrieval method for massive logistics product images, the theory of multi-view learning is adopted to mine potential relevance of Hash codes of different lengths. Hash codes of a plurality of lengths are essentially various feature representations of original data in Hamming space. Associative learning of the Hash codes of a plurality of lengths involves the use of complementarity and correlation of features, and this process can also be regarded as multi-level feature fusion of unified samples. Related theories of multi-feature fusion and multi-view learning provide a theoretical and technical guarantee for the feasibility of this research method, which further improves the performance of Hashing retrieval.

    [0034] According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for masses of objects in the logistics industry.

    [0035] Table 1 provides a first simulation experiment result according to the method of the present disclosure, which is measured by MAP. Test results on NUS-WIDE data sets show that the performance of multi-tasking is better than that of single Hash code learning, which verifies the rationality of the idea of multi-tasking.

    TABLE-US-00001 TABLE 1 Method 24 bits 48 bits 64 bits 128 bits 256 bits DJMH-Single 0.73 0.78 0.79 0.827 0.833 DJMH-Multiple 0.801 0.827 0.831 0.846 0.855

    [0036] Table 2 provides a second simulation experiment result according to the method of the present disclosure, which is measured by MAP. NUS-WIDE data sets are further studied for the influence of the number of Hash codes of multiple lengths on a Hash code of any length, and it is verified that learning more Hash codes at the same time can also improve the retrieval performance of a Hash code of any length (take 24 bits as an example).

    TABLE-US-00002 TABLE 2 Method 24 bits 48 bits 64 bits 128 bits 256 bits DJMH-24, 48 0.755 0.777 DJMH-24, 48, 64 0.777 0.8 0.806 DJMH-24, 48, 64, 128 0.791 0.816 0.821 0.834 DJMH-24, 48, 64, 0.8 0.822 0.828 0.847 0.855 128, 256

    [0037] Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.

    [0038] Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.

    [0039] Preferably, N in Step c) is a positive integer.

    [0040] Preferably, M in Step f) is 5000.

    [0041] Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Although the present disclosure is described in detail with reference to the foregoing embodiments, a person skilled in the art can still make modifications to the technical solutions described in the foregoing embodiments, or make equivalent replacement of some technical features therein. Any modifications, equivalent substitutions, improvements, and the like made within the spirit and principle of the present disclosure should be included within the protection scope of the present disclosure.