Structurally matching images by hashing gradient singularity descriptors
11640702 · 2023-05-02
Assignee
Inventors
Cpc classification
G06V30/424
PHYSICS
G06V10/7715
PHYSICS
G06V10/25
PHYSICS
G06V10/462
PHYSICS
G06F18/213
PHYSICS
G06V10/50
PHYSICS
G06V30/19127
PHYSICS
International classification
G06V10/25
PHYSICS
G06V30/424
PHYSICS
Abstract
The method of matching digital images of the same article in a data processor unit comprises the steps of: transforming each digital image of an article into a local divergence topographic map of the luminance gradient vector field; detecting singularities or extrema of local divergence in the luminance gradient vector field, such singularities corresponding to points of interest in said digital image; and, for each detected point of interest, encoding the values for the singularity of the gradient field that are located on a plurality of concentric rings centered on the point of interest so as to derive a digital data vector (210); and transforming said vector into a digital hash key (220) by means of a family of hash functions of the cosine Locality-Sensitive Hashing (LSH) type.
Claims
1. A method of sorting articles with a sorting conveyor, comprising obtaining first digital images of the articles with first cameras in an upstream portion of the sorting conveyor and obtaining second digital images of the articles with second cameras downstream from the first cameras, the second images constituting target images to be each matched with one of the first images by means of a data processor unit of a machine for processing the articles, said method comprising the following data processing steps: transforming each of the first and the second digital images into a local divergence topographic map of a luminance gradient vector field; from said topographic map, detecting singularities or extrema of local divergence in the luminance gradient vector field, such singularities corresponding to points of interest (POIs) in said each digital image; for each detected point of interest, encoding values for the corresponding singularity that are located on rings centered on the point of interest in the topographic map so as to derive a descriptor of singularity of the gradient field associated with said point of interest; and comparing said descriptors of singularity of one of the second digital images with said descriptors of singularity of one of the first digital images so as to assess graphical similarity between these two images in order to track and monitor the article corresponding to the second image throughout the sorting, so as to direct the article towards a sorting outlet; wherein the method further comprises the steps of: encoding the values for the singularity of the gradient field in the form of a digital data vector having a plurality of components that are associated with respective ones of a plurality of concentric rings centered on said point of interest, each component of the digital data vector that is associated with a ring being derived from computing the sum of the indices of singularity of the points of the ring in question; and transforming said digital data vector into at least one digital hash key by means of a family of hash functions of the cosine Locality-Sensitive Hashing (LSH) type, said hash key being representative of the singularity descriptor.
2. The method according to claim 1, wherein said transforming of the digital data vector comprises: multiplying the data vector by a hash matrix representing the families of hash functions of the cosine LSH type so as to produce a second digital data vector having positive or negative signed components; transforming each component of the second digital data vector into a binary value so as to produce a third vector having binary values; and computing a digital hash key value that corresponds to a conversion of the binary expression represented by the third vector.
3. The method according to claim 2, wherein the articles are postal articles, each of which bears delivery postal address information comprising symbols and characters of writing, and in that the concentric rings centered on the point of interest lie within a disk of singularity having a radius of 8 mm for an image resolution of 4 pixels/mm, and in that the vector of digital data associated with the rings centered on the point of interest is a vector having 32 components for 32 rings.
4. The method according to claim 3, wherein each function forming a hash family is a data vector having 32 components, each of which is chosen randomly and independently from among the values +1 and −1.
5. The method according to claim 4, wherein each hash family is a hash matrix formed by concatenating K hash functions, each of which is represented by the data vector.
6. The method according to claim 5, wherein K is chosen in the range 16 to 64.
7. The method according to claim 6, wherein L digital hash keys are computed for the descriptor of singularities on the basis of L families of hash functions, where L is chosen in the range 4 to 8.
8. The method according to claim 7, wherein a collision matrix is computed for the points of interest on the basis of the hash keys.
9. The method according to claim 8, wherein structural filtering of a search horizon is performed on the basis of said collision matrix.
10. The method according to claim 8, wherein structural clustering of the search horizon is performed on the basis of the collision matrix.
11. The method according to claim 10, wherein image recognition hypotheses are generated on the basis of hash tables.
12. A method of sorting postal articles comprising letter-type mails, parcels, or packets, wherein the method according claim 1 is used for matching a target image of a certain postal article with one from among a number N of candidate images associated with the number N of postal articles.
13. A postal sorting machine, comprising a monitoring and control unit specially arranged to implement the method according to claim 12.
14. The method according to claim 1, wherein the articles are postal articles, each of which bears delivery postal address information comprising symbols and characters of writing, and in that the concentric rings centered on the point of interest lie within a disk of singularity having a radius of 8 mm for an image resolution of 4 pixels/mm, and in that the vector of digital data associated with the rings centered on the point of interest is a vector having 32 components for 32 rings.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The present invention can be better understood and other advantages appear on reading the following detailed description of an implementation given by way of non-limiting example and with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DESCRIPTION OF AN IMPLEMENTATION OF THE METHOD OF THE INVENTION
(10) The Fast-DoGS method is thus based on approximating a measurement of correlation of local descriptors of Points of Interest (POIs) of images to be mutually compared.
(11) Since the local descriptors are centered and reduced DoGS signals, correlation between the DoGS is equivalent to computing a cosine similarity of the approximation and can therefore be performed, in accordance with the invention, by means of a hashing technique known as “Cosine LSH” or “Cosine Locality-Sensitive Hashing”.
(12) The approximation is performed in accordance with the invention by means of a cosine LSH hashing technique applied to indices of singularities of the gradient field in the vicinity of the point of interest and in particular on concentric rings centered on the point of interest insofar as the images include symbols and characters of writing.
(13) The matching method of the invention comprises implementing a plurality of steps shown in
(14) i) at 310, determining local descriptors characterizing the digital images, these local descriptors being DoGS signals in the Fast-DoGS method;
(15) ii) at 320, defining families of cosine LSH hash functions to be applied to the DoGS signals of the digital images;
(16) iii) at 330, computing hash values by applying a family of cosine LSH hash functions to the DoGS signals; and
(17) iv) at 340, matching digital images by means of LSH DoGS hash tables being generated.
(18) It should be noted that, in a process of sorting articles in a plurality of sorting passes, the steps of generating the hash tables are performed in a first sorting pass, while the image matching step iv is performed in a subsequent sorting pass. However, it is possible to generate the hash tables and to perform the matching on the fly without going beyond the ambit of the invention.
(19) On the left of
(20) Determining Descriptors DoGS of Singularities of the Luminance Gradient Vector Field
(21) In order to determine the DoGS signals that are the local descriptors of a digital image of an article of the invention, firstly, at 312, a digital map of the luminance gradient vector field of the image is extracted by local planar regression on, for example, portions of images having sides of 5 pixels, and then, at 314, the image of the singularities of said digital image, a portion 180 of which image of singularities is shown in
(22) At 316, the local extrema or topological singularities 182 of the image of the singularities shown in
(23) The local neighborhood of each POI is then described by means of a DoGS signal 190, which is represented, in accordance with the invention, by a data vector 192, each component of which is computed at 318 by summing the indices of singularity of the points lying on a ring 184 having a width of one pixel and centered on the POI in question, the entire set of the rings forming a disk of singularity having a given radius and centered on the POI.
(24) In
(25) Said sums are signed, i.e. they can take positive values (convergence of the vector field of the local gradient) as well as negative values (zones of divergence of the vector field of the local gradient), as a function of the singularity values associated with the pixels of the image of singularity.
(26) After determining the DoGS signals of each of the POIs of a digital image, the digital image is stored in a memory of the data processor unit at 319 in association with the set of DoGS signals that come from said image.
(27) For articles to be sorted that are of the postal article type, each of which is supposed to bear a delivery address block with address lines formed of handwritten or typed symbols and characters of writing, the delivery address block may serve as a distinguishing region for distinguishing the articles and the images of said articles. In accordance with the invention, it has been determined that an appropriate radius for the disk of singularity is 8 mm, with an image resolution of 4 pixels/mm, which is sufficient to detach characters of writing in the image.
(28) Therefore, in this implementation, the DoGS signals are constituted by data vectors of 32 components, each component corresponding to a sum of the singularities situated on a ring having a width of one pixel and centered on the POI, the radii of the rings being incremented successively to take values ranging from 1 pixel to 32 pixels, with one ring per radius.
(29) It is understood that a pixel having its geometrical center lying within one of the rings is considered to be part of that ring.
(30)
(31) More particularly, if it is considered that the disk of singularity of the gradient in the vicinity or neighborhood of said POI (
(32)
(33) Since these signals are centered and reduced by construction after the POIs have been extracted from the image, measuring a correlation between two DoGS amounts to computing the scalar product of two vectors of 32 components encoded on 8 bits (the signed integer values of a DoGS are amplitudes considerably less than 128 after application of a real accuracy factor ×10).
(34)
(35) Measuring similarity between centered and reduced DoGS signals thus amounts to measuring the “cosine similarity” formed by the DoGS vectors on IR.sup.32 as shown above.
(36) Definition of families of hash functions Below, two images are considered to be similar or not similar on the basis of an approximation of a measurement of similarity of the local descriptors associated with respective ones of the two images, i.e. on the basis of a measurement of similarity of the respective DoGS signals.
(37) As indicated above, a measurement of similarity of the DoGS signals characterizing the POIs is the normalized coefficient of correlation between the two signals.
(38) Since the signals are centered and reduced by construction after determining POIs of the image, measuring a correlation between two DoGS signals amounts to computing the scalar product of the two vectors respectively representing them.
(39) Measuring similarity between centered and reduced DoGS signals thus amounts to measuring the “cosine similarity”, and such a similarity measurement can advantageously be approximated by means of the “cosine Locality-Sensitive Hashing” method or “cosine LSH” method in which digital hash values that are each representative of a DoGS signal are computed.
(40) Thus, the invention takes advantage of the fact that two mutually identical or similar DoGS signals have a high probability of having the same digital hash value.
(41) In this implementation of the invention, L different hash values are computed for each DoGS signal by hashing the vector representing it by means of L different families of hash functions, each family being represented as a hash matrix 202 formed by concatenating K hash functions that are themselves each represented by a data vector 204 having the same number of components as the DoGS signals, i.e. 32 components in this implementation.
(42) Each of the functions forming the L families of hash functions is constructed by a value +1 or −1 being randomly and independently assigned to the 32 components of the vectors that represent them.
(43) As described below, approximating the measurement of similarity between two DoGS signals, and thus searching for similarity between two images, is based on comparisons between said hash values.
(44) It can be understood that increasing the value of K, where K>1, increases the accuracy by increasing the fineness of the hashing, and reduces the number of false positives, i.e. matching or pairing off of DoGS that are not similar, such matching increasing the risk of erroneous matching of two images.
(45) In accordance with the invention, K may advantageously lie in the range 16 to 64, and is preferably equal to 32, as in the present implementation.
(46) It can also be understood that increasing the value of L, where L>1, increases the recall, i.e. reduces the number of false negatives by increasing the number of hash values for the same DoGS signal, and thus the probability that at least one of said hash values makes it possible to find a similarity between two DoGS signals and thus, ultimately, to facilitate good matching of two images.
(47) In accordance with the invention, L may advantageously lie in the range 4 to 8, in the present implementation.
(48) The families of hash functions may be prepared in advance and be applied to any new set of new images or to any new current image.
(49) Computing the Hash Values
(50) To obtain the digital hash values of the DoGS signals, the data vectors 192 representative of the DoGS signals are hashed at 332 by means of the matrices 202 representing the families of hash functions by multiplying the vectors by the matrices.
(51) Each scalar product of a DoGS signal vector multiplied by a column 204 of a hash matrix results in a signed digital value 212.
(52) This signed digital value expresses the orientation of the DoGS signal vector relative to the hyperplane defined by the hash function represented by the column in question.
(53) Thus, the product of a DoGS signal vector multiplied by the entire set of the columns of a hash matrix results in a data vector 210 having the same number of components as there are columns in the DoGS signal vector matrix, i.e. K=32 in this implementation, the components having signed, positive or negative, digital values.
(54) In a second stage, the hash key 250 (which is a digital value or fingerprint) of the DoGS signal is obtained by assigning one of the binary values 0 and 1 to each component of said data vector 210, as a function of its sign, e.g. by assigning the value 0 to the negative components and the value 1 to the positive components, as illustrated by the data vector 220 having binary components that is shown in
(55)
(56) Matching of Digital Images
(57) By means of the above-described operations, each POI of a collection of images is associated with a hash key that is characteristic of the local environment of the POI.
(58) The hash values of the descriptors of the POIs are filed, at 242, in buckets of a database in a memory in the processor unit as a function of their values, each bucket receiving one or more images I.
(59) A comparison between the various DoGS vectors (each associated with a POI) is thus performed automatically: two DoGS signals are considered to be similar if they fall into the same bucket.
(60) In accordance with the invention, at 244, L hash tables 350 are advantageously constructed, one for each family of hash functions, by associating each bucket B with the images I that have their POIs associated with a digital hash value filed in the bucket in question following the hashing produced by the respective family.
(61)
(62) At 246, a target image is matched with a reference image on the basis of the number of their DoGS signals that are considered as being similar, which amounts to the number of POIs having digital fingerprints that are filed in the same buckets, i.e. on the basis of the number of POIs having local environments that have been assessed as being similar between the two images by the Fast-DoGS similarity approximation computation.
(63) The higher the number of hash values the target image and the reference image have in common, the more the two images are considered to be similar and can be matched if the level of similarity is deemed sufficient by the operative implementing the matching method.
(64) In other words, with the LSH of the descriptors of the POIs having concentric rings, it is possible to construct, in a memory, an ordered database (hash tables) from which it is possible to derive firstly an identification of the images having high graphical similarity, and secondly an identification of the points of interest having high capacity for distinguishing between the images.
(65) When the spectrum of the postal articles processed is uniform (high graphical and textual resemblance of the images), it is naturally possible to take into consideration criteria other than merely the number of POIs having the same hash values between the two images.
(66) For example, it is possible, for a collection of reference images, to establish a list of distinguishing POIs having respective hash values that do not correspond to the hash value of any other POI of a candidate image or to the hash values of at the most N POIs of candidate images, where N is, for example, equal to 1, and to test the similarity of a target image with the reference images by limiting the comparison to a comparison with the hash values of the distinguishing POIs rather than comparing with the entire set of POIs.
(67) In the spirit of the DoGS method, it is then possible to individualize a corresponding image in a set of candidate images that are graphically similar by omitting the common image background and by looking only at the POIs that constitute the potentially distinguishing zones.
(68) Regardless of the degree of sophistication of the matching algorithm used for exploiting the hash tables, matching based on the Fast-DoGS method implements only comparisons between digital values each representative of a point of the image that is defined as a point of interest (POI) during the method, such computations being compatible with real-time operating constraints.
(69) Such matching therefore makes it possible to track and to sort, in real time, a large number of distinct articles without it being necessary to affix identifiers such as bar codes to them.
(70) The Fast-DoGS matching method of the invention has a large number of applications.
(71) By way of example and as shown in
(72) These packets to be sorted may, for example, arrive at a postal sorting center with a view to them being sorted and delivered to destination addresses. They therefore bear delivery information such as addresses that include characters of text.
(73) The data processor unit 150 that may generally be considered to be the monitoring and control unit of the sorting machine 160 associates the first digital images with local descriptors and then with digital hash values filed in hash tables in accordance with the above-described method 300. The address information is recognized automatically by the unit 150 from the images, and the recognized addresses constituting sorting data may also be stored in a memory 140 in association with the reference images.
(74) In a second stage, downstream from the cameras C1, the packets are directed into sorting outlets indicated by arrows F1, where second images can be taken of them using cameras C2 for taking digital images.
(75) Thus, for each packet at the inlet of the conveyor 120, a second digital image of the packet is acquired and recorded in a memory of the unit 150, which second digital image constitutes the target image that is to be matched with one of the reference images, in order to track and monitor the packet throughout the logistics processing chain that processes the packet, so as to direct the packet towards a sorting outlet.
(76) In the same way as for the first digital images, the data processor unit 150 associates the second digital image with local descriptors 190 and then with digital hash values 250 filed in hash tables 350 in accordance with the Fast-DoGS method.
(77) In a variant, the packet in the sorting outlet may also return to the inlet of the sorting conveyor 20 as indicated by arrow F2 so that a second image of the packet is also acquired by the image-taking means C1.
(78) For the purpose of processing the packets, the first digital images, the second digital image of each packet, and the local descriptors and digital hash value or key are stored in a memory 140 of the data processor unit 150.
(79) As explained above, the unit 150 matches the second image of each packet with that one of the first images that has the most hash values filed in the same buckets as those in which the hash values of the second image are filed so as to retrieve, for example, the postal address information of the packet so that it is logistically processed in the same sorting machine 160.
(80) It can be understood that it is also possible for the process to lead to the second image being matched with a plurality of first reference images and, in such a situation, it is possible to hone the decision by conventional DoGS matching or pairing, or by some other matching or pairing, or indeed to use video-encoding with a human operative, e.g. by displaying the reference first images and the target image on a screen of the video-encoding operative in order to leave the human operative to monitor and control the last stage of the matching.
(81) The Fast-DoGS method of the invention is adapted to structurally matching images of postal articles, such as letters, flat postal articles, parcels, and packets, including “small import packets”.
(82) In the DoGS method, the rate of POIs matched with those of a target image is compared either with the total number POI.sub.tot of POIs in the target image, or with the subset POI.sub.dis of POIs considered as being distinguishing within the search context.
(83) The set POI.sub.tot is useful for “structurally” selecting the postal articles that share graphical and textual structures with the target image (as applies to the image background for the magazines shown in
(84) The subset POI.sub.dis is useful for individualizing the image corresponding to the image of the request article within the structural family.
(85) The status of “distinguishing POI” is thus relative to a research context. A POI of the target image is considered distinguishing if it has not been matched (by default) or if it has been matched to no more than N candidates (conventionally where N=1). Conversely, it is deduced that a POI being matched or paired off with a plurality of candidates is significant of that POI belonging to a local structure that is common to the entire family of articles and that is not distinguishing between a plurality of articles of that family.
(86) This differentiation between POI.sub.dis and POI.sub.tot may be taken into account in the Fast-DoGS method by analyzing the “POI collision” matrix illustrated by the tables below.
(87) Thus, after hashing of the DoGS of the P Points of Interest (POIs) selected from the target image according to the L LSH families considered when generating the L tables, it is possible, for each searched-for POI p, to take into account the number of candidate images colliding in a bucket, and to do so, with or without accumulation over the L hash tables following the algorithm below:
(88) .fwdarw. initialize the matrix M.sub.coll(p, i) with p=[1, P] and P i=[1, N] candidate images
(89) for each POI p of the request image .fwdarw. compute the L hashes h.sub.p.sup.1, h.sub.p.sup.2, . . . , h.sub.p.sup.L
for each table 1 .fwdarw. increment M.sub.coll[p, i]=M.sub.coll[p, i]+1 of each i⊂h.sub.p.sup.l
(90) As shown in
(91) A POI is set to the distinguishing status if it is not matched with more than one candidate of the context, matching with a candidate being effective if the number of collisions is greater than L/2 for the matrix with accumulation over L tables and if the number of collisions is non-zero for the matrix without accumulation over the L tables. The signal POI.sub.dis expresses the number of collisions, per candidate, of the distinguishing POIs of the request image.
(92) The Fast-DoGS method of the invention may advantageously apply to the structural filtering. In this application, a target image of a postal article and a set of image individuals that constitute the reference space are available. From this search context, it is sought to extract the minimum subset of individuals that have maximum structural similarity with the target individual.
(93) The signal POI.sub.tot coming from the matrix of collisions in context of the POIs of the target image is representative of the statistical accumulation of the luminance structures that are common among the candidates.
(94) Various conventional algorithms may be used to find the optimum cut-off threshold for the amplitudes of the signal, with effectiveness that depends on the spectrum of articles in question (small or large format flat articles, “small import packets”, packets, population that is very uniform or not very uniform).
(95) While the method of structurally filtering postal articles can make it possible to initiate a process of matching individuals over big image bases by structurally preselecting the candidates, it can, on a spectrum of postal articles that are highly non-uniform, directly individualize the target image in context. For example, it is possible to have a POI.sub.tot match of an article of the postcard type over a horizon of in the range several tens of thousands to several hundreds of thousands of letter-format flat articles.
(96) The Fast-DoGS method of the invention may also apply to structural clustering. In this application, a set of reference images is available that it is sought to distribute into groups of individuals sharing common graphical and textual characteristics. It is thus necessary to maximize the structural similarity between the individuals of the same group and to′ minimize the similarity between the groups. The constraints of the clustering depend on the criterion of uniformity of the partitions (groups of individuals) and on the application: “over-clustering” or “under-clustering” should be preferred.
(97) Like structural filtering, prior partitioning of the individuals forming the maximum search horizon may constitute the first stage of a matching problem for accelerating and optimizing the individualization second stage: preselecting the clusters that are closest to the target image individual by measuring similarity with the reference individuals of the groups.
(98) Regardless of the algorithm used for the clustering, it may be necessary to have a metric for comparison between the individuals (distance).
(99) The LSH of the DoGS of the selected POIs in each image individual makes it possible rapidly to construct the similarity matrix that expresses the structural proximity between each of the individuals in the reference space. The statistical collision of the POIs of the images by grouping together in buckets lends itself naturally to building such a matrix.
(100) The similarity values of the matrix may, for example, serve as a basis for an ascending hierarchical classification procedure that consists in progressively merging the clusters by cost of decreasing similarity (iterative cost of the partition) and makes it possible to obtain a dendrogram. The cut-off of the dendrogram may depend on the maximum number of partitions desired or on an intra-cluster uniformity threshold.
(101) The Fast-DoGS method of the invention may also advantageously apply to generating recognition hypotheses. In this application, a target image of a postal article is available as is a search horizon of N candidate images that is wide enough for it to compromise use of a conventional methodology: broad horizon to be processed in real time or big horizon of several tens of thousands of images to several millions of images:
(102) Assuming that the searched-for article has a corresponding image in the search horizon, the aim is to obtain the H minimum best recognition hypotheses for the target image including the correct solution.
(103) The H image recognition hypotheses (IRHs) can then enter an automatic decision processes (taking into account the absolute and relative reliability of the scored hypotheses or applying another distinguishing criterion) or a supervised decision process (validation by a human operative as applies to the “CAMS” computer-assisted manual sorting).
(104) The method used for generating IRHs combines all of the following processes: Generating the LSH DoGS hash tables; Extracting the POIs from N images and computing their DoGS descriptors; Setting the LSH hashing configuration: precision K-bits and number L of tables; Creating L families of cosine Locality-Sensitive Hashing (LSH) functions; Setting the mode of selection of the POIs (selective or non-selective); and LSH DoGS hashing and filling the L tables, as the hashing progresses or in deferred time; and Structural matching of the image; Extracting the POIs from the target image and computing their DoGS descriptors; Setting the mode of selection of the POIs (if the original extraction is selective or non-selective); DoGS hashing according to the L LSH families and constructing the collision matrix M.sub.coll Computing the POI.sub.tot signal representative of the global structural similarities of the N candidates; Conditional thresholding of the POI.sub.tot signal for structurally filtering the N′<N candidates; Identifying the distinguishing POIs in the context of the N or N′ candidates; Sorting the hypotheses according to the POI.sub.tot collisions with or without accumulation over the L tables; Sorting the hypotheses according to the POI.sub.dis collisions with or without accumulation over the L tables; and Generating the H best hypotheses by POI.sub.tot/POI.sub.dis combination with or without accumulation of the collisions over the hash tables.
(105) The POI.sub.tot filtering for preselecting the N′ candidates that are structurally close to the target image may be triggered in conditional manner by automatically analyzing the POI.sub.tot signal (curvature, modes of the histogram, etc.).
(106) As presented above, the matching performance of the Fast-DoGS method can benefit from an improvement of the recall, by overflow of the search for the DoGS hash h over the 32 hashes h′ that are such that the Hamming distance between h and h′ is equal to 1, without any significant risk of increasing the number of false positives if consideration is given to the accumulation of the collisions over the L hash tables.