METHOD AND SYSTEM FOR COMPARING VIDEO SHOTS
20170249516 · 2017-08-31
Assignee
Inventors
Cpc classification
International classification
Abstract
A method (100) for comparing a first video shot (Vs1) comprising a first set of first images (I1(s)) with a second video shot (Vs2) comprising a second set of second images (I2(t)), at least one between the first and the second set comprising at least two images. The method comprises pairing (110) each first image of the first set with each second image of the second set to form a plurality of images pairs (IP(m)), and, for each image pair, carrying out the operations a)-g): a) identifying (120) first interest points in the first image and second interest points in the second image; b) associating (120) first interest points with corresponding second interest points in order to form corresponding interest point matches; c) for each pair of first interest points, calculating (130) the distance therebetween for obtaining a corresponding first length; d) for each pair of second interest points, calculating (130) the distance therebetween for obtaining a corresponding second length; e) calculating a plurality of distance ratios (130), each distance ratio corresponding to a selected pair of interest point matches and being based on a ratio of a first term and a second term or on a ratio of the second term and the first term, said first term corresponding to the distance between the first interest points of said pair of interest point matches and said second term corresponding to the distance between the second interest points of said pair of interest point matches; f) computing (140) a first representation of the statistical distribution of the plurality of calculated distance ratios; g) computing (150) a second representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in the image pair are outliers. The method further comprises generating (160) a first global representation of the statistical distribution of the plurality of calculated distance ratios computed for all the image pairs based on the first representations of all the image pairs; generating (170) a second global representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in all the image pairs are outliers based on the second representations of all the image pairs; comparing (180) said first global representation with said second global representation, and assessing (190) whether the first video shot contains a view of an object depicted in the second video shot based on said comparison.
Claims
1. A method for comparing a first video shot comprising a first set of first images with a second video shot comprising a second set of second images, at least one between the first and the second set comprising at least two images, the method comprising: pairing each first image of the first set with each second image of the second set to form a plurality of images pairs; for each image pair, carrying out the operations a)-g): a) identifying first interest points in the first image and second interest points in the second image; b) associating first interest points with corresponding second interest points in order to form corresponding interest point matches; c) for each pair of first interest points, calculating the distance therebetween for obtaining a corresponding first length; d) for each pair of second interest points, calculating the distance therebetween for obtaining a corresponding second length; e) calculating a plurality of distance ratios, each distance ratio corresponding to a selected pair of interest point matches and being based on a ratio of a first term and a second term or on a ratio of the second term and the first term, said first term corresponding to the distance between the first interest points of said pair of interest point matches and said second term corresponding to the distance between the second interest points of said pair of interest point matches; f) computing a first representation of the statistical distribution of the plurality of calculated distance ratios; g) computing a second representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in the image pair are outliers; generating a first global representation of the statistical distribution of the plurality of calculated distance ratios computed for all the image pairs based on the first representations of all the image pairs; generating a second global representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in all the image pairs are outliers based on the second representations of all the image pairs; comparing said first global representation with said second global representation, and assessing whether the first video shot contains a view of an object depicted in the second video shot based on said comparison.
2. The method of claim 1, wherein the operation f) provides for arranging the plurality of distance ratios in a corresponding image pair histogram having a plurality of ordered bins, each one corresponding to a respective interval of distance ratio values, the image pair histogram enumerating for each bin a corresponding number of calculated distance ratios having values comprised within the respective interval.
3. The method of claim 2, wherein the operation g) provides for generating an image pair outlier probability mass function comprising for each of said bins the probability that, under the hypothesis that all the interest point matches are outliers, a distance ratio has a value that falls within said bin.
4. The method of claim 3, wherein the phase of generating a first global representation of the statistical distribution of the plurality of calculated distance ratios computed for all the image pairs based on the first representations of all the image pairs comprises generating a global histogram based on the image pair histograms, said global histogram being indicative of how the values of the distance ratios calculated for all the image pairs are distributed among the bins.
5. The method of claim 4, wherein the phase of generating a second global representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in all the image pairs are outliers based on the second representations of all the image pairs comprises generating a global outlier probability mass function by combining the image pair outlier probability mass functions.
6. The method of claim 5, wherein the phase of comparing said first global representation with said second global representation comprises comparing said global histogram with said global outlier probability mass function.
7. The method of claim 6, wherein the phase of generating the global histogram based on the image pair histograms comprises: for each bin of the plurality of ordered bins, summing the number of calculated distance ratios corresponding to that bin of all image pair histograms.
8. The method of claim 7, wherein the phase of generating the image pair outlier probability mass function comprises calculating a linear combination of the image pair outlier probability mass functions.
9. The method of claim 1, wherein said comparing said first global representation with said second global representation comprises performing a Pearson's test.
10. The method of claim 1, wherein said calculating the distance ratios provides for calculating the logarithm of the distance ratios.
11. A video shot comparing system comprising: a communication interface configured to receive a first video shot comprising a first set of first images and identify first interest points in the first images; a reference database storing a plurality of second video shot, each one comprising a respective second set of second images; and circuitry configured to associate for each second video shot, and for each image pair comprising a second image of said second video shot and a first image of the first video shot, first interest points in said first image to second interest points in said second image in order to form corresponding interest point matches; calculate, for each second video shots (Vs2) and for each image pair comprising a second image of said second video shot and a first image of the first video shot: for each pair of first interest points, the distance therebetween for obtaining a corresponding first length; for each pair of second interest points, the distance therebetween for obtaining a corresponding second length; a plurality of distance ratios, each distance ratio corresponding to a selected pair of interest point matches and being based on a ratio of a first term and a second term or on a ratio of the second term and the first term, said first term corresponding to the distance between the first interest points of said pair of interest point matches and said second term corresponding to the distance between the second interest points of said pair of interest point matches; a first representation of the statistical distribution of the plurality of calculated distance ratios; a second representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in the image pair are outliers; generate for each second video shot: a first global representation of the statistical distribution of the plurality of calculated distance ratios computed for all the image pairs comprising second images of said second video shot based on the first representations of all the image pairs comprising second images of said second video shot; a second global representation of the statistical distribution of distance ratios obtained under the hypothesis that all the interest point matches in all the image pairs comprising second images of said second video shot are outliers based on the second representations of all the image pairs comprising second images of said second video shot: and compare for each second video shot the corresponding first global representation with the corresponding second global representation, and to assess whether there is a second video shot containing a view of an object depicted in the first video shot based on said comparison.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0044] These and other features and advantages of the present invention will be made evident by the following description of some exemplary and non-limitative embodiments thereof, to be read in conjunction with the attached drawings, wherein:
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION
[0051]
[0052] The first phase of the method 100 (block 110 of
[0053]
[0054] The second phase of the method 100 (block 120 of
[0055]
[0056] The next phase of the method 100 (block 130 of
wherein x.sub.i represents the coordinates of a generic i-th interest point x.sub.i in the first image I1(s) of a generic image pair IP(m), y.sub.i represents the coordinates of i-th interest point y.sub.i in the second image I2(t) matched with the interest point x.sub.i in the first image I1(s) of the same image pair IP(m), x.sub.j represents the coordinates of a different generic j-th interest point x.sub.j in the first image I1(s) of the same image pair IP(m), and y.sub.j represents the coordinates of the j-th interest point y.sub.j in the second image I2(t) matched with the interest point x.sub.j in the first image I1(s) of the same image pair IP(m). The interest points must be distinct, i.e., x.sub.i≠x.sub.j, and y.sub.i≠y.sub.j, and the LDR is undefined for i=j. The LDR is a function of the length ratio, an invariant for similarities. Thanks to the presence of the logarithm operator, if the first image I1(s) of an image pair IP(m) is exchanged with the second image I2(t) of the same image pair IP(m), (x becomes y and vice versa), the LDR simply reverses sign. Given a set of L.sub.m matched interest points (x.sub.i, y.sub.i) for a generic image pair IP(m)—including L.sub.m interest points x.sub.i in the first image I1(s) of the pair and L.sub.m corresponding interest points y.sub.i in the second image I2(t) of the pair—, there exists a number
of distinct LDRs.
[0057] The next phase of the method 100 (block 140 of
[0058] Each image pair histogram g.sub.m shows how the values of the N.sub.m LDRs that have been calculated for the corresponding image pair IP(m) are distributed. The image pair histograms g.sub.m are expressed in form of frequency arrays:
wherein each LDR may take values comprised within K predefined ordered intervals T.sub.1, . . . , T.sub.k, . . . , T.sub.K—hereinafter referred to as bins—, and g.sub.m(k) is the number of LDRs (calculated for the image pair IP(m)) whose values fall within the k-th bin T.sub.k.
[0059] For each image pair histogram g.sub.m, the sum of histogram components g.sub.m(k) thereof is equal to the number N.sub.m of LDRs calculated for the corresponding image pair IP(m):
g.sub.m(1)+ . . . +g.sub.m(k)+ . . . +g.sub.m(K)=N.sub.m.
[0060] The total number N of LDRs calculated for all the image pairs IP(m) obtained from the two video shots Vs1 and Vs2 is equal to:
N=N.sub.1+ . . . +N.sub.m+ . . . +N.sub.M.
[0061] The next phase of the method 100 (block 150 of
wherein p.sub.m(k) is the probability that, under the hypothesis that all the interest point matches for the m-th image pair IP(m) are outliers, a LDR calculated using a pair of interest point matches {(x.sub.i, y.sub.i), (x.sub.i, y.sub.j)} from said image pair IP(m) has a value that falls within the k-th bin T.sub.k. The various image pair outlier probability mass functions p.sub.m may be calculated based on a discretization of an outlier probability density function whose closed form is:
wherein z is the LDR value, and d is the ratio between the standard deviations of the coordinates of the interest points in the images (see equation (6) of S. Lepsoy, G. Francini, G. Cordara, and P. P. de Gusmao, “Statistical modelling of outliers for fast visual search”, in IEEE International Conference on Multimedia and Expo (ICME), pages 1-6, IEEE, 2011). In other words, each image pair outlier probability mass function p.sub.m corresponding to an image pair IP(m) is the probability mass function of LDRs calculated using pairs of interest point matches {(x.sub.i, y.sub.i), (x.sub.i, y.sub.j)} obtained by selecting the interest points from said image pair IP(m) in a random way.
[0062] It has to be appreciated that the image pair outlier probability mass functions p.sub.m corresponding to two different image pairs IP(m) may be different to each other, being dependent on the actual arrangement of the interest points x.sub.i, y.sub.i in the two image pairs IP(m).
[0063] The phases of the method 100 described until now (blocks 110-150 of
[0064] The next phases of the method 100 (blocks 160-190 of
[0065] The first phase of the method 100 having said features (block 160) provides for generating a global representation of the statistical distribution of the LDR values computed for all the image pairs IP(m). According to an embodiment of the present invention, said global representation is a further histogram, herein referred to as global histogram g, which is indicative of how the values of the LDRs calculated for all the image pairs IP(m) are distributed among the K bins T.sub.1, . . . , T.sub.k, . . . , T.sub.K. The global histogram g is generated in the following way:
g=g.sub.1+ . . . +g.sub.m+ . . . +g.sub.M=[g(1), . . . , g(k), . . . , g(K)],
wherein:
g(k)=g.sub.1(k)+ . . . +g.sub.m(k)+ . . . +g.sub.M(k)
is the number of LDRs (by considering all the image pairs IP(m)) whose values fall within the k-th bin T.sub.k.
[0066] The next phase of the method (block 170) provides for generating a global representation of the statistical distribution of LDR values obtained under the hypothesis that all the interest point matches in all the image pairs IP(m) are outliers. According to an embodiment of the present invention, said global representation is a further probability mass function, herein referred to as global outlier probability mass function p, which is generated by means of a linear combination of the image pair outlier probability mass functions p.sub.m of all the image pairs IP(m):
p=[p(1), . . . , p(k), . . . p(K)],
wherein:
wherein p(k) is the probability that, under the hypothesis that all the interest point matches for all the image pairs IP(m) are outliers, a LDR calculated using a pair of interest point matches {(x.sub.i, y.sub.i), (x.sub.j, y.sub.j)} from a generic image pair IP(m) has a value that falls within the k-th bin T.sub.k.
[0067] In other words, the global outlier probability mass function p is the probability mass function of LDRs calculated using pairs of interest point matches {(x.sub.i, y.sub.i), (x.sub.j, y.sub.j)} obtained by selecting the interest points from any of the image pairs IP(m) in a random way.
[0068] The next phase of the method (block 180 of
[0069] Indeed, the components of the global histogram g that are due to wrong matches will have a shape similar to that of global outlier probability mass function p, while the components of the global histogram g that are due to correct matches will have a shape different from that of the global outlier probability mass function p.
[0070] The difference in shape between the global histogram g and the global outlier probability mass function p is estimated by means of the known Pearson's test disclosed at pages 402-403 of “An introduction to Mathematical Statistics and its Applications” by R. J. Larsen and M. L. Marx, New Jersey, Prentice-Hall, second edition, 1986.
[0071] The Pearson's test statistic c is computed in the following way:
[0072] The more the shape of the global histogram g is similar to that of the global outlier probability mass function p, the lower the value of the Pearson's test statistic c.
[0073] For this purpose, the next phase of the method 100 (block 190 of
[0074] If the Pearson's test statistic c is lower than the threshold TH (exit branch N of block 190), it means that the shape of the global histogram g is sufficiently similar to that of the global outlier probability mass function p to assume that the interest point matches among the M image pairs IP(m) are wrong (i.e., outliers). In this case, the video shots Vs1 and Vs2 are considered not to contain a view of a same object (block 195).
[0075] If the Pearson's test statistic c is higher than the threshold TH (exit branch Y of block 190), it means that the shape of the global histogram g is sufficiently different from the shape of the global outlier probability mass function p to assume that there are a sufficiently high number of interest point matches among the M image pairs IP(m) which are correct (i.e., inliers). In this case, the video shots Vs1 and Vs2 are considered to contain a view of a same object (block 197).
[0076] As it is well known to those skilled in the art, the value of the threshold TH to be exploited in the Pearson's test should be set based on the number of false positives which can be tolerated.
[0077] Compared with the known solutions, the proposed method is more robust, since it allows the identification of small and/or poorly detailed objects depicted in the images of the video shots. Indeed, even if only a small amount of interest points are selected that correspond to such small and/or poorly detailed objects, during the generation of the global histogram, the components corresponding to such few interest points are accumulated for each image pair, increasing their whole contribution. The capacity of assessing whether two video shots depict a same object or a same scene increases with the total number of interest point matches, such that video shots depicting a same object or a same scene are detected also when the number of inliers are few with respect to the total number of matched interest points.
[0078]
[0079] According to an embodiment of the present invention illustrated in
[0080] A user of a terminal 420 requesting information related to an object depicted in a video shot, sends said video shot (query video shot) to the visual search server 410 through the network 430.
[0081] The visual search server 410 includes a server interface 502 adapted to interact with the network 430 for receiving/transmitting data from/to the terminals 420. Through the server interface 502, the visual search server 410 receives the query video shot to be analyzed.
[0082] The query video shot is provided to an interest point detection unit 504 configured to identify the interest points within the images of the query video shot.
[0083] The visual search server 410 further includes a matching unit 508 coupled with a reference database 510 storing a plurality of pre-processed reference video shots. For each reference video shot, and for each image pair comprising an image of said reference video shot and an image of the query video shot, a matching is made among interest points of the two images of said image pair.
[0084] The visual search server 410 further comprises a first processing unit 512 configured to: [0085] calculate for each reference video shot and for each image pair involving an image of said reference video shot and an image of the query video shot the LDRs for each corresponding interest point match generated by the matching unit 508, [0086] arranging the LDRs of each image pair in a corresponding image pair histogram, and [0087] calculating for each image pair a corresponding image pair outlier probability mass function.
[0088] The visual search server 410 further comprises a second processing unit 514 configured to generate for each reference video shot: [0089] a global histogram (by using the image pair histograms corresponding to said reference video shot and said query video shot), and [0090] a global outlier probability mass function (by using the image pair outlier probability mass functions corresponding to said reference video shot and said query video shot).
[0091] The visual search server 410 further comprises a decisional unit 516 that is configured to assess whether there is a reference video shot containing a view of an object depicted in the query video shot. For this purpose, the decisional unit 516 is configured to make for each reference video shot a comparison between the corresponding global histogram and the global outlier probability mass function. The decisional unit 516 is further configured to provide the results to the terminal 420 through the network 430.
[0092] According to a further embodiment of the present invention illustrated in
[0093] The previous description presents and discusses in detail several embodiments of the present invention; nevertheless, several changes to the described embodiments, as well as different invention embodiments are possible, without departing from the scope defined by the appended claims.
[0094] For example, although in the present description reference has been made to the log distance ratio (LDR), similar considerations apply if the histograms are construed with a difference distance ratio, such as a plain distance ratio, without the logarithm; moreover, similar considerations apply if the histograms are construed with multiples and/or powers of the log distance ratio.
[0095] Moreover, the concepts of the present inventions can be applied even if the widths of the bins of the histograms are different to each other.