Method and apparatus for providing signatures of audio/video signals and for making use thereof

Abstract

A method and apparatus are disclosed for providing a video signature representative of a content of a video signal. A method and apparatus are further disclosed for providing an audio signature representative of a content of an audio signal. A method and apparatus for detecting lip sync are further disclosed and take advantage of the method and apparatus disclosed for providing a video signature and an audio signature.

Claims

1. A method for setting a signal delay based on generated video signatures representative of a content of a video signal, the method comprising: for each of a first video signal and second video signal comprising the first signal after at least one transmission operation: selecting, by a signature extraction unit, a first subset of pixels of a first image of the video signal and a corresponding second subset of pixels of a second image of the video signal, each of the first subset and second subset excluding one or more pixels of the corresponding image, incrementing, by a comparator of the signature extraction unit for each pixel of the first subset of pixels, a counter value responsive to a difference between pixel data of a pixel of the first subset of pixels and pixel data of a corresponding pixel of the second subset of pixels exceeding a threshold, dividing, by the signature extraction unit, the counter value by a value proportional to the number of the plurality of pixels, and generating, by the signature extraction unit, a video signature comprising the divided counter value; identifying a delay between the first video signal and second video signal based on a comparison of the video signature of the first video signal and the video signature of the second video signal; and automatically setting a signal delay based on the identified delay.

2. The method of claim 1, wherein incrementing the counter further comprises selecting the first image and the second image from a window of the video signal.

3. A video signature analyzer for setting a signal delay based on generated video signature signals representative of a content of a video signal, comprising: a video signature extraction unit comprising: a counter, a windowing unit configured to receive a video signal and select a first image and a second image from a window of the video signal, a first image pixel data providing unit configured to select a subset of pixels of the first image, a second image pixel data providing unit configured to select a corresponding subset of pixels of the second image, a comparator configured to, for each pixel of the subset of pixels, increment a value of the counter responsive to a difference between pixel data of a pixel of the first image and pixel data of a corresponding pixel of the second image exceeding a threshold, and a signature providing unit configured to divide the counter value by a value proportional to the number of the plurality of pixels, and generate a signature comprising the divided counter value; and a video signature analysis unit configured to: receive, from the signature providing unit, a first signature representative of a first video signal and a second signature representative of the first video signal after at least one transmission operation, identify a delay between the first video signal and second video signal based on a comparison of the first signature and the second signature; and automatically set a video signal delay based on the identified delay.

4. A method for setting a signal delay based on generated audio signatures representative of a content of an audio signal, the method comprising: for each of a first audio signal and a second audio signal comprising the first audio signal after at least one transmission operation: filtering the audio signal via a first filter to generate a first filtered audio signal and via a second filter different from the first filter to generate a second filtered audio signal, assigning a plurality of filtered audio comparison values responsive to a corresponding plurality of comparisons of the first filtered audio signal to the second filtered audio signal, the filtered audio comparison values comprising a first predetermined value responsive to the first filtered audio signal exceeding the second filtered audio signal, and comprising a second predetermined value responsive to the second filtered audio signal equaling or exceeding the first filtered audio signal, and decimating the plurality of assigned filtered audio comparison values to generate an audio signature representative of the content of the audio signal; identifying a delay between the first audio signal and second audio signal based on a comparison of the audio signature of the first audio signal and the audio signature of the second audio signal; and automatically setting a signal delay based on the identified delay.

5. The method of claim 4, wherein filtering the audio signal to generate the first filtered audio signal comprises detecting an envelope signal of the audio signal.

6. The method of claim 4, wherein filtering the audio signal further comprises calculating an absolute value of the audio signal.

7. The method of claim 4, wherein decimating the plurality of assigned values further comprises removing a subset of the plurality of assigned values.

8. An audio signature analyzer for setting a signal delay based on generated audio signature signals representative of a content of an audio signal, the audio signature extraction unit comprising: a first filter configured to filter an audio signal to generate a first filtered audio signal; a second filter, different from the first filter, configured to filter the audio signal to generate a second filtered audio signal; a comparator configured to assign a plurality of filtered audio comparison values responsive to a corresponding plurality of comparisons of the first filtered audio signal to the second filtered audio signal, the filtered audio comparison values comprising a first predetermined value responsive to the first filtered audio signal exceeding the second filtered audio signal, and comprising a second predetermined value responsive to the second filtered audio signal equaling or exceeding the first filtered audio signal; a decimator configured to decimate the plurality of assigned filtered audio comparison values to generate an audio signature representative of the content of the audio signal; and a signature analysis unit configured to: receive, from the decimator, a first signature representative of a first audio signal and a second signature representative of the first audio signal after at least one transmission operation, identify a delay between the first audio signal and second audio signal based on a comparison of the first signature and the second signature; and automatically set an audio signal delay based on the identified delay.

9. The audio signature extraction unit of claim 8, wherein the first filter is further configured to detect an envelope signal of the audio signal.

10. The audio signature extraction unit of claim 8, further comprising a third filter configured to calculate an absolute value of the audio signal.

11. The audio signature extraction unit of claim 8, wherein the decimator is further configured to remove a subset of the plurality of assigned values.

12. A method for determining a delay in a signal induced through transmission operations, the method comprising: for each of a first video signal and a second video signal comprising the first video signal after at least one transmission operation: incrementing, by a comparator of one or more signature extraction units, a counter value responsive to a difference between pixel data of a first pixel of a plurality of pixels of a first image of said video signal and pixel data of a corresponding first pixel of a corresponding plurality of pixels of a second image of said video signal exceeding a threshold, dividing, by the one or more signature extraction units, the counter value by a value proportional to the number of the plurality of pixels, and generating, by the one or more signature extraction units, a video signature signal for said video signal comprising the divided counter value, the video signature representative of content of said video signal; performing, by a signature analysis unit, a convolution of the first video signature signal representative of content of the first video signal and the second video signature signal representative of content of the first video signal processed by one or more intermediary processors to generate a first convolution signal; calculating, by the signature analysis unit, a video matching factor inversely proportional to a minimum value of the first convolution signal divided by a sum of one of the first and second video signature signals; determining, by the signature analysis unit, that the first video signal and second video signature are correlated, responsive to the value of the video matching factor being above a first predetermined threshold; identifying, by the signature analysis unit, a video delay between the correlated first video signal and correlated second video signal, responsive to the determination, the video delay representative of a delay induced during processing of the first video signal by the one or more intermediary processors; and automatically setting a signal delay based on the identified video delay.

13. The method of claim 12, further comprising: for each of a first audio signal associated with the first video signal and a second audio signal associated with the second video signal comprising the first audio signal after the at least one transmission operation: assigning, by the one or more signature extraction units, a plurality of values responsive to a corresponding plurality of comparisons of a first filtering of said audio signal to a second filtering of said audio signal, the second filtering different from the first filtering, and decimating the plurality of assigned values to generate an audio signature signal representative of content of said audio signal; performing, by the signature analysis unit, a convolution of the first audio signature signal representative of content of the first audio signal and second audio signature signal representative of content of the second audio signal to generate a second convolution signal; calculating, by the signature analysis unit, an audio matching factor inversely proportional to a minimum value of the second convolution signal divided by a size of one of the first and second audio signature signals; determining, by the signature analysis unit, that the first audio signature signal and second audio signature signal are correlated, responsive to the value of the audio matching factor being above a second predetermined threshold; and identifying, by the signature analysis unit, an audio delay between the correlated first audio signal and correlated second audio signal, responsive to the determination, the audio delay representative of a delay induced during the at least one transmission operation.

14. The method of claim 13, further comprising: determining, by the signature analysis unit, that the identified video delay is different from the identified audio delay; and wherein automatically setting a signal delay based on the identified video delay further comprises automatically setting a signal delay based on a difference between the identified video delay and the identified audio delay.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) In order that the invention may be readily understood, embodiments of the invention are Illustrated by way of example in the accompanying drawings.

(2) FIG. 1 is a flowchart which shows an embodiment of a method for generating a signature representative of a content of a video signal;

(3) FIG. 2 is a flowchart which shows an embodiment of a method for generating a signature representative of a content of an audio signal;

(4) FIG. 3A is a block diagram which shows an embodiment of an apparatus for generating a video signature representative of a content of a video signal;

(5) FIG. 3B is a schematic which shows how the video signature representative of a content of a video signal is generated according to one embodiment;

(6) FIG. 3C is a block diagram which shows an embodiment of a video signature extraction unit;

(7) FIG. 4A is a block diagram which shows an embodiment of an apparatus for generating an audio signature representative of a content of an audio signal;

(8) FIG. 48 is a block diagram which shows an embodiment of an apparatus for generating an audio signature representative of a content of an audio signal; in this embodiment, the apparatus for generating an audio signature representative of a content of an audio signal comprises, inter alia, an envelope detector, a mean detector and a comparator;

(9) FIG. 5 is a block diagram of an embodiment of an apparatus comprising an audio signal signature extraction unit, a video signal signature extraction unit and a signature analysis unit for providing, inter alia, an indication of a lip sync;

(10) FIG. 8 is a block diagram which shows an embodiment of the signature analysis unit disclosed in FIG. 5;

(11) FIG. 7A is a schematic which shows an embodiment of two streams of video signature signals delayed in time;

(12) FIG. 7B is a block diagram which shows an embodiment of a video signature analysis unit used for analyzing two video signature signals;

(13) FIG. 7C is a graph which shows an embodiment of a variation of the result of the convolution of a first video signature signal with a second video signature signal;

(14) FIG. 8A is a schematic which shows an embodiment of two streams of audio signature signals delayed in time;

(15) FIG. 8B is a block diagram which shows an embodiment of an audio signature analysis unit used for analyzing two audio signature signals; and

(16) FIG. 8C is a graph which shows an embodiment of a variation of the result of the convolution of a first audio signature signal with a second audio signature signal.

(17) Further details of the invention and its advantages will be apparent from the detailed description included below.

DETAILED DESCRIPTION

(18) In the following description of the embodiments, references to the accompanying drawings are by way of illustration of an example by which the invention may be practiced. It will be understood that other embodiments may be made without departing from the scope of the invention disclosed.

(19) Now referring to FIG. 1, there is shown an embodiment of a method for generating a video signature representative of a content of a video signal. It will be appreciated that generating a video signature representative of a content of a video signal may be of great advantage for various reasons as further explained herein below.

(20) According to processing step 102, pixel data of each of a plurality of given pixels In a first image are received.

(21) It will be appreciated that in one embodiment, the image originates from a digital video stream. Alternatively, the image may originate from an analog video stream. In other embodiments, the image may originate from a file-based source.

(22) The skilled addressee will appreciate that the given pixels are pixels that may be selected in the image according to various criteria. For instance, the given pixels may be selected according to spatial criteria. For instance, It has been contemplated that some parts of an Image may be of less interest for the purpose of generating a signature.

(23) This may be the case for instance for parts of an image where no significant changes occur. Alternatively, it may be the case for parts of an image resulting from a format conversion. Alternatively, those parts of an image may be pads of images that have been modified for branding reasons.

(24) In one embodiment a windowing may be accordingly performed on a part of interest of an image in order to remove those parts of the image that have a limited interest.

(25) In a preferred embodiment, the given pixels are pixels each separated from one another by a given amount of pixels. The amount of given pixels does not change when the resolution of the video signal changes in the preferred embodiment. Alternatively, the amount of given pixels changes when the resolution of the video signal changes.

(26) According to processing step 104, pixel data of each of a plurality of given pixels in a second image are received.

(27) It will be appreciated that the second image is an image following in time the first image in the video signal.

(28) In a preferred embodiment, the second image is immediately following the first image in the video signal.

(29) According to processing step 106, a comparison is performed.

(30) In one embodiment, the comparison is performed for each given pixel of the plurality of given pixels.

(31) More precisely, corresponding pixel data of the given pixel of the first image is compared with the corresponding pixel data of the given pixel in the second image in order to provide a corresponding indication of a difference between the pixel data of the given pixels of the first image and the pixel data of the given pixel in the second image.

(32) In one embodiment, the comparison is a subtraction of the pixel data of the given pixel of the first image with corresponding pixel data of the given pixel of the second image. The comparison is performed for each given pixel of the plurality of given pixels.

(33) Alternatively the comparison may be a combination of operations involving the pixel data of each of the given pixels of the first image with corresponding pixel data of each of the given pixels of the second image.

(34) According to processing step 108, a counter value is incremented based on the result of the comparison.

(35) In one embodiment, the counter value is incremented in the case where the result of the operation is greater than a given threshold value.

(36) The skilled addressee will appreciate that various embodiments of the given threshold value may he provided. In a preferred embodiment, the threshold value is equal to thirty two (32) on an 8 bit precision video pixel.

(37) Moreover, it will be appreciated that the given threshold value may be provided according to various criteria such as a type of video signal.

(38) According to processing step 110, an indication of the counter value is provided. The indication of the counter value is used as the video signature representative of a content of the video signal.

(39) It will be appreciated that in one embodiment, the processing step of providing the indication of a counter value may comprise normalizing the counter value to provide a counter value limited by a given value.

(40) In a preferred embodiment, a normalizing of the counter value is performed by dividing the counter value by (N/15) where N is the number of given pixels.

(41) The skilled addressee will appreciate that the method disclosed for providing a video signature representative of a content of the video signal is of great advantage for various reasons.

(42) An advantage of the method disclosed is that such method may be used for providing information, i.e. a signature, for characterizing the content of the video signal. When signatures are used for characterizing each of two video signals, those signatures may be advantageously correlated and in turn be used for detecting a delay between the two video signals which is of great advantage.

(43) Another advantage of the method disclosed is that it requires limited processing resources for generating the signature which is therefore of great advantages. Such method does not require any complex algorithm for analyzing the content of the video signal.

(44) Another advantage of the method is that its implementation requires limited storing resources.

(45) Now referring to FIG. 2, there is shown an embodiment of a method for generating an audio signature representative of a content of an audio signal. If will be appreciated that generating an audio signature representative of a content of an audio signal may be of great advantage for various reasons as further explained below.

(46) According to processing step 202, an audio signal is received. It will be appreciated that the audio signal may be received from various sources. For instance, the audio signal may be receiving from an audio stream. Alternatively, the audio signal may be received from a file. It will further be appreciated that the audio signal may he imbedded in a video stream.

(47) Moreover, it will be appreciated that the audio signal may be provided in various forms such as in a digital format or in an analog format. Moreover it will he appreciated that the audio signal may be formatted according to various standards known to the skilled addressee.

(48) According to processing step 204, a first filtering of the received audio signal is performed.

(49) In one embodiment, the first filtering of the received audio signal comprises detecting an envelope of the audio signal.

(50) According to processing step 206, a second filtering of the received audio signal is performed.

(51) In one embodiment, the second filtering of the received audio signal comprises computing an average value of the received audio signal.

(52) It will be appreciated by the skilled addressee that processing steps 204 and 206 may be performed in parallel. Alternatively, processing steps 204 and 206 may be performed serially.

(53) According to processing step 208, the first filtered audio signal is compared to the second filtered audio signal.

(54) It will be appreciated that the comparison may be a combination of operations involving the first filtered audio signal and the second filtered audio signal.

(55) In a preferred embodiment, the comparison comprises checking If the first filtered audio signal is greater than the second filtered audio signal.

(56) According to processing step 210, a value is assigned depending on the result of the comparison. It will be appreciated that the value may be any type of value.

(57) In a preferred embodiment, the value is a binary value. Still in a preferred embodiment, binary value one (1) is assigned if the first filtered audio signal is greater than the second filtered audio signal while binary value zero (0) is assigned if the second filtered audio signal is greater or equal than the first filtered audio signal.

(58) According to processing step 212, the assigned value is provided. It will be appreciated that the assigned value. The assigned value is used as the audio signature representative of a content of the audio signal.

(59) It will be appreciated that in a preferred embodiment the providing of the assigned value may comprise a decimation processing step. The skilled addressee will appreciate that the purpose of the decimation processing step is to remove a given amount of unwanted/redundant data. The skilled addressee will also appreciate that this will further result in a signature having a shorter size which is also of great advantage.

(60) The skilled addressee will appreciate that the method disclosed for providing an audio signature representative of a content of the audio signal is of great advantage for various reasons.

(61) An advantage of the method disclosed is that such method may be used for providing information, i.e. a signature, for characterizing the content of the audio signal. When signatures are used for characterizing each of two audio signals, those two signatures may be advantageously correlated and in turn be used for detecting a delay between the two audio signals which is of great advantage.

(62) Another advantage of the method disclosed above is that they require limited processing resources for generating the signature which is of great advantage.

(63) A further advantage is that the implementation of the method disclosed may require limited memory resource which is also of great advantage.

(64) Now referring to FIG. 3a, there is shown an embodiment of an apparatus 300 for providing a video signature.

(65) The apparatus 300 for providing a video signature comprises an optional windowing unit 302 and a video signature extraction unit 304.

(66) More precisely, the optional windowing unit 302 is used for performing a windowing of an incoming image signal V.sub.i. The optional windowing unit 302 provides a corresponding selected signal W.sub.i. The skilled addressee will appreciate that the optional windowing unit 302 may be implemented according to various embodiments known to the skilled addressee.

(67) The apparatus 300 for providing a video signature further comprises the video signature extraction unit 304.

(68) The video signature extraction unit 304 is used for extracting a video signature from an incoming video signal. In this embodiment, the video signature extraction unit 304 receives the selected signal W.sub.i provided by the optional windowing unit 302 and provides a video signature signal VS.sub.i.

(69) Now referring to FIG. 3B, there is shown an embodiment which shows how the video signature extraction unit 304 operates according to an embodiment.

(70) In accordance with this embodiment, a corresponding pixel data P.sub.k of a pixel k of a plurality of given pixels 1 to N of a first image is compared with a corresponding pixel data C.sub.k of the corresponding given pixel k in a second image in order to provide a corresponding indication of a difference |P.sub.kC.sub.k| between the pixel data of the given pixel of the first image and the pixel data of the given pixel in the second image.

(71) An absolute value of the result |P.sub.kC.sub.k|, of the comparison is taken. In the case where the absolute value of the result is larger than a given threshold value, binary value one (1) is assigned. If this is not the case, binary value zero (0) is assigned. Such operation is performed for each pixel of the plurality of given pixels.

(72) Moreover, an optional normalization of the result may be performed. The video signature signal may be therefore defined as:

(73) ${VS}_{i} (f) = \frac{{.Math.}_{k = 1}^{N} (.Math. P_{k} - C_{k} .Math. .Math. \frac{1 if > 32}{0 otherwise})}{N / 15}$

(74) The skilled addressee will appreciate that various other alternative embodiments may be provided.

(75) Now referring to FIG. 3C, there is shown an embodiment of the video signature extraction unit 304.

(76) The video signature extraction unit 304 comprises a first image pixel data providing unit 306, a second image pixel data providing unit 308, a comparator 310, a counter 312 and a video signature providing unit 314.

(77) The first image pixel data providing unit 306 is used for receiving the selected signal W.sub.i provided by the optional windowing unit 302 and for providing a corresponding pixel data P.sub.k of a pixel k of a plurality of given pixels 1 to N of the first image.

(78) Similarly, the second image pixel data providing unit 308 is used for receiving the selected signal W.sub.i provided by the optional windowing unit 302 and for providing a corresponding pixel data C.sub.k of the corresponding given pixel k in the second image.

(79) The comparator 310 is used for receiving the corresponding pixel data P.sub.k of a pixel k of a plurality of given pixels 1 to N of the first image and the corresponding pixel data C.sub.k of the corresponding given pixel k in the second image and for comparing the corresponding pixel data P.sub.k with the corresponding pixel data C.sub.k.

(80) In one embodiment the comparator 310 outputs a logic value one (1) if |P.sub.kC.sub.k| is greater than thirty two (32) and a logic value zero (0) otherwise.

(81) The counter 312 receives the output from the comparator 310 and provides a signal indicative of a number of logic value ones received by the counter 312.

(82) The video signature providing unit 314 receives the signal indicative of a number of logic value ones received by the counter 312 and provides the video signature signal VS.sub.i. In one embodiment, the video signature providing unit 314 performs a division of the signal indicative of a number of logic value ones received by the counter 312 by N/15 wherein N is the number of given pixels.

(83) Now referring to FIG. 4A, there is shown an embodiment of an apparatus 402 for generating an audio signature representative of a content of an audio signal.

(84) In this embodiments the apparatus 402 for generating an audio signature representative of a content of an audio signal receives an audio signal A.sub.i and provides a corresponding audio signature signal AS.sub.i.

(85) It will be appreciated that the audio signal A.sub.i may be of various forms such as in a digital format or in an analog format. Moreover if will be appreciated that the audio signal may be formatted according to various standards as already explained above.

(86) Similarly, it will be appreciated that the corresponding audio signature signal AS.sub.i may be of various forms such as in a digital format or in an analog format. Moreover it will be appreciated that the corresponding audio signature signal AS.sub.i may be formatted according to various standards as explained above.

(87) Now referring to FIG. 4B, there is shown an embodiment of the apparatus 402 for generating an audio signature representative of a content of an audio signal.

(88) In this embodiment, the apparatus 402 for generating an audio signature representative of a content of an audio signal comprises an absolute value providing unit 404, an envelope detector 406, a mean detector 408, a comparator 410 and a decimator 412.

(89) The absolute value providing unit 404 is used to provide a signal indicative of an absolute value of the audio signal A.sub.i.

(90) The envelope detector 408 is used to provide a signal E.sub.s indicative of an envelope of the signal indicative of an absolute value of the value of the audio signal A.sub.i. It will be appreciated that the envelope detector 406 is an embodiment of a first filtering unit.

(91) Moreover, it will be appreciated that the envelope detector 406 may be implemented in various ways as known by the skilled addressee.

(92) In a preferred embodiment, the envelope detector 406 is implemented using a one-tap Infinite Impulse Response (IIR) filter.

(93) The mean detector 408 is used to provide a signal M.sub.s indicative of a mean of the signal indicative of an absolute value of the value of the audio signal A.sub.i. It will be appreciated that the mean detector 408 is an embodiment of a second filtering unit.

(94) Moreover, if will be appreciated that the mean detector 408 may be implemented in various ways as known by the skilled addressee.

(95) In a preferred embodiment, the mean detector 408 is implemented using a one-tap Infinite Impulse Response (IIR) filter.

(96) The comparator 410 is used for making a comparison between two incoming signals. More precisely, the comparator 410 receives the signal M.sub.s indicative of a mean of the signal indicative of an absolute value of the value of the audio signal A.sub.i and the signal E.sub.s indicative of an envelope of the signal indicative of an absolute value of the value of the audio signal A.sub.i and performs a comparison between those two signals.

(97) If the signal E.sub.s indicative of an envelope of the signal indicative of an absolute value of the value of the audio signal A.sub.i is greater than the signal M.sub.s indicative of a mean of the signal indicative of an absolute value of the value of the audio signal A.sub.i, a one (1) logic value signal is outputted by the comparator 410. Otherwise a zero (0) logic value signal is outputted by the comparator 410.

(98) The skilled addressee will appreciate that various other types of comparison may he made.

(99) Moreover, it will be appreciated that the comparator 410 may be implemented in various ways.

(100) The declinator 412 is used for performing decimation on a signal provided by the comparator 410. The decimator 412 provides the audio signature signal AS.sub.i.

(101) It will be appreciated that the decimator 412 is optional.

(102) Moreover, it will be appreciated that the decimator 412 may be implemented in various ways as known by the skilled addressee. In one embodiment, a decimation by fifty two (52) is performed by the decimator 412.

(103) In a preferred embodiment, the decimator 412 takes a sample and ignores fifty one (51) following samples. Alternatively, the sample is taken every one (1) ms. Other methods may alternatively be used. For instance, those methods may take into consideration the value of each sample for selecting a given sample.

(104) How referring to FIG. 5, there is shown an embodiment of an apparatus 500 which uses an audio signal signature extraction unit and a video signal signature extraction unit for providing, inter alia, an indication of a lip sync.

(105) In this embodiment, a first audio/video content is to be compared to a second audio/video content.

(106) More precisely, the first audio/video content comprises a first audio signal and a first video signal while the second audio/video content comprises a second audio signal an a second video signal.

(107) As shown below, by carefully comparing the signatures of the first audio/video content with the signatures of the second audio/video content it is possible to provide an indication of a lip sync, i.e. to providing an indication of a desynchronization that occurred between the video signal and its corresponding audio signal. As known by the skilled addressee, such desynchronization may occur for various reasons such as for instance for reasons associated with the transmission of the signals over a medium.

(108) More precisely and as shown in FIG. 5, the apparatus 500 comprises a first signature extraction unit 502, a second signature extraction unit 504 and a signature analysis unit 506.

(109) The first signature extraction unit 502 is used for providing a first video signature signal VS.sub.1 and a first audio signature signal AS.sub.1 of respectively a first video signal V.sub.1 and a first audio signal A.sub.1 associated with the first video signal V.sub.1.

(110) The first signature extraction unit 502 comprises, as not shown in FIG. 5, an audio signature extraction unit and a video signature extraction unit each responsible for respectively receiving the first audio signal A.sub.1 and providing the first audio signature signal AS.sub.1 and receiving the first video signal V.sub.1 and providing the first video signature signal VS.sub.1. Such video signature extraction unit and audio signature extraction unit have been already described above.

(111) The second signature extraction unit 504 is used for providing a second video signature signal VS.sub.2 and a second audio signature signal AS.sub.2 of respectively a second video signal V.sub.2 and a second audio signal A.sub.2 associated with the second video signal V.sub.2.

(112) The second signature extraction unit 504 comprises, as not shown in FIG. 5, an audio signature extraction unit and a video signature extraction unit each responsible for respectively receiving the second audio signal A.sub.2 and providing the second audio signature signal AS.sub.2 and receiving the second video signal V.sub.2 and providing the second video signature signal VS.sub.2. Such video signature extraction unit and audio signature extraction unit have also been already described above.

(113) The signature analysis unit 506 is used for receiving the first video signature signal VS.sub.1, the first audio signature signal AS.sub.1, the second video signature signal VS.sub.2 and the second audio signature signal AS.sub.2.

(114) The signature analysis unit 506 provides a signal VD indicative of a video delay, a signal AD indicative of an audio delay, a signal LS Indicative of a lip sync, a signal VMF indicative of a video matching factor and a signal AMF indicative of an audio matching factor.

(115) The signal VD indicative of a video delay is generated by comparing the first video signature signal VS.sub.1 with the second video signature signal VS.sub.2 and by determining a delay between those two signals as shown further below.

(116) Similarly, the signal AD indicative of an audio delay is generated by comparing the first audio signature signal AS.sub.1 with the second audio signature signal AS.sub.2 and by determining a delay between those two signals as shown further below.

(117) The signal VMF indicative of a video matching factor and the signal AMF indicative of an audio matching factor are generated as further explained below.

(118) Finally and in one embodiment, the signal LS indicative of a lip sync is generated by comparing the signal VD indicative of a video delay with the signal AD indicative of an audio delay.

(119) In the implementation described herein the signal VMF indicative of a video matching factor and the signal AMF indicative of an audio matching factor are further used for generating the signal LS indicative of a lip sync.

(120) While the embodiment disclosed in FIG. 5 has been shown to be used for lip sync defection, the skilled addressee will appreciate that such embodiment may be advantageously used for content comparison for instance.

(121) Alternatively, the embodiment may be used for detecting pirated copies. In an alternative embodiment, the method disclosed may be used for checking that a given ads has been properly inserted in a video signal. This may be done by comparing the signatures of the audio/video signal carrying the ads and the signatures of the ads per se.

(122) Now referring to FIG. 6, there is shown an embodiment of the signature analysis unit 506 disclosed in FIG. 5.

(123) The signature analysis unit 506 comprises a video signature analysis unit 602, an audio signature analysis unit 604 and a video signature correlation analysis unit 606.

(124) The video signature analysis unit 602 is used for determining an estimation VD of the signal indicative of a video delay between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2.

(125) The video signature analysis unit 602 is further used for determining an estimation VMF of a signal indicative of a video matching factor.

(126) Similarly, the audio signature analysis unit 604 is used for determining an estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2.

(127) The audio signature analysis unit 604 is further used for determining an estimation AMF of a signal indicative of an audio matching factor.

(128) The video signature correlation analysis unit 606 receives the estimation VD of the signal indicative of a video delay between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2, the estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2, the estimation VMF of a signal indicative of a video matching factor and the estimation AMF of a signal indicative of an audio matching factor.

(129) The video signature correlation analysis unit 606 provides the signal LS indicative of a lip sync, the signal VD indicative of a video delay, the signal AD indicative of an audio delay, the signal VMF indicative of a video matching factor and the signal AMF indicative of an audio matching factor.

(130) In one embodiment, the signal LS indicative of a lip sync is defined as LS=(VDAD) if VMFthreshold1 & AMFthreshold2, wherein threshold1 and threshold2 are given values. In one embodiment, the threshold1 is equal to 50% while the threshold2 is equal to 40%. The skilled addressee will appreciate that various other values may be used.

(131) The skilled addressee will appreciate that the signal VD indicative of a video delay and the signal AD indicative of an audio delay are generated using respectively at least the estimation VD of the signal indicative of a video delay between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2 and the estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2. In fact, the skilled addressee will appreciate that estimations are first computed prior to provide the values for the sake validating the values.

(132) Now referring to FIG. 7A, there are shown a graph 702 showing an example of the first video signature signal VS.sub.1 and a graph 704 showing an example of the second video signature signal VS.sub.2.

(133) It will be appreciated that the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2 are desynchronized in time by an amount of time corresponding to the estimation VD of the signal indicative of a video delay between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2. Still in this embodiment, the first video signature signal VS.sub.1 is delayed in time compared to the second video signature signal VS.sub.2.

(134) Now referring to FIG. 7B, there is shown an embodiment of the video signature analysis unit 602.

(135) The video signature analysis unit 602 comprises a convolution unit 706 and a minimum defection and delay extraction unit 708.

(136) The convolution unit 706 is used for performing a convolution of the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2. In one embodiment the convolution is a time shifted convolution of the second video signature signal VS.sub.2 window across the first video signature signal VS.sub.1 window.

(137) In one embodiment the second video signature signal VS.sub.2 window has a value of twenty (20) seconds while the first video signature signal VS.sub.1 window has a value of thirty (30) seconds.

(138) The convolution unit 706 provides a convolution signal E.sub.V(t).

(139) It will be appreciated by the skilled addressee that the convolution may be alternatively performed in various other ways. Moreover it will be appreciated that the convolution unit 706 may implemented in various way.

(140) The minimum defection and delay extraction unit 708 is used for detecting a minimum in the convolution signal E.sub.V(t). It will be appreciated that the minimum in the convolution signal E.sub.V(t) is indicative of the estimation VD of the signal indicative of a video delay between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2.

(141) The minimum defection and delay extraction unit 708 is further used for providing the estimation VMF of a signal indicative of a video matching factor.

(142) It will be appreciated that the signal VMF indicative of a video matching factor is indicative of a level of similarity measured in the video signatures.

(143) It will be appreciated that the signal estimation VMF of a signal indicative of a video matching factor is a function of the convolution signal E.sub.V(t) and the estimation VD of the signal indicative of a video delay between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2.

(144) In a preferred embodiment, the signal estimation VMF of a signal indicative of a video matching factor is defined as

(145) VMF=100*(1(Min(E.sub.V(t))/Sum(VS.sub.2)), wherein VMF is the estimated video matching factor between the first video signature signal VS.sub.1 and the second video signature signal VS.sub.2 and is expressed in percentage, Min(E.sub.V(t) is the minimum error found in the video correlation graph disclosed at FIG. 7c., and Sum(VS.sub.2) is the sum of the signature vector in the window W.sub.V2. It will be appreciated that in that embodiment, when V.sub.1 and V.sub.2 are exactly the same, VMF is equal to 100% while if one of the two video sources is altered, VMF will be reduced. In this embodiment, under 50%, the video sources are considered to be different.

(146) If will be appreciated that the minimum detection and delay extraction unit 708 may be implemented in various ways.

(147) Now referring to FIG. 8, there are shown a graph 800 showing an example of the first audio signature signal AS.sub.1 and a graph 802 showing an example of the second audio signature signal AS.sub.2.

(148) It will be appreciated that the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2 are desynchronized in time by an amount of time corresponding to the estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2. Still in this embodiment, the first audio signature signal AS.sub.1 is delayed in time compared to the second audio signature signal AS.sub.2.

(149) Now referring to FIG. 8B, there is shown an embodiment of audio signature analysis unit 604 for determining the estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2 and the estimation AMF of a signal indicative of an audio matching factor.

(150) The audio signature analysis unit 604 comprises a convolution unit 804 and a minimum detection and delay extraction unit 806.

(151) The convolution unit 804 is used for performing a convolution of the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2. In one embodiment the convolution is a time shifted convolution of the second audio signature signal AS.sub.2 window across the first audio signature signal AS.sub.1 window.

(152) In one embodiment, the second audio signature signal AS.sub.2 window has a value of one (1) second while the first audio signature signal AS.sub.1 window has a value of ten (10) seconds. It will be appreciated that those values may be changed depending on various needs.

(153) The convolution unit 804 provides a convolution signal E.sub.A(t).

(154) It will be appreciated by the skilled addressee that the convolution may be alternatively performed in various other ways. Moreover if will be appreciated that the convolution unit 804 may implemented in various ways.

(155) The minimum detection and delay extraction unit 806 is used for detecting a minimum in the convolution signal E.sub.A(t). It will be appreciated that the minimum in the convolution signal E.sub.A(t) is indicative of the estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2.

(156) The minimum detection and delay extraction unit 806 is further used for providing the signal estimation AMF of a signal indicative of an audio matching factor.

(157) It will be appreciated that the signal estimation AMF of a signal indicative of an audio matching factor is indicative of a level of similarity measured in the audio signatures.

(158) It will be appreciated that the signal estimation AMF of a signal indicative of an audio matching factor is a function of the convolution signal E.sub.A(t) and the estimation AD of the signal indicative of an audio delay between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2.

(159) In a preferred embodiment, the signal estimation AMF of a signal Indicative of an audio matching factor is defined as:

(160) AMF=100*(1Min((E.sub.A(t))/(Size(AS.sub.2)/5)), wherein AMF is the estimated audio matching factor between the first audio signature signal AS.sub.1 and the second audio signature signal AS.sub.2 expressed in percentage, Min((E.sub.A(t) is the minimum error found in the audio correlation graph shown in FIG. 8c and Size(AS.sub.2) is the size of the audio signature vector in the window W.sub.A2. It will he appreciated that when A.sub.1 and A.sub.2 are exactly the same, the signal estimation AMF of a signal indicative of an audio matching factor is equal to 100%. If one of the two audio sources was altered, the signal estimation AMF of a signal indicative of an audio matching factor will be reduced. Still in this embodiment, when the signal estimation AMF of a signal indicative of an audio matching factor is under 40%, the audio sources are considered to be different.

(161) It Will be appreciated that the minimum detection and delay extraction unit 806 may be implemented in various ways. For instance, the minimum detection and delay extraction unit 806 may first perform a pre-scan in order to select a first selection of convoluted samples. In one embodiment the first selection represents one-eight () of all the convoluted samples. A search may then be performed in the selected convoluted samples.

(162) Although the above description relates to a specific preferred embodiment as presently contemplated by the inventor, it will he understood that the invention in its broad aspect includes mechanical and functional equivalents of the elements described herein.

Method and apparatus for providing signatures of audio/video signals and for making use thereof

Assignee

Inventors

Cpc classification

Classification Explorer

H04N5/04

ELECTRICITY

Classification Explorer

H04N17/00

ELECTRICITY

International classification

Classification Explorer

H04N9/475

ELECTRICITY

Classification Explorer

H04N5/04

ELECTRICITY

Classification Explorer

H04N17/00

ELECTRICITY

Abstract

Claims

Description