System, method, and computer program for encoding and decoding a unique signature in a video file
09813725 · 2017-11-07
Assignee
Inventors
Cpc classification
H04N19/42
ELECTRICITY
G10L19/018
PHYSICS
H04N21/8352
ELECTRICITY
G06V20/46
PHYSICS
International classification
H04N19/467
ELECTRICITY
H04N19/42
ELECTRICITY
H04N21/8352
ELECTRICITY
Abstract
The present disclosure describes a system, method, and computer program for encoding and decoding a unique signature for a user in a video file, wherein the video file was created using a video format that does not specifically support embedding a unique signature in the video file. A unique signature, comprising a plurality of data bits, is associated with a user and divided into groups. For each group, a different sine wave is created for each bit in the group. The frequencies of the sine waves correspond to the types of bits, and the amplitudes of the sine waves indicate the values of the bits. Group signals are overdubbed on the infrasound range of the audio track of the video file. The unique signature is decoded from the video file by analyzing the frequencies and amplitudes of sine waves in the infrasound range of the audio track.
Claims
1. A method for encoding and decoding a unique signature for a user in a video file, wherein the video file was created using a video format that does not support embedding a unique signature in the video file and wherein the video file includes an audio track, the method comprising: encoding a unique signature into a video file by performing the following: associating a unique sequence of data bits with a user; dividing the sequence of data bits into a plurality of sequential groups, wherein each group is associated with a sequence number and wherein each group includes one or more sequence number bits and one or more data bits; for each group, creating a different sine wave for each bit in the group having a non-zero value, wherein each of the sine waves has (1) a frequency that represents whether the bit is a sequence number bit or a data bit, and, if there is more than one sequence number bit or data bit in the group, which sequence number or data bit and (2) an amplitude that represents the value of the bit; for each group, creating a group signal by combining the sine waves of the group, wherein the frequency of each group signal is in the infrasound range; removing frequencies below the infrasound range from an audio track in the video file; overdubbing the group signals onto the audio track, wherein gaps are left between the groups and wherein the group signals are not detectable to a person during play of the video file; and decoding the unique signature from the video file by performing the following: removing signals with frequencies above the infrasound range from the audio track of the video file; identifying each of the group signals in the remaining audio track using gaps in the remaining audio track to distinguish between groups; deconstructing each group signal into individual sine waves; identifying the individual sine waves in each group having frequencies corresponding to the frequencies of the sine waves used in the encoding step to represent the data bits and the sequence number bits; for each group, ascertaining the value of each sequence number bit and each data bit in the group from the amplitude of the sine waves having frequencies corresponding to said bits; determining the sequence number of each group from the value of the sequence number bits; and recreating the sequence of data bits by concatenating the data bits in the groups in order of the sequence numbers of the groups.
2. The method of claim 1, wherein the unique sequence is a digital sequence of up to 123 bits.
3. The method of claim 1, wherein each user is associated with a different integer, and wherein the unique sequence of data bits is created by converting the integer to a ternary number.
4. The method of claim 3, further comprising: deriving the unique signature from the recreated sequence of data bits by converting the ternary number represented by the sequence of data bits into an integer.
5. The method of claim 1, wherein each user is associated with an integer, and wherein the unique sequence is creating by converting the integer into a binary number.
6. The method of claim 1, wherein the group signals are overdubbed over the audio track at fixed intervals.
7. The method of claim 6, wherein the group signals are generated for 1 second and overdubbed over the audio track at a distance of 0.25 seconds from each other.
8. The method of claim 1, wherein, in the overdubbing step, the audio track is analyzed in frames, and a group signal is overdubbed over a frame in the audio track only if the amplitude of the audio track within the frame exceeds a threshold.
9. The method of claim 8, wherein the group signals are generated for 1 second and the frames are 1 second frames.
10. The method of claim 1, wherein the group signal is deconstructed into individual sine waves using a Fast Fourier Transform analysis with buckets wide enough to distinguish between the frequencies used in the encoding step.
11. A non-transitory, computer-readable medium comprising a computer program, that, when executed by a computer system, enables the computer system to perform the following method for encoding and decoding a unique signature for a user in a video file, wherein the video file was created using a video format that does not support embedding a unique signature in the video file and wherein the video file includes an audio track, the method comprising: encoding a unique signature into a video file by performing the following: associating a unique sequence of data bits with a user; dividing the sequence of data bits into a plurality of sequential groups, wherein each group is associated with a sequence number and wherein each group includes one or more sequence number bits and one or more data bits; for each group, creating a different sine wave for each bit in the group having a non-zero value, wherein each of the sine waves has (1) a frequency that represents whether the bit is a sequence number bit or a data bit, and, if there is more than one sequence number bit or data bit in the group, which sequence number or data bit and (2) an amplitude that represents the value of the bit; for each group, creating a group signal by combining the sine waves of the group, wherein the frequency of each group signal is in the infrasound range; removing frequencies below the infrasound range from an audio track in the video file; overdubbing the group signals onto the audio track, wherein gaps are left between the groups and wherein the group signals are not detectable to a person during play of the video file; and decoding the unique signature from the video file by performing the following: removing signals with frequencies above the infrasound range from the audio track of the video file; identifying each of the group signals in the remaining audio track using gaps in the remaining audio track to distinguish between groups; deconstructing each group signal into individual sine waves; identifying the individual sine waves in each group having frequencies corresponding to the frequencies of the sine waves used in the encoding step to represent the data bits and the sequence number bits; for each group, ascertaining the value of each sequence number bit and each data bit in the group from the amplitude of the sine waves having frequencies corresponding to said bits; determining the sequence number of each group from the value of the sequence number bits; and recreating the sequence of data bits by concatenating the data bits in the groups in order of the sequence numbers of the groups.
12. The non-transitory, computer-readable medium of claim 11, wherein the unique sequence is a digital sequence of up to 123 bits.
13. The non-transitory, computer-readable medium of claim 11, wherein each user is associated with a different integer, and wherein the unique sequence of data bits is created by converting the integer to a ternary number.
14. The non-transitory, computer-readable medium of claim 13, further comprising: deriving the unique signature from the recreated sequence of data bits by converting the ternary number represented by the sequence of data bits into an integer.
15. The non-transitory, computer-readable medium of claim 11, wherein each user is associated with an integer, and wherein the unique sequence is creating by converting the integer into a binary number.
16. The non-transitory, computer-readable medium of claim 11, wherein the group signals are overdubbed over the audio track at fixed intervals.
17. The non-transitory, computer-readable medium of claim 11, wherein, in the overdubbing step, the audio track is analyzed in frames, and a group signal is overdubbed over a frame in the audio track only if the amplitude of the audio track within the frame exceeds a threshold.
18. The non-transitory, computer-readable medium of claim 11, wherein the group signal is deconstructed into individual sine waves using a Fast Fourier Transform analysis with buckets wide enough to distinguish between the frequencies used in the encoding step.
19. A computer system for encoding and decoding a unique signature for a user in a video file, wherein the video file was created using a video format that does not support embedding a unique signature in the video file and wherein the video file includes an audio track, the method comprising, the system comprising: one or more processors; one or more memory units coupled to the one or more processors, wherein the one or more memory units store instructions that, when executed by the one or more processors, cause the system to perform the operations of: encoding a unique signature into a video file by performing the following: associating a unique sequence of data bits with a user; dividing the sequence of data bits into a plurality of sequential groups, wherein each group is associated with a sequence number and wherein each group includes one or more sequence number bits and one or more data bits; for each group, creating a different sine wave for each bit in the group having a non-zero value, wherein each of the sine waves has (1) a frequency that represents whether the bit is a sequence number bit or a data bit, and, if there is more than one sequence number bit or data bit in the group, which sequence number or data bit and (2) an amplitude that represents the value of the bit; for each group, creating a group signal by combining the sine waves of the group, wherein the frequency of each group signal is in the infrasound range; removing frequencies below the infrasound range from an audio track in the video file; overdubbing the group signals onto the audio track, wherein gaps are left between the groups and wherein the group signals are not detectable to a person during play of the video file; and decoding the unique signature from the video file by performing the following: removing signals with frequencies above the infrasound range from the audio track of the video file; identifying each of the group signals in the remaining audio track using gaps in the remaining audio track to distinguish between groups; deconstructing each group signal into individual sine waves; identifying the individual sine waves in each group having frequencies corresponding to the frequencies of the sine waves used in the encoding step to represent the data bits and the sequence number bits; for each group, ascertaining the value of each sequence number bit and each data bit in the group from the amplitude of the sine waves having frequencies corresponding to said bits; determining the sequence number of each group from the value of the sequence number bits; and recreating the sequence of data bits by concatenating the data bits in the groups in order of the sequence numbers of the groups.
20. The system of claim 19, wherein the group signal is deconstructed into individual sine waves using a Fast Fourier Transform analysis with buckets wide enough to distinguish between the frequencies used in the encoding step.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(5)
(6) Encoding a Unique ID in a Video File
(7) Referring to
(8) The system divides the unique signature into a plurality of sequential groups (step 120), and it associates each group with a sequence number (step 130), which is represented by sequence number bits. In other words, each group comprises one or more sequence number bits and one or more data bits of the unique signature. The unique signature can be recreated from the groups, using the sequence numbers of the groups to put the data bits from the unique signature in the right order.
(9) For each group, the system creates a different sine wave for each bit in the group having a non-zero value (step 140). Each sine wave has a frequency that represents whether the bit is a sequence number bit or a data bit and an amplitude that represents the value of the bit. For example, if a group comprises two sequence number bits, and two data bits, sine waves with 5 Hz and 13 Hz frequencies may be used for the sequence number bits, and sine waves with 3 Hz and 7 Hz frequencies may be used for the data bits. If the unique signature is a ternary number, then a sine wave with full amplitude could represent the value “2,” a sine wave with half amplitude could represent the value “1,” and the lack of a sine wave could represent the value 0. The system then combines the sine waves in each group to create a group signal (step 150).
(10) In order for the unique signature to be resilient to transcoding, it needs be embedded in the video file in way that would cause an encoder/transcoder to preserve it as relevant information. Most audio/video encoders are lossy and only preserve information deemed relevant to reconstruct the video/audio signal in a way that is indistinguishable to a human from the original. The infrasound band of the audio spectrum is a good place to embed the unique signature, as common encoders (e.g., AAC and MP3) preserve this band and concentrate instead on compressing higher frequencies. Also, this part of the audio spectrum is inaudible to humans, and, therefore, embedding signals in this spectrum will not affect the sound of the video to a human. Consequently, in the preferred embodiment, all the frequencies of the sine waves created in step 140, as well the resulting group signals, are in the infrasound band (e.g., less than 20 Hz).
(11) To add the unique signature to the video file, the system removes signals frequencies below the infrasound range from the audio track of the video file (step 160). The system then overdubs the group signals onto the audio signal (step 170), leaving gaps between the groups so that in the decoding process the groups can be distinguished from each other.
(12) In one embodiment, the group signals are overdubbed on the audio track at regular intervals (e.g. 1 second group signals separated by 0.25 seconds of silence). Once all the group signals have been added to the audio track, the process is repeated until the end of the audio track. This means that the audio track may include multiple instances of the group sequence throughout its duration.
(13) In an alternate embodiment, the overdubbing is based on the signal strength in the audio track. The audio track signal is analyzed for strength (amplitude) in fixed-duration frames (e.g., 1 second). If the frame contains a strong enough signal to mask the unique signature, a group signal is overdubbed on the frame itself. Otherwise, a next candidate frame is analyzed by moving forward a certain amount (e.g., 0.25 seconds) on the audio track. In one embodiment, a frame is considered to have a strong enough signal if it exceeds a threshold amplitude, wherein the threshold is based on a theoretical maximum group signal amplitude. In one embodiment, the frame signal is considered strong enough to mask the group signal if the group signal amplitude is equal to or less than 25% of the amplitude of the frame signal.
(14) In one embodiment, the unique signature must not exceed 123 bits, and the video file must include an audio track sampled at 44.1 Khz or above.
(15) In one embodiment, the system is part of a platform than enables end users to create video animations. An example of such a platform is the GOANIMATE platform. As the platform system generates the video file for the user (based on user input), the unique signature is embedded in the video file.
(16) Decoding the Unique Signature from the Video File
(17)
(18) The system deconstructs each group signal into individual sine waves (step 230). In one embodiment, this is done using a Fast Fourier Transform analysis with buckets wide enough to distinguish between the frequencies used in the encoding process.
(19) The system then analyzes the individual sine waves to identify those having frequencies corresponding to the frequencies of the sine waves used in the encoding step to represent data bits (e.g., unique signature bits) and sequence number bits (step 240).
(20) The system maps each identified sine wave to a specific sequence number bit or data bit based on the frequency of the sine wave (step 250). The value of each sequence number bit and data bit in each group is determined from the amplitude of the particular sine wave (step 260). For example, if 5 Hz is used for the least significant sequence number bit, then the system knows this sine wave corresponds to such bit, and the amplitude of the sine wave indicates the value of the bit.
(21) Where the sequence number bits represent ternary or binary numbers, the system converts the sequence number bits for each group into an integer sequence number (step 270). For example, if the sequence number bits for a group are “2” and “0”, and the bits represent ternary numbers, the system knows that the sequence number for the group is integer number 1. The system recreates the digital signature by concatenating the data bits in each group in the order of the sequence numbers of the groups (step 280).
Example Implementation
(22) An example of the above-described methods are described below with respect to Tables 1-9 and
(23) Signal Generation (Encoding)
(24) In this example, the user has the integer number “102” as his unique ID. To create the unique digital signature from the unique ID, 102 is converted into a ternary number, which is as follows:
(25) Ternary Representation: 01 02 10 (3+2×9+81=102)
(26) Four sine waves are used to embed this digital signature. Two sine waves with frequencies of 5 Hz and 13 Hz will be used for the sequence number bits, and two sine waves with frequencies of 3 Hz and 7 Hz will be used for the unique signature bits (i.e., the data bits). The unique signature, which consists of six bits, will be divided into groups, each with two sequence number bits and two data bits.
(27) Each group sequence number is represented as ternary bits, as indicated in the Table 1 below:
(28) TABLE-US-00001 TABLE 1 Group Sequence Ternary Sequence Ternary Sequence Number Bit 0 (5 Hz) - Bit 1 (13 Hz) - (Integer format) Value Value Calibration 2 2 0 0 1 1 0 2 2 1 0 3 1 1 4 1 2
(29) In the above table, the least significant bit in a group is Ternary Bit 0, and the most significant bit in a group is Ternary Bit 1. Having up to five groups enables 10 (5×2) ternary bits to be encoded for a maximum of 59,049 combinations (or unique IDs with integer numbers between 0 and 59,048). The calibration sequence, which has all ternary bits set to 2, is useful to calibrate the decoding system (as discussed below).
(30) Table 2 below illustrates the unique signature bits (data bits) in each group for the unique ID 102:
(31) TABLE-US-00002 TABLE 2 Group Sequence Number Ternary Data Ternary Data (Integer Format) Bit 0 (3 Hz) Bit 1 (7 Hz) 0 0 1 1 2 0 2 1 0 3 0 0 4 0 0
(32) In the above table, the least significant bit in a group is Ternary Bit 0, and the most significant bit in a group is Ternary Bit 1.
(33) Table 3 below illustrates the relative amplitudes of the four sine waves created for each group. The two left-most ternary bits are the sequence number bits, and the two right-most ternary bits are the data bits (i.e., unique signature bits).
(34) TABLE-US-00003 TABLE 4 Sine Wave Amplitudes Ternary Bits Group (sequence and data) 5 Hz 13 Hz 3 Hz 7 Hz Calibration 2, 2, 2, 2 25% 25% 25% 25% 0 0, 1, 0, 1 — 12.5% — 12.5% 1 0, 2, 0, 2 — 25% 25% — 2 1, 0, 1, 0 12.5% — 12.5% — 3 1, 1, 0, 0 12.5% 12.5% — — 4 1, 2, 0, 0 12.5% 25% — —
(35) The amplitude percentages above are percentages of the maximum amplitude that can be stored on the audio track. Therefore, if the maximum amplitude that can be stored on the audio track is 40 db, a sine wave having an amplitude of ˜10 db will represent the ternary value of “2,” and a sine wave having an amplitude of ˜5 db will represent a ternary value of “1.”
(36) The four sine waves in each group are combined to create a group signal for the group. The complete sequence of groups is illustrated in
(37) To understand the composition of each group signal, we can decompose the first group signal (the “calibration” group) and note that it was generated by mixing the four waves illustrated in
(38) Overdubbing
(39) The sequence of group signals is then overdubbed on the original audio track of a video file, looping it from start to finish in order to cover the whole duration of the original audio. For purposes of this example, each group signal is overdubbed for 1 second, followed by 0.25 second gaps between group signals.
(40) In order to overdub the unique signature, frequencies below 20 Hz are removed from the audio track. Sounds at these frequencies are inaudible and will not be noticed when removed.
(41) Decoding
(42) Removing Unnecessary Frequencies
(43) The first step in the decoding process is to remove all frequencies above 20 Hz through a low-pass filter since frequencies above 20 Hz are not part of the unique signature.
(44) Once the frequencies above 20 Hz are removed from the audio track, the group signals can be identified by looking at the gaps between signals.
(45) In order to determine the amplitude of the group's components, a FFT analysis is run with buckets wide enough to distinguish between the original frequencies. An FFT analysis with 44,100 points over 1 second of audio data results in 22,050 buckets of 1 Hz each. This provides enough separation to isolate frequencies of interest. For the first group, this results in the following data:
(46) TABLE-US-00004 TABLE 5 Frequency (Hz) Amplitude (DC Offset) 30.4380376598 1 58.5886510151 2 420.2787637881 3 683.9611593603 4 832.9792092872 5 629.7254774691 6 824.2247666843 7 701.6410952705 8 405.1519539124 9 45.7046487393 10 37.6339476121 11 39.821715285 12 400.7346142158 13 768.3206464348 14 399.724772282 15 34.4455805305 16 33.6250037487 17 31.8099392116 18 29.3880111935 19 27.0296059976
(47) Relevant frequencies (i.e., 3, 5, 7, and 13) are bolded. Since this is the “Calibration” group, the expected maximum amplitude (e.g., the amplitude corresponding to Ternary bit value “2”) for each frequency of interest is as follows:
(48) TABLE-US-00005 TABLE 6 Frequency Meaning Amplitude 5 Hz Sequence Number bit 0 629 13 Hz Sequence Number bit 1 768 3 Hz Data bit 0 683 7 Hz Data bit 1 701
(49) Repeating the same FFT analysis over another group results in the following values:
(50) TABLE-US-00006 TABLE 7 Frequency (Hz) Amplitude (DC Offset) 5.5821762405 1 30.7042724264 2 206.2251123983 3 323.7944205867 4 405.6971668409 5 325.8027326038 6 204.8138063691 7 29.6427971081 8 6.5059694382 9 6.5984732472 10 6.6533211611 11 6.647010936 12 6.6112156383 13 6.4805638885 14 6.3347136951 15 6.0413375486 16 5.8004016926 17 5.5289838432 18 4.8704968703 19 4.6858400346
(51) This give the following mapping:
(52) TABLE-US-00007 TABLE 8 Amplitude/Max % of Max Amplitude from amplitude from Frequency Meaning “Calibration” group Calibration Group 5 Hz Sequence 325/629 50% Number bit 0 13 Hz Sequence 6/768 0% Number bit 1 3 Hz Data bit 0 323/683 47% 7 Hz Data bit 1 29/701 4%
(53) The derived values for the ternary bits in the group would then be:
(54) TABLE-US-00008 TABLE 9 Element Value Sequence Number bit 0 1 Sequence Number bit 1 0 Data bit 0 1 Data bit 1 0
This corresponds to the the encoded bits (1,0,1,0) for the group with Sequence Number 2.
(55) The foregoing decoding analysis is repeated on all sequences found in the audio track. Statically analysis may be performed to weigh the likelihood of the values for each sequence, thus introducing error correction in the retrieval algorithm.
(56) General
(57) The methods described with respect to
(58) As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the above disclosure is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.