SYNTHETIC ELECTRONIC VIDEO CONTAINING A HIDDEN IMAGE
20190297298 · 2019-09-26
Inventors
Cpc classification
H04N1/32149
ELECTRICITY
H04N2201/327
ELECTRICITY
H04N5/74
ELECTRICITY
H04N21/41415
ELECTRICITY
G03B21/26
PHYSICS
International classification
H04N5/74
ELECTRICITY
G03B21/26
PHYSICS
H04N21/414
ELECTRICITY
Abstract
We present a method for hiding images in synthetic videos and reveal them by temporal averaging. We developed a visual masking method that hides the input image both spatially and temporally. Our masking approach consists of temporal and spatial pixel by pixel temporal variations of the frequency band coefficients representing the image to be hidden. These variations ensure that the target image remains invisible. In addition, by applying a temporal expansion function derived from a dither matrix, we allow the video to carry a visible message that is different from the hidden image. The image hidden in the video can be revealed by software averaging, or with a camera, by long exposure photography. The method finds applications in the secure transmission of digital information.
Claims
1. A method for generating, in a computing system, a synthetic electronic video comprising a plurality of sequential video frames containing a hidden image that is not ascertainable by the naked eye of a human observer when the video is played on an electronic display, the method comprising the steps of: (a) providing an electronic file of the hidden image and decomposing the hidden image into a plurality of spatial frequency bands; (b) applying to pixels of said spatial frequency bands an expansion function that yields temporally varying instances of said spatial frequency bands, which, when averaged, enable recovering said spatial frequency bands; (c) summing at each time point the corresponding instance from each of the expanded spatial frequency bands to generate said video frames in which said hidden image is contained.
2. The method of claim 1, further including a method of recovering the hidden image comprising: (d) averaging said plurality of sequential video frames and recovering thereby the hidden image.
3. The method of claim 2, wherein step d) is performed by a camera that captures the video played on an electronic display and combines the plurality of sequential video frames into a still image that reveals the hidden image.
4. The method of claim 3, wherein the electronic display is a device selected from a set of TV, computer display, tablet, smartphone, and smart watch.
5. The method of claim 1, where the expansion function is selected from the set of (i) random functions that generate both spatial and temporal noise, (ii) sinusoidal composite wave functions that generate spatial random noise evolving smoothly in time, (iii) combination of random and dither expansion functions, where the dither expansion function relies on a dither matrix animated in time.
6. The method of claim 3, wherein the camera is selected from a set of (i) a camera that captures the plurality of sequential video frames as a single image within an adjustable exposure time and (ii) a camera that captures the plurality of sequential video frames and averages them by software.
7. The method of claim 2 wherein before or during step (a) the contrast of the hidden image is reduced and after step (d) the contrast of the recovered hidden image is increased.
8. The method of claim 1, wherein said expansion function is applied to each color channel separately to generate said synthetic video in color.
9. The method of claim 1, further including embedding the synthetic electronic video within a classical video or movie.
10. A computing system operable for generating a synthetic electronic video comprising a plurality of sequential video frames containing a hidden image that is not ascertainable by the naked eye of a human observer when the video is played on an electronic display, said computing system comprising software modules operable for: (a) decomposing said hidden image into a plurality of spatial frequency bands; (b) applying to pixels of said spatial frequency bands an expansion function that yields temporally varying instances which, when averaged, enable recovering said spatial frequency bands; (c) summing at each time point the corresponding instance from each of the expanded spatial frequency bands to generate said video frames in which said hidden image is contained.
11. The computing system of claim 7, further comprising a camera operable for capturing and averaging said synthetic video frames, thereby recovering the hidden image.
12. A synthetic electronic video comprising a plurality of video frames containing a hidden image that is not ascertainable by the naked eye of a human observer when the video is played on an electronic display, and wherein the hidden image is revealed by averaging the plurality of video frames of said video.
13. The synthetic electronic video of claim 12, embedded within a classical video or movie.
14. The synthetic electronic video of claim 12, wherein the hidden image does not appear in any single video frame.
15. The synthetic electronic video of claim 12, comprising a dynamically evolving message different from the hidden image, where said dynamically evolving message comprises a visual element selected from the set of text, logo, graphic element, and picture.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0042] For a better understanding of the present invention, one may refer by way of example to the accompanying drawings, in which:
[0043] synthetic
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
[0054] The goal of the present work is to hide an image in a video stream under the constraint that the temporal average of the video reveals the image. Specifically, the input image should remain invisible in each frame of the video and should not become visible due to the temporal integration of consecutive frames by the human visual system (HVS). In order to achieve this, a visual masking method that acts both in the spatial and in the temporal domain is required. Spatial masking inhibits orientation and frequency channels of the HVS. In temporal masking, any information coming from the target image by temporal averaging should be masked.
[0055] Our method hides an input image within a video. The image is revealed by averaging, which is either achieved by pixelwise mathematical averaging of the video frames or by long exposure photography. We call the video hiding the input image tempocode or equivalently tempocode video.
[0056] Regarding the vocabulary, we also call the image to be hidden within the tempocode video target hidden image or simply target image. Sometimes we refer to one pixel called target pixel of the target image or of an instance of the target image that has been obtained by processing it, for example by decomposition into frequency bands. A target pixel has a target intensity value or simply a target intensity. In analogy with the science of signal processing, the term target signal or simply target is used for the signal to be hidden. In the present disclosure, there is an implicit analogy between the term target signal and target image or between target signal and target image pixel.
[0057]
[0058] In order to create such tempocodes, we apply the following self-masking approach. We first decrease the dynamic range of the input image and decompose it into a certain number of frequency bands. For each frequency band of the contrast reduced input image, we generate temporal samples by sampling a selected expansion function, whose integration along a certain time interval gives the corresponding frequency band. We then reconstruct each video frame from the temporal samples derived from the frequency bands. We consider the following expansion functions: random function, sinusoidal composite wave function, and a temporally-varying dither function. Using these functions we generate different masking effects such as smoothly evolving videos and videos with visible moving patterns.
[0059] We now describe our approach for hiding an image in a video. The hidden information is not perceivable by the human eye but the pixelwise average of the video over a time interval ranging between 2 seconds and 20 seconds reveals the hidden image. With the correct exposure time, conventional and digital cameras can detect the hidden information. Software averaging over the video frames also reveals the image.
[0060] The main challenge resides in masking the input image by spatio-temporal signals that are a function of the input image. To achieve this, we present a visual masking process that enables hiding the input image for both the spatial and the temporal perception of human beings.
[0061] In conventional visual masking methods, the mask and the target signal to be hidden are different stimuli. However, in our method, the mask is constructed from the target image. We call this approach self-masking.
[0062] We initially define the problem in the continuous domain. A constant target signal p is reproduced by the integration of (t), a time dependent expansion function, over a duration :
[0063] In order to create spatial noise, a phase shift parameter is selected randomly at each spatial position. We assume that the display is linear. The target signal p, the duration , and the phase shift are known parameters. The challenge resides in finding a function (t+), satisfying this integration and ensuring that the target signal is masked at each time and within each small time interval (40 ms). We present the different alternatives for the expansion function (t+) in the Expansion Functions section.
[0064] In practice, our signals are not continuous since the target image to be hidden is a digital image and the mask is a digital video designed for modern displays. Let I be a target image to be masked (i.e. hidden) into a video V having n frames. Initially, we reduce the contrast of the input image I by linear scaling and obtain the contrast reduced image I.sub.c. This is required in order to reach the masking threshold, i.e. the threshold where the target image is hidden.
[0065] A multi-band masking approach is required to mask both high frequency and low frequency target image contents. Applying the expansion function solely on input pixels would only mask the high frequency content. Therefore, we decompose the contrast reduced target image I.sub.c into spatial frequency bands. A Gaussian pyramid is computed from the contrast reduced target image I.sub.c. To obtain the frequency bands, we compute the differences of every two neighbouring pyramid levels. In practice, we use a standard Laplacian pyramid with a 1-octave spacing between frequency bands, see reference [11] herein incorporated by the reference. Finally, for each contrast reduced pixel value I.sub.c.sup.l(x,y) in each band l, we solve a discretized instance of Eq. (1). Let t.sub.1, . . . t.sub.n be a set of n uniformly spaced time points (
where v.sub.i.sup.l(x,y) is the frame V.sub.i of frequency band l at time point t.sub.i of the resulting video and where (x,y) indicates the pixel location. A different phase shift value .sub.l is assigned to each pixel (x,y) in each band l.
[0066] Once all bands v.sub.i.sup.l(x,y) of each frame v.sub.i(x,y) are constructed, we sum the corresponding bands to obtain the final frame at time point t.sub.i:
where k is the number of bands and (x,y) is the position of a given pixel within the frame.
[0067]
[0068] For decoding purposes, the average of the tempocode frames 219 gives the contrast reduced input image I.sub.c from which the input I 220 is recovered. In the present example, the resulting video has n=24 frames and is constructed with k=7 frequency bands. In
Contrast Reduction for Masking Purposes
[0069] A masking signal with a certain contrast can mask a target signal having a contrast smaller than the masking threshold. In the present invention, we always generate our mask with 100 percent contrast in order to enable a maximal contrast of the target image to be hidden. To ensure that the target image is hidden, we first reduce the contrast of the target image I and move the contrast reduced image to the center of the available intensity range. The resulting contrast reduced image I.sub.c is:
where is the reduction factor and 0<<1.
[0070] The amount of contrast reduction a depends on the contrast, spatial frequency, and orientation of the image to be hidden.
[0071] It is very important to select the correct contrast reduction factor to reach the masking threshold. However, the input image consists of a mixture of locally varying contrasts, spatial frequencies, and orientations that affect masking. The contrast reduction factor should be selected by considering the local image element that requires the largest amount of the contrast reduction. Once this image element is masked, all other image elements are masked as well.
Expansion Functions
[0072] Many different types of temporal expansion functions (t+) fulfill the requirements of Eq. (1). We can define a random function with uniform probability, a Gaussian function, a Bezier curve, a logarithmic function, or periodic functions such as a square wave, a triangle wave, or a sine wave. However, the following constraints need to be satisfied: [0073] Eq. (1) must have a solution for the selected function within the dynamic range of each frequency band. [0074] Masking must be achieved spatially and temporally during the whole video V. In other words, any visual element that could reveal the target image I or its contrast reduced instance I.sub.c must remain invisible to the human eye. [0075] A smooth transition between frames is desirable. Therefore, we want our function to be continuous.
[0076] In the following, we describe random, periodic, and dither expansion functions.
1. Random Expansion Function
[0077] Our random expansion function is made of n random uniformly distributed samples varying temporally for each pixel of each band (
[0078] If the contrast of the target image is sufficiently reduced, the random function masks to a large extent the target image. However, this is only true when each frame is observed separately. When all frames are played as a video (e.g., at 30 frames per second), the target image might be slightly revealed. This is due to the fact that the target image is well masked spatially but not temporally. The human visual system has a temporal integration interval of 4010 ms. Therefore a few consecutive frames can be averaged by the human visual system.
[0079]
2. A Sinusoidal Composite Wave
[0080] As we have seen in the previous section, a temporally continuous low frequency masking signal is required to avoid revealing the target signal by temporal integration of the human visual system. We thus propose a periodic function that results in spatial discontinuity and temporal continuity of the resulting video.
[0081] We use a sine function as our periodic function. Spatial juxtaposition of phase-shifted sine functions may reveal local parts of the target image. Therefore, instead of using a regular sine function, we create a sinusoidal composite wave by varying the function in amplitude for a given number of temporal segments.
[0082] In order to create m sine segments varying in amplitude, we first generate m uniformly distributed random temporal parent-samples p.sub.j.sup.l(x,y) for each pixel of each band ensuring that their mean is I.sub.c.sup.l(x,y):
[0083] Since we have a small number of parent-samples (e.g. 4 samples), the mean I.sub.c.sup.l(x,y) will not be exactly achieved. Therefore, we redistribute the error across the samples. Next, for each parent-sample p.sub.j, we establish a function .sub.j(t+) in the form of Eq. 1 such that:
where
is the start time,
is the end time, j[1, . . . , m] is the index of each parent-sample, and i is the total duration of the video to be averaged.
[0084] We define the expansion function .sub.1(t+) for each parent sample as a continuous section of a sine in a form that is analytically integrable and lies within the allowed intensity range for most of its values.
where k.sub.j is the amplitude and T is the period. As shown in
[0085] By inserting Eq. 8 into Eq. 7, we can express k.sub.j in function of the other parameters:
[0086] For each pixel of each frequency band, these m functions .sub.j(t+) of parent samples p.sub.1 416, p.sub.2 417, p.sub.3 418, p.sub.4 419 are sampled by
video frames 421, see
[0087] In order to ensure a phase continuity between the sinusoidal segments, we select the phase shift randomly only for the first sinusoidal segment .sub.j(t+). For all other functions associated to parent samples we use the current phase and the current period T. Nevertheless, due to the variations of the amplitudes, we obtain a non-continuous composite signal. These discontinuities 413a, 413b, 413c appear at the junctions between successive sinusoidal segments (see
[0088] To remove the discontinuities at the junction points, we apply a refinement process by using differential values. From the samples of the composite wave, we first calculate the differential values by taking the backward temporal differences: v.sub.i.sup.l(x,y)=v.sub.i.sup.l(x,y)v.sub.i-1.sup.l(x,y) (
[0089] With the blended differential values, we re-calculate the intensity values for each pixel of each band by minimizing the following optimization function:
where n is the total number of frames (
[0090] This optimization is solved as a sparse linear system. We obtain a smooth signal (
[0091] The deviations from the average I.sub.c.sup.l(x,y) (
[0092] As shown in
3. Temporal Dither Expansion Function
[0093] A sinusoidal composite wave enables masking the target image both spatially and temporally. However, the visible part, the tempocode video, does not convey any visual meaning. We thus propose to replace the spatial noise with meaningful patterns. For this purpose, we make use of artistic dither matrices which were described in U.S. Pat. No. 7,623,739 to Hersch and Wittwer, herein incorporated by reference.
[0094] When printing with bilevel pixels, dithering is used to increase the number of apparent intensities or colors. A full tone color image can be created with spatially distributed surface coverages of cyan (c), magenta (m), yellow (y), and black (b) inks. The human visual system integrates the tiny c,m,y,k inked and non-inked areas into the desired color.
[0095] A dither matrix includes in each of its cell a dither threshold value. These dither threshold values indicate at which intensity level pixels should be inked. Artistic dithering enables ordering these threshold levels so that for most levels the turned-on pixels depict a meaningful shape. We adapt artistic dithering to provide a visual meaning to tempocode videos.
[0096] We repeat the selected dither matrix (
[0097] Instead of finding such a dither input intensity 518, we directly assign white or black to the successive temporal dither threshold levels as follows: [0098] 1. Find the ratio r.sub.wb of white to black temporal pixel values to obtain the target intensity I.sub.c(x,y). Then derive the number w of white pixel values. This is calculated as follows:
[0102] A smooth transition between frames is desirable. Therefore, our expansion function should be continuous. This is ensured by the smooth displacement of the dither matrix.
4. Combination of Random Expansion and Temporal Dither Expansion Functions
[0103] Expansion by simple dithering satisfies one of our conditions, i.e., the average of the frames yield the target image (Eq. (2)). However, a multi-band decomposition cannot be carried out with the dithered binary images since they are bilevel. As shown previously, the multi-band decomposition is an important component for masking the target image. To overcome this problem, we create two parent frames I.sub.c.sup.P1 and I.sub.c.sup.P2 (
frames by dither expansion using the temporal dither function as described above. Thanks to the dither expansion we get n dithered frames forming our final video V in which the target image is successfully masked, as shown for a single pixel in
Results
[0104] As an example,
[0105] The methods for generating tempocodes are described for grayscale target images. For color images, we use exactly the same procedure and apply the self-masking method to each color channel separately.
[0106] As a further example,
[0107] The present invention introduces a screen camera channel for hiding information by simple averaging. The encoding is complex, but the decoding is very simple. Thus, hidden images can be revealed by non-expert users but not created. The present method does not compete with existing watermarking or stenographic methods that require complex decoding procedures. It can be rather used as a first-level secure communication feature. More and more security applications, such as banking software, use smartphones to identify codes that appear on a display. In the present case, instead of directly acquiring the image of a code, the smartphone might acquire a video that incorporates that code. For example, instead of showing a QR code on an electronic document directly, our method can be used to hide it. Hiding a message into a video can be seen as one building block within a larger security framework. Furthermore, tempocodes can be used as video seals in movies against piracy. A video seal can be placed in the credits or titles section (
[0108]
[0109] The final tempocode video is stored on disk 94 or transmitted over the network 96 to another computer in order to be played or to be inserted into a movie. For the display of the tempocode video, a computing system (e.g. TV, laptop, tablet, smartphone, smart watch) with a display 95 is required. The display shows the client's tempocode that has been received through the network or is stored in his memory. Authentication can be performed by an external camera which is not part of this computing system or by an other computing system (e.g. laptop, tablet, smartphone) equiped with a digital camera.
CITED NON PATENT PUBLICATIONS
[0110] 1. J. Fridrich, M. Goljan, and D. Hogea, Steganalysis of jpeg images: breaking the f5 algorithm, in Information Hiding, (2003), pp. 310-323. [0111] 2. Z. Li, X. Chen, X. Pan, and X. Zeng, Lossless data hiding scheme based on adjacent pixel difference, in International Conference on Computer Engineering and Technology, (2009), Vol. 1, pp. 588-592. [0112] 3. X. Li and J. Wang, A steganographic method based upon jpeg and particle swarm optimization algorithm, Inform. Sci. 177, 3099-3109 (2007). [0113] 4. A. Hashad, A. S. Madani, and A. E. M. A. Wandan, A robust steganography technique using discrete cosine transform insertion, in IEEE International Conference on Information and Communications Technology (2005), pp. 255-264. [0114] 5. R. T. McKeon, Strange Fourier steganography in movies, in IEEE International Conference on Electro/Information Technology, (2007), pp. 178-182. [0115] 6. P. Wayner, Disappearing Cryptography: Information Hiding: Steganography & Watermarking (Morgan Kaufmann, 2009). [0116] 7. G. C. Langelaar, I. Setyawan, and R. L. Lagendijk, Watermarking digital image and video data. A state-of-the-art overview, IEEE Signal Process. Mag. 17(5), 20-46 (2000). [0117] 8. A. Khan, A. Siddiqa, S. Munib, and S. A. Malik, A recent survey of reversible watermarking techniques, Inform. Sci. 279, 251-272 (2014). [0118] 9. M. Arsalan, S. A. Malik, and A. Khan, Intelligent reversible watermarking in integer wavelet domain for medical images, J. Syst. Softw. 85, 883-894 (2012). [0119] 10. M. U. Celik, G. Sharma, A. M. Tekalp, and E. Saber, Lossless generalized-LSB data embedding, IEEE Trans. Image Process. 14, 253-266 (2005). [0120] 11. M. N. Do and M. Vetterli, Framing pyramids, IEEE Trans. Signal Process. 51, 2329-2342 (2003).