PROCESS AND SYSTEM FOR ENCODING AND PLAYBACK OF STEREOSCOPIC VIDEO SEQUENCES
20210235065 · 2021-07-29
Assignee
Inventors
- Nicholas Routhier (Candiac, CA)
- Claude THIBEAULT (Brossard, CA)
- Jean Belzile (Lachine, CA)
- Daniel Malouin (Longueuil, CA)
- Pierre-Paul Carpentier (St-Tite, CA)
- Martin Dallaire (Candiac, CA)
Cpc classification
H04N2213/007
ELECTRICITY
H04N13/161
ELECTRICITY
H04N13/189
ELECTRICITY
H04N2213/002
ELECTRICITY
H04N19/597
ELECTRICITY
International classification
H04N13/161
ELECTRICITY
H04N13/189
ELECTRICITY
Abstract
A method for decoding a compressed image stream, the image stream having a plurality of frames, each frame consisting of a merged image including pixels from a left image and pixels from a right image. The method involves the steps of receiving each merged image; changing a clock domain from the original input signal to an internal domain; for each merged image, placing at least two adjacent pixels into an input buffer and interpolating an intermediate pixel, for forming a reconstructed left frame and a reconstructed right frame according to provenance of the adjacent pixels; and reconstructing a stereoscopic image stream from the left and right image frames. The invention also teaches a system for decoding a compressed image stream.
Claims
1. A system for displaying stereoscopic video sequences, the system comprising: a dual input head mountable display; an output controller in communication with the dual input head mountable display, the output controller being operable to: add additional left pixels into each left image of a left image sequence, the additional left pixels being created using spatial interpolation based at least on a plurality of other left pixels in the respective left image, add additional right pixels into each right image of a right image sequence, the additional right pixels being created using spatial interpolation based at least on a plurality of other right pixels in the respective right image, adjust an output frame rate of the left image sequence to correspond with a display frame rate of the head mountable display by inserting, at certain locations in the left image sequence, a modified left image at a position in the left image sequence immediately after a selected left image of the left image sequence, the modified left image being created using movement anticipation based at least in part on the selected left image, adjust an output frame rate of the right image sequence to correspond with a display frame rate of the head mountable display by inserting, at certain locations in the right image sequence, a modified right image at a position in the right image sequence immediately after a selected right image of the right image sequence, the modified right image being created using movement anticipation based at least in part on the selected right image, generate an output left video signal comprised of the adjusted left image sequence, generate an output right video signal comprised of the adjusted right image sequence, and send the output left video signal and the output right video signal to the dual input head mountable display.
2. The system of claim 1, further comprising a rate controller that is operable to monitor clock signals to detect clock signal variations and to instruct the output controller to adjust the output frame rate based on the detected clock signal variations.
3. The system of claim 1, further comprising an anti-flicker filter that is operable to decrease colors of at least one pixel of at least one image of the left image sequence and the right image sequence when at least one color of the at least one pixel is above a certain value.
4. The system of claim 1, wherein the head mountable display is a first display operating in a stereoscopic 3D viewing mode, and wherein the output controller is operable to simultaneously drive one of the output left video signal and the output right video signal to a second display for display in a conventional 2D display mode.
5. The system of claim 1, further comprising a digital input connectable to the internet, the digital input being configured to receive a stereoscopic video sequence over an internet connection, wherein the output controller generates the left image sequence and the right image sequence based on the received stereoscopic video sequence.
6. The system of claim 1, wherein each left image in the left image sequence is generated from a left image mosaic and each right image in the right image sequence is generated from a right image mosaic, each left image having a larger number of pixels than the corresponding left image mosaic and each right image having a larger number of pixels than the corresponding right image mosaic.
7. The system of claim 6, wherein each left image includes a plurality of additional left pixels generated from the corresponding left image mosaic using interpolation and each right image includes a plurality of additional right pixels generated from the corresponding right image mosaic using interpolation.
8. The system of claim 1, wherein the head mountable display is also operable to display 2D format video.
9. A method for displaying stereoscopic video sequences on a dual input head mountable display, the method comprising the steps of: adding additional left pixels into each left image of a left image sequence, the additional left pixels being created using spatial interpolation based at least on a plurality of other left pixels in the respective left image, adding additional right pixels into each right image of a right image sequence, the additional right pixels being created using spatial interpolation based at least on a plurality of other right pixels in the respective right image, adjusting an output frame rate of the left image sequence to correspond with a display frame rate of the head mountable display by inserting, at certain locations in the left image sequence, a modified left image at a position in the left image sequence immediately after a selected left image of the left image sequence, the modified left image being created using movement anticipation based at least in part on the selected left image; adjusting an output frame rate of the right image sequence to correspond with a display frame rate of the head mountable display by inserting, at certain locations in the right image sequence, a modified right image at a position in the right image sequence immediately after a selected right image of the right image sequence, the modified right image being created using movement anticipation based at least in part on the selected right image; generating an output left video signal comprised of the adjusted left image sequence; generating an output right video signal comprised of the adjusted right image sequence; and sending the output left video signal and the output right video signal to the head mountable display.
10. The method of claim 9, further comprising the step of monitoring clock signals to detect clock signal variations and to adjust the output frame rate based on the detected clock signal variations.
11. The method of claim 9, further comprising the step of reducing flickering by decreasing colors of at least one pixel of at least one image of the left image sequence and the right image sequence when at least one color of the at least one pixel is above a certain value.
12. The method of claim 9, wherein the head mountable display is a first display operating in a stereoscopic 3D viewing mode, and further including the step of simultaneously sending one of the output left video signal and the output right video signal to a second display for display in a conventional 2D display mode.
13. The method of claim 9, further comprising the steps of: receiving, via a digital input connectable to the internet, a stereoscopic video sequence over an internet connection; and generating the left image sequence and the right image sequence based on the received stereoscopic video sequence.
14. The method of claim 9, wherein each left image in the left image sequence is generated from a left image mosaic and each right image in the right image sequence is generated from a right image mosaic, each left image having a larger number of pixels than the corresponding left image mosaic and each right image having a larger number of pixels than the corresponding right image mosaic.
15. The method of claim 14, wherein each left image includes a plurality of additional left pixels generated from the corresponding left image mosaic using interpolation and each right image includes a plurality of additional right pixels generated from the corresponding right image mosaic using interpolation.
16. The method of claim 9, wherein the head mountable display is also operable to display 2D format video.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062] Similar reference numerals refer to similar parts throughout the various Figures.
DETAILED DESCRIPTION OF THE DRAWINGS
[0063] Preferred embodiments of the method and associated systems for encoding and playback of stereoscopic video sequences according to the present invention will now be described in detail referring to the appended drawings.
[0064] Referring to
[0065] Stored digital image sequences, typically available in a 24 fps digital Y U V format such as Betacam 4:2:2 (motion pictures), are then converted to an RGB format by processors such as 5 and 8 and fed to inputs 29 and 30 of moving image mixer unit 1, representing the main element of the encoding system of the present invention. It should be noted however the two image sequences can alternatively be converted on a time-sharing basis by a common processor, in order to reduce costs. Mixer 1 compresses the two planar RGB input signals into a 30 fps stereo RGB signal delivered at output 31 and then converted by processor 9 into a betacam 4:2:2 format at output 32 and in turn compressed into a standard MPEG2 bit stream format by a typical circuit 10. The resulting MPEG2 coded stereoscopic program can then be recorded on a conventional medium such as a Digital Video Disk (DVD) 11 or broadcasted on a single standard channel through, for example, transmitter 13 and antenna 14. Alternative program transport media could be for instance a cable distribution network or internet.
[0066] Turning now to
[0067] Decoder 2 produces a synchronized pair of RGB signals at outputs 23 and 24, representative of the first an second image sequences, to drive a dual input stereoscopic progressive display device such as a head mounted display (HMI)) 16. Further, decoder 2 produces a time-sequenced stereo RGB signal at output 25, to supply a single input progressive display device such as projector 17, LCD display 22, CRT monitor or a SDTV or HDTV 21, whereby images from the first and second image sequences are presented in an alternating page flipping mode. Alternatively, the stereo RGB signal from output 25 may be converted into an interlaced NTSC signal to be reproduced by an analog CRT television set or in other stereoscopic formats (ex: column interleaved for autostereoscopic lenticular displays). Also, decoder 2 may be so internally configured to output the stereo RGB signal at one of RGB outputs 23 or 24, thus eliminating output 25.
[0068] Decoder 2 further produces a sync-timing signal at output 26 to drive an infrared shutter spectacle driver 20, driving spectacles 19, Shutter spectacles 19 can be worn by a viewer to view a three-dimensional program projected for instance on screen 18 by projector 17 fed by stereo output 25, by enabling the viewer to alternately see an image from the first image sequence with one eye and an image from the second image sequence with his second eye.
[0069] As stated in the foregoing description, the two original image sequences contain too much information to enable direct storage onto a conventional DVD or broadcast through a conventional channel using the MPEG2 or equivalent multiplexing protocol handling information at a rate of 30 fps. Therefore mixer 1 carries out a decimation process to reduce each picture's information by half.
[0070] The spatial decimation carried out by mixer 1 will now be described with reference to
[0071]
[0072] In a schematic representation,
[0073] As better illustrated in
[0074] The above operation is accomplished by inputting the data of one pixel at a time in a three-pixel input buffer 55 as shown in
[0075] Upon decoding of the merged images, reconstruction of the complete images is carried out by spatially interpolating missing pixels from the compressed half-size images (mosaics) located in the fields of the merged images such as 60. As illustrated in
[0076] In a preferred embodiment of the invention, data of one pixel at a time is stored into a three-pixel input buffer 65. As shown, the three pixels of the shadowed portion of input image 60 have been stored in input buffer 65, two adjacent pixels from the same mosaic being identified as P.sub.i and P.sub.i+1. Data of a third pixel P.sub.i is then calculated as being the arithmetic mean of each of the 3 components of the RGB vectors of adjacent pixels (P.sub.i and P.sub.i+1). For example, if pixel P.sub.i has an intensity vector of (10,0,30) and pixel P.sub.i+1 has an intensity vector of (20,0,60), then, pixel P.sub.j will be calculated as being (15,0,45), Therefore, the mean of two identical pixels is another identical pixel. That calculated (topologically interpolated) pixel replaces the missing pixel decimated upon creation of the mosaics from original image sequences such as 50.
[0077] The original pixels and the interpolated pixels are then stored in appropriate memory locations of a frame buffer where the corresponding image is to be reconstructed (image 72 in the present example). Passed the centre of each hue of the merged frame 60 (entering the right field), data is stored into a second frame buffer 72′, to rebuild the image from the mosaic stored in the right hand field of the stereo image. The process is followed line by line from left to right, until the two images are spatially reconstructed in their respective buffer.
[0078] Although the above embodiment interpolates a pixel as being the mean of two adjacent pixels of a mosaic, the invention provides for a weighting of more than two pixels. For example, if pixel P.sub.i is to be interpolated, then the two or three preceding and following pixels from the mosaic can be used with difference coefficients. More specifically, referring to
[0079] In order to assure flickerless viewing, the decoding method further comprises temporal expansion of image sequences as will be described in detail in the following description. When frame buffers are completely filled to provide complete rebuilt or temporally interpolated images (no more than four frame buffers are required in any embodiment of the reconstruction process and system), they may be read according to different modes to provide different types of desired output signals.
[0080] A first embodiment of the mixing method carried out by mixer 1 according to the present invention is schematically represented in
[0081] A first sequence of images in RUB 24 fps format 50, identified as L1 to L4, is first time expanded by 25% to form a. 30 fps sequence of images such as 51, by the creation and insertion of a new image 52 after every fourth image of the original sequence 50. New image 52 is time-interpolated from the topological information of the immediately preceding and following images (#4 and #5 of original sequence 50). Each pixel of the new image 52 is calculated as the arithmetic mean of the corresponding pixel in the precedent and following image, in a manner similar to the spatial interpolation technique explained in the foregoing description.
[0082] Images of the time-expanded sequence 51 are then spatially compressed according to the technique illustrated in
[0083] It is worth mentioning that in spite of the schematic diagram of
[0084] The decoding and reconstruction carried out by decoder 2 according to the first embodiment of the present invention will now be described by referring to
[0085] In the example shown in
[0086] However, in order to enable comfortable and fatigue free viewing, decoder 2 significantly reduces flicking by providing output signals at a typical rate of 36 full definition frames per eye per second, while satisfying results may be obtained at 30 fps per eye with high definition frames to match refresh rates of SDTV or HDTV for instance. On the other hand, output signals up to 120 fps (60 images per second per eye) can be provided by decoder 2 for a very high fidelity, reproduction, such an output being compatible however with display devices such as DLP projectors and a limited number of high end devices. By experience, a playback rate of 72 fps provides very good results, provided image quality is preserved throughout the coding/decoding process as contemplated herein, such a frequency being a standard for most display devices currently encountered in home theatre systems.
[0087] Therefore, the playback process carried out by decoder 2 preferably includes a further step to increase the presentation rate of sequences 72 and 72′. Additional images are inserted at regular intervals in the image sequence, using the temporal interpolation technique already explained in the foregoing description of the mixing process referring to
[0088] In the example illustrated in
[0089] It should be noted that the foregoing description has been based on the fact that input sequences are supplied at a rate of 24 fps, which is common for motion picture movies. However, one can easily appreciate that the mixing process can be easily adapted to the case whereby two 30 fps sequences (ex. V programs) would be supplied, by merely skipping the preliminary step of temporal interpolation represented by time-expanded sequences 51 and 51′ of
[0090] Alternatively, as illustrated in
[0091]
[0092] A more specific representation of the decoder 2 of the present invention is shown in
[0093] As can be seen, the decoder has two inputs: an analog and a digital. If the signal is analog, it is converted into a digital signal by ADC 101 FIFO buffer 103 changes the dock domain of the input signal into a clock domain used by the decoder. In practice, a broadcast signal or a DVD signal are clocked at a frequency different from the frequency used for RGB signals, hence the necessity of FIFO buffer 103. The signal is then passed through converter 105 which converts the signal from a Y C.sub.B C.sub.R signal into an RGB signal of 1×720×480 (pixels), This signal is then spatially interpolated according to the teachings of the present invention by spatial interpolator 107, resulting is a dual stream of 720×480 pixels. This dual stream is then scaled in scaler 109 to provide two 640×480 image streams (always in the RGB format). Alternatively, other resolutions can be supported by the system of the present invention. The frames are then placed in frame buffers 113, one for the right frames and the other for the left frames, the contents of which are controlled by input memory controller 111.
[0094] The output of the frame buffers is controlled by output memory controller 115 and, if necessary, time interpolator 117 to increase the frame rate.
[0095] A rate controller 119 is preferably provided. The purpose of rate controller is to accommodate variations in the clock signals, which variations, although minute, de-synchronise the system. Rate controller monitors the difference in rate and corrects the output frequency by adding or removing a certain number of pixels on inactive lines of the frame. For example, for a frame it may be necessary to add a few pixels to artificially slow the internal clock and to properly synchronise the clocks.
[0096] Another advantageous component of decoder 2 is the anti-flicker filter 121. Flicking occurs when wearing shutter spectacles, when there is a contrast between the image and the closing of the shutter of the head display. It has been surprisingly discovered that by evaluating the value of the green level in each RGB pixel, and by decreasing the corresponding pixel colours proportionally when the green level is above a certain value, flicking is greatly reduced.
[0097] The output is then directly digital, or converted into an analog signal by DAC 123. Sync module 125 synchronises the head display with the output signal in order to open and close the shutters at the appropriate times.
[0098] Further preferably, an adjuster 127 is further provided. This adjuster is useful when the display device includes its own frame buffer, which would otherwise result in a de-synchronisation between the sync signal for the shutters and the actual display. This is a manual adjustment that the user makes in order to reduce crosstalk/ghosting of the image.
[0099] A second embodiment of the mixing method carried out by mixer 1 according to the present invention will now be described in detail, by reference to
[0100] Full definition images from the two 24 fps sequences 50 and 50′ comprising mosaics A and B by definition are identified as L.sub.iAB and R.sub.iAB respectively (supposing two sequences of a stereoscopic program), index “i” representing the sequential number of a given image at time t. Dashed lines in
[0101] These fully saved images are nevertheless encoded in the form of two complementary mosaics stored in side-by-side merged images fields to ascertain homogeneity of the encoded sequence and compatibility with the MPEG2 compression/decompression protocol, by providing a certain temporal redundancy between successive images. Better definition and fidelity is thus generally obtained at playback with respect to the previously described embodiment, but at the expense of increased processing power requirement and system hardware cost. As for the above-described first embodiment, the encoding (mixing) process according to the present embodiment of the invention also further includes insertion of information in the compressed sequence 80 to enable identification of frame numbers as needed by the reconstruction process to identify image content and rebuild sequences with the proper sequential order and insert interpolated images at appropriate locations in the sequence. Again, such information may be stored in blank lines of merged images for instance.
[0102] The corresponding decoding process carried out by decoder 2 according to the present invention is schematically represented in
[0103] The five merged frames 81 to 85 representative of 30 fps RGB input sequence 80 are expanded to twelve images (six per channel) providing playback sequences 90 and 100 at 36 fps total (72 total, 36 per eye in the case of a three-dimensional stereoscopic program). In total, each group of twelve successive images of playback sequences 90 and 100, presented in a page flipping mode according to the frame sequence indicated by dashed lines 110, comprises two integral original images, six spatially interpolated images and four temporally interpolated images. Alternatively, sequences 90 and 100 could be outputted separately in parallel on two separate channels, as required by some display devices such as a head mounted or auto-stereoscopic devices. In the illustrated example:
1. Image 91 (L.sub.1AB) is totally rebuilt from mosaic L.sub.1A stored in the left field of frame 81 of sequence 80, and mosaic L.sub.1B stored in the right field thereof;
2. Image 101 (R.sub.1AX) is spatially interpolated from mosaic R.sub.1A, taken from the left field of frame 82 of sequence 80;
3. Image 103 (R.sub.2BX) is spatially interpolated from mosaic R.sub.2B, taken from the right field of frame 82 of sequence 80;
4. Image 102 is temporally interpolated from image 101 and image 103;
5. Image 93 (L.sub.2AB) is totally rebuilt from mosaic image L.sub.2A, stored in the left field of frame 83 of sequence 80, and mosaic L.sub.2B stored in the right field thereof;
6. Image 92 is temporally interpolated from image 91 (L.sub.1AB) and image 93 (L.sub.2AB);
7. Image 94 (L.sub.3AX) is spatially interpolated from mosaic L.sub.3A, stored in the left field of frame 84 of sequence 80;
8. Image 96 (L.sub.1BX) is spatially interpolated from mosaic 1,4B, stored in the right field of frame 84 of sequence 80;
9. Image 95 is temporally interpolated from images 94 and 96;
10. Image 104 (R.sub.3AX) is spatially interpolated from mosaic R.sub.3A, stored in the left field of frame 85 of sequence 80;
11. Image 106 (R.sub.4BX) is spatially interpolated from mosaic R.sub.1B, stored in the right field of frame 85 of sequence 80; and
12. Image 105 is temporally interpolated from image 104 and image 106.
[0104] Obviously, one may easily understand that such a reconstruction process requires proper identification of frame order in the 5 frame sequences constituting input sequence 80. Therefore, a frame recognition circuit is provided in decoder 2 to interpret frame number information stored by mixer 1 in merged image sequence 80.
[0105] It can be observed that in this latter embodiment as well as in the first one disclosed in the foregoing description, the first and second image sequences are being encoded and decoded totally independently, without any inference between each other, enabling processing of original video sequences referring to independent scenes.
[0106] The above described example of the second embodiment, processing sources at 24 fps to yield a presentation rate of 72 fps, is only illustrative of a more general process applicable to 24 or 30 fps sources to produce a stereo output at presentation rates such as 60, 72, 96 or 120 fps. The chart below provides additional exemplary arrangements for 24 or 30 fps sources and 60, 72, 96 or 120 fps presentation rates:
TABLE-US-00001 Spatially- Temporally- Source Output Original interpolated interpolated Repeated (fps) (fps) images images images images 24 + 24 60 12 36 12 or 0 0 or 12 24 + 24 77 12 36 24 0 24 + 24 96 12 36 0 48 24 + 24 120 12 36 12 60 30 + 30 60 0 60 0 0 30 + 30 72 0 60 12 or 0 0 or 12 30 + 30 96 0 60 36 0 30 + 30 120 0 60 0 60
[0107] As stated above, RGB sequences 90 and 100 obtained through the above described processing could be directly outputted and displayed on a dual input device to reproduce the original programs or stereoscopic program signals at a 72 fps (36 per eye) presentation rate. Further processing is however carried out by decoder 2 to provide a combined stereoscopic RGB output signal (not shown) comprising images of sequences 90 and 100 in a time sequenced arrangement as indicated by dashed arrows such as 110. Still referring to the example of
[0108] Presentation of the time sequenced combined signal with a standard projector or another display device is thus enabled to display the stereoscopic program in a page-flipping mode. Decoder 2 provides the necessary timing signals to a driver of shutter spectacles which can be worn by a viewer to view the displayed stereoscopic program in a three-dimensional mode, with high fidelity, negligible flicking and high comfort. As stated above, presentation rate can be increased up to 120 fps by inserting additional temporally interpolated image pairs or by repeating certain image pairs in the decoding process. It is also contemplated in the present invention that the RGB combined stereo output signal could be converted to another known standard presentation format such as an interlaced format or a conventional 2D format.
[0109] Therefore, one can easily appreciate that the above described embodiments of the present invention provide effective and practical solutions for the recording of two motion picture sequences on a conventional data storage medium, and playback with conventional videodisk player or broadcast source and display device, to enable viewing of stereoscopic 3D movies at home with unmatched performance and comfort, still at an affordable cost, in a plurality of output modes to match input signal requirement of a broad range of display devices. For example a universal set top box fed with a single input signal format as defined in the foregoing description, can be provided with selectable modes such as: page flipping, row interleaved, column interleaved, simultaneous dual presentation, anaglyphic, etc. The encoding/playback method and system of the present invention can thus be advantageously used in miscellaneous applications, including the processing of video sequences representing independent scenes, with numerous advantages over the solutions of the prior art.
[0110] It will thus be readily appreciated that the present invention presents advantages over the prior art. It provides a better quality of images, since no frequency filters are used low pass or band pass), decompression can be effected in real time with minimal resources, is compatible with progressive or interlaced systems, both at input and output, allows for pause, forward, reverse, slow, etc., and supports all stereoscopic displays presently, available.
[0111] Although the present invention has been described by means of preferred embodiments thereof, it is contemplated that various modifications may be made thereto without departing from the spirit and scope of the present invention. Accordingly, it is intended that the embodiment described be considered only as illustrative of the present invention and that the scope thereof should not be limited thereto but be determined by reference to the claims hereinafter provided and their equivalents.