Method for generating and reconstructing a three-dimensional video stream, based on the use of the occlusion map, and corresponding generating and reconstructing device
10051286 ยท 2018-08-14
Assignee
Inventors
Cpc classification
H04N13/161
ELECTRICITY
H04N2013/0081
ELECTRICITY
International classification
H04N19/597
ELECTRICITY
Abstract
Devices and methods for generating a three-dimensional video stream starting from a sequence of video images. The sequence includes a first view (V.sub.0), at least one second view (V.sub.1) of a scene, and a depth map (D.sub.0) of said first view (V.sub.0), or a disparity map of said at least one second view (V.sub.1) with respect to the first view (V.sub.0). At least one occlusion image (O.sub.1) including the occluded pixels of said second view (V.sub.1) is obtained by starting from said depth map (D.sub.0) or from said disparity map. A compacted occlusion image (OC.sub.1)is generated by spatially repositioning said occluded pixels of said at least one occlusion image (O.sub.1), so as to move said pixels closer to one another. The three-dimensional video stream may include said first view (V.sub.0), said depth map (D.sub.0) or said disparity map, and said at least one compacted occlusion image (OC.sub.1).
Claims
1. A method for generating a three-dimensional video stream by starting from a sequence of video images, said sequence comprising a first view (V.sub.0), at least one second view (V.sub.1) of a scene, as well as a depth map (D.sub.0) of said first view (V.sub.0), or a disparity map of said at least one second view (V.sub.1) with respect to the first view (V.sub.0), the method comprising, for one image: obtaining at least one occlusion image (O.sub.1) comprising the occluded pixels of said second view (V.sub.1) by starting from said depth map (D.sub.0) or from said disparity map; generating a compacted occlusion image (OC.sub.1) by spatially repositioning said occluded pixels of said at least one occlusion image (O.sub.1), by moving said occluded pixels spatially closer to one another with respect to their positions in said occlusion image; said three-dimensional video stream comprising, for one image, said first view (V.sub.0), said depth map (D.sub.0) or said disparity map, and said at least one compacted occlusion image (OC.sub.1), wherein said at least one compacted occlusion image (OC1) of said at least one second view (V1) is obtained by: determining at least one first occlusion map (OM0,1) of at least one first estimated sequence (Vsyn1) of said at least one second view (V1) by starting from decoded values (Vdec0, Ddec0) of said first view (V0) or by starting from said first view (V0) and from said depth map (D0) corresponding to said first view (V0), said first occlusion map (OM0,1) comprising a representation of the positions of said occlusions; determining said at least one occlusion image (O1) having values corresponding to those of said at least one second view (V1) of the image in positions corresponding to those of the occlusions represented in said at least one first occlusion map (OM0,1); determining a spatial compaction of said at least one occlusion image (O1) for said positions corresponding to those of the occlusions represented in said at least one occlusion map (OM0,1), thus obtaining said at least one compacted occlusion image (OC1) of said at least one second view (V1).
2. The method according to claim 1, wherein said spatial compaction is obtained by moving occlusion pixels of said at least one occlusion image (O.sub.1), which are located in said positions corresponding to those of the occlusions represented in said at least one occlusion map (OM.sub.0,1), towards one or more sides of the images of said at least one occlusion image (O.sub.1).
3. The method according to claim 2, wherein said occlusion pixels are moved row by row towards one of the sides of the image of said at least one first video sequence (O.sub.1), said movement occurring by removing intermediate pixels of said at least one occlusion image (O.sub.1) which do not correspond to occlusion positions, while preserving the relative sequence of the occlusion pixels row by row or reversing it horizontally.
4. The method according to claim 2, wherein said occlusion pixels are moved row by row towards both sides of the image of said at least one occlusion image (O.sub.1) towards a first side for a first group of rows and towards a second side for a second group of rows of the images of said at least one occlusion image (O.sub.1) by removing intermediate pixels of said at least one occlusion image (O.sub.1) which do not correspond to occlusion positions, said first group and second group of rows being either adjacent or alternate rows.
5. The method according to claim 2, wherein said occlusion pixels are moved row by row towards both sides of the image of said at least one occlusion image (O.sub.1) the pixels of a first group of rows being moved towards a first side of the image in said first group of rows, and the pixels of a second group of rows being moved towards a second side of the image in said first group of rows of the images of said at least one occlusion image (O.sub.1), by removing intermediate pixels of said at least one occlusion image (O.sub.1) which do not correspond to occlusion positions, and leaving said second group of rows free from pixels.
6. The method according to claim 5, wherein said second group of rows is removed from said image of said at least one occlusion image (O.sub.1) by reducing the size of said image.
7. The method according to claim 5, wherein a group of columns (m/2) of said image of said at least one occlusion image (O.sub.1) unoccupied by said occlusion pixels is removed from said image of said at least one occlusion image (O.sub.1), thereby reducing the size of said image.
8. The method according to claim 1, wherein, in the presence of multiple occlusion areas in the image of said at least one occlusion image (O.sub.1), said occlusion pixels are positioned sequentially by removing pixels of said at least one occlusion image (O.sub.1) which do not correspond to occlusion positions.
9. The method according to claim 1, wherein, in the presence of multiple occlusion areas in the image of said at least one occlusion image (O.sub.1), said occlusion pixels are positioned sequentially by placing buffer pixels between pixels of different occlusion areas.
10. The method according to claim 9, wherein said buffer pixels have values which are calculated in a manner such as to decrease the gap in the signal level between two neighboring occlusion areas, and/or between one occlusion area and an adjacent neutral area with no occlusions, by introducing intermediate transition zones between pixels of different occlusion areas.
11. The method according to claim 1, comprising the generation of a sequence of coded images (Vcod.sub.0, Dcod.sub.0, OCcod.sub.0), wherein the coded images comprise the coding of said first view (V.sub.0), of said depth map (D.sub.0) or of said disparity map, and of said at least one compacted occlusion image (OC.sub.1).
12. A device for generating a three-dimensional video stream by starting from a sequence of video images, said sequence comprising a first view (V.sub.0), at least one second view (V.sub.1) of a scene, as well as a depth map (D.sub.0) of said first view (V.sub.0), or a disparity map of said at least one second view (V.sub.1) with respect to the first view (V.sub.0), the device comprising: a three-dimensional video stream generator, the three-dimensional video stream generator configured to: obtain said at least one occlusion image (O.sub.1) comprising the occluded pixels of said at least one second view (V.sub.1) by starting from a depth map (D.sub.0) of said first view (V.sub.0), or from a disparity map of said at least one second view (V.sub.1) with respect to the first view (V.sub.0); generate said at least one compacted occlusion image (OC.sub.1) by spatially repositioning said occluded pixels of said at least one occlusion image (O.sub.1) by moving said occluded pixels spatially closer to one another with respect to their positions in said occlusion image; obtain said three-dimensional video stream comprising, for one image, sad first view (V.sub.0), said depth map (D.sub.0) or said disparity map, and said at least one compacted occlusion image (OC.sub.1), wherein said three-dimensional video stream generator is configured to generate said at least one compacted occlusion image (OC1) by: determining said at least one first occlusion map (OM0,1) of said at least one first estimated sequence (Vsyn1) of said at least one second view (V1) by starting from said decoded values (Vdec0, Ddec0) of said first view (V0) or by starting from said first view (V0) and from said depth map (D0) corresponding to said first view (V0), said first occlusion map (OM0,1) comprising a representation of the positions of said occlusions; determining said at least one occlusion image (O1) of images having values corresponding to those of said at least one second view (V1) of the image in positions corresponding to those of the occlusions represented in said at least one first occlusion map (OM0,1); determining said spatial compaction of said at least one occlusion image (O1) for said positions corresponding to those of the occlusions of said at least one occlusion map (OM0,1), thus obtaining said at least one compacted occlusion image (OC1) of said at least one second view (V1).
13. The device according to claim 12, wherein said three-dimensional video stream generator is also configured to generate a sequence of coded images (Vcod.sub.0, Dcod.sub.0, OCcod.sub.0), wherein the coded images comprise the coding of said first view (V.sub.0), of said depth map (D.sub.0) or of said disparity map, and of said at least one compacted occlusion image (OC.sub.1).
14. A method for reconstructing a three-dimensional video stream comprising a sequence of video images, comprising, for one image: receiving a first view (Vdec.sub.0, V.sub.0) of said sequence of video images, a depth map (Ddec0, D.sub.0) of said first view (Vdec.sub.0, V.sub.0), or a disparity map between said first view (Vdec.sub.0, V.sub.0) and at least one second view (Vdec.sub.1, V.sub.1) of said sequence of video images, and at least one compacted occlusion image (OCdec.sub.1, OC.sub.1) obtained by spatially repositioning the occluded pixels of at least one occlusion image (O.sub.1) of said at least one second view (Vdec.sub.1),by moving said occluded pixels spatially closer to one another with respect to their positions in said occlusion image; obtaining at least one reconstructed occlusion image (Odec.sub.1, (O.sub.1) comprising the occluded pixels of said at least one second view (Vdec.sub.1, V.sub.1) repositioned in the position they were in prior to the compaction operation carried out in order to obtain said at least one compacted occlusion image (OC.sub.1); reconstructing said at least one second view (Vdec.sub.1, V.sub.1) by starting from said first view (Vdec.sub.0, V.sub.0), from said depth map (Ddec.sub.0, D.sub.0) or, respectively, from said disparity map, and from said at least one reconstructed occlusion image (Odec.sub.1, O.sub.1); said reconstructed three-dimensional stream comprising said received first view (Vdec.sub.0, V.sub.0) and said at least one reconstructed second view (Vdec.sub.1,V.sub.1), wherein said at least one second view (Vdec1, V1) is reconstructed by: determining at least one second estimated sequence (Vsyn1) of said at least one second view (Vdecl, V1) by using values obtained from said first view (Vdec0, V0) and from said depth map (Ddec0, D0), wherein said at least one second estimated sequence (Vsyn1) is adapted to comprise occlusion areas; determining at least one second occlusion map (OM0,1) of said at least one second estimated sequence (Vsyn1), said occlusion map (OM0,1) comprising a representation of the positions of the occlusions; spatially uncompacting the compacted occlusions of said at least one compacted occlusion image (OCdec1, OC1) in order to obtain at least one second occlusion image (Odec1, O1) comprising an uncompacted occlusion image, by restoring the actual original positions of said occlusions of said at least one second view (Vdec1, V1) by spatially repositioning said occluded pixels of said at least one compacted occlusion image by moving said occluded pixels away one from the other with respect to their position in the compacted occlusion image, based on the positions represented in the occlusion map (OM0,1); replacing the pixels of the occlusion positions of said at least one second occlusion image (Odec1) in corresponding positions of said at least one second estimated sequence (Vsyn1), while leaving the other positions of said at least one second estimated sequence (Vsyn1) unchanged, thus obtaining at least one second view (Vdec1, V1).
15. The method according to claim 14, wherein said at least one second view (Vdec.sub.1) is subjected to a combination artifact compensation operation.
16. The method according to claim 14, comprising the decoding of a sequence of coded images (Vcod.sub.0, Dcod.sub.0, OCcod.sub.0), wherein the coded images comprise the coding of said first view (V.sub.0), of said depth map (D.sub.0) or of said disparity map, and of said at least one compacted occlusion image (OC.sub.1), thus obtaining said first view (Vdec.sub.0) from said depth map (Ddec.sub.0) or, respectively, from said disparity map, and said at least one compacted occlusion image (OCdec.sub.1).
17. A device for reconstructing a three-dimensional video stream comprising a sequence of video images, said device comprising: a three-dimensional video signal reconstructor, the three-dimensional video signal reconstructor configured to: receive a first view (Vdec.sub.0, V.sub.0) of said sequence of video images, a depth map (Ddec.sub.0, D.sub.0) of said first view (Vdec.sub.0, V.sub.0), or a disparity map between said first view (Vdec.sub.0, V.sub.0) and at least one second view (Vdec.sub.1, V.sub.1) of said sequence of video images, and at least one compacted occlusion image (OCdec.sub.1, OC.sub.1) obtained by spatially repositioning the occluded pixels of at least one occlusion image (O.sub.1) of said at least one second view (Vdec.sub.1), moving said occluded pixels spatially closer to one another with respect to their positions in said occlusion image; obtain said at least one reconstructed occlusion image (Odec.sub.1, O.sub.1) by repositioning said occluded pixels of said at least one second view (Vdec.sub.1, V.sub.1) in the position they were in prior to the compaction operation carried out in order to obtain said at least one compacted occlusion image (OC.sub.1); reconstruct said at least one second view (V.sub.dec1, V.sub.1) by starting from said first view (Vdec.sub.0, V.sub.0), from said depth map (Ddec.sub.0, D.sub.0) or, respectively, from said disparity map, and from said at least one reconstructed occlusion image (Odec.sub.1, o.sub.1), wherein said three-dimensional video signal reconstructor is configured to reconstruct said at least one second view (Vdec1, V1) by: obtaining said at least one second estimated sequence (Vsyn1) of said at least one second view (Vdec1, V1) by using values obtained from said first view (Vdec0, V0) and from said decoded depth map (Ddec0, D0), wherein said at least one second estimated sequence (Vsyn1) may comprise occlusion areas; determining said at least one second occlusion map (OM0,1) of said at least one second estimated sequence (Vsyn1), said occlusion map (OM0,1) comprising a representation of the positions of the occlusions; spatially uncompacting said compacted occlusions of said at least one compacted occlusion image (OCdec1, OC1) in order to obtain said at least one second occlusion image (Odec1, O1) comprising an uncompacted occlusion image, based on the position of the occlusions represented in said at least one occlusion map (OM0,1), by restoring the actual original positions of said occlusions of said at least one second view (Vdec1, V1) by spatially repositioning said occluded pixels of said at least one compacted occlusion image by moving said occluded pixels away one from the other with respect to their position in the compacted occlusion image; replacing the pixels of the occlusion positions of said at least one second occlusion image (Odec1) in corresponding positions of said at least one second estimated sequence (Vsyn1), while leaving the other positions of said at least one second estimated sequence (Vsyn1) unchanged, thus obtaining said at least one second view (Vdec1, V1).
18. The device according to claim 17 wherein said three-dimensional video signal reconstructor is also configured to decode said sequence of coded images, by coding of said first view (V.sub.0), of said depth map (D.sub.0) or of said disparity map, and of said at least one compacted occlusion image (OC.sub.1), in order to obtain said first view (Vdec.sub.0) from said depth map (Ddec.sub.0) or, respectively, from said disparity map, and said at least one compacted occlusion image (OCdec.sub.1).
19. The device according to claim 17, comprising a combination artifact compensation block (CART) for said at least one reconstructed second view (Vdec.sub.1).
20. A method for generating a three-dimensional video stream by starting from a sequence of video images generated according to claim 1, wherein said at least one second view (V.sub.1) comprises (k1) views (V.sub.1, . . . V.sub.k1), where k>1 and integer, the method comprising, for one image: establishing an order of the views (V.sub.0, V.sub.1, . . . ,V.sub.k1) comprising said first view (V.sub.0) as a main view; obtaining (k1) occlusion images (O.sub.1, . . . O.sub.k1,), each comprising the occluded pixels of one of said second views (V.sub.1, . . . V.sub.k1) with the corresponding index, by starting from said depth map (D.sub.0, D.sub.dec0) or said disparity map and from said main view (V.sub.0, V.sub.dec0); generating (k1) compacted occlusion images (OC.sub.1,2,k1, OCcod.sub.1,2, . . . k1) by spatially repositioning said occluded pixels of said (k1) occlusion images (O.sub.1, . . . O.sub.k1), so as to move the respective pixels closer to one another; said three-dimensional video stream comprising, for one image, said first view (V.sub.0), said depth map (D.sub.0) of said first view (V.sub.0), and said compacted occlusion images (OC.sub.1,2,k1, OCcod.sub.1, 2, . . . k1).
21. A device for generating a three-dimensional video stream by starting from a sequence of video images, said sequence comprising a first view (V.sub.0) and at least one second view (V.sub.1) of a scene, wherein said at least one second view (V.sub.1) comprises (k1) views (V.sub.1, . . . , V.sub.k1), where k >1 and integer, the device comprising said three-dimensional video stream generator as in claim 1 also configured to: establish an order of the views (V0, V1,. . . , Vk1) comprising said first view (V0) as a main view; obtain (k1) occlusion images (O1, . . . Ok1,), each comprising the occluded pixels of one of said second views (V1, . . . Vk1) with the corresponding index, by starting from said depth map (D0, Ddec0) or said disparity map and from said main view (V0, Vdec0); generate (k1) compacted occlusion images (OC1,2,k1, OCcod1,2, . . . k1) by spatially repositioning said occluded pixels of said (k1) occlusion images (O1, . . . Ok1), so as to move the respective pixels closer to one another; said three-dimensional video stream comprising, for one image, sad first view (V0), said depth map (D0) of said first view (V0), and said compacted occlusion images (OC1,2,k1, OCcod1,2, . . . k1).
22. The method for reconstructing a three-dimensional video stream comprising a sequence of video images reconstructed according to claim 14, wherein said at least one second view (V.sub.1) comprises (k1) views (V.sub.1, . . . ,V.sub.k1), where k>1 and integer, and compacted occlusion images (OC.sub.1,2, . . . k1, OCcod.sub.1, 2, . . . k1) obtained by spatially repositioning the occluded pixels of the occlusion images (O.sub.1,. . . Ok.sub.k1,) of said (k1) views (V.sub.1, . . . V.sub.k1), so as to move said pixels spatially closer to one another, the method comprising, for one image: obtaining (k1) reconstructed occlusion images (Odec.sub.1, . . . Odec.sub.k1; O.sub.1, . . . O.sub.k1) comprising the occluded pixels of said k1 views (V.sub.1, . . . , V.sub.k1) repositioned in the position they were in prior to the compaction operation carried out in order to obtain said compacted occlusion images (OCdec.sub.1, 2, . . . k1, OC.sub.1,2, . . . k1); reconstructing said (k1) views (V.sub.1, . . . , V.sub.k1) by starting from said first view (Vdec.sub.0, V.sub.0), from said depth map (Ddec.sub.0, D.sub.0) or, respectively, from said disparity map, and from said reconstructed occlusion images (Odec.sub.1, . . . Odec.sub.k1; O.sub.1, . . . O.sub.k1); said reconstructed three-dimensional stream comprising said received first view (Vdec.sub.0, V.sub.0) and said (k1) reconstructed views (Vdec.sub.1, . . . Vdec.sub.k1, V.sub.1, . . . , V.sub.k1).
23. The device for reconstructing a three-dimensional video stream comprising a sequence of video images, as in claim 17, wherein said at least one second view (V.sub.1) comprises (k1) views (V.sub.1, . . . , V.sub.k1), where k>1 and compacted occlusion images (OC.sub.1,2, . . . k1, OCcod.sub.1,2, . . . k1) obtained by spatially repositioning the occlusion images (O.sub.1, . . . O.sub.k1) of said (k1) views (V.sub.1, . . . V.sub.k1), so as to move said pixels spatially closer to one another, said three-dimensional video signal reconstructor being also configured to: obtain (k1) reconstructed occlusion images(Odec.sub.1, . . . Odec.sub.k1; O.sub.1, . . . O.sub.k1); comprising the occluded pixels of said k1 views (V.sub.1, . . . V.sub.k1) repositioned in the position they were in prior to the compaction operation carried out in order to obtain said compacted occlusion images (OCdec1,2, . . . k1, OC1,2, . . . k1); reconstruct said (k1) views (V.sub.1, . . . , V.sub.k1) by starting from said first view (Vdec.sub.0, V.sub.0), from said depth map (Ddec.sub.0, D.sub.0) or, respectively, from said disparity map, and from said reconstructed occlusion images (Odec.sub.1, . . . Odec.sub.k1; O.sub.1, . . . O.sub.k1); said reconstructed three-dimensional stream comprising said received first view (Vdec.sub.0, V.sub.0) and said (k1) reconstructed views (Vdec.sub.1, . . . Vdec.sub.k1, V.sub.1, . . . , V.sub.k1).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Further objects and advantages of the present invention will become apparent from the following detailed description of a preferred embodiment (and variants) thereof and from the annexed drawings, which are only supplied by way of non-limiting example, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11) In the drawings, the same reference numerals and letters identify the same items or components.
DETAILED DESCRIPTION OF A FEW EMBODIMENTS OF THE INVENTION
(12)
(13) The diagram shown in
(14) According to the present invention, the signals V.sub.0 and D.sub.0 can also be coded by using a suitable compression technique, e.g., through a standard video encoder such as, for example, one of the AVC/H.264 type.
(15) A video encoder implemented via software usually also provides the decoded images of the input video signals or streams, in that they are used in the motion estimation/compensation process of the encoder. Should decoded images be unavailable in the standard encoder, a suitable decoder may be used for decoding the coded images produced by the encoder with the same video compression and decompression technique in use. The corresponding decoded video signals or streams Vdec.sub.0 and Ddec.sub.0 can then be used by an occlusion estimator block STOC.
(16) The block STOC may comprise a function (View Synthesis), e.g., implemented through a synthesis algorithm, capable of producing an estimated sequence of the view V.sub.1, referred to as Vsyn.sub.1: this sequence is not outputted, but can be used in order to determine the positions of the occluded pixels forming the so-called occlusion map. The synthesis stage implicitly produces a video sequence (also referred to as video stream or simply as video) of the occlusion map, consisting of binary images representing the set of occlusions OM.sub.0,1. A value of 1 in OM.sub.0,1 indicates that it is not possible to synthesize the corresponding pixel of the image of V.sub.1 by starting from the corresponding images of V.sub.0 and D.sub.0. The values 0 represent those areas where the synthesis is successful, i.e., for which there is an estimated value in the sequence Vsyn.sub.1.
(17) The techniques and rules pertaining to the definition of the occlusion maps may vary depending on the synthesis algorithm in use. For example, an occlusion estimation technique may be used by starting from images of the decoded video sequence Vdec.sub.0 and Ddec.sub.0, which generates images with synthesized pixels associated with probabilistic result validity values, depending on the reliability of the estimation. A decision block may assign the value 0 or 1 to a pixel depending on whether synthesis reliability is higher or lower, respectively, than a preset threshold value, or on the basis of any other decision criterion considered to be appropriate. On the coding side, video processing algorithms may also be used which do not generate any synthesized pixel value, while nonetheless being able to estimate the probability that a synthesis algorithm of the view V.sub.1 will give or not a correct value for a given pixel of the image of the corresponding video sequence.
(18) The decoded video sequences Vdec.sub.0 and Ddec.sub.0 are preferably used, as opposed to the original ones V.sub.0 and D.sub.0, so as to obtain procedures and results in line with those that can be obtained at the reconstruction stage, where the original video sequences cannot be accessed. If there is a video signal encoder, a decoder will also be present, arranged in cascade with the video encoder, to obtain the decoded video sequences Vdec.sub.0 and Ddec.sub.0 by starting from the coded streams Vcod.sub.0 and Dcod.sub.0.
(19) The above applies when the invention is implemented with coding on the generation side and decoding on the reconstruction side to decrease the occupation of the channel (for transmission) or of the storage medium (for storage) by the stereoscopic content. If the coding and decoding processes are not carried out, the view V.sub.1 can be calculated directly from V.sub.0 and D.sub.0. In such a case, the V+D+O triplet will be composed of three uncoded sequences V.sub.0, D.sub.0 and OC.sub.1, wherein the latter will have been spatially compressed or compacted in accordance with the present invention. Therefore, also the occlusion estimator block STOC will use the sequences V.sub.0, D.sub.0 instead of Vdec.sub.0 and Ddec.sub.0 (
(20) At this point, it is possible to retrieve a sequence of video images of the occlusions O.sub.1 comprising those images which have pixel values other than zero (or another predetermined neutral value) for the occluded pixels alone, where the corresponding value is present. When zero is selected as the value of non-occluded pixels, the images of O.sub.1 can be obtained by simply multiplying the co-positioned coefficients of the images of the sequence OM.sub.0,1 and V.sub.1.
(21) Typically an image 200 of the video sequence comprising the occlusion images consists of a series of isolated regions containing occluded pixels, i.e., those pixels which are not visible in the corresponding image of the video sequence V.sub.0. One possible representation of a typical component image of O.sub.1 is shown in
(22) The video sequence O.sub.1 of the occluded images can be processed by an occlusion spatial compactor block CSO according to the present invention in various ways that may also take into account the fact that the resulting video sequence may possibly be subjected to compression by the standard video encoder. The first and simplest embodiment will now be described with reference to
(23) Continuing the row-by-row scanning of the image 200, the spatial compactor block detects that the next group of nB rows, from n.sub.A+1 to n.sub.A+n.sub.B, includes as occluded pixels the mA occluded pixels of the region A and the mB occluded pixels of the rectangular region B. Said block will thus copy such two groups of occluded pixels into the image 400 in the same order in which they show in the image 200, moving them from left to right and removing any neutral non-occluded pixels between the regions A and B. The situation of
(24) In the next set of n.sub.C1n.sub.D1 rows (from the n.sub.A+n.sub.B+1-th one to the n.sub.A+n.sub.B+n.sub.C1n.sub.D1-th one) there are additional occluded pixels belonging to the upper part of the occluded region C.sub.1, which in turn constitutes the upper part of the larger occluded region C having a quadrilateral arrow-point shape with one diagonal parallel to the horizontal side of the image 200. The spatial compactor block copies into the image 400 of the sequence OC.sub.1, row by row, all the occluded pixels belonging to the regions A, B e C.sub.1, skipping all the non-occluded pixels between A and B and between B and C.sub.1. It should be noted that, after having been copied and compacted to the left, the pixels of the region C.sub.1 lying against B in the image 400 form a figure having a different shape than the shape they had in the image 200.
(25) For the next n.sub.D1 rows of the image 200 (from the n.sub.A+n.sub.B+n.sub.C1n.sub.D1+1-th row to the n.sub.A+n.sub.B+n.sub.C1-th row), during the scan the spatial compactor encounters, in addition to the occluded pixels of A, B and C.sub.1, also the occluded pixels of the triangular region D.sub.1, which constitutes the upper part of a similar region designated D. These pixels are also compacted to the left against those of C.sub.1 previously encountered while scanning the image 200 from left to right, without the neutral non-occluded pixels between C.sub.1 and D.sub.1 being copied. Also the region D.sub.1 resulting from the compaction process takes, in the image 400, a different shape than the original shape it had in the image 200.
(26) The spatial compactor CSO then continues the compaction operation upon the remaining rows of the image 200, compacting to the left, in succession, also the regions C and D.sub.2, followed by E.sub.1, D.sub.3 and E.sub.2, and finally E.sub.3. The mn image 400 resulting from this operation, shown in
(27) Of course, the occluded regions may alternatively be compacted to the right instead of to the left: in such a case, the compactor may carry out a scan of the rows from right to left and copy into the output image 400, row by row, the occluded pixels in the same order in which they are encountered while scanning the image 200, skipping the non-occluded pixels between any possible pair of occluded regions: the remaining pixels of the row will be filled with the value denoting occlusion absence. A compacted output image will thus be obtained, wherein all the occluded regions have been moved to the right one against the other, thus taking a shape and undergoing a disassembly which are different from those determined by compacting the same image to the opposite side.
(28) It is important to point out that the compaction direction may be totally independent of the scan order and arrangement of the occluded pixels. Any combination of these two operating variables is possible, and the choice may depend, for example, on implementation advantages or on the bit rate reduction obtainable in the subsequent compression step. For example, let us assume that the image 200 of
(29) A second embodiment of the invention is illustrated in
(30) In this case, a generic row of the image 200 of the n rows contained therein, where n is assumed to be an even number, is compacted differently depending on whether it belongs to the upper or lower half of the image.
(31) The occluded pixels in the first n/2 rows of the image are stacked at the left edge of the image produced by the spatial compactor from left to right, whereas those belonging to the last n/2 rows are stacked at the right edge in the same order from right to left. The compaction directions are indicated by the arrows in
(32)
(33) If there are not rows containing a number of occluded pixels greater than m/2, then it is possible to resize the image 600 of OC.sub.1 without losing information, by copying the compacted occluded regions at the lower right edge of the last n/2 rows of the image 600 to the upper right edge of the first n/2 rows of the same image and then removing the last n/2 rows. The image 700 shown in
(34) The output images may be subjected to compression as they are, i.e., in mn/2 size; as an alternative, the compactor may additionally carry out a step of halving the number of images of the output sequence by placing pairs of mn/2 half-images 600 of the sequence (e.g., two temporally successive images) into single mn images.
(35) In one embodiment of the present invention, the resulting video sequence can be compressed by a standard video encoder ENC. In this case, the spatial uncompactor DSO (
(36) The halving of the horizontal dimension of the images of the compacted occlusion image sequence OC.sub.1 is possible without losing information, if the number of occluded pixels in one row of the image does not exceed the value m/4: the hypothesis is verified in most cases, and also for the compressed occlusion image shown in
(37)
(38) In order to maximize the spatial compression efficiency, it is possible to arrange into a single output image four temporally successive m/2n/2 images of the input sequence in accordance with a preset configuration, constant throughout the length of the video. In this case as well, the spatial uncompactor DSO of the decoder 1500 according to the present invention will carry out the operation inverse to that carried out by the spatial compactor of the encoder.
(39) At this point, it is clear that it is possible, in principle, to execute the horizontal halving procedure r times upon the input image O.sub.1 received by the compactor CSO without losing information, provided that the maximum number of occluded pixels per row therein does not exceed m/2r, i.e., r times the number of pixels per row of the full-resolution image. The vertical dimension being equal, the compactor can group 2r images with m/2r horizontal dimension into one image with m horizontal dimension, thereby reducing by a factor 2r the number of images contained in the compacted output sequence OC.sub.1 compared to the occlusion video sequence O.sub.1, and hence to the original video sequence V.sub.1.
(40) As a general rule, it is possible to remove, without losing any occluded pixels, any number of pixels in a row as long as it is smaller than or equal to the number of non-occluded pixels of the occlusion image. This leads to the possibility of making size reductions for any number of pixels, thus altering the width of the images by any quantity, which may even not be an integer submultiple of their horizontal dimensions.
(41) It is clear from the above that it is possible to divide the rows of the images 200 that compose the uncompacted occlusion video O.sub.1 into two compaction areas (see
(42) Now it is also possible to define a larger number of such areas, as shown by way of example in
(43) In this case as well, the compactor may halve the vertical size of the resulting image by executing operations for copying the occlusions from the lower right edge to the upper right edge of the two upper and lower half-images with mn/2 size, so that the occluded pixels of the area II will be stacked at the right edge of the area I, while the occluded pixels of the area IV will be stacked at the right edge of the area III. At this point, the areas II and IV will have been left with no occluded pixels and can be eliminated to obtain the image with halved vertical size mn/2 1200 shown in
(44) As illustrated for the image 700 of
(45) It is clear that, in general, the occlusion images can be sectioned into any integer number of compaction areas to be subjected to a given type of compaction, with the consequence that, as their number increases, the complexity of the operations to be executed will increase as well, especially those carried out by the compactor, while however also increasing the possibility of reducing the area occupied by the neutral areas of the images, still without losing any occluded pixels if the reduced number thereof allows it.
(46) In general, the images of the sequence of compacted occlusions OC.sub.1 outputted by the spatial compactor CSO are characterized by some areas abounding in information content, with high variance and with many high-frequency components, due to abrupt transitions of pixel values between the different occluded regions placed one against the other into a small space and between an occluded region and the neutral area. If the occlusion sequence is to be coded, this circumstance increases the bit rate required for that purpose. In order to reduce the presence of these high-frequency components, and hence to further increase the compression efficiency of the standard video encoder, it is possible to insert between the occluded regions, during the compaction process, a certain number of buffer pixels creating an intermediate transition zone composed of pixels having values suitably calculated to reduce the signal level difference between two neighboring occluded regions and between an occluded region and the adjacent neutral area.
(47) This removal of the abrupt transitions in the video of the images of the compacted occlusions OC.sub.1 can be carried out in many different ways: a fixed or variable number of pixels per row may be used, and the value of the buffer pixels may be calculated with many different mechanisms; moreover, this elimination process can be executed in combination with any one of the compaction techniques described so far. A simple way to implement this measure is to use a fixed number of pixels, preferably a small number of just a few units (e.g., 1, 3, 5 or 7 buffer pixels), preferably an odd number.
(48) In a simple embodiment, the values of the buffer pixels may only depend on pixels belonging to the same row and may be calculated as mean values of adjacent pixels. Let us assume that buffer areas composed of three consecutive pixels z1, z2 and z3, in this order, are interposed between two pixels r1, located before z1, and r2, located after z3, wherein r1 and r2 belong to two regions R1 and R2 which are assumed to be separate from an uninterrupted row of neutral pixels, which otherwise would be placed one against the other by the compactor in the absence of a buffer area.
(49) One possible way of assigning values to the buffer pixels is as follows: z2=(r1+r2)/2, z1=(r1+z2)/2, and z3=(z2+r2)/2. In substance, for buffer areas composed of three pixels, the central one entered can be calculated as a mean value between the two row pixels adjacent to the area, while the two outermost pixels of the area are in turn the mean between the central one and the closest adjacent one. In general, one may use more or less complex buffer pixel calculation formulae, possibly also taking into account pixels present in rows other than those where the buffer pixels are located. One may even consider pixels of the same row or of other rows present in images of the occlusion video referring to previous times, other than the time the current image refers to.
(50) The additional complication due to the insertion of buffer areas is marginal. In the first place, it solely concerns the compactor included in the encoder: the uncompactor on the decoding side will only have to discard the buffer pixels added by the compactor on the coding side from the images of the compacted occlusion video, and therefore it will only need to know which those added pixels are. The other side effect is a decreased maximum allowable number of occlusions in the video sequence that can be represented without losing any occluded pixels from an occlusion video provided with buffer areas. This effect is however negligible in most cases, particularly when compaction methods are used which do not reduce the size of the occlusion images (
(51) Referring back to the encoder diagram of
(52) The compacted occlusion video sequence OC.sub.1 can be compressed by a standard encoder ENC (
(53) As an alternative, the triplet of contents can be stored, in a combined and coordinated manner, on any media in uncoded form, possibly with the addition of the signaling required by the regenerator for rebuilding the original component sequences, in particular the uncompacted occlusion sequence O.sub.1. In this case (
(54) In general, there may also be more than one standard encoder, each one compressing a subset of the three streams to be coded V.sub.0, D.sub.0 and OC.sub.1, so that the sequences can be compressed in a manner optimized in accordance with the characteristics of the video sequence. At any rate, the coded video stream or signal will turn out to be composed, from a logic viewpoint, of three data streams Vcod.sub.0, Dcod.sub.0 and OCcod.sub.1 corresponding to V.sub.0, D.sub.0 and OC.sub.1. This triplet of data streams will constitute the output of the encoder 100. It may be, for example, physically multiplexed into a single data stream by using known techniques in a DVB transport stream or in any other type of data stream container adapted to simultaneously transport multiple input video streams; in such a case, this task will be carried out by a multiplexer device not shown in
(55)
(56) When the reception device 1500 (
(57) The decoded video sequence Vdec.sub.0 relating to the first view may be sent to the display to be represented three-dimensionally in accordance with the particular technique in use, which may be, for example, a stereoscopic or self-stereoscopic one. The video sequence of the decoded depth map Ddec.sub.0 (or D.sub.0) is used in order to synthesize the video sequence relating to the second view Vsyn.sub.1 by starting from the first view Vdec.sub.0 (or V.sub.0) through a block SIV executing an algorithm for synthesizing the second view by starting from a generic view of the stereoscopic pair and of its depth map. Contrary to the coding side, here it is necessary to generate the synthesized images of the sequence Vsyn.sub.1 containing all the pixels that can be synthesized by the algorithm. Generally, they will be composed of a preponderant majority of synthesized pixels occupying almost the whole mn image, while some regions of occluded pixels having unknown values will occupy the remaining part of the image, as shown by way of example in
(58) The occlusion map OM.sub.0,1 is used for restoring the positions of the occluded pixels through a suitable occlusion spatial uncompactor block, which executes the uncompaction operations as well as any operations for expanding the horizontal and/or vertical dimensions and for eliminating the buffer areas, which are inverse to the operations carried out by the spatial compactor block, in the reverse order. It operates by starting from the video sequences of the decoded compacted occlusion images OCdec.sub.1 (or OC.sub.1) and of the occlusion map OM.sub.0,1 to obtain an output video sequence Odec.sub.1 (or O.sub.1) comprising the uncompacted occlusion images. With reference, for example, to the embodiment shown in
(59) The first nA rows of mA pixels of OM.sub.0,1 contain unknown pixels constituting the positions of the pixels of the mAxnA rectangular area of the occluded region A, compacted into the homologous area of
(60) In general, the corresponding images of the video sequences of Odec.sub.1 and Vsyn.sub.1 can be added up in the matrix direction to obtain the video sequence of the decoded second view Vdec.sub.1, the images of which contain values which are valid for both the non-occluded pixels coming from Vsyn.sub.1 and the occluded pixels coming from Odec.sub.1. In this particular embodiment of the invention, this operation can be practically carried out by simply copying the values of the pixels of OCdec.sub.1, row by row, into the positions of the non-synthesized pixels of Vsyn.sub.1 indicated in the occlusion map OM.sub.0,1, in the order in which they are encountered while scanning the rows from left to right: one can thus directly obtain Vdec.sub.1 from Vsyn.sub.1 without necessarily generating the intermediate uncompacted occlusion images Odec.sub.1.
(61) In the case shown in
(62) Of course, the occlusion spatial uncompactor DSO takes into account both the compaction direction and the repositioning order used by the spatial compactor for the occluded pixels at the coding stage. This ensures the obtainment of an uncompacted occlusion sequence Odec.sub.1 which is analogous to the sequence O.sub.1 obtained by starting from the second view V.sub.1, i.e., with the occluded pixels in the positions in which they are located in the respective view. In the case of an embodiment of the compacted occlusion video sequence as shown in
(63) Another embodiment of the occlusion spatial compactor CSO allows to obtain an input video sequence of the compacted occlusions with component images of mn/2 size of the type shown in
(64) At this point, an image equivalent to that shown in
(65) The spatial compactor may have executed, at the coding stage, the additional step of reducing the horizontal dimension of the images 700 that compose the occlusion sequence as shown in
(66) As usual, the pixels belonging to uncopied areas will take the preset value assigned to non-occluded pixels. If the compactor has additionally grouped sets of four m/2n/2 images 900 into single mn images, then the uncompactor will additionally have to, prior to the above-described step, decompose the four sub-images 900 contained in one mn image of the input sequence into pairs of mn images each comprising two sub-images of mn/2 size 800 obtained by executing the operation inverse to that previously carried out, so as to switch from the image 800 to the image 900. Each one of said pairs of mn images will in turn produce, by reiterating the decomposing step, a pair of mn images, each containing one image 600 (
(67) Similar considerations apply to the process for uncompacting the video sequences containing compacted occlusion images in accordance with any one of the modalities shown in
(68) Referring back to the block diagram of the decoder 1500 (
(69) In a simpler case, the synthesized view and the occlusions are simply added up in the matrix direction, so that the occlusions will occupy the positions of those pixels that the synthesis algorithm was not able to estimate. In order to improve the quality of the reconstructed image, it may be useful to adopt filtering techniques adapted to reduce the artifacts created in discontinuity areas between synthesized pixels and decoded occluded pixels when mounting the synthesized view and the occluded areas. This operation is carried out by the optional combination artifact compensation block CART, which is located downstream of the adder and may consist of suitable prior-art numerical filters. In particular, a smoothing (or low-pass) filtering technique may be adopted along the discontinuities between occluded and non-occluded areas.
(70) The decoder 1500 will thus have reconstructed the two original views V.sub.0 and V.sub.1, which may then be used by a reproduction device to display the video stream in three-dimensional mode in accordance with any stereoscopic technique.
(71) In order to be able to correctly reconstruct the video sequence comprising the occlusion images, i.e., with the positions of the occluded pixels determined by the respective map on the coding side, it is necessary that the uncompactor knows the modes in which the compaction process was carried out.
(72) In particular, such modes may relate to: the number of compaction areas, which may be an integer number greater than or equal to 1; the compaction direction used in a given compaction area, which may be either from left to right or from right to left, or, more concisely, to the right or to the left; the order of scanning or positioning of the occluded pixels in a given area, which may also be either to the right or to the left; the horizontal size reduction, i.e., the parameters that allow determining which non-occluded pixels have been eliminated in the rows of the non-occluded image; the vertical size reduction, i.e., the parameters that allow determining if and how many times the operation of vertically compacting the occlusions has been executed by moving the occluded pixels into the upper compaction area and removing from the image the compaction area that contained the occluded pixels; the possible presence of buffer areas and the characteristics thereof
(73) In general, these modes may vary from one image to another within a compacted occlusion video sequence. For simplicity, let us assume that they are constant within one video sequence or one part thereof.
(74) Various scenarios are conceivable as concerns how to communicate said modes to the uncompactor. In a first scenario, a specific compaction mode may be defined once for all, which will always be applied by default to any sequence by any compactor. In such a case, the uncompactor will know such mode and will only have to carry out the corresponding uncompaction operations.
(75) If this hypothesis is not verified, then the compactor may use different compaction modes, e.g., a smaller or greater number of compaction areas, different compaction directions and orders, etc., depending on the input occlusion video sequence. In such a case, the uncompactor can determine the compaction mode in two different ways. In a first approach, the uncompactor analyzes the images of the compacted occlusions OCdec.sub.1, possibly comparing them with the corresponding ones of the sequence OM.sub.0,1, and thus determines a posteriori the compaction modes used on the coding side. This solution offers the advantage of not requiring the transmission of any compaction parameters, while however implying higher computational costs for the uncompactor, which might also make wrong analyses or anyway very complex ones.
(76) In a second approach, the operating parameters can be added by the compactor or can be communicated by the same to the standard video encoder, which can then enter them by whatever means into the coded video stream OCcod.sub.1. This can be done either by using data reserved for future applications and already included in current video coding standards, or by using existing or newly defined fields included in video stream container formats, such as, for example, DVB transport stream, Matroska, etc., which comprise the VDO signal triplet.
(77) In a particularly refined embodiment, the compactor may execute multiple compaction tests in different modes, possibly taking into account the characteristics of the occlusions (e.g., number of occluded pixels, and spatial and temporal distribution thereof) present in the video sequence. For each mode thus tested, the associated bit rate is calculated by subsequently decoding and reconstructing the associated view; finally, the most effective compaction mode resulting from the tests is applied. This technique is particularly advantageous when the coding and decoding processes are delayed, when there are no particular requirements in terms of coding speed, and also for video sequences for which the priority is to reduce as much as possible the bit rate required for transmission and/or storage.
(78) The present invention was tested on a video sequence used for 3D video coding experiments. The stereoscopic sequence called book arrival was used, made available to the scientific community for experimentation and research purposes by the Fraunhofer Heinrich Hertz Institut. Video resolution was 1024768, with 16.67 Hz frequency. In all tests carried out, 300 frames of two stereoscopic views were coded. For the sequences in use, depth maps were also made available, estimated by a suitable algorithm.
(79)
(80)
(81)
(82)
(83)
(84)
(85)
(86) Three tests were carried out for coding and decoding this test signal. In the first test, which was used as a reference, V, D/2 and O, i.e., the video sequences of the main view, of the depth map undersampled to 50%, and of the non-compacted occlusion image (i.e., with the values of the occluded pixels in their original positions), respectively, were coded and decoded. The second test used V, D/2 and O*, i.e., the video sequences of the main view, of the depth map undersampled to 50%, and of the compacted occlusion images, respectively. The third test involved the sequences V, D and O*/2, i.e., the video sequences of the main view, of the non-undersampled depth map, and of the compacted occlusion images reduced by a factor of both horizontally and vertically, respectively. For all tests, a standard H.264 AVC encoder was used for coding all the sequences with a constant quality parameter QP. In order to obtain various coding bit rates, several experiments were carried out with different QP's. The view synthesis algorithm was specially developed in accordance with the state of the art. It receives an input video signal and the associated depth map, and estimates a video obtained from a new viewpoint horizontally displaced with respect to the original video. It uses no strategy for resolving the occlusions, such as, for example, inpainting techniques, and outputs the synthesized video and the occlusion map.
(87)
(88) The above-described embodiment example may be subject to variations without departing from the protection scope of the present invention, including all embodiments equivalent for a man skilled in the art.
(89) The present invention is also applicable to more than 2 views. A non-limiting example of embodiment of a generator of three-dimensional streams with more than two views when coding is used, i.e., an extension of the stereoscopic generator diagram of
(90) The reconstruction process follows the diagram shown in
(91) Finally, the artifact compensation block CART is independently applied to the k1 reconstructed synthesized views obtained by combining the uncompacted occlusions with the synthesized views Vsyn.sub.1, . . . Vsyn.sub.k1 received from the synthesis module SIV.
(92) In the particular case of a stereoscopic video, i.e., a video signal having two views, the present invention can be used in order to implement a video transmission system that can be made backward-compatible by entering the left view, the depth map and the occlusions into a single frame through the use of frame packing arrangement strategies. For example, it is possible to use the tile format to send the left view in 720p format and the depth map undersampled by a factor of 2 in the lower right corner of the tile format; the occlusion image re-organized in accordance with the present invention can be entered as a right view of the tile format. Alternatively, one may use the full-resolution depth map and enter the occlusions in the lower right corner by exploiting the size reduction of the occlusion images as shown in
(93) If an encoder is used in order to reduce the bit rate required for representing the stereoscopic content, the present invention requires that the coding of the left view and of the depth map be carried out prior to the coding of the occlusions. The latter, in fact, must be calculated on the basis of the decoded left view and depth map. This poses a technical problem when using transport formats of the frame packing type that reuse a standard coding and transmission chain operating in real time. In this case, it is not possible to construct a single image including video, depth and occlusions relating to the same time instant, unless occlusion estimation errors can be tolerated. This problem can be solved by introducing a one-image delay when creating the occlusion image. The left view and the depth thereof are coded, with a frame packing approach, at time t.sub.0. After having been decoded, such information is used on the coding side to calculate the occlusions at time t.sub.0. Such occluded information is however sent in frame packing mode at a time t.sub.1 later than t.sub.0. This means that at time t.sub.1 a composite frame will be built on the coding side, which will contain the left view, the depth map at time t.sub.1 and the occlusion image at time t.sub.0. By following this procedure, the decoder will be able to reconstruct the stereoscopic video with a delay of one frame, which is not however a problem, since it is very short (of the order of hundreds of a second) and cannot therefore be perceived by a viewer. Furthermore, it is only a minimal part of the delay introduced by the video stream decoding operations. It must be pointed out that the times t.sub.0, t.sub.1, t.sub.2 relate to the coding order of modern compression standard, which may in general be different from the display time of the same images.
(94) It has been described above the implementation of the present invention in the case of V+D+O three-dimensional video coding, i.e., using three video sequences comprising main view, depth map and occlusions. It can however be used for any type of video coding using a disparity map, such as, for example, a video coding using a view, a disparity map and an occlusion sequence.
(95) The method for obtaining the disparity map of the second view V.sub.1 with respect to the first view V.sub.0 is per se known. In this case, the View Synthesis function of the block STOC (
(96) The assumption has been made in the present description that standard video encoders and decoders should be used to ensure the utmost compatibility with the video processing, storage and transmission devices and infrastructures currently in use. However, this does not exclude the possibility of applying the invention also to video coding and decoding systems employing non-standard encoders and decoders optimized for particular types of video processing.
(97) It has been underlined several times in this description that it is better to avoid subjecting the occlusion images to size reductions that may imply losses of occluded pixels; if removed during such reductions, in fact, the spatial uncompactor will not be able to obtain them from the compacted occlusion images. It may occur that, for a given, and typically limited, quantity of images of the video sequence to be coded, the number of occluded pixels is such that they are removed by the size reduction operations carried out by the spatial compactor. This loss can often be tolerated and produces artifacts which are scarcely or not at all perceivable by the viewer, in that the occluded areas typically cover very small areas. Furthermore, the optional combination artifact compensation block included in the decoder can often fill the voids left by occluded pixels removed during the coding process, by using the video information of adjacent image areas and suitable video processing techniques. It is therefore conceivable that the compactor decides to apply a certain size reduction to the occlusion sequence without verifying whether it will cause losses of occluded pixels in its compacted representation that will be transmitted to the decoder, or at least that it will make such verification for only a limited and predefined part thereof.
(98) The verification may be limited, for example, to the initial part of the sequence, and then the whole occlusion video sequence may be subjected to the maximum size reduction that does not cause losses of occluded pixels, without worrying about the fact that it may cause such losses in some other parts of the sequence.
(99) The present invention can advantageously be implemented through computer programs comprising coding means for implementing one or more steps of the above-described methods, when such programs are executed by computers. It is therefore understood that the protection scope extends to said computer programs as well as to computer-readable means that comprise recorded messages, said computer-readable means comprising program coding means for implementing one or more steps of the above-described methods, when said programs are executed by computers. Further embodiment variations are possible to the non-limiting examples described, without departing from the scope of the invention, comprising all the equivalent embodiments for a skilled in the art.
(100) The elements and characteristics described in the various forms of preferred embodiments can be mutually combined without departing from the scope of the invention.
(101) The advantages deriving from the application of the present invention are apparent.
(102) The present invention allows to efficiently compress a stereoscopic video in V+D+O format by using current video compression techniques. The innovative elements of the technique consist of an occlusion position representation not requiring to be explicitly coded and sent to the decoder, and a reorganization of the occluded pixels to form an image that facilitates the subsequent compression carried out by using standard techniques. The technique proposed herein, furthermore, does not depend on a particular intermediate view synthesis algorithm, and can be easily adapted to the technologies that will become available in the near future. Finally, the present invention ensures backward compatibility with 2D systems, while at the same time allowing generalization for multi-view transmission for self-stereoscopic displays.
(103) From the above description, those skilled in the art will be able to produce the object of the invention without introducing any further construction details.