Method for combining images relating to a three-dimensional content
09549163 ยท 2017-01-17
Assignee
Inventors
Cpc classification
H04N13/10
ELECTRICITY
H04N13/161
ELECTRICITY
H04N13/183
ELECTRICITY
International classification
Abstract
A method for superimposing images on three-dimensional content, wherein a video stream is received which includes the three-dimensional content and a depth map for superimposing images on the three-dimensional content. Once the video stream has been received, images are superimposed on the three-dimensional content in a position in depth dependent on the superimposition depth map (DM). The superimposition depth map contains information about the depth of the three-dimensional content and is inserted as an image contained in a frame (C) of the video stream. The depth map has a smaller number of pixels than that of a two-dimensional image associated with the three-dimensional content. The invention also relates to devices enabling the implementation of the methods.
Claims
1. A method for superimposing images on a three-dimensional content, the method comprising: receiving a video stream that includes a composite frame comprising said three-dimensional content and a depth map for superimposing images on said three-dimensional content, said three-dimensional content including a left image and a right image, said depth map containing information about a depth of said three-dimensional content, and being positioned as an image in said composite frame of said video stream in a region of the composite frame not occupied by the right image and the left image, using said depth map in a playback phase only for the superimposition of locally generated images to said three-dimensional content, said superimposition being made in a position in depth depending on said superimposition depth map, wherein said depth map has a smaller number of pixels than a two-dimensional image associated with said three-dimensional content.
2. The method according to claim 1, wherein said superimposition depth map only contains information about the depth of pixels located in a lower half or a lower third of said three-dimensional content.
3. The method according to claim 1, wherein the superimposition depth map has a non-uniform resolution, wherein a lower half or a lower third of said depth map has a higher resolution than an upper part of said depth map.
4. The method according to claim 1, wherein said superimposition depth map has a lower resolution than a two-dimensional image associated with said three-dimensional content.
5. The method according to claim 4, wherein said three-dimensional content is an image consisting of a plurality of pixels, and wherein said depth map is obtained by undersampling a depth map whose elements correspond to the depth of the pixels of said three-dimensional content.
6. The method according to claim 5, wherein, after undersampling said depth map, the undersampled map is divided into blocks and each pixel of the block is given a same value equal to a minimum depth of the pixels of said block or to a mean value of the depth of the pixels of the block.
7. The method according to claim 5, wherein, prior to undersampling said depth map, the depth map is divided into blocks and each pixel of the block is given a same value equal to a minimum depth of the pixels of said block or to a mean value of the depth of the pixels of the block.
8. The method according to claim 6, wherein said blocks have a size equal to a multiple of an elementary block of 2=2 pixels.
9. The method according to claim 1, wherein said superimposition depth map is entered into a portion of said frame not intended for display.
10. The method according to claim 1, wherein said depth map is broken up into blocks distributed in areas of said frame (C) which are not occupied by said three-dimensional content.
11. The method according to claim 1, wherein said frame comprises a right image, a left image and said depth map, wherein said depth map is broken up into blocks distributed in regions of the frame which are not occupied by said three-dimensional content, and wherein said frame is coded according to the H.264 coding standard.
12. The method according to claim 1, wherein said three-dimensional content comprises a two-dimensional image and information which allows to rebuild the other image of a stereoscopic pair, and wherein said superimposition depth map is entered into a portion of the two-dimensional image.
13. The method according to claim 1, wherein said frame comprises a flag adapted to indicate to a receiver the position of said superimposition depth map within said frame.
14. The method according to claim 1, wherein said video stream comprises a flag adapted to indicate to a receiver the position of said superimposition depth map within said frame, said flag being external to said frame.
15. A device for reproducing three-dimensional content, the device comprising: means for receiving a video stream comprising a composite frame containing three-dimensional content and a depth map, means for combining a locally generated image with said three-dimensional content, wherein said means for combining a locally generated image with said three-dimensional content is adapted to implement a method according to claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Said embodiments will be described with reference to the annexed drawings, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16) Where appropriate, similar structures, components, materials and/or elements are designated by means of similar references in different figures.
DETAILED DESCRIPTION OF THE INVENTION
(17)
(18) For the purposes of the present invention, a three-dimensional (or 3D) content is an image or a video which is perceived by the observer as having variable depth, where elements can protrude from the screen plane on which said image or video is being displayed or projected.
(19) The expression to superimpose two images refers herein to any form of combination of two images, e.g. in transparency, half-transparency or complete opacity.
(20) The present invention equally applies to any type of superimposition, whether static or dynamic, i.e. having fixed or time-variable graphic characteristics, which in turn may be either two-dimensional or three-dimensional.
(21) The depth of a three-dimensional content relates to the dimension of the three-dimensional content which enters into the screen along an axis orthogonal to the screen on which the 3D content is being displayed. For the purposes of the present description, the screen corresponds to a zero depth point, while the minimum depth point is that point of the 3D content which is perceived by the user as closest to him/herself, i.e. farthest from the screen. Accordingly, the maximum depth point is that point which is perceived by the observer as deepest into the screen, i.e. farthest from him/herself, even beyond the screen plane.
(22) In
(23) As an alternative to the example of
(24) The device 100 allows to implement a method for multiplexing two images of the two sequences 102 and 103 and the depth map of the sequence 106.
(25) In order to implement the method for multiplexing the right and left images and the depth map, the device 100 comprises a disassembler module 104 for breaking up an input image (the right image in the example of
(26) One example of a multiplexing method implemented by the device 100 will now be described with reference to
(27) The method starts in step 200. Subsequently (step 201), one of the two input images (right or left) is broken up into a plurality of regions, as shown in
(28) The frame R of
(29) The disassembly of the image R is obtained by dividing it into two portions of the same size and subsequently subdividing one of these portions into two portions of the same size.
(30) The region R1 has a size of 640720 pixels and is obtained by taking all the first 640 pixels of each row. The region R2 has a size of 640360 pixels and is obtained by taking the pixels from 641 to 720 of the first 360 rows. The region R3 has a size of 640360 pixels and is obtained by taking the remaining pixels of the image R, i.e. the pixels from 641 to 720 of the last 360 rows.
(31) In the example of
(32) Subsequently (steps 202, 203 and 205) the composite image C is constructed, which comprises the information pertaining to both the right and the left images and to the depth map received; in the example described herein, said composite image C is a frame of the output stereoscopic video stream, and therefore it is also referred to as container frame.
(33) First of all (step 202), the input image received by the device 100 and not disassembled by the device 105 (the left image L in the example of
(34) In the example of
(35) When in the following description reference is made to entering an image into a frame, or transferring or copying pixels from one frame to another, it is understood that this means to execute a procedure which generates (by using hardware and/or software means) a new frame comprising the same pixels as the source image.
(36) The (software and/or hardware) techniques for reproducing a source image (or a group of pixels of a source image) into a target image are considered to be unimportant for the purposes of the present invention and will not be discussed herein any further, in that they are per se known to those skilled in the art.
(37) In the next step 203, the image disassembled in step 201 by the module 104 is entered into the container frame. This is achieved by the module 105 by copying the pixels of the disassembled image into the container frame C in the areas thereof which were not occupied by the image L, i.e. areas being external to the area C1.
(38) In order to attain the best possible compression and reduce the generation of artifacts when decompressing the video stream, the pixels of the subimages outputted by the module 104 are copied by preserving the respective spatial relations. In other words, the regions R1, R2 and R3 are copied into respective areas of the frame C without undergoing any deformation, exclusively by means of translation and/or rotation operations.
(39) An example of the container frame C outputted by the module 105 is shown in
(40) The regions R2 and R3 are copied under the area C1, i.e. respectively in the areas C3 and C4, which respectively comprise the first 640 pixels and the following 640 pixels of the last 360 rows.
(41) As an alternative to the solution shown in
(42) The operations for entering the images L and R into the container frame do not imply any alterations to the balance between horizontal and vertical resolution.
(43) In the free pixels of the frame C, i.e. in the area C5, the module 105 enters, in the form of an image, the depth map (DM) pertaining to the stereoscopic pair L and R (step 205). Prior to step 205, the depth map DM may be undersampled, filtered or further processed by the module 107.
(44) The depth map is preferably coded as a gray scale image, the information content of which can therefore be transported by the luminance signal alone, since chrominances are null; this allows to obtain an effective compression of the container frame C.
(45) As shown in the example of
(46) In a preferred embodiment, the superimposition depth map DM has a resolution of 640360 pixels, corresponding to a 4-to-1 undersampling (or decimation) of the original depth map having a resolution of 1280720 pixels, matching that of the images L and R. Each pixel of the undersampled map DM corresponds to a 22 pixel region of the original map. In particular, the 4-to-1 undersampling step can be executed by selecting one row out of two and one column out of two of the original map.
(47) In another embodiment, after decimation the superimposition depth map DM undergoes a processing step wherein it is divided into 1616-pixel macroblocks, and the pixels belonging to the same macroblock are assigned a single depth value. Preferably, this value equals the minimum depth within the macroblock, since this is the most significant value for properly positioning the overlays.
(48) Alternatively, this value is equal to the mean depth value within the macroblock.
(49) The choice of 1616-pixel macroblocks is particularly advantageous when the compression standard in use is H.264, because such macroblocks coincide with those employed in the H.264 standard. With this solution, in fact, compression generates less artifacts and requires a lower bit rate.
(50) The subdivision into blocks of 88 or 44 can also be considered to be advantageous in that, due to the particular characteristics of the H.264 compression algorithm, compression benefits are obtained if the pixels within these blocks are all equal.
(51) Alternatively, but giving up the subdivision into blocks or macroblocks within which the pixels are all equal, the 640360 depth map may be filtered with a two-dimensional low-pass filter. Compression advantages are obtained in this case as well, because the highest spatial frequencies are eliminated or reduced.
(52) Alternatively, the depth map may have a resolution of 16090 pixels, resulting from a 64-to-1 undersampling, wherein each pixel of the depth map DM corresponds to an 88 region of the original map.
(53) In a further embodiment, the superimposition depth map DM entered into the container frame C may have an uneven resolution; in particular, the lower half or third of the superimposition depth map has a higher resolution than the upper part. This solution turns out to be particularly advantageous as concerns the positioning of subtitles or other information such as the audio volume, which are generally placed in the lower part of the image. The receiver can thus use more accurate information about the depth of the pixels in a region of interest, e.g. the lower third of the 3D image, and can therefore position the images (text or graphics) correctly in that region. At the very least, the superimposition depth map may even only contain information about the depth of the pixels (all or only a portion thereof) located in a region of interest, in particular in the lower half or in the lower third of the three-dimensional content.
(54) In another embodiment, a region of the container frame which is not occupied by the right or left images, by portions thereof or by the superimposition depth map is intended for receiving a flag which is necessary for reconstructing the right and left images at demultiplexer level. For example, said flag may relate to how the composite image has been created. Preferably, the flag may contain information useful for properly using the depth map.
(55) The pixels of this flag region are, for example, colored in two colors (e.g. black and white) so as to create a bar code of any kind, e.g. linear or two-dimensional, which carries the flag information.
(56) Once the transfer of both images and of the superimposition depth map received (and possibly also of the flag) into the container frame has been completed, the method implemented by the device 100 ends, and the container frame can be compressed and transmitted on a communication channel and/or recorded onto a suitable medium (e.g. CD, DVD, Blu-ray, mass memory, etc.).
(57) Since the multiplexing operations explained above do not alter the spatial relations among the pixels of one region or image, the video stream outputted by the device 100 can be compressed to a considerable extent while preserving good possibilities that the image will be reconstructed very faithfully to the transmitted one without creating significant artifacts.
(58) Before describing any further embodiments, it must be pointed out that, in the preferred embodiment, the division of the frame R into three regions R1, R2 and R3 corresponds to the division of the frame into the smallest possible number of regions, taking into account the space available in the composite image and the space occupied by the left image entered unchanged into the container frame.
(59) Said smallest number is, in other words, the minimum number of regions necessary to occupy the space left available in the container frame C by the left image.
(60) In general, therefore, the minimum number of regions into which the image must be disassembled is defined as a function of the format of the source images (right and left images) and of the target composite image (container frame C).
(61) Preferably, the image to be entered into the frame is disassembled by taking into account the need for breaking up the image (e.g. R in the above example) into the smallest number of rectangular regions.
(62) In a further embodiment, the right image R is disassembled as shown in
(63) The region R1 corresponds to the region R1 of
(64) The region R2 comprises the 320 columns of pixels adjacent to the region R1, whereas the region R3 comprises the last 320 columns of pixels.
(65) The container frame C can thus be constructed as shown in
(66) The regions R2 and R3 thus rotated occupy 720 pixels of 320 rows; therefore, the areas C3 and C4 are separated from the areas C1 and C2 that contain the pixels copied from the image L and from the region R1.
(67) Preferably, the areas C3 and C4 are separated from the other areas C1 and C2 by at least one safeguard line. In particular, it is advantageous and preferable to copy the pixels of the regions R2 and R3 into the last rows of the container frame C.
(68) Since in this case the container frame is made up of 1080 rows, in the embodiment of
(69) In the example of
(70) As an alternative to positioning R2 and R3 into the last rows of the container frame C (as described with reference to
(71) Finally, in the area C5 in the bottom right corner of the frame C, the superimposition depth map (DM) is entered with a resolution of 16090 pixels, obtained by undersampling the original depth map as previously described. In general, the superimposition depth map may have any resolution, as long as it is contained within a free space of the frame C. For better exploiting the available space, the superimposition depth map may undergo a rotation and/or disassembly step prior to being entered into the frame C.
(72) In a further embodiment, which is described herein with reference to
(73) The region R1 corresponds to the region R1 of
(74) The segment R1 is thus a region having a size of 640720 pixels and occupying the first columns of the frame R to be disassembled.
(75) The segment R3 occupies the last columns of the frame R to be disassembled, and borders on the central region R2. R3 includes, on the left side (the one bordering on R2), a buffer strip Ra3 containing pixels in common with the region R2. In other words, the last columns of R2 and the first ones of R3 (which constitute the buffer strip Ra3) coincide.
(76) Preferably, the size of the buffer strip Ra3 is chosen as a function of the type of compression to be subsequently applied to the container frame C, and in general to the video stream containing it. In particular, said strip has a size which is twice that of the elementary processing unit used in the compression process. For example, the H.264 standard provides for disassembling the image into macroblocks of 1616 pixels, each of which represents this standard's elementary processing unit. Based on this assumption, the strip Ra3 has a width of 32 pixels. The segment R3 therefore has a size of 352 (320+32)720 pixels, and comprises the pixels of the last 352 columns of the image R.
(77) The segment R2 occupies the central part of the image R to be disassembled and includes, on its left side, a buffer strip Ra2 having the same size as the strip Ra3. In the example taking into account the H.264 compression standard, the strip Ra2 is thus 32 pixels wide and comprises pixels in common with the region R1. The segment R2 therefore has a size of 352720 pixels and comprises the pixels of the columns from 608 (640 of R132) to 978 of the frame R.
(78) The three subimages pertaining to the regions R1, R2 and R3 outputted by the module 104 (visible in
(79) In this embodiment as well, the superimposition depth map (DM) is entered into the area C5 in the bottom right corner of the frame C.
(80) The frame C thus obtained is subsequently compressed and transmitted or saved to a storage medium (e.g. a DVD). For this purpose, compression means are provided which are adapted to compress an image or a video signal, along with means for recording and/or transmitting the compressed image or video signal.
(81)
(82) The same remarks made for the receiver 1100 are also applicable to a reader (e.g. a DVD reader) which reads a container frame (possibly compressed) and processes it in order to obtain one pair of frames corresponding to the right and left images entered into the container frame (possibly compressed) read by the reader.
(83) Referring back to
(84) These frames C are then supplied to a reconstruction module 1103, which executes an image reconstruction and depth map extraction method as described below with reference to
(85) It is apparent that, if the video stream is not compressed, the decompression module 1102 may be omitted and the video signal may be supplied directly to the reconstruction module 1103.
(86) The reconstruction process starts in step 1300, when the decompressed container frame C is received. The reconstruction module 1103 extracts (step 1301) the left image L by copying the first 7201080 pixels of the decompressed frame into a new frame which is smaller than the container frame, e.g. a frame of a 720p stream. The image L thus reconstructed is outputted to the receiver 1100 (step 1302).
(87) Subsequently, the method provides for extracting the right image R from the container frame C.
(88) The step of extracting the right image begins by copying (step 1303) a portion of the area R1 included in the frame C. More in detail, the pixels of the first 624(64016) columns of R1 are copied into the corresponding first 624 columns of the new frame representing the reconstructed image Rout, as shown in
(89) Then a central portion of R2 is extracted (step 1304). From the decompressed frame C (which, as aforesaid, corresponds to the frame C of
(90) By cutting the 16 outermost columns of the region R2, those columns are eliminated where formation of artifacts is most likely to occur. The width of the cut area (in this case 16 columns) depends on the type of compression used. Said area is preferably equal to the elementary processing unit used by the compression process; in the case described herein, the H.264 standard operates upon blocks of 1616 pixels, and therefore 16 columns are to be cut.
(91) As regards R3 (step 1305), the pixels of the region C4 are extracted from the frame C and the subimage R3 is brought back to the original row/column format (see
(92) Of course, for both regions R2 and R3 the rotation step may be carried out in a virtual manner, i.e. the same result in terms of extraction of the pixels of interest may be obtained by copying into the reconstructed frame the pixels of a row of the area C3 (if R2, C4 if R3) in a column of the new frame Rout, except for the last 16 rows of the area C3 (if R2, C4 if R3) corresponding to the sixteen columns to be cut, shown in
(93) At this point, the right image Rout has been fully reconstructed and can be outputted (step 1306).
(94) Finally, the reconstruction module 1103 extracts (step 1308) the superimposition depth map DM by copying into a register the luminance values of the last 16090 pixels of the decompressed container frame C, corresponding to the area C5. The content of said register is outputted to the receiver 1100 (step 1309) and will be used for defining the position in depth of images (text or graphics) to be combined with the three-dimensional content transported by the stereoscopic video stream; in particular, it will be used for combining images to be superimposed on the three-dimensional content.
(95) As an alternative or in addition to outputting the content of the depth map and the images L and R extracted from the input frames, the receiver 1100 comprises a character generator and/or a graphic generator and combines other images with the images L and R, i.e. with the three-dimensional content. The images to be combined are selected from a memory area of the receiver and may be stored when manufacturing the receiver (e.g. the graphics of some menus or of the channel numbers) or may be extracted from the video stream (e.g. program guide information and subtitles).
(96) These images are combined with the three-dimensional content in positions in depth that depend on the superimposition depth maps extracted from the video stream. In particular, for each stereoscopic image (produced by the pair of images L and R), the combined image is placed in the point of minimum depth of the stereoscopic image. After the images have been combined with the 3D content, in this embodiment the receiver 1100 outputs a pair of images L* and R* which, when reproduced, will be perceived by the user as a three-dimensional content corresponding to the original one (produced by the images L and R) with images superimposed thereon, e.g. subtitles, menus, graphics, etc.
(97) The process for reconstructing the right and left images and the depth map contained in the container frame C is thus completed (step 1307). Said process is repeated for each frame of the video stream received by the receiver 1100, so that the output will consist of two video streams 1104 and 1105 for the right image and for the left image, respectively, and one data signal deduced from the superimposition depth map.
(98) The process for reconstructing the right and left images and the superimposition depth map described above with reference to
(99) Of course, this is possible if the multiplexing method is standardized.
(100) In order to take into account the fact that the container frame may be generated according to any one of the above-described methods, or anyway according to any one of the methods that utilise the solution which is the subject of the appended claims, the demultiplexer uses the flag information contained in a predefined region of the composite image (e.g. a bar code, as previously described) in order to know how the contents of the composite image must be unpacked and how to reconstruct the right and left images and the superimposition depth map.
(101) After decoding the flag, the demultiplexer will know the position of the unchanged image (e.g. the left image in the above-described examples), as well as the positions and any transformations (rotation, translation or the like) of the regions into which the other image was disassembled (e.g. the right image in the above-described examples) and the position of the superimposition depth map.
(102) With this information, the demultiplexer can thus extract the unchanged image (e.g. the left image) and the depth map and reconstruct the disassembled image (e.g. the right image).
(103) Although the present invention has been illustrated so far with reference to some preferred and advantageous embodiments, it is clear that it is not limited to said embodiments and that many changes may be made thereto by a man skilled in the art wanting to combine into a composite image two images relating to two different perspectives (right and left) of an object or a scene.
(104) For example, the electronic modules that provide the above-described devices, in particular the device 100 and the receiver 1100, may be variously subdivided and distributed; furthermore, they may be provided in the form of hardware modules or as software algorithms implemented by a processor, in particular a video processor equipped with suitable memory areas for temporarily storing the input frames received. These modules may therefore execute in parallel or in series one or more of the video processing steps of the image multiplexing and demultiplexing methods according to the present invention.
(105) It is also apparent that, although the preferred embodiments refer to multiplexing two 720p video streams into one 1080p video stream, other formats may be used as well, such as, for example, two 640480 video streams into one 1280720 video stream, or two 320200 video streams into one 640480 video stream.
(106) Nor is the invention limited to a particular type of arrangement of the composite image, since different solutions for generating the composite image may offer specific advantages.
(107) For example, the embodiments described above with reference to
(108) Alternatively, it is conceivable that the images are also subjected to specular inversion steps, in addition to said rotation and/or translation operations, in order to obtain a composite image of the type shown in
(109) These additional operations are carried out for the purpose of maximizing the boundary perimeters between regions containing homologous pixels, thereby exploiting the strong correlation existing among them and minimizing the artifacts introduced by the subsequent compression step. In the example of
(110) In this figure, the left image L (shown in
(111) Instead, the right image R is disassembled according to the example of
(112) Subsequently, some regions (the regions R1 and R3 in the example of
(113) In the case of inversion relative to a vertical axis, the pixels of the column N (where N is an integer between 1 and 1080, 1080 being the number of columns of the image) are copied into the column 1080+1-N.
(114) In the case of inversion relative to a horizontal axis, the pixels of the row M (where M is an integer between 1 and 720, 720 being the number of rows of the image) are copied into the row 720+1-N.
(115)
(116) The inverted region R1inv is entered into the first 640 pixels of the first 640 pixel rows. As can be seen in the example of
(117)
(118) The container frame C is then completed by entering the region R2.
(119) In this example R2 is not inverted and/or rotated because it would not be possible, in neither case, to match a boundary region of R2 with a boundary region made up of homologous pixels of another region of R or L.
(120) Finally, it is also apparent that the invention also relates to any demultiplexing method which allows a right image and a left image to be extracted from a composite image by reversing one of the above-described multiplexing processes falling within the protection scope of the present invention.
(121) The invention therefore also relates to a method for generating a pair of images starting from a composite image, which comprises the steps of: generating a first one (e.g. the left image) of said right and left images by copying one single group of contiguous pixels from a region of said composite image, generating a second image (e.g. the right image) by copying other groups of contiguous pixels from different regions of said composite image.
(122) According to one embodiment, the information for generating said second image is extracted from an area of said composite image. Said information is preferably encoded according to a bar code.
(123) In one embodiment of the method for generating the right and left images, the generation of the image which was disassembled in the composite image comprises at least one step of specular inversion of a group of pixels of one of said different regions. In one embodiment of the method for generating the right and left images, the generation of the image which was disassembled in the composite image comprises at least one step of removing pixels from one of the regions of the composite image that comprise the pixels of this image to be reconstructed. In particular, the pixels are removed from a boundary area of this region.
(124) In one embodiment, the image which was disassembled into different regions of the composite image is reconstructed by subjecting the pixel regions that include the pixels of the image to be disassembled to translation and/or rotation operations only.
(125) Although the above-described embodiment example refers to entering a superimposition depth map into a container frame in which either one of the two right and left images is disassembled into several parts, it is clear that the invention is not dependent on the manner in which the two right and left images are formatted within the container frame. For example, the two images may be undersampled and arranged side by side (side-by-side format) or one on top of the other (top-bottom format) in order to leave a free space in the frame wherein the superimposition depth map can be placed. Also, either one of the right and left images may be left unchanged, whereas the other one may be undersampled in order to free up space for the depth map.
(126) Finally, it must be remarked that the embodiment examples described above with reference to the annexed drawings relate to a whole depth map, i.e. a depth map computed by decimating or filtering a depth map of the 3D content without however subdividing it into several parts, unlike one of the two images L and R, for example. Nevertheless, this is not a limitation of the present invention, and the superimposition depth map, once generated (or received), may be entered into the container frame by an encoder, which will break it up into multiple parts that will be arranged in different regions of the container frame. For example, as known, in order to code a stereoscopic content, an H.264 encoder has to enter eight additional rows which will be cut by the decoder; in one embodiment, the superimposition depth map can be entered into these eight additional rows by dividing it, for example, into 240 blocks of 88 in size, which when appropriately reassembled will form an image having dimensions proportional to the transported stereoscopic content. One example of block arrangement may be obtained by scanning the rows of a depth map decimated by 16, therefore with a 12072 resolution, wherein strips of 1208 pixels are lined up in order to obtain an 10808-pixel image. In another embodiment, the same decimated depth map may be subdivided into a greater number of strips 8 pixels high by using a 6-pixel offset instead of an 8-pixel one, so that the content becomes redundant and content protection is promoted at the boundary with the main image. This appears to be particularly advantageous whenever the stereoscopic content includes a pair of right and left images multiplexed into a top-bottom, side-by-side or checkerboard format, with such a resolution as to occupy all the potentially displayable pixels in the frame, e.g. the pixels of a 19201080 format.
(127) Preferably, in the event that the frame includes a pair of asymmetrically decimated images (e.g. a side-by-side format wherein the columns are decimated more than the rows, or a top-bottom format wherein only the rows are decimated, not the columns), then the superimposition depth map is obtained by decimating a depth map with a row/column decimation ratio proportional to the one used for sampling the images placed in the same frame. By way of example, assuming that a side-by-side format is used for multiplexing the right and left images in the frame, the row/column decimation ratio will be 1:2, since all rows are kept and the columns are decimated by two. In this case, the superimposition depth map can be obtained by decimating a depth map with a 1:2 row/column decimation ratio.
(128) It is also clear that different methods may be used for signaling the area occupied by the depth map to the receiver other than those described above, which provide for entering a flag into the image; in fact, such a flag may also be included in a data packet of the signal carrying the video stream.