Method and device for coding the geometry of a point cloud

Abstract

The present principles relate to a method and device method for encoding depth values of orthogonally projected points of a point cloud onto a projection plane. The present principles also relate to a method and device for decoding a point cloud, a computer readable program and a video signal.

Claims

1. A method of encoding points of a point cloud, comprising: obtaining a first encoded depth image by encoding a first depth image in a bitstream, said first depth image comprising at least first depth values of orthogonally projected points of the point cloud onto a projection plane; determining and encoding, in the bitstream, a depth coding mode per image region of a second depth image, the second depth image comprising second depth values of the orthogonally projected points of the point cloud onto the projection plane, said depth coding mode indicating if the second depth values in an image region of the second depth image are also encoded in the bitstream or if the second depth values in the image region of the second depth image are to be interpolated from the first depth values of the first encoded depth image when decoding the point cloud; and responsive to a determination that at least one depth coding mode indicates that the second depth values in an image region of the second depth image are encoded in the bitstream, encoding for the image region, said second depth values in the bitstream.

2. The method of claim 1, wherein determining if the second depth values in an image region of the second depth image are encoded in the bitstream comprises: obtaining a decoded first depth image by decoding the first encoded depth image and a decoded second depth image by encoding and decoding the second depth image; calculating a first rate-distortion cost using a first distance and a first bitrate, said first distance being calculated between the first depth values located in a co-located image region of said decoded first depth image and the second depth values located in a co-located image region of the decoded second depth image, said first bitrate being calculated for encoding said second depth image; and calculating a second rate-distortion cost using a second distance and a second bitrate for encoding said second depth image being considered as being null, said second distance being calculated between the first depth values of the co-located image region of said decoded first depth image and interpolated second depth values obtained by interpolating the second depth values from first depth values in said decoded first depth image; wherein, if the second rate-distortion cost is lower than the first rate-distortion cost then the depth coding mode for said image region indicates that the second depth values in the co-located image region of the second depth image are not encoded in the bitstream, otherwise, the depth coding mode for said image region indicates that the second depth values in the co-located image region of the second depth image are encoded in the bitstream.

3. The method of claim 2, wherein the first and the second distance is computed between at least a part of a reconstructed point cloud and the corresponding part of the point cloud, said at least part of the point cloud being reconstructed from the decoded first depth image and the second depth image.

4. The method of claim 3, wherein said at least part of the point cloud is reconstructed from the second depth values in said image region and from second depth values in at least one previously considered image region.

5. The method of claim 1, wherein determining if the second depth values in an image region of the second depth image are encoded in the bitstream comprises: calculating interpolated second depth values for said image region of the second depth image by interpolating the second depth values from first depth values in said first depth image; and calculating a distance between the second depth values located in said image region of the second depth image and interpolated second depth values obtained by interpolating the second depth values from first depth values of a decoded first depth image obtained by decoding the first encoded depth image; wherein, if the distance is lower than a threshold, then the depth coding mode for said image region indicates that the second depth values in said image region of the second depth image are not encoded in the bitstream, otherwise, the depth coding mode for said image region indicates that the second depth values in said image region of the second depth image are encoded in the bitstream.

6. The method of claim 1, wherein if the depth coding mode for said image region indicates that the second depth values in said image region of the second depth image are not encoded in the bitstream, the second depth values of pixels in said image region of the second depth image are replaced by a constant value before encoding, at least partially, the second depth image.

7. The method of claim 1, wherein the depth coding mode is encoded as a metadata associated with a reconstruction of the point cloud whose geometry is represented by said first and second depth images.

8. A method of reconstructing points of a point cloud, comprising: obtaining a decoded first depth image by decoding a bitstream, said decoded first depth image comprising first depth values of orthogonally projected points of the point cloud onto a projection plane; obtaining from the bitstream, a depth coding mode associated with an image region of a second depth image representing second depth values of the orthogonally projected points of the point cloud onto the projection plane, the depth coding mode indicating whether the second depth values are encoded in the bitstream or not; and if the depth coding mode indicates that the second depth values are encoded in the bitstream, decoding said second depth values from the bitstream; otherwise, determining the second depth values by interpolating said second depth values from first depth values in the decoded first depth image.

9. The method of claim 8, wherein the whole second depth image is decoded from the bitstream when at least one depth coding mode indicates that the second depth values in an image region of the second depth image are decoded from the bitstream.

10. The method of claim 8, wherein a size and shape of an image region of the second depth image are a size and shape of said second depth image.

11. The method of claim 8, wherein said image region of the second depth image is a block of said second depth image or a projected depth patch of said second depth image.

12. A device for encoding points of a point cloud, comprising at least one processor configured to: obtain a first encoded depth image by encoding a first depth image in a bitstream, said first depth image comprising at least first depth values of orthogonally projected points of the point cloud onto a projection plane; determine and encode, in the bitstream, a depth coding mode per image region of a second depth image, the second depth image comprising second depth values of the orthogonally projected points of the point cloud onto the projection plane, said depth coding mode indicating if the second depth values in an image region of the second depth image are also encoded in the bitstream or if the second depth values in the image region of the second depth image are to be interpolated the first depth values from the first encoded depth image when decoding the point cloud; and responsive to a determination that at least one depth coding mode indicates that the second depth values in an image region of the second depth image are encoded in the bitstream, encode said second depth values in the bitstream.

13. The device of claim 12, wherein if the depth coding mode for said image region indicates that the second depth values in said image region of the second depth image are not encoded in the bitstream, the second depth values of pixels in said image region of the second depth image are replaced by a constant value before encoding, at least partially, the second depth image.

14. The device of claim 12, wherein the depth coding mode is encoded as a metadata associated with a reconstruction of the point cloud whose geometry is represented by said first and second depth images.

15. A device for reconstructing a point cloud, comprising at least one processor configured to: obtain a decoded first depth image by decoding a bitstream, said decoded first depth image comprising first depth values of orthogonally projected points of the point cloud onto a projection plane; obtain from the bitstream, a depth coding mode associated with an image region of a second depth image representing the orthogonally projected points of the point cloud onto the projection plane, the depth coding mode indicating whether the second depth values are encoded in the bitstream or not; and if the depth coding mode indicates that the second depth values are encoded in the bitstream, decode said second depth values from the bitstream; otherwise, determine the second depth values by interpolating said second depth values from first depth values in the decoded first depth image.

16. The device of claim 15, wherein the whole second depth image is decoded from the bitstream when at least one depth coding mode indicates that the second depth values in an image region of the second depth image are decoded from the bitstream.

17. The device of claim 15, wherein a size and shape of an image region of the second depth image are a size and shape of said second depth image.

18. The device of claim 15, wherein said image region of the second depth image is a block of said second depth image or a projected depth patch of said second depth image.

19. A non-transitory computer-readable medium including instructions for causing one or more processors to perform: encoding points of a point cloud, by: obtaining a first encoded depth image by encoding a first depth image in a bitstream, said first depth image comprising at least first depth values of orthogonally projected points of the point cloud onto a projection plane; determining and encoding, in the bitstream, a depth coding mode per image region of a second depth image, the second depth image comprising second depth values of the orthogonally projected points of the point cloud onto the projection plane, said depth coding mode indicating if the second depth values in an image region of the second depth image are also encoded in the bitstream or if the second depth values in the image region of the second depth image are to be interpolated from the first depth values of the first encoded depth image when decoding the point cloud; and responsive to a determination that at least one depth coding mode indicates that the second depth values in an image region of the second depth image are encoded in the bitstream, encoding for the image region said second depth values in the bitstream.

20. A non-transitory computer-readable medium including instructions for causing one or more processors to perform: reconstructing points of a point cloud, by: obtaining a decoded first depth image by decoding a bitstream, said decoded first depth image comprising first depth values of orthogonally projected points of the point cloud onto a projection plane; obtaining, from the bitstream, a depth coding mode associated with an image region of a second depth image representing second depth values of the orthogonally projected points of the point cloud onto the projection plane, the depth coding mode indicating whether the second depth values are encoded in the bitstream or not; and if the depth coding mode indicates that the second depth values are encoded in the bitstream, decoding said second depth values from the bitstream; otherwise, determining the second depth values by interpolating said second depth values from first depth values in the decoded first depth image.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) In the drawings, examples of the present principles are illustrated. It shows:

(2) FIG. 1 shows schematically a diagram of the steps of the method for encoding the geometry of a point cloud represented by a first and a second depth images in accordance with an example of the present principles;

(3) FIG. 2 shows schematically a diagram of the step 120 of the method of FIG. 1 in accordance with an embodiment of the present principles;

(4) FIG. 3 shows schematically a diagram of the step 120 of the method of FIG. 1 in accordance with an embodiment of the present principles;

(5) FIG. 4 shows schematically a diagram of the steps of the method for decoding the geometry of a point cloud from a first and a second depth images representing different depth values of orthogonally projected points of an original point cloud in accordance with an example of the present principles;

(6) FIG. 5 shows schematically the method for encoding the geometry and texture of a point cloud as defined in prior art (TMC2);

(7) FIG. 6 shows schematically an example of use of the methods 100 and 200 in the encoding method of FIG. 5;

(8) FIG. 7 shows schematically the method for decoding the geometry and texture of a point cloud as defined in prior art (TMC2);

(9) FIG. 8 shows schematically an example of use of the method 200 in the decoding method of FIG. 7;

(10) FIG. 9 shows an example of an architecture of a device in accordance with an example of present principles; and

(11) FIG. 10 shows two remote devices communicating over a communication network in accordance with an example of present principles; and

(12) FIG. 11 shows the syntax of a signal in accordance with an example of present principles.

(13) Similar or same elements are referenced with the same reference numbers.

DESCRIPTION OF EXAMPLE OF THE PRESENT PRINCIPLES

(14) The present principles will be described more fully hereinafter with reference to the accompanying figures, in which examples of the present principles are shown. The present principles may, however, be embodied in many alternate forms and should not be construed as limited to the examples set forth herein. Accordingly, while the present principles are susceptible to various modifications and alternative forms, specific examples thereof are shown by way of examples in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the present principles to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present principles as defined by the claims.

(15) The terminology used herein is for the purpose of describing particular examples only and is not intended to be limiting of the present principles. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising,” “includes” and/or “including” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Moreover, when an element is referred to as being “responsive” or “connected” to another element, it can be directly responsive or connected to the other element, or intervening elements may be present. In contrast, when an element is referred to as being “directly responsive” or “directly connected” to other element, there are no intervening elements present. As used herein the term “and/or” includes any and all combinations of one or more of the associated listed items and may be abbreviated as“/”.

(16) It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element without departing from the teachings of the present principles.

(17) Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

(18) Some examples are described with regard to block diagrams and operational flowcharts in which each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in other implementations, the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.

(19) Reference herein to “in accordance with an example” or “in an example” means that a particular feature, structure, or characteristic described in connection with the example can be included in at least one implementation of the present principles. The appearances of the phrase in accordance with an example” or “in an example” in various places in the specification are not necessarily all referring to the same example, nor are separate or alternative examples necessarily mutually exclusive of other examples.

(20) Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.

(21) While not explicitly described, the present examples and variants may be employed in any combination or sub-combination.

(22) The present principles are described for encoding/decoding the geometry of a point cloud from two depth images but extends to the encoding/decoding of a sequence of point clouds (temporally dynamic point cloud) because the geometry of the sequence of point clouds is encoded/decoded by/from two sequences (video) of depth images, the two depth images associated with a point cloud being encoded independently of the two depth images of another point cloud of the sequence.

(23) As explained above, a point cloud is orthogonally projected onto a projection plane and two depth images D0 and D1 are obtained from the depth values associated with said projected 3D points. D0 is the first depth image that represents the depth values of the nearest points of the point cloud and D1 is the second depth image that represents the depth values of farthest points of the point cloud. The first depth image D0 is encoded using for example a legacy image/video encoder.

(24) In the following, the term “image region” designates a set of pixels of an image. These pixels may or not be adjacent pixels but all of them share at least one common property.

(25) For example, an image itself may be considered as being an image region. An image may also be split into multiple block and a block is then an image region.

(26) An image region may also have a non-rectangular shape. This is the case, for example, when pixels of an image which have a same (or similar) extracted feature are associated to form an image region.

(27) Examples of feature extracted from an image may be a color, texture, normal vector, etc. . . .

(28) FIG. 1 shows schematically a diagram of the steps of the method 100 for encoding the geometry of a point cloud represented by a first (D0) and a second (D1) depth images in accordance with an example of the present principles.

(29) In step 110, the first depth image D0 is encoded in a bitstream B.

(30) In step 120, a module determines a depth coding mode DCM.sub.i per image region, said depth coding mode indicating if the depth values of pixels in an image region of the second depth image D1 are also encoded in the bitstream B. This depth coding mode is denoted the “explicit” mode in the following.

(31) In step 130, a module encodes said depth coding mode DCM.sub.i in the bitstream B.

(32) In step 140, if at least one depth coding mode DCM.sub.i indicates that the depth values of pixels in an image region of the second depth image D1 are encoded in the bitstream B (“explicit” mode), a module encodes at least partially the second depth image D1 in the bitstream B.

(33) The steps 130 and 140 are repeated until each of the I image regions has been considered.

(34) According to the present principles, an additional depth coding mode is encoded in a bitstream to indicate if an image region of the second depth image D1 is explicitly (or implicitly) encoded in the bitstream. When a depth coding mode associated with an image region of the second depth image D1 indicates that the depth values of pixels in that image region are not encoded in the bitstream (“implicit” mode), the bit rate is decreased compared to the effective transmission of coded data representative of said depth values as disclosed in prior art. Thus, transmitting such a depth coding mode per image region increases the coding efficiency of the depth images representing the geometry of a point cloud.

(35) According to an embodiment, the size and shape of an image region of the second depth image are the size and shape of said second depth image, i.e the image region is the image itself.

(36) A single depth coding mode is then transmitted to indicate if the whole second depth image is (or not) encoded in the bitstream.

(37) According to step 140, the whole second depth image D1 is encoded in the bitstream B when at least one depth coding mode DCM.sub.i is set to “explicit” mode.

(38) According to another embodiment, a depth coding mode is assigned to each image region of the second depth image.

(39) Said image region may have a rectangular shape, e.g. a block of the image, or a non-rectangular shape such as projected depth patches in TMC2.

(40) These embodiments improve the coding efficiency by adapting locally the depth coding mode to the characteristics of the image content.

(41) According to an embodiment of step 120, as illustrated in FIG. 2, determining if the depth values of pixels in an image region of the second depth image D1 are encoded in the bitstream comprises the following steps.

(42) A module obtains a decoded first depth image custom character by decoding the first encoded depth image, and a decoded second depth image by encoding and decoding the second depth image D1.

(43) A current image region of said decoded first depth image custom character is considered. The depth values of pixels in said current image region is a set of depth values noted . A first quality metric Dist.sub.0 is calculated between the depth values of pixels in said current image region and the depth values of co-located pixels in the decoded second depth image custom character , i.e the depth values of pixels in a co-located image region of the decoded second depth image, said set of depth values being denoted . A data rate RA.sub.0 for encoding said second depth image D1 is also calculated.

(44) A first rate-distortion cost Cost.sub.0 is then calculated by taking into account said first distance Dist.sub.0 and said first bitrate RA.sub.0.

(45) A module calculates interpolated depth values for the pixels of the co-located image region of the decoded second depth image custom character by interpolating depth values of pixels in said decoded first depth image . The set of interpolated depth values is denoted .

(46) A second quality metric Dist.sub.1 is calculated between the depth values of pixels in said current image region custom character and the interpolated depth values .

(47) A second rate-distortion cost Cost.sub.1 is then calculated by taking into account said second distance Dist.sub.1, the data rate being considered here as being null because the second depth image is not encoded (transmitted).

(48) If the second rate-distortion cost Cost.sub.1 is lower than the first rate-distortion cost Cost.sub.0 then the depth coding mode DCM.sub.i for the current image region i is set to “implicit”, i.e. indicates that the depth values in the current image region of the second depth image D1 are not encoded in the bitstream. Otherwise, the depth coding mode DCM.sub.i for the current image region i is set to “explicit”, i.e. indicates that the depth values in the current image region the second depth image D1 are encoded in the bitstream.

(49) The steps of this embodiment are repeated until each of the I image regions has been considered.

(50) This embodiment of step 120 provides the best rate-distortion tradeoff for determining whether (or not) the depth values of pixels in an image region of a second depth image are encoded in a bitstream.

(51) According to an alternative embodiment of step 120, as illustrated in FIG. 3, determining if the depth values of pixels in an image region of the second depth image D1 are encoded in the bitstream comprises the following steps.

(52) A module calculates interpolated depth values for the pixels of the co-located image region of the second depth image D1 by interpolating depth values of pixels in said first depth image D0. The set of interpolated depth values is denoted custom character .

(53) A distance DIST is then calculated between the depth values in a current image region i of the second depth image D1, denoted R.sub.i.sup.1, and said interpolated depth values custom character .

(54) If the distance DIST is lower than a threshold TH, then the depth coding mode DCM.sub.i for the current image region i is set to “implicit”, i.e. indicates that the depth values in the current image region of the second depth image D1 are not encoded in the bitstream. Otherwise, the depth coding mode DCM.sub.i for the current image region i is set to “explicit”, i.e. indicates that the depth values in the current image region of the second depth image D1 are encoded in the bitstream.

(55) The steps of this embodiment are repeated until each of the I image regions has been considered.

(56) This alternative embodiment of step 120 provides a sub-optimal rate-distortion trade-off because the metric is calculated without encoding/decoding process but decreases the complexity of the selecting process compared to the complexity of the above optimal embodiment of FIG. 2.

(57) According to an embodiment, a distance DIST between two set of ordered depth values A and B is a distance defined by:

(58) $DIST = {.Math.}_{j = 1}^{J} {(A_{j} - B_{j})}^{2}$

(59) where A.sub.j, respectively B.sub.j, designates the j.sup.th depth value of the ordered set A, respectively B, of J depth values.

(60) Ordering a set of values means that the depth values A.sub.j and B.sub.j represents different depth values of co-located pixels in two distinct depth images.

(61) A distance DIST is not limited to this embodiment and may extend to any other well-known metric for computing a distance between two set of J values, such as, for example, the sum of absolute differences, an average/maximum/minimum of the differences, etc. . . . .

(62) According to an embodiment, a distance DIST is computed between at least a part of a reconstructed point cloud and the corresponding part of the original point cloud.

(63) As an example, the distance DIST is defined by ISO/IEC JTC1/SC29/WG1 MPEG2017/N16763, Hobart, April 2017, Annex B.

(64) Said at least part of the point cloud is reconstructed from the decoded first depth image and a second depth image.

(65) According to an embodiment, said at least part of the point cloud is reconstructed from depth values of pixels in an image region.

(66) According to an embodiment, said at least part of the point cloud is reconstructed from depth values of pixels in a current image region and from depth values of pixels in at least one previously considered image region.

(67) For example, according to this embodiment, a “temporary” second depth image is initialized with a constant value. Then, the depth values of pixels of said temporary second depth image are iteratively either replaced by depth values of encoded/decoded second depth image when a current image region is encoded explicitly (“explicit” mode), or by padding the depth value of the nearest neighboring point previously encoded according to the “explicit” mode.

(68) Thus, the reconstructed point cloud that depends on the encoding of depth values of pixels in previously considered image regions becomes similar to the reconstructed point cloud.

(69) Note that in this embodiment, the “temporary” depth image is not encoded in the bitstream. The second depth image is still encoded according to the method of FIG. 1.

(70) According to an embodiment of step 140, if the depth coding mode DCM.sub.i associated with an image region is set to “implicit”, the depth values of pixels in said image region of the second depth image are replaced by a constant value before encoding, at least partially, the second depth image D1.

(71) According to an embodiment, the depth coding mode DCMi is encoded as a metadata associated with the reconstruction of the point cloud whose geometry is represented by said first and second depth images.

(72) Said metadata may be associated, for example with each image, common to the two images or with each image region, and are used for reconstructing the geometry of a point cloud both at the encoding and decoding side as further explained in relation with FIGS. 5 and 6.

(73) According to an embodiment, the depth coding mode DCMi is encoded as a syntax element of a SEI message, for example, attached to a NAL unit associated to the first depth image D0.

Example of DCM in a SEI Message in HEVC

(74) TABLE-US-00001 Syntax Descriptor dcm_info( payloadSize ) { dcm_mode u(8) }
dcm_mode contains an identifying number that is used to identify the depth coding mode. When dcm_mode equals to 0 it means, for example, the “explicit” mode, when 1 means “implicit” mode.

(75) According to a variant, the depth coding mode could be also in a SPS or PPS message.

(76) According to another embodiment, the depth coding mode DCMi is encoded as a watermark embedded in the depth images.

(77) As a variant, the depth coding mode DCMi is embedded as a visible watermark in an empty area of the first depth image D0.

(78) For example, block of N×N pixels in a pre-defined corner of the first depth image D0: all the pixels of such block are set to a same binary value, for example, 0 (1) to indicate that a depth coding mode DCMi is set to “explicit” (“implicit”).

(79) At the decoder, an average value of the block is then calculated and if said average is closer to 0 than to a maximum value (all the pixel values equal to 1) then the decoded block indicates that the “explicit” mode is used, otherwise, it indicates the “implicit” mode is used.

(80) According to another embodiment, the depth coding mode DCMi would be added to the binary information of a metadata associated to the geometry of the point cloud represented by the first and the second depth images, such as the occupancy map as defined in TMC2.

(81) This embodiment is better suited to specify the depth coding mode DCM.sub.i at finer resolution than per image.

(82) Let's see more in detail how this is implemented in TMC2. The top-level syntax of the current version of TMC2 is shown in Table 1 and Table 2. Table 3 provides the syntax of the encapsulation of the geometry (depth) and texture (color) streams. Table 4 and Table 5 describe the detailed syntax for the occupancy map and block to patch index decoding. And Table 6 describes the syntax for the arithmetic coding of elementary values.

(83) TABLE-US-00002 TABLE 1 Bitstream header Magic Number ReadUint32 Version ReadUint32 Total size ReadUint64 GroupOfFrames × N ReadGroupOfFrames

(84) TABLE-US-00003 TABLE 2 GroupOf Frames header Size ReadUint8 Width ReadUint16 Height ReadUint16 Occupancy resolution ReadUint8 radius2Smoothing ReadUint8 neighborCountSmoothing ReadUint8 radius2BoundaryDetection ReadUint8 thresholdSmoothing ReadUint8 losslessGeo ReadUint8 losslessTexture ReadUint8 noAttributes ReadUint8 Geometric video bitstream ReadVideo( ) Occupancy maps × M ReadOccupancyMap( ) Texture video bitstream ReadVideo( )

(85) TABLE-US-00004 TABLE 3 ReadVideo( ) function Size of the video bit stream ReadUint32 Read video bitstream ReadUint8 × size

(86) TABLE-US-00005 TABLE 4 ReadOccupancyMap( ) function Patch count ReadUint32 Occupancy precision ReadUint8 Max candidate count ReadUint8 Bit Count U0 ReadUint8 Bit Count V0 ReadUint8 Bit Count U1 ReadUint8 Bit Count V1 ReadUint8 Bit Count D1 ReadUint8 Arithmetic bitstream size ReadUint32 Arithmetic bitstream ReadArithmetic( )

(87) TABLE-US-00006 TABLE 5 ReadArithmetic( ) function For all patches U0 DecodeUInt32(bitCountU0) V0 DecodeUInt32(bitCountV0) U1 DecodeUInt32(bitCountU1) V1 DecodeUInt32(bitCountV1) D1 DecodeUInt32(bitCountD1) deltaSizeU0 DecodeExpGolomb( ) deltaSizeV0 DecodeExpGolomb( ) // Block to patch index decoding For all blocks If number of candidate patches > 1 Candidate index Decode If Candidate index == maxCandidateCount block to patch index DecodeUInt32(bitCountPatch) Else Block to patch index = Candidate index // Occupancy map decoding For all blocks If Block to patch index > 0 isFull decode If not Full bestTraversalOrderIndex decode runCountMinusTwo decode Occupancy decode for (size_t r = 0; r < runCountMinusOne; ++r) { runLength decode for (size_t j = 0; j <= runLength; ++j) Block[ traversalOrder[ i++ ] ] = occupancy; occupancy = !occupancy; } For all remaining blocks Block[ traversalOrder[ i++ ] ] = occupancy;

(88) The current syntax encodes the per-block metadata in two steps: first coding the block to patch index for all blocks of the patch image, then coding the occupancy map for those blocks belonging to a patch.

(89) The block to patch index defines the index of the patch associated to each block of the texture and depth images, the blocks forming a regular, square grid. The size of the block is given by the “Occupancy resolution” parameter in the header of the group of frames, and typically set to 16 pixels.

(90) The occupancy map, which indicates what pixels from the texture and depth images represent the point cloud to be reconstructed, is also coded per block. In this case, the blocks form a grid within each “Occupancy resolution” block, the grid being of size “Occupancy precision” and typically set to 4 pixels.

Example of DCM Mode Encoded as Metadata

Example of DCM in the Occupancy Map (Per Image (Frame))—Modification of Table 5

(91) TABLE-US-00007 TABLE 6 ReadArithmetic( ) function For all patches U0 DecodeUInt32(bitCountU0) V0 DecodeUInt32(bitCountV0) U1 DecodeUInt32(bitCountU1) V1 DecodeUInt32(bitCountV1) D1 DecodeUInt32(bitCountD1) deltaSizeU0 DecodeExpGolomb( ) deltaSizeV0 DecodeExpGolomb( ) dcm_mode ReadUint8 // Block to patch index decoding For all blocks If number of candidate patches > 1 Candidate index Decode If Candidate index == maxCandidateCount block to patch index DecodeUInt32(bitCountPatch) Else Block to patch index = Candidate index // Occupancy map decoding For all blocks If Block to patch index > 0 isFull decode If not Full bestTraversalOrderIndex decode runCountMinusTwo decode Occupancy decode for (size_t r = 0; r < runCountMinusOne; ++r) { runLength decode for (size_t j = 0; j <= runLength; ++j) Block[ traversalOrder[ i++ ] ] = occupancy; occupancy = !occupancy; } For all remaining blocks Block[ traversalOrder[ i++ ] ] = occupancy;

(92) According to an embodiment, the depth coding modes DCM.sub.i associated to image regions are binary values of a sequence of binary values, where each binary value indicates a depth coding modes DCM.sub.i for an image region. For example, ‘0’ indicates the “implicit” mode and ‘1’ indicates the “explicit” mode.

(93) According to an embodiment, entropy or Run-length coding methods may be used to encode the sequence of binary.

(94) FIG. 4 shows schematically a diagram of the steps of the method 200 for decoding the geometry of a point cloud from a first (D0) and a second (D1) depth images representing different depth values of orthogonally projected points of an original point cloud in accordance with an example of the present principles.

(95) In step 210, a decoded first depth image is obtained by decoding a bitstream B.

(96) In step 220, a depth coding mode DCM.sub.i associated with a current image region i of a decoded second depth image is decoded from the bitstream B.

(97) In step 230, if the depth coding mode DCM.sub.i indicates that the depth values of pixels in said current image region of the decoded second depth image D1 are encoded in the bitstream B (“explicit” mode), a module decodes at least partially the second depth image D1 from the bitstream B.

(98) Otherwise, in step 240, a module calculates interpolated depth values for the pixels of the image region of the decoded second depth image custom character by interpolating depth values of pixels in the decoded first depth image .

(99) The steps 220-240 are repeated until each of the I image regions has been considered.

(100) The geometry of the point cloud is then reconstructed by deprojecting the decoded first ( custom character ) and second () depth images as defined, for example, in TMC2.

(101) According to an embodiment of the method, calculating interpolated depth values for the pixels of an image region of a second depth image by interpolating depth values of pixels in a first depth image comprises: Determining a co-located pixel in the first depth image for each current pixel of said image region of the second depth image; Determining at least one neighboring pixel of said co-located pixel in the first depth image; Calculating an interpolated depth value for each current pixel taking into account said at least one neighboring pixel in the first depth image.

(102) According to an embodiment, the spatial distance between the co-localized pixel in the first depth image and said at least one neighboring pixel is below a given threshold.

(103) According to an embodiment, the interpolated depth value of a current pixel in an image region of the second depth image is the depth value of the closest neighboring pixel among said at least one neighboring pixel in the first depth image. According to an embodiment, the interpolated depth value of a current pixel in an image region of the second depth image is the maximum depth value of said at least one neighboring pixel in the first depth image.

(104) According to an embodiment, the interpolated depth value of a current pixel in an image region of the second depth image is the minimum depth value of said at least one neighboring pixel in the first depth image.

(105) According to an embodiment, the interpolated depth value of a current pixel in an image region of the second depth image is the average of the depth values of said at least one neighboring pixel in the first depth image.

(106) FIG. 5 shows schematically the method for encoding the geometry and texture of a point cloud as defined in TMC2.

(107) Basically, the encoder captures the geometry information of an original point cloud PC in a first (D0) and a second (D1) depth images.

(108) As an example, the first and second depth images are obtained as follows in TMC2.

(109) Depth patches (set of 3D points of the point cloud PC) are obtained by clustering the points of the point cloud PC according to the normal vectors at these points. All the extracted depth patches are then projected onto a 2D grid and packed while trying to minimize the unused space, and guaranteeing that every T×T (e.g., 16×16) block of the grid is associated with a unique patch, where T is a user-defined parameter that signalled into the bitstream.

(110) Depth images are then generated by exploiting the 3D to 2D mapping computed during the packing process, more specifically the packing position and size of the projected area of each patch. More precisely, let H(u,v) be the set of points of the current patch that get projected to the same pixel (u, v). A first layer, also called the nearest layer or the first depth image D0, stores the point of H(u,v) with the smallest depth value. The second layer, referred to as the farthest layer or the second depth image D1, captures the point of H(u,v) with the highest depth value within the interval [D, D+Δ], where D is a depth value of pixels in the first depth image D0 and Δ is a user-defined parameter that describes the surface thickness.

(111) A first depth image D0 then outputs the packing process. A padding process is also used to fill the empty space between patches in order to generate a piecewise smooth first depth image suited for video compression.

(112) The generated depth images/layers D0 and D1 are then stored as video frames and compressed using any legacy video codec such as HEVC.

(113) The encoder also captures the texture information of the original point cloud PC in a two texture images by encoding/decoding the first and second depth images and reconstructing the geometry of the point cloud by deprojecting said decoded first and second depth images custom character , . Once reconstructed, a color is assigned (color transferring) to each point of the reconstructed point cloud from the color information of the original point cloud PC in a manner of minimizing color information coding error.

(114) According to one embodiment, for each reconstructed point, the color of its nearest point in the original point cloud is assigned as its color to be coded.

(115) A first and a second texture images T0, T1 are then generated by storing the color information to be coded of each reconstructed point in the same position as in the depth images, i.e. (i,u,v).

(116) FIG. 6 shows schematically an example of use of the methods 100 and 200 in the encoding method of FIG. 5.

(117) According to this example, the encoding of the first depth image custom character and the encoding of the second depth image of FIG. 5 are replaced by the encoding method 100 of FIG. 1, and the decoding of the second depth image of FIG. 5 is replaced by the decoding method 200 of FIG. 4.

(118) FIG. 7 shows schematically the method for decoding the geometry and texture of a point cloud as defined in prior art (TMC2).

(119) A decoded first depth image custom character and a decoded second depth image are obtained by decoding the bitstream B. Possibly metadata are also decoded to reconstruct the geometry of the point cloud .

(120) The geometry of the point cloud is thus reconstructed by deprojection said decoded first and second depth images and possibly said metadata.

(121) FIG. 8 shows schematically an example of use of the method 200 in the decoding method of FIG. 7.

(122) According to this example the decoding of the first and second depth images of FIG. 7 is replaced by the decoding method of FIG. 4.

(123) On FIG. 1-8, the modules are functional units, which may or not be in relation with distinguishable physical units. For example, these modules or some of them may be brought together in a unique component or circuit, or contribute to functionalities of a software. A contrario, some modules may potentially be composed of separate physical entities. The apparatus which are compatible with the present principles are implemented using either pure hardware, for example using dedicated hardware such ASIC or FPGA or VLSI, respectively «Application Specific Integrated Circuit», «Field-Programmable Gate Array», «Very Large Scale Integration», or from several integrated electronic components embedded in a device or from a blend of hardware and software components.

(124) FIG. 9 represents an exemplary architecture of a device 90 which may be configured to implement a method described in relation with FIG. 1-8.

(125) Device 90 comprises following elements that are linked together by a data and address bus 91: a microprocessor 92 (or CPU), which is, for example, a DSP (or Digital Signal Processor); a ROM (or Read Only Memory) 93; a RAM (or Random Access Memory) 94; an I/O interface 95 for reception of data to transmit, from an application; and a battery 96.

(126) In accordance with an example, the battery 96 is external to the device. In each of mentioned memory, the word «register» used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). The ROM 93 comprises at least a program and parameters. The ROM 93 may store algorithms and instructions to perform techniques in accordance with present principles. When switched on, the CPU 92 uploads the program in the RAM and executes the corresponding instructions.

(127) RAM 94 comprises, in a register, the program executed by the CPU 92 and uploaded after switch on of the device 90, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.

(128) The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.

(129) In accordance with an example of encoding or an encoder, the ppoint cloud PC is obtained from a source. For example, the source belongs to a set comprising: a local memory (93 or 94), e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk; a storage interface (95), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support; a communication interface (95), e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.11 interface or a Bluetooth® interface); and an picture capturing circuit (e.g. a sensor such as, for example, a CCD (or Charge-Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).

(130) In accordance with an example of the decoding or a decoder, the decoded first and/or second depth images or the reconstructed point cloud is (are) sent to a destination; specifically, the destination belongs to a set comprising: a local memory (93 or 94), e.g. a video memory or a RAM, a flash memory, a hard disk; a storage interface (95), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support; a communication interface (95), e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.11 interface, WiFi® or a Bluetooth® interface); and a display.

(131) In accordance with examples of encoding or encoder, the bitstream B is sent to a destination. As an example, the bitstream B is stored in a local or remote memory, e.g. a video memory (94) or a RAM (94), a hard disk (93). In a variant, one or both bitstreams are sent to a storage interface (95), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (95), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.

(132) In accordance with examples of decoding or decoder, the bitstream B is obtained from a source. Exemplarily, the bitstream is read from a local memory, e.g. a video memory (94), a RAM (94), a ROM (93), a flash memory (93) or a hard disk (93). In a variant, the bitstream is received from a storage interface (95), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (95), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.

(133) In accordance with examples, device 90 being configured to implement an encoding method described in relation with FIG. 1-3, or 5-6, belongs to a set comprising: a mobile device; a communication device; a game device; a tablet (or tablet computer); a laptop; a still picture camera; a video camera; an encoding chip; a still picture server; and a video server (e.g. a broadcast server, a video-on-demand server or a web server).

(134) In accordance with examples, device 90 being configured to implement a decoding method described in relation with FIG. 4 or 7-8, belongs to a set comprising: a mobile device; a communication device; a game device; a set top box; a TV set; a tablet (or tablet computer); a laptop; a display and a decoding chip.

(135) According to an example of the present principles, illustrated in FIG. 10, in a transmission context between two remote devices A and B over a communication network NET, the device A comprises a processor in relation with memory RAM and ROM which are configured to implement a method for encoding the geometry of a point cloud as described in relation with the FIG. 1-3, or 5-6 and the device B comprises a processor in relation with memory RAM and ROM which are configured to implement a method for decoding a point cloud as described in relation with FIG. 4 or 7-8.

(136) In accordance with an example, the network is a broadcast network, adapted to broadcast still pictures or video pictures from a device A to decoding devices including the device B.

(137) A signal, intended to be transmitted by the device A, carries the bitstream B. The bitstream B comprises an encoded first depth image and possibly at least a part of an encoded second depth image as explained in relation with FIG. 1. This signal further comprises an information data representing at least one depth coding mode DCM.sub.i. Each depth coding mode indicates if the depth values of pixels in an image region i of the second depth image are encoded in the bitstream B (“explicit” mode) or not (“implicit” mode).

(138) FIG. 11 shows an example of the syntax of such a signal when the data are transmitted over a packet-based transmission protocol. Each transmitted packet P comprises a header H and a payload PAYLOAD. A bit of the header H, for example, id dedicated to represent a depth coding mode DCM.sub.i. Thus, at least one bit of the header H is dedicated to represent at least one depth coding mode DCM.sub.i.

(139) Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and any other device for processing a picture or a video or other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.

(140) Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a computer readable storage medium. A computer readable storage medium can take the form of a computer readable program product embodied in one or more computer readable medium(s) and having computer readable program code embodied thereon that is executable by a computer. A computer readable storage medium as used herein is considered a non-transitory storage medium given the inherent capability to store the information therein as well as the inherent capability to provide retrieval of the information therefrom. A computer readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples of computer readable storage mediums to which the present principles can be applied, is merely an illustrative and not exhaustive listing as is readily appreciated by one of ordinary skill in the art: a portable computer diskette; a hard disk; a read-only memory (ROM); an erasable programmable read-only memory (EPROM or Flash memory); a portable compact disc read-only memory (CD-ROM); an optical storage device; a magnetic storage device; or any suitable combination of the foregoing.

(141) The instructions may form an application program tangibly embodied on a processor-readable medium.

(142) Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.

(143) As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described example of the present principles, or to carry as data the actual syntax-values written by a described example of the present principles. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

(144) A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.

Method and device for coding the geometry of a point cloud

Assignee

Inventors

Cpc classification

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/103

ELECTRICITY

Classification Explorer

H04N19/132

ELECTRICITY

Classification Explorer

H04N19/20

ELECTRICITY

Classification Explorer

H04N19/17

ELECTRICITY

Classification Explorer

G06T2207/10028

PHYSICS

Classification Explorer

H04N19/597

ELECTRICITY

International classification

Classification Explorer

H04N19/597

ELECTRICITY

Classification Explorer

H04N19/103

ELECTRICITY

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/17

ELECTRICITY

Abstract

Claims

Description