Method and apparatus for encoding and decoding a texture block using depth based block partitioning
11234002 · 2022-01-25
Assignee
Inventors
Cpc classification
H04N13/161
ELECTRICITY
H04N19/90
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/44
ELECTRICITY
H04N19/192
ELECTRICITY
International classification
H04N19/90
ELECTRICITY
H04N13/161
ELECTRICITY
H04N19/597
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/192
ELECTRICITY
Abstract
The invention relates to an apparatus for decoding an encoded texture block of a texture image, the decoding apparatus comprising: a partitioner (510) adapted to determine a partitioning mask (332) for the encoded texture block (312′) based on depth information (322) associated to the encoded texture block, wherein the partitioning mask (332) is adapted to define a plurality of partitions (P1, P2) and to associate a texture block element of the encoded texture block to a partition of the plurality of partitions of the encoded texture block; and a decoder (720) adapted to decode the partitions of the plurality of partitions of the encoded texture block based on the partitioning mask.
Claims
1. An apparatus for decoding an encoded texture block of a texture image, wherein the apparatus comprises: a processor configured to determine a partitioning mask for the encoded texture block based on depth information associated with the encoded texture block, wherein the encoded texture block comprises a rectangle, a plurality of partitions, and a texture block element, and wherein the partitioning mask is configured to: define the plurality of partitions, wherein the partitions divide the rectangle into a plurality of irregular shapes; and associate the texture block element with a partition of the partitions, wherein the processor is further configured to: decode the partitions based on the partitioning mask; adaptively determine a threshold value based on depth information values for an area associated with the encoded texture block; and associate the texture block element to one partition of the partitions based on a comparison of a depth information value associated to the texture block element with the threshold value.
2. The apparatus of claim 1, wherein the processor is further configured to predetermine a number of partitions forming the partitions or adaptively determine the number of partitions by analyzing the depth information associated to the texture block.
3. The apparatus of claim 1, wherein the processor is further configured to determine the partitioning mask in an iterative manner, wherein in each iteration determining the partition mask comprises further dividing a partition fulfilling predetermined selection criteria into sub-partitions until a predetermined termination criterion is fulfilled or as long as a further-partitioning criterion is fulfilled, and wherein the encoded texture block comprises an initial partition for the iterative partitioning.
4. The apparatus of claim 1, wherein the processor is further configured to: extract encoded depth information from a bitstream and decode the encoded depth information to obtain the depth information associated to the encoded texture block.
5. The apparatus of claim 4, wherein the processor is further configured to: extract from a bitstream coding information for a first partition of the partitions of the encoded texture block separately from coding information for a second partition of partitions of the texture block; and decode the first partition using the coding information, wherein the coding information comprises one or more of a prediction mode, a predictor index, a prediction direction, a reference picture index, a reference view index, a transform coefficient, a motion vector, or a coding context.
6. The apparatus of claim 1, wherein the irregular shapes do not comprise rectangles.
7. The apparatus of claim 1, wherein the texture image comprises a matrix of chrominance values and luminance values.
8. The apparatus of claim 1, wherein the texture image is part of a three-dimensional visual scene.
9. The apparatus of claim 1, wherein the partitions comprise three or more partitions.
10. The apparatus of claim 1, wherein each of the partitions comprises a different shape.
11. The apparatus of claim 1, wherein the texture image further comprises a plurality of other encoded texture blocks.
12. The apparatus of claim 11, wherein each of the encoded texture blocks and each of the other encoded texture blocks comprises a different number of partitions.
13. The apparatus of claim 11, wherein the processor is further configured to iteratively determine each of the encoded texture blocks and each of the other encoded texture blocks.
14. The apparatus of claim 11, wherein each of the encoded texture blocks and each of the other encoded texture blocks is associated with a different threshold for determining a number of partitions.
15. The apparatus of claim 11, wherein each of the encoded texture blocks and each of the other encoded texture blocks is associated with a different number of thresholds.
16. A method for decoding an encoded texture block of a texture image, comprising: determining a partitioning mask for the encoded texture block based on depth information associated with the encoded texture block, wherein the encoded texture block comprises a rectangle, a plurality of partitions, and a texture block element, and wherein the determining comprises: defining the plurality of partitions, wherein the partitions divide the rectangle into a plurality of irregular shapes; and associating the texture block element to a partition of the partitions of the encoded texture block based on a comparison of a depth information value associated to the texture block element with a determined threshold value; and decoding the partitions based on the partitioning mask, wherein the threshold value is adaptively determined based on depth information values for an area associated with the encoded texture block.
17. A non-transitory computer readable medium comprising a program code that when executed by a processor cause an apparatus to be configured to: determine a partitioning mask for an encoded texture block based on depth information associated with the encoded texture block, wherein the encoded texture block comprises a rectangle and a texture block element, and wherein the partitioning mask is configured to: define a plurality of partitions, wherein the partitions divide the rectangle into a plurality of irregular shapes; and associate the texture block element with a partition of the partitions of the encoded texture block based on a comparison of a depth information value associated to the texture block element with a determined threshold value; and decode the partitions based on the partitioning mask, wherein determining the threshold value comprises adaptively determining the threshold value based on depth information values for an area associated with the texture block.
18. An apparatus for encoding a texture block of a texture image, comprising: a processor configured to determine a partitioning mask for the texture block based on depth information associated with the texture block, wherein the texture block comprises a rectangle and a texture block element, and wherein the partitioning mask is configured to: define a plurality of partitions of the texture block, wherein the partitions divide the rectangle into a plurality of irregular shapes; and associate the texture block element to a partition of the partitions, wherein the processor is further configured to: encode the texture block by encoding the partitions based on the partitioning mask; adaptively determine a threshold value based on depth information values for an area associated with the texture block; and associate the texture block element to one of the partitions based on a comparison of a depth information value associated to the texture block element with the threshold value.
19. The apparatus of claim 18, wherein the processor is further configured to determine for a first partition of the partitions separately from a second partition of the partitions coding information that is used to encode the first partition, wherein the coding information comprises one or more of a prediction mode, a predictor index, a prediction direction, a reference picture index, a reference view index, a motion vector, a transform coefficient, or a coding context.
20. A method for encoding a texture block of a texture image, comprising: determining a partitioning mask for the texture block based on depth information associated with the texture block, wherein the texture block comprises a rectangle and a texture block element, and wherein the determining comprises: defining a plurality of partitions of the texture block, wherein the partitions divide the rectangle into a plurality of irregular shapes; and associating the texture block element to a partition of the partitions based on a comparison of a depth information value associated with the texture block element with a determined threshold value; and encoding the texture block by encoding the partitions based on the partitioning mask, wherein the threshold value is adaptively determined based on depth information values for an area associated with the texture block.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Further embodiments of the invention will be described with respect to the following figures, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13) Equal or equivalent elements are denoted in the following description of the figures by equal or equivalent reference signs.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
(14) For a better understanding of the embodiments of the invention certain terms used for describing the embodiments of the invention will be explained based on
(15)
(16)
(17) For video coding, the texture image is divided into small parts, called blocks, for example, macro blocks or coding units (CU). In the coding process, the encoder decides about the coding mode for each block, including the possibility to divide each block into smaller sub-parts. This process is usually referred to as block partitioning. As a result, each block may consist of one or more partitions. In recent video codecs, usually rectangular shaped partitions, i.e. partitions of a rectangular shape, are allowed. Additionally, for each block or partition a prediction mode is selected. As the predictive coding is a very efficient method for encoding video content, for each block to be encoded a reference block, which was already encoded prior to the coded block is selected. Such a block is set as a reference for the block to be encoded and only prediction errors with respect to this reference block need to be signaled in the bitstream of the encoded video. A reference block can be selected from blocks of the same picture as the block to be encoded, which is also referred to as intra-picture prediction, or one of the available previously encoded pictures, which is also referred to as inter-picture prediction. For intra-picture-prediction, also referred to as intra-image prediction or short intra prediction, each partition of the block to be encoded is predicted using one or more selected directional predictors. In the inter-picture prediction, also referred to as inter-image prediction or short as inter-prediction, a method known as motion estimation can be applied, which uses motion vectors to specify the spatial location of the reference block in the reference picture relatively to the spatial position of the block to be encoded in the current picture. Additionally, the reference picture needs to be specified, which is typically indicated by a reference picture index. For each partition of the block to be encoded, an independent set of motion vectors and reference picture indices can be selected by the encoder. Consequently, the inter-prediction may be different for each partition. In 3D video additionally an inter-view prediction can be used which allows to use blocks of other views as reference.
(18) Finally, the prediction area, also referred to residuum, i.e., the difference between the prediction of the coded block and the reference block is encoded and transmitted in the bit-stream.
(19)
(20) As mentioned above, texture values of the texture block 114 or texture values of partitions of the texture block 114 can be predicted using reference blocks or reference partitions of reference blocks from the same texture image, from a texture image of a different view for the same time instant, for example T2-1, or from a texture image of the same view T1 from a previously coded, for example T1-1.
(21) The following terms will be used for describing the embodiments of the invention.
(22) The term “image” refers to a two-dimensional representation of data, typically a two-dimensional matrix, and may also be referred to as picture.
(23) The term “visual scene” refers to a real world or synthetic scene that is acquired with a visual system (e.g. single camera or multiple cameras) and represented in form of a still image or video.
(24) The term “3D video frame” refers to a signal comprising information describing 3D geometry of the scene. In particular, this information can be represented by at least two texture images associated with two different viewpoints of the visual scene (stereo image) or at least one texture and depth/disparity map (texture-plus-depth image). An individual 3D video frame may also be referred to as 3D image or 3D picture.
(25) The term “3D video sequence” refers to a set of subsequent 3D video frames representing a motion picture.
(26) The term “texture image” refers to an image, still image or as frame of a video sequence, representing a specific view point, containing information about color and light intensity of the visual scene with regard to the specific viewpoint, typically represented in RGB or YUV format (comprising chrominance and luminance values). Typically a two-dimensional matrix comprising texture information, for example chrominance and luminance values.
(27) The term “depth map” refers to a two-dimensional matrix comprising for each matrix element a corresponding depth value determining the distance to a physical or virtual camera of the visual scene. The depth map can be regarded as a grey scale image in which each grey value corresponds to a depth value or distance. Alternatively, a disparity may be used for determining the depth aspect of the 3D visual scene. The disparity values of the disparity map are inversely proportional to the depth values of the depth map.
(28) The term “disparity map” refers to a two-dimensional representation of the three-dimensional visual scene wherein a value of each element is inversely proportional to the distance of the 3D world point represented by this element to the camera.
(29) The term “coding block” or “block” is a coding unit, usually of regular, rectangular shape, describing the encoded area of the picture or image using a syntax specified for a coding mode selected for the block.
(30) The term “coding mode” describes a set of means and methods used to code, i.e. encode and/or decode, the coded block.
(31) The term “slice” refers to a structure of a video sequence containing a part of the whole picture or image of the video sequence.
(32) The term “slice header” refers to a set of parameters describing the slice, which is sent at the beginning of the slice.
(33) The term “coding unit” (CU) refers to a basic coding structure of the video sequence of a predefined size, containing a part of a picture (texture or depth), for example, a part comprising 64×64 pixels.
(34) The term “Coded block” refers to the area of the image that is encoded which corresponds to the area represented by the coding unit or is a part of this area.
(35) The term “I-slice” refers to a slice in which all coding units are intra-predicted, so no reference to other pictures is allowed.
(36) The term “random access point” defines a point in the structure of the video sequence (2D and 3D) from which a decoder is able to start decoding the sequence without the knowledge of the previous part of the video stream.
(37) The term “group of pictures” (GOP) refers to one of the basic data structures of a video sequence, containing a predefined number of subsequent pictures (texture or depth or both) that are not necessarily ordered within the GOP in the display order.
(38) The term “sequence parameter set” (SPS) refers to a set of parameters sent in form of an organized message containing basic information required to properly decode the video stream and must be signaled at the beginning of every random access point.
(39) The term “picture parameter set” (PPS) refers to a set of parameters sent in form of an organized message containing basic information required to properly decode a picture in a video sequence.
(40) The term “supplemental enhancement information” (SEI) refers to a message that can be signaled in a stream of a video sequence, containing additional or optional information about the video sequence, coding tools etc.
(41) The term “reference block” refers to a block (texture block or depth block) of a picture (texture or depth) which is used to encode the current block in prediction coding (and decoding) and is used as a reference for predictive coding of the current block.
(42) In the following, embodiments of the method for encoding a texture block of a texture image using depth based block partitioning will be described based on
(43)
(44)
(45) The method 200 of depth based block partitioning encoding as shown in
(46) Determining 210 a partitioning mask 332 for the texture block based on depth information 322 associated to the texture block 312, wherein the partitioning mask 332 is adapted to define a plurality of partitions P1, P2 of the texture block and to associate a texture block element of the texture block 312 to a partition of the plurality of partitions.
(47) Encoding 220 the texture block by encoding the partitions P1, P2 of the plurality of partitions of the texture block based on the partitioning mask 332.
(48) In other words, the partitions P1 and P2 determined based on the first depth block 322 are mapped onto the texture block 312 and thus, associate the texture block elements to one of the two partitions P1 or P2.
(49) For the sake of readability, in the following, embodiments of the present invention will be described with reference to the first texture block 312 and the corresponding depth information block 322 and the partitioning mask 332 derived based on the depth block 322 unless otherwise stated. It should be mentioned that this shall not limit embodiments of the invention, which can be also used to partition a texture block into three or more partitions as shown in
(50) The encoding of the partitions of the texture block, which may also be referred to as texture block partitions, may be performed using conventional encoding methods and encoding methods designed especially for the aforementioned depth based block partitioning.
(51) According to an embodiment, the encoding 220 of the partitions of the plurality of partitions of the texture block comprises further the following: determining for a first partition P1 of the plurality of partitions of the texture block 312 separately from a second partition P2 of the plurality of partitions of the texture block 312 the coding information to be used for encoding the first partition P1, the coding information comprising, for example, one or more of the following: a prediction mode, a predictor index, a prediction direction, a reference picture index, a reference view index, transform coefficient, a motion vector and a coding context.
(52) For embodiments of the method as shown in
(53) According to the invention, the block is partitioned into a plurality of partitions (at least two), e.g. into a number of N, N>1, partitions P={P.sub.1, . . . , P.sub.N} by thresholding the depth information values, e.g. depth or disparity values, associated with the points of the coded texture block using the threshold values T={T.sub.1, . . . , T.sub.N-1}. The points of the texture block may also be referred to as or texture elements. For each point of the coded texture block p(x,y), the following comparison between its associated depth or disparity value d(x,y) and the thresholds T is done: If: d(x,y,)≥T.sub.N-1.Math.p(x,y).fwdarw.P.sub.N, Else if: d(x,y,)<T.sub.i.Math.p(x,y).fwdarw.P.sub.i, i=[1,N−1].
(54) The number of thresholds and their values can be predefined or adaptively selected. For determining the number of thresholds, embodiments may include, for example: predefining the number of partitions; or counting a number of peaks detected in a histogram of depth or disparity values calculated for the area associated with the coded texture block.
(55) For determining the values of thresholds, embodiments may include, for example: using predefined values; calculating an average value of the depth or disparity values from the area associated with the coded texture block, and setting the threshold to the calculated average value; calculating a weighted average value of the depth or disparity values from the area associated with the coded texture block, e.g. weights may depend on a distance of the point from center of the texture block, and setting the threshold to the calculated weighted average value; or calculating a median value of the depth or disparity values from the area associated with the coded texture block, and setting the threshold to the calculated median value.
(56) An exemplary result of such Depth Based Block Partitioning (DBBP) for two partitions and using a simple average value to determine a threshold value T.sub.1 is presented in
(57) A second solution for determining the partitioning based on depth information associated with the coded texture block proposed by the invention is the application of an edge detection algorithm on an image representing the depth information in form of a depth or disparity map. In this embodiment, each detected edge determines the border between partitions.
(58) Another embodiment for determining the partitioning is using a segmentation algorithm on an image representing the depth information in form depth or disparity map. The segmentation is performed by analyzing the intensity values, which represent the depth or disparity values, and merging image points with similar or equal value as a single segment. Each partition determined by assigning it all the points belonging to the same segment. Additionally, object-oriented segmentation can be performed which uses more advanced segmentation methods that take into consideration some prior knowledge about shape of the objects and/or perform object detection in the analyzed image.
(59) In a further embodiment, the partitioning can be performed on a picture-level, i.e. the partitions are calculated for the whole picture and partitioning of the coded block is done by just assigning the picture-level partitions from the area corresponding to the coded block. In this way, the picture-level partition is assigned for each point of the coded block and all the points assigned to the same picture-level partition form a block-level partition. This approach applies especially for the depth-based partitioning methods such as object-oriented depth or disparity segmentation, depth or disparity image segmentation or depth or disparity image edge detection.
(60) In a further embodiment, the depth based blocked partitioning is performed in an iterative manner. In each iteration a partition fulfilling predetermined selection criteria is selected and further divided into sub-partitions until a predetermined termination-criterion is fulfilled, or as long as a further-partitioning-criterion is still fulfilled, wherein the texture block performs the initial partition used for the iterative partitioning.
(61)
(62) Selecting 401 a partition to be divided into a predefined number of sub-partitions based on predefined criteria. For the first iteration the texture block as a whole is used as starting partition or initial partition. In the following iterations specified selection criteria are used to select the partition to be divided in step 401.
(63) Dividing 403 the selected partition into the predefined number of sub-partitions using a predefined partitioning method based on depth information associated with the selected partition of the texture block. As partitioning method any of the above mentioned may be used. For example, the threshold based partitioning methods described above are very efficient for iterative partitioning.
(64) Determining, whether the further partitioning of the selected partition into the sub-partitions shall be accepted or kept based on predefined criteria. If yes, the new partitioning becomes the current partitioning, otherwise the previous partitioning is maintained.
(65) Determining whether the iterative partitioning shall be finished or, whether the iterative partitioning shall be continued based on predefined criteria.
(66) Possible selection criteria for the selection a partition to be divided may include, for example, alone or in combination: the largest partition; the partition with largest depth or disparity difference between points within the partition, the difference can be measured, for example, as the difference between the largest and the smallest value, variance, standard deviation, or other statistical moments; the partition neighboring an already encoded/decoded neighboring block which contains more than one partition, with the border between these partitions lying on the border of the two blocks; or the partition with the average depth or disparity value of points within the partition that is the most different from the average values calculated for all or selected neighboring partitions.
(67) Then, the dividing the selected partition into sub-partitions is performed as described previously (see non-iterative variants of DBBP).
(68) Next, the selected partition is tested if the further partitioning should be accepted using a specified criteria. Possible embodiments of the decision function include testing if (a single or a combination of criteria may be used): the size of selected partition is large enough (predefined or adaptive threshold, e.g. dependent on input block size), the depth or disparity difference between points within each sub-partition is small/large enough (predefined or adaptive threshold), the number of sub-partitions is small or large enough (predefined or adaptive threshold).
(69) Finally, the conditions for finishing the partitioning process are checked. Possible embodiments of the decision function include testing if (a single or a combination of criteria may be used): the number of partitions is equal or exceeds the defined number of partitions (predefined or adaptive threshold), the depth or disparity difference between points within each partition is small/large enough (predefined or adaptive threshold), or maximum number of iterations was exceeded (predefined or adaptive threshold).
(70) The two above steps can be combined into one, in which testing both acceptance of new partitioning and conditions to finish the iterative partitioning are performed.
(71) An example of such a partitioning is illustrated in
(72) Further embodiments of the method comprise: adding an coding mode indicator to a bit-stream comprising coding information of the encoded partitions of the plurality of partitions of the texture block, wherein the coding mode indicator indicates whether the partitioning of the texture block was performed using a partitioning mask derived based on depth information associated to the texture block, and/or wherein the coding mode indicator indicates whether a specific partitioning mode of a plurality of different partitioning modes using a partitioning mask derived based on depth information associated to the texture block was used.
(73) In other embodiments, the method further comprises: encoding and decoding depth information associated to the texture block to obtain the depth information used for determining the partitioning mask.
(74) In embodiments of the invention, the depth information associated to the texture block may be depth information comprised in a depth information block associated to the texture block. In embodiments of the invention, the depth information block may be a depth block comprising depth values as depth information or a disparity block comprising disparity values as depth information. In embodiments of the invention, the depth information associated to the texture block is associated to the same area of the image or picture, the same view and/or the same time instant as the texture block. In embodiments of the invention, the depth information may be depth values of a depth map or disparity values of a disparity map. In embodiments of the invention, the texture block element or point may be a picture element or any other spatially larger or smaller element defining a spatial resolution of the texture block.
(75) Further embodiments of the invention can be adapted to use only the depth information associated to the texture block and/or no texture information associated to the texture block for determining the partitioning mask and/or for partitioning the texture block.
(76)
(77) The partitioner 510 is adapted to determine a partitioning mask 332 for the texture block 312 based on depth information 322 associated to the texture block, wherein the partitioning mask 332 is adapted to define a plurality of partitions P1, P2 and to associate a texture block element of the texture block to a partition of the plurality of partitions of the texture block.
(78) The encoder 520 is adapted to encode the partitions of the plurality of partitions of the texture block based on the partitioning mask to obtain an encoded texture block 312′.
(79) Embodiments of the partitioner 510 may be adapted to perform any of the method steps related to the determination of the partitioning mask based on the depth information and the dividing of the texture block into the plurality of partitions, as described herein, e.g. based on
(80) The encoder 520 is adapted to perform any of the embodiments of the step of encoding 220 the texture block as described herein, e.g. based on
(81)
(82) Determining 210 a partitioning mask 332 for the encoded texture block 312′ based on depth information 322 associated to the encoded texture block 312′, wherein the partitioning mask 332 is adapted to associate a texture block element of the encoded texture block 312′ to a partition of a plurality of partitions P1, P2 of the encoded texture block.
(83) Decoding 720 based on the partitioning mask 332 the partitions of the plurality of partitions of the encoded texture block 312′ to obtain a decoded texture block 312″.
(84) For the decoding step 620 conventional decoding methods and decoding methods especially designed for depth based block partitioning may be used to decode the encoded block.
(85)
(86) The partitioner 510 is adapted to determine a partitioning mask 332 for the encoded texture block 312′ based on depth information associated to the encoded texture block 312′, wherein the partitioning mask 332 is adapted to associate a texture block element of the encoded texture block to a partition of a plurality of partitions P1, P2 of the encoded texture block.
(87) The decoder 720 which may also be referred to as texture decoder 720, is adapted to decode, based on the partitioning mask 332, the partitions of the plurality of partitions of the encoded texture block 312′ to obtain the decoded texture block 312″.
(88) The partitioner 510 is adapted to perform any of the steps or functionalities related to the partitioning 210 as described herein, e.g. based on
(89) The decoder 720 is adapted to perform any of the steps or functionalities related to the decoding step 220 as described herein, e.g. based on
(90)
(91) Compared to the encoding apparatus 500 shown in
(92) Referring to the encoding apparatus 500, the depth encoder 810 is adapted to receive the depth information, e.g. in form of a depth map 320 and/or the corresponding depth information blocks 322, and to encode the depth information to obtain the encoded depth information, e.g. an encoded depth map 320′ and/or the corresponding encoded depth information blocks 322′, to the multiplexer 830 and the depth decoder 820. The depth decoder 820 is adapted to perform on the encoded depth information the decoding corresponding to the encoding performed by the depth encoder 810 to obtain a decoded depth information, e.g. a decoded depth map 320″ and/or a decoded depth information block 322″. The partitioner 510 is adapted to receive the decoded depth information, e.g. decoded depth map 322″ and/or the decoded depth information block 322″, and to determine the partitioning mask 332 based on the decoded depth information associated to the texture block 312 to be encoded.
(93) Alternatively, the partitioner 510 may be adapted to receive the original depth information (see broken line arrow in
(94) Using the decoded depth information 322″, which corresponds to the depth information 322″ available at the decoder side for the partitioning, models the situation at the decoding apparatus 700 more accurately, and thus, allows, for example, to calculate the residuum which corresponds to the residuum at the decoder side and to improve the coding efficiency.
(95) The multiplexer 830 is adapted to receive the encoded depth information and the encoded texture information, e.g. the encoded texture block 312′, and to multiplex these and potentially further information onto a bitstream 890, which is transmitted to the decoding apparatus 700. Alternatively, the bitstream may be stored on a storage medium.
(96) Referring to the decoding apparatus 700, the demultiplexer 860 is adapted to extract the depth information 322′, e.g. the encoded depth map and/or the encoded depth information blocks 322′, and the encoded texture block 312′ from the bitstream 890 and to pass the encoded depth information 322′ to the depth decoder 820. The depth decoder 820 is adapted to decode the encoded depth information 322′ to obtain a decoded depth information 322″, e.g. the decoded depth map and/or decoded depth block, which it may output for further processing, and which it also forwards to the partitioner 510 for determining the partitioning mask 332. The texture decoder 720 receives the encoded texture block and decodes, based on the partitioning mask 332 received from the partitioner 510 the encoded texture block to obtain a decoded texture block 312″.
(97) Embodiments of the invention may be used in various ways for texture coding using the depth based block partitioning (DBBP) for 3D and texture-plus-depth video coding.
(98) Embodiments can be adapted to use arbitrary shape partitions determined using DBBP to represent coding information of the coded texture block. Each partition may have its own set or subset of coding information, e.g. motion vectors, disparity vectors, reference picture indices, prediction mode, intra predictor, residuum.
(99) Embodiments can be adapted to use DBBP partitions as a replacement for or in addition to conventional partitioning modes of the codec, i.e. DBBP partitions are the only available partitions used by the codec or enrich the originally available set of partitioning modes of the codec with the additional partitioning mode.
(100) Embodiments can be adapted to use DBBP switchable per sequence, per GOP, per Intra-period, per picture, per slice and per coding unit, and usage of DBBP partitions can be enabled or disabled for the specified range.
(101) Embodiments can be adapted to use DBBP in interleaved video coding, wherein DBBP is independently applied to each field of the interleaved video.
(102) Embodiments can be adapted to efficiently signal by way of adapting existing coding mode indicators a DBBP partitioning in HEVC-based codecs (HEVC—High-Efficiency Video coding). The selection of usage of DBBP partitions to represent the coded texture block is, for example, signaled in dependent texture views as partitioning into two vertical halves (N×2N=N in width×2N in height) and an additional 1-bit dbbp_flag which is required to distinguish the usage of DBBP from original N×2N partitioning.
(103)
(104) Embodiments can be adapted to use DBBP partitions P1, P2 for intra-prediction, wherein the intra-prediction mode is determined for each DBBP partition. The predicted intra-prediction mode is determined for each DBBP partition. The coding costs that are used are calculated for DBBP partition. The coding 520, 720 of each element is done per partition.
(105) Embodiments can be adapted to use DBBP partitions P1, P2 for motion and/or disparity-compensated prediction, wherein motion and/or disparity-vectors, reference picture indices and number of reference pictures are determined for each DBBP partition. Predicted motion and/or disparity vectors, reference picture indices and number of reference pictures are determined for each DBBP partition. The coding costs that are used are calculated for DBBP partition. Coding 520 of each element is done per partition.
(106) Using DBBP partitions for residuum prediction—residuum is determined for each DBBP partition. Predicted residuum is determined for each DBBP partition. The coding costs that are used are calculated for DBBP partition. Coding 520, 720 of each element is done per partition.
(107) Embodiments can be adapted to map arbitrary shape of DBBP partitions onto available regular, e.g. rectangular, partitions for storing of the coding information for the coded block (including partitioning) to be easily referenced (used for prediction) by latter encoded/decoded blocks:
(108) In a first exemplary embodiment using such a mapping, the mapping is performed by down-sampling an original, e.g. pixel-wise, partitioning mask onto 2×2, 4×4, 8×8, 16×16, 32×32, 64×64, etc. pixel grids. The lowest cost partitioning using regular partitions giving the same course-partitioning is selected as a representative for the DBBP partitioning.
(109) In a second exemplary embodiment using such mapping, which can be used in case of two partitions, the mapping is performed by calculating a correlation with all available regular partitioning modes for the current level in a block-tree, e.g. a quad-tree of HEVC-based codecs, and selecting the most similar one as a representative for the DBBP partitioning. For example, the mapping of the DBBP partitions to one of the 6 available two-segment partitioning modes of HEVC as shown in
(110)
(111) The best match can be determined, for example, as follows. For each of the available partitioning modes i∈[0,5] (
(112)
(113) In this way, all the blocks that are coded, i.e. encoded or decoded, after the block with DBBP partitions can easily interpret and utilize the mapped block partitioning scheme of the DBBP block for prediction and/or context derivation using conventional regular shaped partitioning based approaches. However, the DBBP block is still coded with the DBBP partitioning, which means that the mapping procedure does not influence the encoding or decoding process of the DBBP block.
(114) Embodiments of DBBP partitioning combined with such kind of mapping have the following advantages.
(115) Usage of a smaller number of contexts (in particular CABAC context models). Adding new context models is not required or at least the number of added models can be limited to a very small number.
(116) Easier incorporation into existing codecs. Traditional coding modes can easily treat the DBBP block like one of the traditionally coded blocks, no further modifications of the existing methods of prediction from the reference neighboring blocks or development of specific prediction methods from DBBP reference block need to be done.
(117)
(118) Embodiments can be adapted to calculate the cost used for selecting a coding mode for the block and/or partitions. The cost functions can are modified in a way that for each partition only pixels belonging to this partition are taken into account to compute the cost.
(119) Embodiments can be adapted to calculate a single depth or disparity value representing each partition. The representative value is computed as the average, weighted average, median, minimum, maximum of the depth or disparity values associated with coded texture block (e.g. weights depend on distance from center of the block/partition). The resultant value can be used for disparity-compensated prediction, to predict depth or disparity values for the partitions and/or blocks or as a reference depth or disparity values for coding other blocks and/or partitions.
(120) Embodiments can be adapted to determine foreground and background partitions using depth or disparity values representing each DBBP partition. The depth or disparity value representing each DBBP partition is used to determine which partition is closer or more distant from the camera.
(121) Embodiments can be adapted to determine disocclusion areas based on foreground and background picture areas computed based on depth or disparity values representing each DBBP partition. The foreground and background partitions determined based on the depth or disparity value representing each DBBP partition is used to determine disocclusion areas in the picture.
(122) Embodiments can be adapted to improve the coding efficiency by using depth or disparity values computed based on DBBP for disparity-compensated prediction. The depth or disparity value representing each DBBP partition is used as the prediction of disparity vector used for disparity-compensated prediction.
(123) Embodiments can be adapted to improve the coding efficiency by using depth or disparity values computed based on DBBP for adaptive QP (Quantization Parameter) or QD (Quantization Parameter for Depth) selection based on a distance from the camera. The depth or disparity value representing each DBBP partition is used for selecting the QP or QD quantization parameter for each partition based on the distance from the camera (the larger the distance from the camera, the higher QP or QD value is selected).
(124) Embodiments of the invention also provide solutions for minimizing the complexity of video coding when DBBP is utilized, as will be explained in the following.
(125) Embodiments can be adapted to calculate and store intra-predicted, motion or disparity-compensated and residuum prediction signals for each partition in a regular (rectangular) shape blocks. For calculating and storing the abovementioned prediction signals in the memory, regular (rectangular) shape blocks are used for each partition, however, only pixels belonging to the respective partition are valid in each block. This saves the number of individual calls to memory and allows to avoid pixel-wise calls to memory because the whole regular block of memory is copied, read and/or stored). As a result, a regular memory access is provided.
(126) Embodiments can be adapted to compute the DBBP partitioning based on sparse depth information—partitioning is computed using a sparse representation of depth information, i.e. non-pixel-wise (e.g. down-sampled depth or disparity map). In this way, the number of depth or disparity points to be analyzed and processed decreases, however, accuracy of the partitioning is slightly lower.
(127) Embodiments can be adapted to compute the DBBP partitioning based on dense, e.g. pixel-wise, depth information and down-sampling the resolution of partitioning mask to a 2×2, 4×4, 8×8, 16×16, etc. grid. In this way, the resolution of data structures that store all the coding information describing the coded partitions can be decreased, saving the amount of memory and number of memory read/write operations.
(128) Embodiments can be adapted to decrease the complexity of the video coding with DBBP applied by turning off the loop filters. The complexity of the video coding process can be decreased by turning off the loop filters, e.g. deblocking. ALF or SAO filters) for the blocks that contain DBBP partitions. As a result, complexity of video coding is lower with only a small decrease in coding performance, i.e. rate-distortion ratio.
(129) Embodiments of the invention provide a method, which can be referred to as depth-based block partitioning (DBBP). In embodiments the partitioning of the texture block may be performed using only depth information and no texture information, e.g. only depth information related to the texture block but no texture information of the texture block. Alternative embodiments may combine the depth based block partitioning with other partitioning methods, e.g. based on coarse texture information to keep the complexity low. However, using only the depth information in form of a partitioning mask provides a simple, low complexity but nevertheless efficient way to partition a texture block.
(130) This way depth information that is available in the decoder can be reused to improve compression without necessity to sent any further information about the partitions' shape in the bit-stream.
(131) Summarizing the above, embodiments of the invention provide a coding solution for coding a texture block using at least two partitions of an arbitrary shape, which is determined based on depth information, e.g. in a form of depth or disparity map associated with the coded texture block. As the shape of the partitions can be well fitted to the object borders of the texture block, an additional flexibility for the coding process is obtained, which preserves the encoder from further partitioning the texture block into smaller regular, i.e. rectangular, shaped partitions, saving bits for signaling these partitions. According to embodiments of the invention, the arbitrary shape of the partitions can be determined at the decoder based on available depth information associated with the coded texture block. Consequently, the exact shape of the depth-based partitions does not need to be transmitted in the bitstream, reducing the bitrate.