Apparatuses and methods for encoding and decoding a video coding block of a video signal

10735726 ยท 2020-08-04

Assignee

Inventors

Cpc classification

International classification

Abstract

A decoding apparatus partitions a video coding block based on coding information into two or more segments including a first segment and a second segment. The coding information comprises a first segment motion vector associated with the first segment and a second segment motion vector associated with the second segment. A co-located first segment in a first reference frame is determined based on the first segment motion vector and a co-located second segment in a second reference frame is determined based on the second segment motion vector. A predicted video coding block is generated based on the co-located first segment and the co-located second segment. A divergence measure is determined based on the first segment motion vector and the second segment motion vector and a first or second filter is applied depending on the divergence measure to the predicted video coding block.

Claims

1. A decoding apparatus for decoding a video coding block of an encoded video signal comprising coding information and a plurality of frames wherein the decoding apparatus comprises: a memory configured to store instructions; and a processor coupled to the memory, wherein the instructions cause the processor to be configured to: decode the video coding block of a current frame of the plurality of frames to provide a residual video coding block, wherein each of the plurality of frames is dividable into a plurality of video coding blocks; partition the video coding block on the basis of the coding information into two or more segments including a first segment and a second segment, wherein the coding information comprises a first segment motion vector and a second segment motion vector, wherein the first segment motion vector is associated with the first segment and the second segment motion vector is associated with the second segment; determine a co-located first segment in a first reference frame on the basis of the first segment motion vector; determine a co-located second segment in a second reference frame on the basis of the second segment motion vector; generate a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment; determine a divergence measure on the basis of the first segment motion vector and the second segment motion vector, wherein the divergence measure indicates either a divergence of the first segment motion vector and the second segment motion vector or a convergence of the first segment motion vector and the second segment motion vector; apply either a first filter to the predicted video coding block or a second filter to the predicted video coding block based on the divergence measure to obtain a filtered predicted video coding block; and reconstruct the video coding block on the basis of the filtered predicted video coding block and the residual video coding block.

2. The decoding apparatus of claim 1, wherein the processor is configured to apply the first filter to a boundary between the predicted first segment and the predicted second segment when the divergence measure indicates the first segment motion vector is diverging and the second segment motion vector is diverging, wherein the first filter comprises a directional smoothing filter configured to smooth across the boundary between the predicted first segment and the predicted second segment.

3. The decoding apparatus of claim 2, wherein the processor is configured to: adjust a filter property of the first filter on the basis of the divergence measure; adjust the filter property of the first filter on the basis of the first segment motion vector and the second segment motion vector; or adjust the filter property of the first filter on the basis of a difference between pixel values of pixels located at the boundary between the predicted first segment and the predicted second segment, wherein the filter property of the first filter comprises a first filter strength or a first filter size of the first filter.

4. The decoding apparatus of claim 1, wherein the processor is further configured to determine whether the predicted first segment or the predicted second segment is a background segment on the basis of the coding information, wherein the coding information further comprises information indicating whether the predicted first segment or the predicted second segment is a background segment or a foreground segment.

5. The decoding apparatus of claim 4, wherein the processor is configured to apply the second filter to the predicted video coding block when the divergence measure indicates the first segment motion vector is converging and the second segment motion vector is converging, wherein the second filter comprises a feathering filter configured to either feather in a direction of the background segment or feather in an opposite direction to the foreground segment.

6. The decoding apparatus of claim 4, wherein the processor is further configured to adjust a filter property of the second filter on the basis of the divergence measure, wherein the filter property of the second filter comprises either a second filter strength of the second filter or a second filter size of the second filter.

7. The decoding apparatus of claim 1, wherein the first segment motion vector and the second segment motion vector form a vector field, wherein the processor is configured to determine the divergence measure as the divergence of the vector field, wherein the divergence of the vector field being smaller than a first threshold indicates the first segment motion vector and the second segment motion vector are converging, and wherein the divergence of the vector field being larger than the first threshold indicates the first segment motion vector and the second segment motion vector are diverging.

8. A method for decoding a video coding block of an encoded video signal comprising coding information and a plurality of frames, wherein the method comprises: decoding the video coding block of a current frame of the plurality of frames to provide a residual video coding block, wherein each of the plurality of frames is dividable into a plurality of video coding blocks; partitioning the video coding block on the basis of the coding information into two or more segments including a first segment and a second segment, wherein the coding information comprises a first segment motion vector and a second segment motion vector, wherein the first segment motion vector is associated with the first segment and the second segment motion vector is associated with the second segment; determining a co-located first segment in a first reference frame on the basis of the first segment motion vector; determining a co-located second segment in a second reference frame on the basis of the second segment motion vector; generating a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment; determining a divergence measure on the basis of the first segment motion vector and the second segment motion vector, wherein the divergence measure indicates either a divergence of the first segment motion vector and the second segment motion vector or a convergence of the first segment motion vector and the second segment motion vector; applying either a first filter to the predicted video coding block or a second filter to the predicted video coding block based the divergence measure to obtain a filtered predicted video coding block; and reconstructing the video coding block on the basis of the filtered predicted video coding block and the residual video coding block.

9. The decoding method of claim 8, wherein the applying comprises applying the first filter to a boundary between the predicted first segment and the predicted second segment of the predicted video coding block when the divergence measure indicates the first segment motion vector is diverging and the second segment motion vector is diverging, wherein the first filter comprises a directional smoothing filter for smoothing across the boundary between the predicted first segment and the predicted second segment.

10. The decoding method of claim 8, wherein the decoding comprises determining, on the basis of the coding information, whether the predicted first segment is a background segment or whether the predicted second segment is a background segment, wherein the coding information further comprises information indicating whether the predicted first segment is a background segment or a foreground segment or indicating whether the predicted second segment is a background segment or a foreground segment.

11. The decoding method of claim 10, wherein the applying comprises applying the second filter to the predicted video coding block when the divergence measure indicates the first segment motion vector is converging and the second segment motion vector is converging, wherein the second filter comprises a feathering filter for either feathering in a direction of the background segment or feathering in an opposite direction to the foreground segment.

12. An encoding apparatus for encoding a video coding block of a video signal comprising a plurality of frames, wherein the encoding apparatus comprises: a memory configured to store instructions; and a processor coupled to the memory, wherein the instructions cause the processor to be configured to: partition the video coding block into two or more segments including a first segment and a second segment; determine a co-located first segment in a first reference frame of the video signal; determine a co-located second segment in a second reference frame of the video signal, wherein each of the first segment and the co-located first segment comprise a first segment motion vector, wherein each of the second segment and the co-located second segment comprise a second segment motion vector; generate a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment; determine a divergence measure on the basis of the first segment motion vector and the second segment motion vector, wherein the divergence measure indicates either a divergence of the first segment motion vector and the second segment motion vector or a convergence of the first segment motion vector and the second segment motion vector; apply either a first filter to the predicted video coding block or a second filter to the predicted video coding block based on the divergence measure; and generate an encoded video coding block on the basis of the filtered predicted video coding block.

13. The encoding apparatus of claim 12, wherein the processor is further configured to apply the first filter to a boundary between the predicted first segment and the predicted second segment of the predicted video coding block when the divergence measure indicates the first segment motion vector and the second segment motion vector are diverging, wherein the first filter comprises a directional smoothing filter configured to smooth across the boundary between the predicted first segment and the predicted second segment.

14. The encoding apparatus of claim 13, wherein the processor is further configured to: adjust a filter property of the first filter on the basis of the divergence measure; adjust the filter property of the first filter on the basis of the first segment motion vector and the second segment motion vector; or adjust the filter property of the first filter on the basis of a difference between pixel values of pixels located at the boundary between the predicted first segment and the predicted second segment, wherein the filter property of the first filter comprises a first filter strength or a first filter size of the first filter.

15. The encoding apparatus of claim 12, wherein the processor is further configured to determine whether the predicted first segment or the predicted second segment is either a background segment or a foreground segment.

16. The encoding apparatus of claim 15, wherein the processor is further configured to apply the second filter to the predicted video coding block when the divergence measure indicates the first segment motion vector and the second segment motion vector are converging, wherein the second filter comprises a feathering filter configured to either feather in a direction of the background segment or feather in an opposite direction to the foreground segment.

17. The encoding apparatus of claim 15, wherein the processor is further configured to adjust a filter property of the second filter on the basis of the divergence measure, wherein the filter property of the second filter comprises either a second filter strength or a second filter size of the second filter.

18. The encoding apparatus of claim 12, wherein the processor is further configured to: encode information in an encoded video signal indicating whether the predicted first segment or the predicted second segment is the background segment; or encode information in an encoded video signal indicating whether the predicted first segment or the predicted second segment is the foreground segment.

19. The encoding apparatus of claim 12, wherein the first segment motion vector and the second segment motion vector form a vector field, wherein the processor is further configured to determine the divergence measure as the divergence of the vector field, wherein the divergence of the vector field being smaller than a first threshold indicates the first segment motion vector and the second segment motion vector are converging, and wherein the divergence of the vector field being larger than the first threshold indicates the first segment motion vector and the second segment motion vector are diverging.

20. The encoding apparatus of claim 12, wherein the processor is further configured to: shift a boundary between the predicted first segment and the predicted second segment on the basis of a boundary shift vector associated with the boundary; and apply either the first filter or the second filter to the shifted boundary between the predicted first segment and the predicted second segment based on the divergence measure.

21. The encoding apparatus of claim 20, wherein the processor is further configured to determine the boundary shift vector on the basis of a distortion measure between the video coding block and the predicted video coding block.

22. The encoding apparatus of claim 20, wherein the processor is further configured to determine the boundary shift vector from a set of candidate boundary shift vectors, wherein the candidate boundary shift vectors are smaller than or equal to a difference vector between the first segment motion vector and the second segment motion vector.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Further embodiments of the disclosure will be described with respect to the following figures, wherein:

(2) FIG. 1A shows a schematic diagram illustrating an apparatus for encoding a video signal according to an embodiment;

(3) FIG. 1B shows a schematic diagram illustrating a more detailed view of specific components of the encoding apparatus of FIG. 1a;

(4) FIG. 2A shows a schematic diagram illustrating an apparatus for decoding a video signal according to an embodiment;

(5) FIG. 2B shows a schematic diagram illustrating a more detailed view of specific components of the decoding apparatus of FIG. 2a;

(6) FIG. 3 shows a schematic diagram illustrating a method for encoding a video signal according to an embodiment;

(7) FIG. 4 shows a schematic diagram illustrating a method for decoding a video signal according to an embodiment;

(8) FIG. 5 shows a schematic diagram illustrating different aspects implemented in an encoding apparatus and a decoding apparatus according to an embodiment;

(9) FIG. 6 shows a schematic diagram illustrating different aspects implemented in an encoding apparatus and a decoding apparatus according to an embodiment;

(10) FIG. 7 shows a schematic diagram illustrating different aspects implemented in an encoding apparatus and a decoding apparatus according to an embodiment;

(11) FIG. 8 shows a schematic diagram illustrating different aspects implemented in an encoding apparatus and a decoding apparatus according to an embodiment;

(12) FIG. 9 shows a schematic diagram illustrating different aspects implemented in an encoding app

(13) FIG. 10 shows a schematic diagram illustrating different aspects implemented in an encoding apparatus and a decoding apparatus according to an embodiment; and

(14) FIG. 11 shows a schematic diagram illustrating different aspects implemented in an encoding apparatus and a decoding apparatus according to an embodiment.

(15) In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(16) In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which embodiments of the present disclosure may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the present disclosure is defined be the appended claims.

(17) For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless noted otherwise.

(18) FIG. 1A shows a schematic diagram illustrating an apparatus 100 for encoding a video coding block of a video signal according to an embodiment. The encoding apparatus 100 is configured to encode a video coding block of a video signal comprising a plurality of frames (also referred to as pictures or images herein), wherein each frame is dividable into a plurality of video coding blocks and each video coding block comprises a plurality of pixels. In an embodiment, the video coding blocks could be macro blocks, coding tree units, coding units, prediction units and/or prediction blocks.

(19) In the embodiment shown in FIG. 1A, the encoding apparatus 100 is implemented in the form of a hybrid video coding encoder. In hybrid video coding, an input frame is normally divided into blocks for further processing. The block partitioning is conveyed to the decoder, such as the decoding apparatus 200 shown in FIGS. 2A and 2B. Usually, the first frame of an input video sequence is an intra frame, which is encoded using only intra prediction. To this end, the embodiment of the encoding apparatus 100 shown in FIG. 1 comprises an intra prediction unit 113 for intra prediction. An intra frame can be decoded without information from other frames. The video coding blocks of subsequent frames following the first intra frame can be coded using inter or intra prediction.

(20) In the embodiment shown in FIG. 1, the encoding apparatus 100 further comprises a segmentation based partitioning unit 121 configured to partition the video coding block into two or more segments including a first segment and a second segment.

(21) In a segmentation based partitioning scheme, the segmentation is obtained through segmentation of reference pictures or frames, which are available at both the encoding side and the decoding side. In order to locate a segmentation matching a currently processed block (which is to be encoded or decoded) in the segmented reference picture, an additional motion vector, called the boundary motion vector MV.sub.B can be used. The boundary motion vector MV can be determined by the encoding apparatus 100 by searching for a segmentation in the segmented reference picture, which most closely resembles the segmentation of the currently processed block. As the MV.sub.B can be transmitted to the decoding apparatus 200, an additional cost factor limiting the size of MV.sub.B is used. This process is exemplified in FIG. 5, where it can be seen that the segmented reference block closely resembles the segmented current video coding block.

(22) Furthermore, the encoding apparatus 100 comprises an inter prediction unit 115. A more detailed view of the inter prediction unit 115 and its environment according to an embodiment is shown in FIG. 1B. Generally, the inter prediction unit 115 can be configured to perform motion estimation, motion compensation for choosing motion data including a selected reference picture, motion vector, mode decision and other information. In the embodiment shown in FIGS. 1A and 1B, the signals input to the inter prediction unit 115 include the input frame Sk and the decoded frame Sk1 as output by a frame buffer 119. FIG. 1A shows an embodiment, in which the decoded frame Sk1 is first provided from the frame buffer 119 to the segmentation based partitioning unit 121 and from the segmentation based partitioning unit 121 to the inter prediction unit 115. Clearly, this is only a possible configuration and in other embodiments the decoded frame Sk1 may also be provided from the frame buffer 119 directly to the inter prediction unit 115. In such an embodiment, a direct connection between the frame buffer 119 and the inter prediction unit 115 would be provided.

(23) Further, the inter prediction unit 115 is configured to determine a co-located first segment in a first reference frame of the video signal and a co-located second segment in a second reference frame of the video signal, wherein the first segment and the co-located first segment define a first segment motion vector MV.sub.S0 and wherein the second segment and the co-located second segment define a second segment motion vector MV.sub.S1. In an embodiment, the first reference frame and the second reference frame can be the same reference frame. In an embodiment, these functions of the inter prediction unit 115 can be provided by a segment motion estimation unit 115a, a segmentation refinement unit 115b and a segment motion compensation unit 115c which will be described in more detail further below.

(24) Moreover, the inter prediction unit 115 is further configured to generate a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment.

(25) For more details about segmentation based partitioning and generating a predicted video coding block comprising a predicted first segment and a predicted second segment on the basis of a co-located first segment and a co-located second segment reference is made to WO2008/150113A1, which is herein fully incorporated by reference.

(26) As can be taken from FIGS. 1A and 1B, the encoding apparatus 100 further comprises a motion-dependent filtering unit 116, which is located downstream of the inter prediction unit 115. The motion-dependent filtering unit 116 is configured to determine a divergence measure on the basis of the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 and to apply depending on the divergence measure a first filter or a second filter to the predicted video coding block. Further embodiments of the motion-dependent filtering unit 116 will be described in more detail further below.

(27) Further, the encoding apparatus 100 comprises an encoding unit, which in the embodiment shown in FIG. 1A is provided by an encoding unit 103 and/or an entropy coding unit 105 and which is configured to generate an encoded video coding block on the basis of the filtered predicted video coding block. Further, in the embodiment shown in FIG. 1A the prediction error of the intra/inter picture prediction, which is the difference between the original block and its prediction, is encoded by the encoding unit 103 including such processes as transform, transform skip, scaling, quantization or others. The output of the encoding unit 103 as well as the coding or side information provided by the intra prediction unit 113, the inter prediction unit 115 and a deblocking filter/sample adaptive offset (SAO)/adaptive loop filtering (ALF) unit 117 are further encoded by the entropy coding unit 105.

(28) A hybrid video encoder usually duplicates the decoder processing such that both will generate the same predictions. Thus, in the embodiment shown in FIG. 1A a decoding unit 107 performs the inverse operations of the encoding unit 103 and duplicates the decoded approximation of the prediction error/residual data. The decoded prediction error/residual data is then added to the results of prediction. A reconstruction unit 109 obtains the results of adding the prediction and the residuals. Then, the output of the reconstruction unit 109 might be further processed by one or more filters, summarized by the deblocking filter/SAO/ALF unit 117 shown in FIG. 1A, to smooth the coding artifacts. The final picture is stored in the frame buffer 119 and can be used for the prediction of subsequent frames. As already described above, the segmentation based partitioning unit 121 can perform all possible steps of object boundary based partition including possible pre- and post-processing. The segmentation based partitioning unit 121 can adaptively generate a segmentation for the current block on the basis of one or more reference frames. Segmentation related parameters can be encoded and transmitted as a part of coding or side information to the decoding apparatus 200 shown in FIGS. 2A and 2B.

(29) FIGS. 1A and 2B show respective schematic diagram illustrating an apparatus for 200 decoding a video signal according to an embodiment as well as some details thereof. The decoding apparatus 200 is configured to decode a video coding block of a current frame of an encoded video signal, wherein the encoded video signal, which in the embodiment shown in FIG. 2A is provided in form of a bitstream, comprises coding or side information and a plurality of frames and wherein each frame is divided into a plurality of video coding blocks.

(30) In the embodiment shown in FIG. 2A, the decoding apparatus 200 is implemented as a hybrid decoder. An entropy decoding unit 205 performs entropy decoding of the encoded bitstream, which generally can comprise prediction errors (i.e. residual video coding blocks), motion data and other side information, which are needed, in particular, for an intra prediction unit 213 and an inter prediction unit 215 as well as other components of the decoding apparatus 200, such as a deblocking filter, SAO and ALF unit 217. Generally, the intra prediction unit 213 and the inter prediction unit 215 of the decoding apparatus 200 shown in FIG. 2A perform in the same way as the intra prediction unit 113 and the inter prediction unit 115 of the encoding apparatus 100 shown in FIG. 1A (with the exception that motion estimation is not performed by the decoding apparatus 200) such that identical predictions can be generated by the encoding apparatus 100 and the decoding apparatus 200. Also in case of the decoding apparatus 200, the signals input to the inter prediction unit 215 include the decoded frame Sk1 as output by the frame buffer 219. The schematic block diagram illustrated in FIG. 2A shows a configuration in which the decoded frame is first input from the frame buffer 219 to a segmentation based partitioning unit 221, which will be described in more detail further below, and from the segmentation based partitioning unit 221 to the inter prediction unit 215. Clearly, this is a possible configuration and in other embodiments the decoded frame may also be input from the frame buffer 219 directly to the inter prediction unit 215. In this case a direct connection between the frame buffer 219 and the inter prediction unit 215 would be provided.

(31) The segmentation based partitioning unit 221 of the decoding apparatus 200 is configured to partition the video coding block on the basis of the coding or side information into two or more segments including a first segment and a second segment, wherein the coding information comprises a first segment motion vector MV.sub.S0 associated with the first segment of the video coding block and a second segment motion vector MV.sub.S1 associated with the second segment of the video coding block.

(32) The inter prediction unit 215 of the decoding apparatus 200 is configured to determine on the basis of the first segment motion vector MV.sub.S0 a co-located first segment in a first reference frame and on the basis of the second segment motion vector MV.sub.S1 a co-located second segment in a second reference frame and to generate a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment. As shown in FIG. 2B, in an embodiment this function of the inter prediction unit 215 or a part thereof can be implemented in a segment motion compensation unit 215c. Further embodiments of the segment motion compensation unit 215c will be described in more detail further below.

(33) A motion-dependent filtering unit 216 of the decoding apparatus 200 is configured to determine a divergence measure on the basis of the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 and to apply depending on the divergence measure a first filter or a second filter to the predicted video coding block. Further embodiments of the motion-dependent filtering unit 216 will be described in more detail further below.

(34) A reconstruction unit 209 of the decoding apparatus 200 is configured to reconstruct the video coding block on the basis of the filtered predicted video coding block and the residual video coding block.

(35) Thus, a decoding apparatus is provided, which is based on segmentation-based partitioning for inter prediction of a video coding block and which provides an improved handling of occlusions and disocclusions. In particular, the decoding apparatus allows depending on a divergence measure, which indicates the divergence or the convergence of the first and second segment motion vectors, to apply different filters to the segments.

(36) FIG. 3 shows a schematic diagram illustrating a method 300 for encoding a video coding block of a current frame of a video signal, wherein the video signal comprises a plurality of frames and wherein each frame is dividable into a plurality of video coding blocks.

(37) The encoding method 300 comprises the steps of: partitioning 301 the video coding block into two or more segments including a first segment and a second segment; determining 303 a co-located first segment in a first reference frame of the video signal and a co-located second segment in a second reference frame of the video signal, wherein the first segment and the co-located first segment define a first segment motion vector and wherein the second segment and the co-located second segment define a second segment motion vector; generating 305 a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment; determining 307 a divergence measure on the basis of the first segment motion vector and the second segment motion vector; applying 309 depending on the divergence measure a first filter or a second filter to the predicted video coding block; and generating 311 an encoded video coding block on the basis of the filtered predicted video coding block.

(38) FIG. 4 shows a schematic diagram illustrating an embodiment of a method 400 of decoding a video coding block of a current frame of an encoded video signal, wherein the encoded video signal comprises coding information and a plurality of frames and wherein each frame being divided into a plurality of video coding blocks.

(39) The decoding method 400 comprises the steps of: providing 401 a residual video coding block by decoding the video coding block; partitioning 403 the video coding block on the basis of the coding information into two or more segments including a first segment and a second segment, wherein the coding information comprises a first segment motion vector associated with the first segment of the video coding block and a second segment motion vector associated with the second segment of the video coding block; determining 405 on the basis of the first segment motion vector a co-located first segment in a first reference frame and on the basis of the second segment motion vector a co-located second segment in a second reference frame; generating 407 a predicted video coding block on the basis of the co-located first segment and the co-located second segment, wherein the predicted video coding block comprises a predicted first segment and a predicted second segment; determining 409 a divergence measure on the basis of the first segment motion vector and the second segment motion vector; applying 411 depending on the divergence measure a first filter or a second filter to the predicted video coding block; and reconstructing 413 the video coding block on the basis of the filtered predicted video coding block and the residual video coding block.

(40) In the following, further embodiments of the disclosure will be described in more detail. It is to be understood that, unless explicitly stated to the contrary, the further embodiments can be implemented in any one of the encoding apparatus 100, the decoding apparatus 200, the encoding method 300 and the decoding method 400.

(41) In an embodiment, the segment motion estimation unit 115a of the inter prediction unit 115 of the encoding apparatus 100 is configured to perform a two-step process.

(42) In a first step, the segment motion estimation unit 115a of the inter prediction unit 115 is configured to perform a segmentation mask matching for the current video coding block, where the best segmentation from a reference frame is chosen according to a cost criterion JB being minimized. In an embodiment, the cost criterion J.sub.B can be based on the following equation:
J.sub.B=.sub.1D.sub.SAD+.sub.2R.sub.MVB,

(43) wherein D.sub.SAD denotes the distortion measured by sum-of-absolute-differences between the segmentation of the current block and the segmentation of the reference block, R.sub.MVB denotes a rate estimate for the boundary motion vector and .sub.1, .sub.2 denotes quality dependent Lagrangian multipliers. The boundary motion vector and its associated information can be signaled to the decoding apparatus 200.

(44) In a second step, the segment motion estimation unit 115a is configured to compute on the basis of previously estimated complementary segmentation masks M0, M1 (0,1), the motion vectors for segments S0 and S1 by segment-wise motion estimation between the current block C and a reference block R, where

(45) D SAD k = .Math. i , j .Math. C ( i , j ) - R ( i , j ) .Math. M k ( i , j )

(46) denotes the segment distortion estimated by the sum-of-absolute-differences (SAD). Block-wise calculation of differences and multiplication by a segmentation mask is a possible implementation of pixel-exact motion estimation. Thus, segment motion vectors can be chosen separately at this point. e.g. minimizing the residual of each segment according to the minimum of the following cost function:
J.sub.MV.sup.k=D.sub.SAD.sup.k+.sub.3R.sub.MV.sup.k.

(47) Once the segment motion vectors have been determined, additional estimation steps may be performed, such as quarter-pixel refinement, testing of bi-prediction, advanced motion vector prediction and testing of motion vector merging. Further details about these further processing steps can be found in other approaches.

(48) After the segment motion estimation performed by the segment motion estimation unit 115a of the inter prediction unit 115, the motion compensated prediction signal generated from both segments might reveal visible errors, for instance, in the case of a disocclusion, as illustrated in FIG. 6. By motion-compensating the background segment S.sub.0 parts of the foreground object from the reference picture may be copied into the prediction signal, which would result in a strong residual in this area. To address this issue the inter prediction unit 115 of the encoding apparatus 100 can further comprise the segmentation refinement unit 115b shown in FIG. 1B. The purpose of the segmentation refinement unit 115b is to optimize the segmentation mask such that the overall residual block energy measured by D.sub.SATD is minimized. This can be achieved by artificially shifting the segmentation mask and thereby the boundary between the predicted first segment and the predicted second segment in the horizontal and vertical direction on the basis of an optimized boundary shift vector. Thus, in an embodiment the segmentation refinement unit 115b is configured to shift the boundary between the predicted first segment and the predicted second segment on the basis of such an optimized boundary shift vector. The motion-dependent filtering unit 116 is configured to apply the first filter or the second filter to the shifted boundary between the predicted first segment and the predicted second segment.

(49) It must be noted, that the true object boundaries and the segmentation mask boundaries do not necessarily need to coincide anymore after this optimization step. Instead of the SAD, the sum-of-absolute-transform-difference (SATD) measure may be used, where H denotes a matrix of Hadamard-transform basis functions:
R.sub.T=H*(CP.sub.m)*H.sup.T and
D.sub.SATD=.sub.i,j|R.sub.T(i,j)|.

(50) Here, P.sub.m denotes the modified prediction signal generated by shifting or offsetting the complementary masks in the horizontal and vertical directions.
P.sub.m=R(i+k.sub.0,j+l.sub.0)M.sub.0(i+k.sub.B,j+l.sub.B)+R(i+k.sub.1,j+l.sub.1)M.sub.1(i+k.sub.B,j+l.sub.B)

(51) MV.sub.Sn=(k.sub.n, l.sub.n), n{0,1} are the segment motion vectors which remain fixed and (k.sub.B, l.sub.B) is the shifted boundary motion vector, i.e. the real boundary motion vector plus the optimized boundary shift vector. Thus, in an embodiment the segmentation refinement unit 115b can be configured to determine the optimized boundary shift vector and, thus, an optimized boundary motion vector on the basis of a distortion measure between the video coding block and the predicted video coding block.

(52) The optimization of the boundary shift vector can be performed within a search range that can be inferred from the segment motion vector difference MV.sub.D.
MV.sub.D=MV.sub.S0MV.sub.S1

(53) where the magnitude of the motion vector difference gives an approximation of the size of, for instance, a disoccluded region. Thus, in an embodiment the segmentation refinement unit 115b is configured to determine the boundary shift vector from a set of candidate boundary shift vectors, wherein the candidate boundary shift vectors are smaller than or equal to a difference vector MV.sub.D between the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1.

(54) The optimized boundary shift vector which minimizes the overall distortion D.sub.SATD is finally chosen. Thus the optimized boundary motion vector. i.e. the real boundary motion vector plus the optimized boundary shift vector which minimizes the overall distortion D.sub.SATD is finally chosen.

(55) FIG. 7 shows an example illustrating the advantageous effect provided by the segmentation refinement unit 115b in that FIG. 7 shows the refined prediction signal generated by optimizing the boundary motion vector MV.sub.B after motion compensation. By optimizing the boundary motion vector, the disocclusion error has been visibly reduced. Background pixels, which have been included by the segmentation mask of foreground object S.sub.1, are copied into the newly uncovered area.

(56) As already described above, the filtering process implemented in the motion-dependent filtering units 116, 216 is motion dependent and performed during and after the motion compensation process implemented in the segment motion compensation units 115c, 215c at the encoder and decoder side in order to further improve the prediction signal by determining a divergence measure on the basis of the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 and applying depending on the divergence measure a first filter or a second filter to the predicted video coding block.

(57) In an embodiment, the motion-dependent filtering unit 116, 216 is configured to apply the first filter to a boundary between the predicted first segment and the predicted second segment of the predicted video coding block, in case the divergence measure indicates that the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 are diverging, wherein the first filter comprises a directional smoothing filter for smoothing across the boundary between the predicted first segment and the predicted second segment.

(58) In an embodiment, the motion-dependent filtering unit 116, 216 is further configured to apply the second filter to the predicted video coding block, in case the divergence measure indicates that the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 are converging, wherein the second filter comprises a feathering filter for feathering in the direction of a background segment or in the opposite direction of a foreground segment. In addition, the motion-dependent filtering unit 116, 216 can be configured to determine whether the predicted first segment or the predicted second segment is a background segment or to determine whether the predicted first segment or the predicted second segment is a foreground segment.

(59) In an embodiment, the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 form a vector field F and the motion-dependent filtering unit 116 is configured to determine the divergence measure on the basis of the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 as the divergence of the vector field F, wherein the divergence of the vector field F being smaller than a first threshold indicates that the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 are converging and wherein the divergence of the vector field F being larger than the first threshold indicates that the first segment motion vector MV.sub.S0 and the second segment motion vector MV.sub.S1 are diverging. In an embodiment, the first threshold can be zero or other value or range which can be set according to different environments in which embodiments of the present disclosure is applied.

(60) Thus in an embodiment, the filtering decision is inferred from the divergence of the vector field F{MV.sub.S0, MV.sub.S1} according to the following criterion:

(61) .Math.F<0.fwdarw.occlusion, feathering (i.e. the first segment motion vector and the second segment motion vector are converging.)

(62) .Math.F>0.fwdarw.disocclusion, directional smoothing filtering (i.e. the first segment motion vector and the second segment motion vector are diverging.)

(63) The vector divergence operator may be approximated by a suitable discrete realization such as finite differences. As two segment motion vectors per block are considered, in an embodiment an approximation of the vector divergence F.sub.Div (also referred to as the divergence of the vector field F) can be obtained from:
F.sub.Div(MV.sub.S0,MV.sub.S1)=(MV.sub.S0,xMV.sub.S1,x)+(MV.sub.S0,yMV.sub.S1,y).

(64) Thus, no information controlling the type of filtering needs to be transmitted. In particular, the divergence of the vector field F may be a measure of how much the first and second segment motion vector are converging or diverging. Furthermore, the difference vector may be just the first segment motion vector minus the second segment motion vector.

(65) For applying the feathering filter, which can also be regarded as a kind of weighted prediction, in case of an occlusion the binary segmentation mask taken from the reference picture can be converted to a multilevel representation. The steep boundary separating the foreground segment and the background segment can be smoothed over a certain distance, where values between 0 and 1 indicate the weighting factor of pixel values between the two segments. By means of this operation parts of the foreground segment can be blended into the background segment. This is exemplified in FIG. 8. The addition of the two prediction masks M.sub.0 and M.sub.1 should be a matrix of all-ones.

(66) The amount of feathering applied in the direction of the background object can be measured by the distance d as indicated in FIG. 9. For strong movements, more feathering may be applied by increasing the value of d. In an embodiment, the value of d can be coupled to the vector divergence F.sub.Div.

(67) In the following, a weighting-matrix based feathering filter is specified as a possible embodiment, which can be implemented using integer arithmetic:

(68) P m , f ( x , y ) = .Math. 1 s b ( R c , 0 .Math. M 0 + R c , 1 .Math. M 1 + s b 2 ) .Math. , s b = 2 w max

(69) wherein R.sub.c,0 and R.sub.c,1 denote the motion compensated segments, M.sub.0 and M.sub.1 denote the complementary weighting masks containing integer weights m.sub.x, y, depending on the distance d to the boundary as specified in FIG. 9. Thus, a scaling factor s.sub.b can be provided to scale down the weighted sum accordingly, providing the final prediction block P.sub.m, f. For an efficient implementation it is desirable to choose base-2 scaling such that division operations can be replaced by bit-shifting.

(70) As already described above, in case of a disocclusion, the boundary pixels between the predicted first segment and the predicted second segment can be smoothed or low-pass filtered. This low-pass filter can be implemented as a directional or symmetric filter. To this end, a symmetric window centered at each boundary pixel can be defined, indicating the current region of interest. The low-pass filter may be directional and adapt to the orientation of the boundary within the symmetric window or may possess a symmetric kernel (e.g. a 2D-Gaussian) of specified size (e.g. 33, 55, 77, etc. pixels). In an embodiment, the size of the kernel can be inferred from the magnitude of the vector divergence. Additionally, the low-pass filter size and strength may be adaptable to the pixel-amplitude difference present along the boundary, where a stronger edge measured by .sub.B indicates that more smoothing is needed. This can be realized by comparison against a preset threshold .sub.th. i.e.:
.sub.B=|p.sub.0p.sub.1|>.sub.th.fwdarw.strong filtering

(71) FIG. 10 exemplifies the operation of a directional smoothing/low-pass filter according to an embodiment, where filtering is performed along the normal of the boundary between the predicted first segment and the predicted second segment. By measuring the difference in pixel intensities at positions p.sub.0 and p.sub.1, the filter strength can be adapted.

(72) In an embodiment, the feathering filtering is performed into the direction of the background segment, which can be signalled to the decoding apparatus 200 via an additional flag. As the segmentation process may result in an arbitrary assignment of the foreground object to S.sub.0 or S.sub.1, this ambiguity can be solved by indicating whether S.sub.0 or S.sub.1 is actually the foreground. FIG. 11 shows an exemplary implementation, where this indicator flag 1101a is passed along the inter-prediction related coding information or side information, including block-related information such as the segment motion vectors and the segmentation information. The indicator flag 1101a can therefore be part of the block-related information 1101 and can be signalled at the coding-unit level.

(73) Furthermore, signaling of the indicator flag 1101 a can be implemented by making use of context-adaption, e.g. choosing a context for a context-adaptive arithmetic encoder, which adapts to the shape of segment based partitioning.

(74) While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms include, have, with, or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term comprise. Also, the terms exemplary, for example and e.g. are merely meant as an example, rather than the best or optimal. The terms coupled and connected, along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.

(75) Although specific aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.

(76) Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.

(77) Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the disclosure beyond those described herein. While embodiments of the present disclosure have been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present disclosure. It is therefore to be understood that within the scope of the appended claims and their equivalents, the disclosure may be practiced otherwise than as described herein.