Multi-view signal codec
11330242 · 2022-05-10
Assignee
Inventors
- Thomas Wiegand (Berlin, DE)
- Detlev Marpe (Berlin, DE)
- Karsten Mueller (Berlin, DE)
- Philipp Merkle (Erlangen, DE)
- Gerhard Tech (Berlin, DE)
- Hunn Rhee (Berlin, DE)
- Heiko Schwarz (Berlin, DE)
Cpc classification
H04N19/159
ELECTRICITY
H04N13/161
ELECTRICITY
H04N19/197
ELECTRICITY
H04N19/139
ELECTRICITY
International classification
H04N13/161
ELECTRICITY
H04N19/159
ELECTRICITY
H04N19/139
ELECTRICITY
H04N19/196
ELECTRICITY
Abstract
Embodiments are described which exploit a finding, according to which a higher compression rate or better rate/distortion ratio may be achieved by adopting or predicting second coding parameters used for encoding a second view of the multi-view signal from first coding parameters used in encoding a first view of the multi-view signal. In other words, the inventors found out that the redundancies between views of a multi-view signal are not restricted to the views themselves, such as the video information thereof, but that the coding parameters in parallely encoding these views show similarities which may be exploited in order to further improve the coding rate.
Claims
1. A decoder for decoding an encoded video data stream to generate a multi-view video, the decoder comprising: a view decoder configured for: extracting, from a data stream using a processor, first information associated with a first coding block in a first view of the multi-view video; determining, based on the first information, whether a motion parameter for the first coding block is to be adopted or predicted using one or more coding parameters associated with a second coding block in a second view of the multi-view video; responsive to an affirmation to the determining: obtaining, using the processor, the one or more coding parameters including a motion parameter associated with the second coding block, in case of an indication of prediction, predicting, using the processor, the motion parameter for the first coding block based on the motion parameter associated with the second coding block, and extracting prediction error data associated with the motion parameter for the first coding block, and in case of an indication of adoption, adopting, using the processor, the motion parameter for the first coding block based on the motion parameter associated with the second coding block; generating, using the processor, a prediction of the first coding block based on the motion parameter for the first coding block and in case of the indication of the prediction, the prediction error data; obtaining, from the data stream using the processor, residual data associated with the first coding block; and reconstructing, using the processor, the first coding block using the prediction of the first coding block and the residual data to produce a reconstructed first coding block in the first view of the multi-view video.
2. The decoder of claim 1, wherein each of the first and second views includes different types of information components.
3. The decoder of claim 2, wherein the different types of information components include a video and a depth map corresponding to the video.
4. The decoder of claim 3, wherein the first coding block comprises video data and is reconstructed based on a first subset of the one or more coding parameters associated with the second coding block.
5. The decoder of claim 4, wherein the first coding block comprises depth data and is reconstructed based on a second subset of the one or more coding parameters associated with the second coding block.
6. The decoder of claim 3, wherein the view decoder is further configured for: reconstructing a first depth coding block in the first view based on a first edge associated with the first depth coding block, the decoder further comprising another view decoder configured for: predicting a second edge associated with a second depth coding block of the second view based on the first edge; and reconstructing the second depth coding block of the second view based on the second edge.
7. The decoder of claim 1, wherein the first coding block is reconstructed in accordance with a first spatial resolution and the second coding block is reconstructed in accordance with a second spatial resolution.
8. The decoder of claim 1, wherein the decoder is further configured to generate an intermediate coding block based on the first and the second coding blocks.
9. The decoder of claim 1, wherein the decoder is further configured to generate an intermediate view based on the first and second views.
10. The decoder of claim 1, wherein if the first information is indicative that the motion parameter for the first coding block is to be predicted from a motion parameter associated with a previously-reconstructed portion of the first view, the view decoder is configured for: obtaining, using the processor, the motion parameter associated with the previously-reconstructed portion, and predicting, using the processor, the motion parameter for the first coding block based on the motion parameter associated with the previously-reconstructed portion.
11. The decoder of claim 1, wherein the one or more coding parameters include first parameters related to video of the second coding block and second parameters related to depth of the second coding block in the second view.
12. An encoder configured for encoding, into a data stream, a multi-view video, the encoder comprising: a view encoder configured for encoding, into the data stream using a processor, first information associated with a first coding block in a first view of a multi-view video, wherein the first information is indicative of whether a motion parameter for the first coding block is to be adopted or predicted using one or more coding parameters associated with a second coding block located in a second view of the multi-view video; and encoding, into the data stream using the processor, the one or more coding parameters including a motion parameter associated with the second coding block in the second view and in case of an indication of prediction, prediction error data associated with the motion parameter for the first coding block, if the first information indicates that the motion parameter for the first coding block is to be adopted or predicted using the one or more coding parameters associated with the second coding block in the second view; generating, using the processor, a prediction of the first coding block based at least on the motion parameter for the first coding block; determining, using the processor, residual data associated with the first coding block based on a difference of the first coding block and the prediction of the first coding block; and encoding, into the data stream using the processor, the residual data associated with the first coding block, wherein the first coding block is reconstructed using the prediction of the first coding block, the residual data, and in case of the indication of prediction, the prediction error data to produce a part of a video frame in the first view of the multi-view video.
13. The encoder of claim 12, wherein the first information is further indicative of whether a motion parameter for the first coding block is to be predicted from a motion parameter associated with a previously-reconstructed portion of the first view, and if so, the view encoder is configured for encoding, into the data stream using the processor, the motion parameter associated with the previously-reconstructed portion.
14. The encoder of claim 12, wherein each of the first and second views includes different types of information components.
15. The encoder of claim 14, wherein the different types of information components include a video and a depth map corresponding to the video.
16. The encoder of claim 15, wherein the first coding block comprises video data and is reconstructed based on a first subset of the one or more coding parameters associated with the second coding block.
17. The encoder of claim 16, wherein the first coding block comprises depth data and is reconstructed based on a second subset of the one or more coding parameters associated with the second coding block.
18. The encoder of claim 12, wherein the encoder is further configured to encode an intermediate view based on the first and second views.
19. A non-transitory computer-readable storage medium configured to store information comprising a data stream representing an encoded multi-view video and including: encoded first information associated with a first coding block in a first view of a multi-view, wherein the first coding block represents a part of a video frame in the first view and the first information is indicative of whether a motion parameter for the first coding block is to be adopted or predicted using one or more coding parameters associated with a second coding block located in a second view of the multi-view video; encoded one or more coding parameters including a motion parameter associated with the second coding block in the second view and in case of an indication of prediction, prediction error data associated with the motion parameter of the first coding block, if the first information indicates that the motion parameter for the first coding block is to be adopted or predicted using the one or more coding parameters associated with the second coding block in the second view; and encoded residual data associated with the first coding block based on a difference of the first coding block and a prediction of the first coding block, wherein the prediction of the first coding block is determined using the motion parameter for the first coding block, wherein the first coding block is reconstructed using the prediction of the first coding block, the residual data, and in case of the indication of prediction to produce the part of the video frame in the first view of the multi-view video.
20. The non-transitory computer-readable storage medium of claim 19, wherein the first information is further indicative of whether a motion parameter for the first coding block is to be predicted from a motion parameter associated with a previously-reconstructed portion of the first view, and if so, the data stream further includes an encoded motion parameter associated with the previously-reconstructed portion.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present application are described below with respect to the figures among which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION OF THE INVENTION
(10)
(11) The video 14 of the respective views 12.sub.2 and 12.sub.2 represent a spatio-temporal sampling of a projection of a common scene along different projection/viewing directions. Advantageously, the temporal sampling rate of the videos 14 of the views 12.sub.1 and 12.sub.2 are equal to each other although this constraint does not have to be necessarily fulfilled. As shown in
(12) Similarly, the depth/disparity map data 16 represents a spatio-temporal sampling of the depth of the scene objects of the common scene, measured along the respective viewing direction of views 12.sub.1 and 12.sub.2. The temporal sampling rate of the depth/disparity map data 16 may be equal to the temporal sampling rate of the associated video of the same view as depicted in
(13) In order to compress the multi-view signal 10 effectively, the encoder of
(14) In particular, the encoder of
(15) The coding branch 22.sub.v,1 is for encoding the video 14.sub.1 of the first view 12.sub.1 of the multi-view signal 12, and accordingly branch 22.sub.v,1 has an input for receiving the video 14.sub.1. Beyond this, branch 22.sub.v,1 comprises, connected in series to each other in the order mentioned, a subtracter 24, a quantization/transform module 26, a requantization/inverse-transform module 28, an adder 30, a further processing module 32, a decoded picture buffer 34, two prediction modules 36 and 38 which, in turn, are connected in parallel to each other, and a combiner or selector 40 which is connected between the outputs of the prediction modules 36 and 38 on the one hand the inverting input of subtracter 24 on the other hand. The output of combiner 40 is also connected to a further input of adder 30. The non-inverting input of subtracter 24 receives the video 14i.
(16) The elements 24 to 40 of coding branch 22.sub.v,1 cooperate in order to encode video 14.sub.1. The encoding encodes the video 14.sub.1 in units of certain portions. For example, in encoding the video 14.sub.1, the frames v.sub.1,k are segmented into segments such as blocks or other sample groups. The segmentation may be constant over time or may vary in time. Further, the segmentation may be known to encoder and decoder by default or may be signaled within the data stream 18. The segmentation may be a regular segmentation of the frames into blocks such as a non-overlapping arrangement of blocks in rows and columns, or may be a quad-tree based segmentation into blocks of varying size. A currently encoded segment of video 14.sub.1 entering at the non-inverting input of subtracter 24 is called a current portion of video 14.sub.1 in the following description.
(17) Prediction modules 36 and 38 are for predicting the current portion and to this end, prediction modules 36 and 38 have their inputs connected to the decoded picture buffer 34. In effect, both prediction modules 36 and 38 use previously reconstructed portions of video 14.sub.1 residing in the decoded picture buffer 34 in order to predict the current portion/segment entering the non-inverting input of subtracter 24. In this regard, prediction module 36 acts as an intra predictor spatially predicting the current portion of video 14i from spatially neighboring, already reconstructed portions of the same frame of the video 14i, whereas the prediction module 38 acts as an inter predictor temporally predicting the current portion from previously reconstructed frames of the video 14.sub.1. Both modules 36 and 38 perform their predictions in accordance with, or described by, certain prediction parameters. To be more precise, the latter parameters are determined be the encoder 20 in some optimization framework for optimizing some optimization aim such as optimizing a rate/distortion ratio under some, or without any, constraints such as maximum bitrate.
(18) For example, the intra prediction module 36 may determine spatial prediction parameters for the current portion such as a prediction direction along which content of neighboring, already reconstructed portions of the same frame of video 14i is expanded/copied into the current portion to predict the latter. The inter prediction module 38 may use motion compensation so as to predict the current portion from previously reconstructed frames and the inter prediction parameters involved therewith may comprise a motion vector, a reference frame index, a motion prediction subdivision information regarding the current portion, a hypothesis number or any combination thereof. The combiner 40 may combine one or more of predictions provided by modules 36 and 38 or select merely one thereof. The combiner or selector 40 forwards the resulting prediction of the current portion to the inserting input of subtractor 24 and the further input of adder 30, respectively.
(19) At the output of subtractor 24, the residual of the prediction of the current portion is output and quantization/transform module 36 is configured to transform this residual signal with quantizing the transform coefficients. The transform may be any spectrally decomposing transform such as a DCT. Due to the quantization, the processing result of the quantization/transform module 26 is irreversible. That is, coding loss results. The output of module 26 is the residual signal 42.sub.1 to be transmitted within the data stream. The residual signal 42.sub.1 is dequantized and inverse transformed in module 28 so as to reconstruct the residual signal as far as possible, i.e. so as to correspond to the residual signal as output by subtracter 24 despite the quantization noise. Adder 30 combines this reconstructed residual signal with the prediction of the current portion by summation. Other combinations would also be feasible. For example, the subtractor 24 could operate as a divider for measuring the residuum in ratios, and the adder could be implemented as a multiplier to reconstruct the current portion, in accordance with an alternative. The output of adder 30, thus, represents a preliminary reconstruction of the current portion. Further processing, however, in module 32 may optionally be used to enhance the reconstruction. Such further processing may, for example, involve deblocking, adaptive filtering and the like. All reconstructions available so far are buffered in the decoded picture buffer 34. Thus, the decoded picture buffer 34 buffers previously reconstructed frames of video 14.sub.1 and previously reconstructed portions of the current frame which the current portion belongs to.
(20) In order to enable the decoder to reconstruct the multi-view signal from data stream 18, quantization/transform module 26 forwards the residual signal 42.sub.1 to a multiplexer 44 of encoder 20. Concurrently, prediction module 36 forwards intra prediction parameters 46.sub.1 to multiplexer 44, inter prediction module 38 forwards inter prediction parameters 48.sub.1 to multiplexer 44 and further processing module 32 forwards further-processing parameters 50.sub.1 to multiplexer 44 which, in turn, multiplexes or inserts all this information into data stream 18.
(21) As became clear from the above discussion in accordance with the embodiment of
(22) The just-mentioned coding parameters inserted into the data stream 18 by coding branch 22.sub.v,1 may involve one, a combination or, or all of the following: First, the coding parameters for video 14.sub.1 may define/signal the segmentation of the frames of video 14.sub.1 as briefly discussed before. Further, the coding parameters may comprise coding mode information indicating for each segment or current portion, which coding mode is to be used to predict the respective segment such as intra prediction, inter prediction, or a combination thereof. The coding parameters may also comprise the just-mentioned prediction parameters such as intra prediction parameters for portions/segments predicted by intra prediction, and inter prediction parameters for inter predicted portions/segments. The coding parameters may, however, additionally comprise further-processing parameters 50.sub.1 signalling to the decoding side how to further process the already reconstructed portions of video 14.sub.1 before using same for predicting the current or following portions of video 14.sub.1. These further processing parameters 50i may comprise indices indexing respective filters, filter coefficients or the like. The prediction parameters 46.sub.1, 48.sub.1 and the further processing parameters 50.sub.1 may even additionally comprise sub-segmentation data in order to define a further sub-segmentation relative to the afore-mentioned segmentation defining the granularity of the mode selection, or defining a completely independent segmentation such as for the appliance of different adaptive filters for different portions of the frames within the further-processing. Coding parameters may also influence the determination of the residual signal and thus, be part of the residual signal 42.sub.1. For example, spectral transform coefficient levels output by quantization/transform module 26 may be regarded as correction data, whereas the quantization step size may be signalled within the data stream 18 as well, and the quantization step size parameter may be regarded as a coding parameter in the sense of the description brought forward below. The coding parameters may further define prediction parameters defining a second-stage prediction of the prediction residual of the first prediction stage discussed above. Intra/inter prediction may be used in this regard.
(23) In order to increase the coding efficiency, encoder 20 comprises a coding information exchange module 52 which receives all coding parameters and further information influencing, or being influenced by, the processing within modules 36, 38 and 32, for example, as illustratively indicated by vertically extending arrows pointing from the respective modules down to coding information exchange module 52. The coding information exchange module 52 is responsible for sharing the coding parameters and optionally further coding information among the coding branches 22 so that the branches may predict or adopt coding parameters from each other. In the embodiment of
(24) As already denoted above, the further coding branches 22 such as coding branch 22.sub.d,1, 22.sub.v,2 and 22.sub.d,2 act similar to coding branch 22.sub.v,1 in order to encode the respective input 16.sub.1, 14.sub.2 and 16.sub.2, respectively. However, due to the just-mentioned order among the videos and depth/disparity map data of views 12.sub.1 and 12.sub.2, respectively, and the corresponding order defined among the coding branches 22, coding branch 22.sub.d,1 has, for example, additional freedom in predicting coding parameters to be used for encoding current portions of the depth/disparity map data 16.sub.1 of the first view 12.sub.1. This is because of the afore-mentioned order among video and depth/disparity map data of the different views: For example, each of these entities is allowed to be encoded using reconstructed portions of itself as well as entities thereof preceding in the afore-mentioned order among these data entities. Accordingly, in encoding the depth/disparity map data 16.sub.1, the coding branch 22.sub.d,1 is allowed to use information known from previously reconstructed portions of the corresponding video 14.sub.1. How branch 22.sub.d,1 exploits the reconstructed portions of the video in order to predict some property of the depth/disparity map data 16.sub.1, which enables a better compression rate of the compression of the depth/disparity map data 16.sub.1, is described in more detail below. Beyond this, however, coding branch 22.sub.d,1 is able to predict/adopt coding parameters involved in encoding video 14.sub.1 as mentioned above, in order to obtain coding parameters for encoding the depth/disparity map data 16.sub.1. In case of adoption, the signaling of any coding parameters regarding the depth/disparity map data 16.sub.1 within the data stream 18 may be suppressed. In case of prediction, merely the prediction residual/correction data regarding these coding parameters may have to be signaled within the data stream 18. Examples for such prediction/adoption of coding parameters is described further below, too.
(25) Additional prediction capabilities are present for the subsequent data entities, namely video and the depth/disparity map data 16.sub.2 of the second view 12.sub.2. Regarding these coding branches, the inter prediction module thereof is able to not only perform temporal prediction, but also inter-view prediction. The corresponding inter prediction parameters comprise similar information as compared to temporal prediction, namely per interview predicted segment, a disparity vector, a view index, a reference frame index and/or an indication of a number of hypotheses, i.e. the indication of a number of inter predictions participating in forming the inter-view inter prediction by way of summation, for example. Such inter-view prediction is available not only for branch 22.sub.v,2 regarding the video 14.sub.2, but also for the inter prediction module 38 of branch 22.sub.d,2 regarding the depth/disparity map data 16.sub.2. Naturally, these inter-view prediction parameters also represent coding parameters which may serve as a basis for adoption/prediction for subsequent view data of a possible third view which is, however, not shown in
(26) Due to the above measures, the amount of data to be inserted into the data stream 18 by multiplexer 44 is further lowered. In particular, the amount of coding parameters of coding branches 21.sub.d,1, 22.sub.v,2 and 22.sub.d,2 may be greatly reduced by adopting coding parameters of preceding coding branches or merely inserting prediction residuals relative thereto into the data stream 28 via multiplexer 44. Due to the ability to choose between temporal and interview prediction, the amount of residual data 423 and 424 of coding branches 22.sub.v,2 and 22.sub.d,2 may be lowered, too. The reduction in the amount of residual data over-compensates the additional coding effort in differentiating temporal and inter-view prediction modes.
(27) In order to explain the principles of coding parameter adoption/prediction in more detail, reference is made to
(28) In encoding the depth/disparity map di t the coding branch 22.sub.d,i may exploit the above-mentioned possibilities in one or more of the below manners exemplified in the following with respect to
(29) For example, in encoding the depth/disparity map d.sub.1,t, coding branch 22.sub.d,1 may adopt the segmentation of video frame v.sub.1,t as used by coding branch 22.sub.v,1. Accordingly, if there are segmentation parameters within the coding parameters for video frame v.sub.1,t, the retransmission thereof for depth/disparity map data d.sub.1,t may be avoided. Alternatively, coding branch 22.sub.d,1 may use the segmentation of video frame v.sub.1,t, as a basis/prediction for the segmentation to be used for depth/disparity map d.sub.1,t with signalling the deviation of the segmentation relative to video frame v.sub.1,t via the data stream 18.
(30) Further, coding branch 22.sub.d,1 may adopt or predict the coding modes of the portions 66a, 66b and 66c of the depth/disparity map d.sub.1,t from the coding modes assigned to the respective portion 60a, 60b and 60c in video frame v.sub.1,t. In case of a differing segmentation between video frame v.sub.1,t and depth/disparity map d.sub.1,t, the adoption/prediction of coding modes from video frame v.sub.1,t, may be controlled such that the adoption/prediction is obtained from co-located portions of the segmentation of the video frame v.sub.1,t. An appropriate definition of co-location could be as follows. The co-located portion in video frame v.sub.1,t for a current portion in depth/disparity map d.sub.1,t, may, for example, be the one comprising the co-located position at the upper left corner of the current frame in the depth/disparity map d.sub.1,t. In case of prediction of the coding modes, coding branch 22.sub.d,1 may signal the coding mode deviations of the portions 66a to 66c of the depth/disparity map d.sub.1,t relative to the coding modes within video frame v.sub.1,t explicitly signalled within the data stream 18.
(31) As far as the prediction parameters are concerned, the coding branch 22.sub.d,1 has the freedom to spatially adopt or predict prediction parameters used to encode neighbouring portions within the same depth/disparity map d.sub.1,t or to adopt/predict same from prediction parameters used to encode co-located portions 60a to 6c of video frame v.sub.1,t. For example,
(32) In terms of coding efficiency, it might be favourable for the coding branch 22.sub.d,1 to have the ability to subdivide segments of the pre-segmentation of the depth/disparity map d.sub.1,t using a so called wedgelet separation line 70 with signalling the location of this wedgelet separation line 70 to the decoding side within data stream 18. By this measure, in the example of
(33)
(34) Despite this difference, coding branch 22.sub.v,2 may additionally exploit all of the information available form the encoding of video frame v.sub.1,t and depth/disparity map d.sub.1,t such as, in particular, the coding parameters used in these encodings. Accordingly, coding branch 22.sub.v,2 may adopt or predict the motion parameters including motion vector 78 for a temporally inter predicted portion 74a of video frame V.sub.2,t from any or, or a combination of, the motion vectors 62a and 68a of co-located portions 60a and 66a of the temporally aligned video frame v.sub.1,t and depth/disparity map d.sub.1,t respectively. If ever, a prediction residual may be signaled with respect to the inter prediction parameters for portion 74a. In this regard, it should be recalled that the motion vector 68a may have already been subject to prediction/adoption from motion vector 62a itself.
(35) The other possibilities of adopting/predicting coding parameters for encoding video frame v.sub.2,t as described above with respect to the encoding of depth/disparity map d.sub.1,t, are applicable to the encoding of the video frame v.sub.2,t by coding branch 22.sub.v,2 as well, with the available common data distributed by module 52 being, however, increased because the coding parameters of both the video frame v.sub.1,t and the corresponding depth/disparity map d.sub.1,t are available.
(36) Then, coding branch 22.sub.d,2 encodes the depth/disparity map d.sub.2,t similarly to the encoding of the depth/disparity map d.sub.1,t by coding branch 22.sub.d,1. This is true, for example, with respect to all of the coding parameter adoption/prediction occasions from the video frame v.sub.2,t of the same view 12.sub.2. Additionally, however, coding branch 22.sub.d,2 has the opportunity to also adopt/predict coding parameters from coding parameters having been used for encoding the depth/disparity map d.sub.1,t of the preceding view 12.sub.1. Additionally, coding branch 22.sub.d,2 may use inter-view prediction as explained with respect to the coding branch 22.sub.v,2.
(37) With regard to the coding parameter adoption/prediction, it may be worthwhile to restrict the possibility of the coding branch 22.sub.d,2 to adopt/predict its coding parameters from the coding parameters of previously coded entities of the multi-view signal 10 to the video 14.sub.2 of the same view 12.sub.2 and the depth/disparity map data 16.sub.1 of the neighboring, previously coded view 12.sub.1 in order to reduce the signaling overhead stemming from the necessity to signal to the decoding side within the data stream 18 the source of adoption/prediction for the respective portions of the depth/disparity map d.sub.2,t. For example, the coding branch 22.sub.d,2 may predict the prediction parameters for an interview predicted portion 80a of depth/disparity map d.sub.2,t including disparity vector 82 from the disparity vector 76 of the co-located portion 74b of video frame v.sub.2,t. In this case, an indication of the data entity from which the adoption/prediction is conducted, namely video 14.sub.2 in the case of
(38) Regarding the separation lines, the coding branch 22.sub.d,2 has the following options in addition to those already discussed above: For coding the depth/disparity map d.sub.2,t of view 12.sub.2 by using a wedgelet separation line, the corresponding disparity-compensated portions of signal d.sub.1,t can be used, such as by edge detection and implicitly deriving the corresponding wedgelet separation line. Disparity compensation is then used to transfer the detected line in depth/disparity map d.sub.1,t to depth/disparity map d.sub.2,t. For disparity compensation the foreground depth/disparity values along the respective detected edge in depth/disparity map d.sub.1,t may be used. Alternatively, for coding the depth/disparity map d.sub.2,t of view 12.sub.2 by using a wedgelet separation line, the corresponding disparity-compensated portions of signal d.sub.1,t can be used, by using a given wedgelet separation line in the disparity-compensated portion of d.sub.1,t, i.e. using a wedgelet separation line having been used in coding a co-located portion of the signal di t as a predictor or adopting same.
(39) After having described the encoder 20 of
(40)
(41) The decoder of
(42) The demultiplexer 104 is for distributing the data stream 18 to the various decoding branches 106. For example, the demultiplexer 104 provides the dequantization/inverse-transform module 28 with the residual data 42.sub.1, the further processing module 32 with the further-processing parameters 50.sub.1, the intra prediction module 36 with the intra prediction parameters 46.sub.1 and the inter prediction module 38 with the inter prediction modules 48.sub.1. The coding parameter exchanger 110 acts like the corresponding module 52 in
(43) The view extractor 108 receives the multi-view signal as reconstructed by the parallel decoding branches 106 and extracts therefrom one or several views 102 corresponding to the view angles or view directions prescribed by externally provided intermediate view extraction control data 112.
(44) Due to the similar construction of the decoder 100 relative to the corresponding portion of the encoder 20, its functionality up to the interface to the view extractor 108 is easily explained analogously to the above description.
(45) In fact, decoding branches 106.sub.v,1 and 106.sub.d,i act together to reconstruct the first view 12.sub.1 of the multi-view signal 10 from the data stream 18 by, according to first coding parameters contained in the data stream 18 (such as scaling parameters within 42.sub.1, the parameters 46.sub.1, 48.sub.1, 50.sub.1, and the corresponding non-adopted ones, and prediction residuals, of the coding parameters of the second branch 16.sub.d,i, namely 42.sub.2, parameters 46.sub.2, 48.sub.2, 50.sub.2), predicting a current portion of the first view 12.sub.1 from a previously reconstructed portion of the multi-view signal 10, reconstructed from the data stream 18 prior to the reconstruction of the current portion of the first view 12.sub.1 and correcting a prediction error of the prediction of the current portion of the first view 12.sub.1 using first correction data, i.e. within 42.sub.1 and 42.sub.2, also contained in the data stream 18. While decoding branch 106.sub.v,i is responsible for decoding the video 14.sub.1, a coding branch 106.sub.d,i assumes responsibility for reconstructing the depth/disparity map data 16.sub.1. See, for example,
(46) As far as the second decoding branch 106.sub.d,i is concerned, same has access not only to the residual data 422 and the corresponding prediction and filter parameters as signaled within the data stream 18 and distributed to the respective decoding branch 106.sub.d,i by demultiplexer 104, i.e. the coding parameters not predicted by across inter-view boundaries, but also indirectly to the coding parameters and correction data provided via demultiplexer 104 to decoding branch 106.sub.v,i or any information derivable therefrom, as distributed via coding information exchange module 110. Thus, the decoding branch 106.sub.d,i determines its coding parameters for reconstructing the depth/disparity map data 16.sub.1 from a portion of the coding parameters forwarded via demultiplexer 104 to the pair of decoding branches 106.sub.v,i and 106.sub.d,i for the first view 12.sub.1, which partially overlaps the portion of these coding parameters especially dedicated and forwarded to the decoding branch 106.sub.v,i. For example, decoding branch 106.sub.d,i determines motion vector 68a from motion vector 62a explicitly transmitted within 48l, for example, as a motion vector difference to another neighboring portion of frame v.sub.1,t, on the on hand, and a motion vector difference explicitly transmitted within 48.sub.2, on the on hand. Additionally, or alternatively, the decoding branch 106.sub.d,i may use reconstructed portions of the video 14.sub.1 as described above with respect to the prediction of the wedgelet separation line to predict coding parameters for decoding depth/disparity map data 16.sub.1.
(47) To be even more precise, the decoding branch 106.sub.d,i reconstructs the depth/disparity map data 14.sub.1 of the first view 12.sub.1 from the data stream by use of coding parameters which are at least partially predicted from the coding parameters used by the decoding branch 106.sub.v,i (or adopted therefrom) and/or predicted from the reconstructed portions of video 14.sub.1 in the decoded picture buffer 34 of the decoding branch 106.sub.v,i. Prediction residuals of the coding parameters may be obtained via demultiplexer 104 from the data stream 18. Other coding parameters for decoding branch 106.sub.d,i may be transmitted within data stream 108 in full or with respect to another basis, namely referring to a coding parameter having been used for coding any of the previously reconstructed portions of depth/disparity map data 16.sub.1 itself. Based on these coding parameters, the decoding branch 106.sub.d,i predicts a current portion of the depth/disparity map data 14.sub.1 from a previously reconstructed portion of the depth/disparity map data 16i, reconstructed from the data stream 18 by the decoding branch 106.sub.d,i prior to the reconstruction of the current portion of the depth/disparity map data 16.sub.1, and correcting a prediction error of the prediction of the current portion of the depth/disparity map data 16.sub.1 using the respective correction data 42.sub.2.
(48) Thus, the data stream 18 may comprise for a portion such as portion 66a of the depth/disparity map data 16.sub.1, the following:
(49) an indication as to whether, or as to which part of, the coding parameters for that current portion are to be adopted or predicted from corresponding coding parameters, for example, of a co-located and time-aligned portion of video 14.sub.1 (or from other video 14.sub.1 specific data such as the reconstructed version thereof in order to predict the wedgelet separation line),
(50) if so, in case of prediction, the coding parameter residual,
(51) if not, all coding parameters for the current portion, wherein same may be signaled as prediction residuals compared to coding parameters of previously reconstructed portions of the depth/disparity map data 16.sub.1
(52) if not all coding parameters are to be predicted/adapted as mentioned above, a remaining part of the coding parameters for the current portion, wherein same may be signaled as prediction residuals compared to coding parameters of previously reconstructed portions of the depth/disparity map data 16.sub.1.
(53) For example, if the current portion is an inter predicted portion such as portion 66a, the motion vector 68a may be signaled within the data stream 18 as being adopted or predicted from motion vector 62a. Further, decoding branch 106.sub.d,i may predict the location of the wedgelet separation line 70 depending on detected edges 72 in the reconstructed portions of video 14.sub.1 as described above and apply this wedgelet separation line either without any signalization within the data stream 18 or depending on a respective application signalization within the data stream 18. In other words, the appliance of the wedgelet separation line prediction for a current frame may be suppressed or allowed by way of signalization within the data stream 18. In even other words, the decoding branch 106.sub.d,i may effectively predict the circumference of the currently reconstructed portion of the depth/disparity map data.
(54) The functionality of the pair of decoding branches 106.sub.v,2 and 106.sub.d,2 for the second view 12.sub.2 is, as already described above with respect to encoding, similar as for the first view 12.sub.1. Both branches cooperate to reconstruct the second view 12.sub.2 of the multi-view signal 10 from the data stream 18 by use of own coding parameters. Merely that part of these coding parameters needs to be transmitted and distributed via demultiplexer 104 to any of these two decoding branches 106.sub.v,2 and 106.sub.d,2, which is not adopted/predicted across the view boundary between views 14.sub.1 and 14.sub.2, and, optionally, a residual of the inter-view predicted part. Current portions of the second view 12.sub.2 are predicted from previously reconstructed portions of the multi-view signal 10, reconstructed from the data stream 18 by any of the decoding branches 106 prior to the reconstruction of the respective current portions of the second view 12.sub.2, and correcting the prediction error accordingly using the correction data, i.e. 423 and 424, forwarded by the demultiplexer 104 to this pair of decoding branches 106.sub.v,2 and 106.sub.d,2.
(55) The decoding branch 106.sub.v,2 is configured to at least partially adopt or predict its coding parameters from the coding parameters used by any of the decoding branches 106.sub.v,i and 106.sub.d,i. The following information on coding parameters may be present for a current portion of the video 14.sub.2:
(56) an indication as to whether, or as to which part of, the coding parameters for that current portion are to be adopted or predicted from corresponding coding parameters, for example, of a co-located and time-aligned portion of video 14.sub.1 or depth/disparity data 16.sub.1,
(57) if so, in case of prediction, the coding parameter residual,
(58) if not, all coding parameters for the current portion, wherein same may be signaled as prediction residuals compared to coding parameters of previously reconstructed portions of the video 14.sub.2
(59) if not all coding parameters are to be predicted/adapted as mentioned above, a remaining part of the coding parameters for the current portion, wherein same may be signaled as prediction residuals compared to coding parameters of previously reconstructed portions of the video 14.sub.2.
(60) a signalization within the data stream 18 may signalize for a current portion 74a whether the corresponding coding parameters for that portion, such as motion vector 78, is to be read from the data stream completely anew, spatially predicted or predicted from a motion vector of a co-located portion of the video 14.sub.1 or depth/disparity map data 16.sub.1 of the first view 12.sub.1 and the decoding branch 106.sub.v,2 may act accordingly, i.e. by extracting motion vector 78 from the data stream 18 in full, adopting or predicting same with, in the latter case, extracting prediction error data regarding the coding parameters for the current portion 74a from the data stream 18.
(61) Decoding branch 106.sub.d,2 may act similarly. That is, the decoding branch 106.sub.d,2 may determine its coding parameters at last partially by adoption/prediction from coding parameters used by any of decoding branches 106.sub.v,i, 106.sub.d,i and 106.sub.v,2, from the reconstructed video 14.sub.2 and/or from the reconstructed depth/disparity map data 16.sub.1 of the first view 12.sub.1. For example, the data stream 18 may signal for a current portion 80b of the depth/disparity map data 16.sub.2 as to whether, and as to which part of, the coding parameters for this current portion 80b is to be adopted or predicted from a co-located portion of any of the video 14.sub.1, depth/disparity map data 16.sub.1 and video 14.sub.2 or a proper subset thereof. The part of interest of these coding parameters may involve, for example, a motion vector such as 84, or a disparity vector such as disparity vector 82. Further, other coding parameters, such as regarding the wedgelet separation lines, may be derived by decoding branch 106.sub.d,2 by use of edge detection within video 14.sub.2. Alternatively, edge detection may even be applied to the reconstructed depth/disparity map data 16.sub.1 with applying a predetermined re-projection in order to transfer the location of the detected edge in the depth/disparity map d.sub.1,t to the depth/disparity map d.sub.2,t in order to serve as a basis for a prediction of the location of a wedgelet separation line.
(62) In any case, the reconstructed portions of the multi-view data 10 arrive at the view extractor 108 where the views contained therein are the basis for a view extraction of new views, i.e. the videos associated with these new views, for example. This view extraction may comprise or involve a re-projection of the videos 14.sub.1 and 14.sub.2 by using the depth/disparity map data associated therewith. Frankly speaking, in re-projecting a video into another intermediate view, portions of the video corresponding to scene portions positioned nearer to the viewer are shifted along the disparity direction, i.e. the direction of the viewing direction difference vector, more than portions of the video corresponding to scene portions located farther away from the viewer position. An example for the view extraction performed by view extractor 108 is outlined below with respect to
(63) However, before describing further embodiments below, it should be noted that several amendments may be performed with respect to the embodiments outlined above. For example, the multi-view signal 10 does not have to necessarily comprise the depth/disparity map data for each view. It is even possible that none of the views of the multi-view signal 10 has a depth/disparity map data associated therewith. Nevertheless, the coding parameter reuse and sharing among the multiple views as outlined above yields a coding efficiency increase. Further, for some views, the depth/disparity map data may be restricted to be transmitted within the data stream to disocclusion areas, i.e. areas which are to fill disoccluded areas in re-projected views from other views of the multi-vie signal with being set to a don't care value in the remaining areas of the maps.
(64) As already noted above, the views 12.sub.1 and 12.sub.2 of the multi-view signal 10 may have different spatial resolutions. That is, they may be transmitted within the data stream 18 using different resolutions. In even other words, the spatial resolution at which coding branches 22.sub.v,i and 22d,i perform the predictive coding may be higher than the spatial resolution at which coding branches 22.sub.v,2 and 22.sub.d,2 perform the predictive coding of the subsequent view 12.sub.2 following view 12.sub.1 in the above-mentioned order among the views. The inventors of the present invention found out that this measure additionally improves the rate/distortion ratio when considering the quality of the synthesized views 102. For example, the encoder of
(65) It should also be mentioned that the embodiments may be modified in terms of the internal structure of the coding/decoding branches. For example, the intra-prediction modes may not be present, i.e. no spatial prediction modes may be available. Similarly, any of interview and temporal prediction modes may be left away. Moreover, all of the further processing options are optional. On the other hand, out-of-loop post-processing modules maybe present at the outputs of decoding branches 106 in order to, for example, perform adaptive filtering or other quality enhancing measures and/or the above-mentioned up-sampling. Further, no transformation of the residual may be performed. Rather, the residual may be transmitted in the spatial domain rather than the frequency domain. In a more general sense, the hybrid coding/decoding designs shown in
(66) It should also be mentioned that the decoder does not necessarily comprise the view extractor 108. Rather, view extractor 108 may not be present. In this case, the decoder 100 is merely for reconstructing any of the views 12.sub.1 and 12.sub.2, such as one, several or all of them. In case no depth/disparity data is present for the individual views 12.sub.1 and 12.sub.2, a view extractor 108 may, nevertheless, perform an intermediate view extraction by exploiting the disparity vectors relating corresponding portions of neighboring views to each other. Using these disparity vectors as supporting disparity vectors of a disparity vector field associated with videos of neighboring views, the view extractor 108 may build an intermediate view video from such videos of neighboring views 12.sub.1 and 12.sub.2 by applying this disparity vector field. Imagine, for example, that video frame V.sub.2,t had 50% of its portions/segments interview predicted. That is, for 50% of the portions/segments, disparity vectors would exist. For the remaining portions, disparity vectors could be determined by the view extractor 108 by way of interpolation/extrapolation in the spatial sense. Temporal interpolation using disparity vectors for portions/segments of previously reconstructed frames of video 14.sub.2 may also be used. Video frame v.sub.2,t and/or reference video frame v.sub.1,t, may then be distorted according to these disparity vectors in order to yield an intermediate view. To this end, the disparity vectors are scaled in accordance with the intermediate view position of the intermediate view between view positions of the first view 12.sub.1 and a second view 12.sub.2. Details regarding this procedure are outlined in more detail below.
(67) A coding efficiency gain is obtained by using the above-mentioned option of determining wedgelet separation lines so as to extend along detected edges in a reconstructed current frame of the video. Thus, as explained above the wedeglet separation line position prediction described above may be used for each of the views, i.e. all of them or merely a proper subset thereof.
(68) Insofar, the above discussion of
(69) Summarizing some of the above embodiments, these embodiments enable view extraction from commonly decoding multi-view video and supplementary data. The term “supplementary data” is used in the following in order to denote depth/disparity map data. According to these embodiments, the multi-view video and the supplementary data is embedded in one compressed representation. The supplementary data may consist of per-pixel depth maps, disparity data or 3D wire frames. The extracted views 102 can be different from the views 12.sub.1, 12.sub.2 contained in the compressed representation or bitstream 18 in terms of view number and spatial position. The compressed representation 18 has been generated before by an encoder 20, which might use the supplementary data to also improve the coding of the video data.
(70) In contrast to current state-of-the-art methods, a joint decoding is carried out, where the decoding of video and supplementary data may be supported and controlled by common information. Examples are a common set of motion or disparity vectors, which is used to decode the video as well as the supplementary data. Finally, views are extracted from the decoded video data, supplementary data and possible combined data, where the number and position of extracted views is controlled by an extraction control at the receiving device.
(71) Further, the multi-view compression concept described above is useable in connection with disparity-based view synthesis. Disparity-based view synthesis means the following. If scene content is captured with multiple cameras, such as the videos 14.sub.1 and 14.sub.2, a 3D perception of this content can be presented to the viewer. For this, stereo pairs have to be provided with slightly different viewing direction for the left and right eye. The shift of the same content in both views for equal time instances is represented by the disparity vector. Similar to this, the content shift within a sequence between different time instances is the motion vector, as shown in
(72) Usually, disparity is estimated directly or as scene depth, provided externally or recorded with special sensors or cameras. Motion estimation is already carried out by a standard coder. If multiple views are coded together, the temporal and inter-view direction are treated similarly, such that motion estimation is carried out in temporal as well as interview direction during encoding. This has already been described above with respect to
(73) Consider a pixel p.sub.1(x.sub.1,y.sub.1) in view 1 at position (x.sub.1,y.sub.1) and a pixel p.sub.2(x.sub.2,y.sub.2) in view 2 at position (x.sub.2,y.sub.2), which have identical luminance values. Then,
p.sub.1(x.sub.1,y.sub.1)=p.sub.2(x.sub.2,y.sub.2). (1)
(74) Their positions (x.sub.1,y.sub.1) and (X.sub.2,y.sub.2) are connected by the 2D disparity vector, e.g. from view 2 to view 1, which is d.sub.21(x.sub.2,y.sub.2) with components d.sub.x,21(x.sub.2,y.sub.2) and d.sub.y,21(x.sub.2,y.sub.2). Thus, the following equation holds:
(x.sub.1,y.sub.1)=(x.sub.2+d.sub.x,21(x.sub.2,y.sub.2),y.sub.2+d.sub.y,21(x.sub.2,y.sub.2)). (2)
Combining (1) and (2),
P.sub.1(x.sub.2+d.sub.x,21(x.sub.2,y.sub.2),y.sub.2+d.sub.y,21(x.sub.2,y.sub.2))=p.sub.2(x.sub.2,y.sub.2). (3)
(75) As shown in
(76) Therefore, new intermediate views can be generated with any position between view 1 and view 2.
(77) Beyond this, also view extrapolation can also be achieved by using scaling factors K<0 and K>1 for the disparities.
(78) These scaling methods can also be applied in temporal direction, such that new frames can be extracted by scaling the motion vectors, which leads to the generation of higher frame rate video sequences.
(79) Now, returning to the embodiments described above with respect to
(80) The common information may also be used as a predictor from one decoding branch (e.g. for video) to be refined in the other branch (e.g. supplementary data) and vice versa. This may include e.g. refinement of motion or disparity vectors, initialization of block structure in supplementary data by the block structure of video data, extracting a straight line from the luminance or chrominance edge or contour information from a video block and using this line for a wedgelet separation line prediction (with same angle but possibly different position in the corresponding depth block keeping the angle. The common information module also transfers partially reconstructed data from one decoding branch to the other. Finally, data from this module may also be handed to the view extraction module, where all necessitated views, e.g. for a display are extracted (displays can be 2D, stereoscopic with two views, autostereoscopic with N views).
(81) One important aspect is that if more than one single pair of view and depth/supplementary signal is encoded/decoded by using the above described en-/decoding structure, an application scenario may be considered where we have to transmit for each time instant t a pair of color views .sub.vColor_1(t), v.sub.Color_2(t) together with the corresponding depth data v.sub.Depth_1(t) and v.sub.Depth_2(f). The above embodiments suggest encoding/decoding first the signal v.sub.Color_1(t), e.g., by using conventional motion-compensated prediction. Then, in a second step, for encoding/decoding of the corresponding depth signal v.sub.Depth_1(f) information from the encoded/decoded signal v.sub.Color_1(t) can be reused, as outlined above. Subsequently, the accumulated information from v.sub.Color_1(t) and v.sub.Depth_1(t) can be further utilized for encoding/decoding of v.sub.Color_2(t) and/or v.sub.Depth_2(t). Thus, by sharing and reusing common information between the different views and/or depths redundancies can be exploited to a large extent.
(82) The decoding and view extraction structure of
(83) As shown, the structure of the decoder of
(84) The decoding process starts with receiving a common compressed representation or bit stream, which contains video data, supplementary data as well as information, common to both, e.g. motion or disparity vectors, control information, block partitioning information, prediction modes, contour data, etc. from one or more views.
(85) First, an entropy decoding is applied to the bit stream to extract the quantized transform coefficients for video and supplementary data, which are fed into the two separate coding branches, highlighted by the doted grey boxes in
(86) Both decoding branches operate similar after entropy decoding. The received quantized transform coefficients are scaled and an inverse transform is applied to obtain the difference signal. To this, previously decoded data from temporal or neighboring views is added. The type of information to be added is controlled by special control data: In the case of intra coded video or supplementary data, no previous or neighboring information is available, such that intra frame reconstruction is applied. For inter coded video or supplementary data, previously decoded data from temporally preceding or neighboring views is available (current switch setting in
(87) After this improvement stage, the reconstructed data is transferred to the decoded picture buffer. This buffer orders the decoded data and outputs the decoded pictures in the correct temporal order for each time instance. The stored data is also used for the next processing cycle to serve as input to the scalable motion/disparity compensation.
(88) In addition to this separate video and supplementary decoding, the new Common Information Module is used, which processes any data, which is common to video and supplementary data. Examples of common information include shared motion/disparity vectors, block partitioning information, prediction modes, contour data, control data, but also common transformation coefficients or modes, view enhancement data, etc. Any data, which is processed in the individual video and supplementary modules, may also be part of the common module. Therefore, connections to and from the common module to all parts of the individual decoding branches may exist. Also, the common information module may contain enough data, that only one separate decoding branch and the common module are necessitated in order to decoded all video and supplementary data. An example for this is a compressed representation, where some parts only contain video data and all other parts contain common video and supplementary data. Here, the video data is decoded in the video decoding branch, while all supplementary data is processed in the common module and output to the view synthesis. Thus, in this example, the separate supplementary branch is not used. Also, individual data from modules of the separate decoding branches may send information back to the Common Information Processing module, e.g. in the form of partially decoded data, to be used there or transferred to the other decoding branch. An example is decoded video data, like transform coefficients, motion vectors, modes or settings, which are transferred to the appropriate supplementary decoding modules.
(89) After decoding, the reconstructed video and supplementary data are transferred to the view extraction either from the separate decoding branches or from the Common Information Module. In the View Extraction Module, such as 110 in
(90) As an example, consider the setting in
(91) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
(92) The inventive encoded multi-view signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
(93) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(94) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(95) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(96) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(97) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(98) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
(99) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(100) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(101) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(102) A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
(103) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
(104) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.