Video codec using template matching prediction
11546630 · 2023-01-03
Assignee
Inventors
Cpc classification
H04N19/533
ELECTRICITY
H04N19/56
ELECTRICITY
H04N19/105
ELECTRICITY
H04N19/134
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/46
ELECTRICITY
H04N19/57
ELECTRICITY
International classification
H04N19/46
ELECTRICITY
H04N19/56
ELECTRICITY
H04N19/57
ELECTRICITY
H04N19/105
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/169
ELECTRICITY
Abstract
Video decoder and/or video encoder, configured to determine a set of search area location candidates in a reference picture of a video; match the set of search area location candidates with a current template area adjacent to a current block of a current picture to obtain a best matching search area location candidate; select, out of a search area positioned in the reference picture at the best matching search area location candidate, a set of one or more predictor blocks by matching the current template area against the search area; and predictively decode/encode the current block from/into a data stream based on the set of one or more predictor blocks.
Claims
1. A video decoder, configured to: determine a set of search area location candidates in a reference picture of a video; match the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; select, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively decode the current block from a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the video decoder is configured to select the predetermined search region out of the search regions based on a signalization in the data stream; and restrict the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the video decoder is configured to decode the search region index from the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
2. The video decoder of claim 1, configured to determine the set of search area location candidates using one or more motion vector predictors spatially and/or temporally predicted for the current block.
3. The video decoder of claim 2, configured to round the one or more motion vector predictors to integer-sample positions in order to acquire the set of search area location candidates.
4. The video decoder of claim 1, configured to check whether a predicted search area location in the reference picture, colocated to the current template area of the current block, is contained in the set of search area location candidates, and if not, add the predicted search area location to the set of search area location candidates.
5. The video decoder of claim 1, configured to, in matching the set of search area location candidates with the current template area, for each of the set of search area location candidates, determine a similarity of the reference picture, at one or more positions, at and/or around the respective search area location candidate, to the current template area, and appoint a search area location candidate the best matching search area location candidate for which the similarity is highest.
6. The video decoder of claim 5, configured to determine the similarity by way of a sum of squared sample differences.
7. The video decoder of claim 6, configured to determine the similarity at the one or more positions by determining the sum of squared sample differences between the current template area and a coshaped candidate template area at the one or more positions in the reference picture, wherein the best matching search area location candidate is associated with a least sum of squared sample differences out of a set of sum of squared differences.
8. The video decoder of claim 1, wherein the search area is subdivided into the search regions so that each of the further search regions extends circumferentially around the first region in an incomplete manner, and wherein the variable length code assigns second codewords to the further search regions which are of mutually equal length.
9. The video decoder of claim 1, configured to decode the current block by determining a linear combination of the set of one or more predictor blocks.
10. The video decoder of claim 1, configured to decode the current block based on an average, including a normal average, a weighted average, or a combination of both, of the set of one or more predictor blocks, or based on an average of a subset out of the set of one or more predictor blocks with a subset excluding predictor blocks from the set of one or more predictor blocks whose reference template area matches with the current template area more than a predetermined threshold worse than that for a best matching predictor block in the set of one or more predictor blocks.
11. The video decoder of claim 1, configured to, in predictively decoding the current block from the data stream based on the set of one or more predictor blocks, sort and weight the set of one or more predictor blocks based on a similarity of a reference template area of each of the predictor blocks and the current template area, and determine the current block P.sub.final according to
12. The video decoder of claim 1, configured to read a merge flag from the data stream, if the merge flag is in a first state, decode a motion vector from the data stream for the current block and predictively decode the current block using the motion vector by motion compensated prediction, and if the merge flag is in a second state, read a region-based template matching merge flag from the data stream, wherein if the region-based template matching merge flag is in a first state, read a merge index from the data stream, use the merge index to select a merge candidate out of a merge candidate list and predictively decode the current block using motion information associated with the selected merge candidate by motion compensated prediction, and if the region-based template matching merge flag is in a second state, perform the determination of the set of search area location candidates, the matching of the set of search area location candidates with the current template area, the selection of the set of one or more predictor blocks and the predictively decoding of the current block from the data stream based on the set of one or more predictor blocks.
13. A method for video decoding, comprising; determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively decoding the current block from a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the method comprises selecting the predetermined search region out of the search regions based on a signalization in the data stream; and restricting the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the method comprises decoding the search region index from the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
14. A non-transitory digital storage medium having a computer program stored thereon to perform a method for video decoding, when the computer program is run by a computer, the method comprising: determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively decoding the current block from a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the method comprises selecting the predetermined search region out of the search regions based on a signalization in the data stream; and restricting the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the method comprises decoding the search region index from the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
15. A video encoder, configured to: determine a set of search area location candidates in a reference picture of a video; match the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; select, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively encode the current block into a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the video encoder is configured to select the predetermined search region out of the search regions and signal the selected search region into the data stream; and restrict the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the video encoder is configured to encode the search region index into the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
16. The video encoder of claim 15, configured to determine the set of search area location candidates using one or more motion vector predictors spatially and/or temporally predicted for the current block.
17. The video encoder of claim 16, configured to round the one or more motion vector predictors to integer-sample positions in order to acquire the set of search area location candidates.
18. The video encoder of claim 15, configured to check whether a predicted search area location in the reference picture, colocated to the current template area of the current block, is contained in the set of search area location candidates, and if not, add the predicted search area location to the set of search area location candidates.
19. The video encoder of claim 15, configured to, in matching the set of search area location candidates with a current template area, for each of the set of search area location candidates, determine a similarity of the reference picture, at one or more positions, at and/or around the respective search area location candidate, to the current template area, and appoint a search area location candidate the best matching search area location candidate for which the similarity is highest.
20. The video encoder of claim 19, configured to determine the similarity by way of a sum of squared sample differences.
21. The video encoder of claim 20, configured to determine the similarity at the one or more positions by determining differences between the current template area and a coshaped candidate template area at the one or more positions in the reference picture.
22. The video encoder of claim 15, wherein the search area is subdivided into the search regions so that each of the further search regions extends circumferentially around the first region in an incomplete manner and wherein the variable length code assigns second codewords to the further search regions which are of mutually equal length.
23. The video encoder of claim 15, configured to encode the current block by determining a linear combination of the set of one or more predictor blocks.
24. The video encoder of claim 15, configured to encode the current block based on an average, including a normal average, a weighted average, or a combination of both, of the set of one or more predictor blocks, or based on an average of a subset out of the set of one or more predictor blocks with a subset excluding predictor blocks from the set of one or more predictor blocks whose reference template area matches with the current template area more than a predetermined threshold worse than that for a best matching predictor block in the set of one or more predictor blocks.
25. The video encoder of claim 15, configured to, in predictively encoding the current block into a data stream based on the set of one or more predictor blocks, sort and weight the set of one or more predictor blocks based on a similarity of a reference template area of each of the predictor blocks and the current template area, and determine the current block P.sub.final according to
26. The video encoder of claim 15, configured to write a merge flag into the data stream if the merge flag is in a first state, predictively encode the current block into the data stream using a motion vector by motion compensated prediction and encode the motion vector into the data stream for the current block, and if the merge flag is in a second state, write a region-based template matching merge flag into the data stream, wherein if the region-based template matching merge flag is in a first state, select a merge candidate out of a merge candidate list, predictively encode the current block using motion information associated with the selected merge candidate by motion compensated prediction and write a merge index into the data stream, associated with the merge candidate, and if the region-based template matching merge flag is in a second state, perform the determination of the set of search area location candidates, the matching of the set of search area location candidates with the current template area, the selection of the set of one or more predictor blocks and the predictively encoding of the current block into the data stream based on the set of one or more predictor blocks.
27. A method for video encoding, comprising: determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively encoding the current block into a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the method comprises selecting the predetermined search region out of the search regions and signal the selected search region into the data stream; and restricting the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the method comprises encoding the search region index into the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
28. A non-transitory digital storage medium having a computer program stored thereon to perform a method for video encoding, when the computer program is run by a computer, the method comprising: determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively encoding the current block into a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the method comprises selecting the predetermined search region out of the search regions and signal the selected search region into the data stream; and restricting the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the method comprises encoding the search region index into the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
29. A non-transitory digital storage medium storing a data stream acquired by a method for video encoding, the method comprising: determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively encoding the current block into a data stream based on the set of one or more predictor blocks, wherein the search area is subdivided into search regions, and the method comprises selecting the predetermined search region out of the search regions and signal the selected search region into the data stream; and restricting the selection of the set of one or more predictor blocks, by matching the current template area against the search area, to the predetermined search region, wherein the search area is subdivided into the search regions so that a first search region is arranged in a middle of the search area and further search regions are arranged in a manner surrounding the first search region, wherein the signalization comprises a search region index indexing the predetermined search region out of the search regions, and wherein the method comprises encoding the search region index into the data stream using a variable length code which assigns a first codeword of a shortest codeword length of the variable length code to the first search region.
30. A video decoder, configured to: determine a set of search area location candidates in a reference picture of a video; match the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; select, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively decode the current block from a data stream based on the set of one or more predictor blocks, wherein the video decoder is configured to, in predictively decoding the current block from the data stream based on the set of one or more predictor blocks, sort and weight the set of one or more predictor blocks based on a similarity of a reference template area of each of the predictor blocks and the current template area, and determine the current block P.sub.final according to
31. A method for video decoding, comprising: determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively decoding the current block from a data stream based on the set of one or more predictor blocks, wherein the method comprises, in predictively decoding the current block from the data stream based on the set of one or more predictor blocks, sorting and weighting the set of one or more predictor blocks based on a similarity of a reference template area of each of the predictor blocks and the current template area, and determining the current block P.sub.final according to
32. A non-transitory digital storage medium having a computer program stored thereon to perform the method for video decoding according to claim 31.
33. A video encoder, configured to: determine a set of search area location candidates in a reference picture of a video; match the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; select, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively encode the current block into a data stream based on the set of one or more predictor blocks, wherein the video encoder is configured to, in predictively encoding the current block into a data stream based on the set of one or more predictor blocks, sort and weight the set of one or more predictor blocks based on a similarity of a reference template area of each of the predictor blocks and the current template area, and determine the current block P.sub.final according to
34. A method for video encoding, comprising: determining a set of search area location candidates in a reference picture of a video; matching the set of search area location candidates with a current template area adjacent to a current block of a current picture to acquire a best matching search area location candidate; selecting, out of a search area positioned in the reference picture at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks by matching the current template area against the search area or the predetermined search region within the search area; and predictively encoding the current block into a data stream based on the set of one or more predictor blocks, wherein the method comprises, in predictively encoding the current block into a data stream based on the set of one or more predictor blocks, sorting and weighting the set of one or more predictor blocks based on a similarity of a reference template area of each of the predictor blocks and the current template area, and determining the current block P.sub.final according to
35. A non-transitory digital storage medium having a computer program stored thereon to perform the method for video decoding according to claim 34.
36. A non-transitory digital storage medium storing a data stream acquired by the method for video encoding according to claim 34.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
DETAILED DESCRIPTION OF THE INVENTION
(22) Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.
(23) In the following description, a plurality of details is set forth to provide a more throughout explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described herein after may be combined with each other, unless specifically noted otherwise.
(24) The following description of the figures starts with a presentation of a description of an encoder and a decoder of a block-based predictive codec for coding pictures of a video in order to form an example for a coding framework into which embodiments of the present invention may be built in. The respective encoder and decoder are described with respect to
(25)
(26) The encoder 10 is configured to subject the prediction residual signal to spatial-to-spectral transformation and to encode the prediction residual signal, thus obtained, into the data stream 14. Likewise, the decoder 20 is configured to decode the prediction residual signal from the data stream 14 and subject the prediction residual signal, thus obtained, to spectral-to-spatial transformation.
(27) Internally, the encoder 10 may comprise a prediction residual signal former 22 which generates a prediction residual 24 so as to measure a deviation of a prediction signal 26 from the original signal, i.e. from the picture 12, wherein the prediction signal 26 can be interpreted as a linear combination of a set of one or more predictor blocks according to an embodiment of the present invention. The prediction residual signal former 22 may, for instance, be a subtractor which subtracts the prediction signal from the original signal, i.e. from the picture 12. The encoder 10 then further comprises a transformer 28 which subjects the prediction residual signal 24 to a spatial-to-spectral transformation to obtain a spectral-domain prediction residual signal 24′ which is then subject to quantization by a quantizer 32, also comprised by the encoder 10. The thus quantized prediction residual signal 24″ is coded into bitstream 14. To this end, encoder 10 may optionally comprise an entropy coder 34 which entropy codes the prediction residual signal as transformed and quantized into data stream 14.
(28) The prediction signal 26 is generated by a prediction stage 36 of encoder 10 on the basis of the prediction residual signal 24″ encoded into, and decodable from, data stream 14. To this end, the prediction stage 36 may internally, as is shown in
(29) Likewise, decoder 20, as shown in
(30) Although not specifically described above, it is readily clear that the encoder 10 may set some coding parameters including, for instance, prediction modes, motion parameters and the like, according to some optimization scheme such as, for instance, in a manner optimizing some rate and distortion related criterion, i.e. coding cost. For example, encoder 10 and decoder 20 and the corresponding modules 44, 58, respectively, may support different prediction modes such as intra-coding modes and inter-coding modes. The granularity at which encoder and decoder switch between these prediction mode types may correspond to a subdivision of picture 12 and 12′, respectively, into coding segments or coding blocks. In units of these coding segments, for instance, the picture may be subdivided into blocks being intra-coded and blocks being inter-coded.
(31) Intra-coded blocks are predicted on the basis of a spatial, already coded/decoded neighborhood (e. g. a current template) of the respective block (e. g. a current block) as is outlined in more detail below. Several intra-coding modes may exist and be selected for a respective intra-coded segment including directional or angular intra-coding modes according to which the respective segment is filled by extrapolating the sample values of the neighborhood along a certain direction which is specific for the respective directional intra-coding mode, into the respective intra-coded segment. The intra-coding modes may, for instance, also comprise one or more further modes such as a DC coding mode, according to which the prediction for the respective intra-coded block assigns a DC value to all samples within the respective intra-coded segment, and/or a planar intra-coding mode according to which the prediction of the respective block is approximated or determined to be a spatial distribution of sample values described by a two-dimensional linear function over the sample positions of the respective intra-coded block with driving tilt and offset of the plane defined by the two-dimensional linear function on the basis of the neighboring samples.
(32) Compared thereto, inter-coded blocks may be predicted, for instance, temporally. For inter-coded blocks, motion vectors may be signaled within the data stream 14, the motion vectors indicating the spatial displacement of the portion of a previously coded picture (e. g. a reference picture) of the video to which picture 12 belongs, at which the previously coded/decoded picture is sampled in order to obtain the prediction signal for the respective inter-coded block. This means, in addition to the residual signal coding comprised by data stream 14, such as the entropy-coded transform coefficient levels representing the quantized spectral-domain prediction residual signal 24″, data stream 14 may have encoded thereinto coding mode parameters for assigning the coding modes to the various blocks, prediction parameters for some of the blocks, such as motion parameters for inter-coded segments, and optional further parameters such as parameters for controlling and signaling the subdivision of picture 12 and 12′, respectively, into the segments. The decoder 20 uses these parameters to subdivide the picture in the same manner as the encoder did, to assign the same prediction modes to the segments, and to perform the same prediction to result in the same prediction signal.
(33)
(34) Again, data stream 14 may have an intra-coding mode coded thereinto for intra-coded blocks 80, which assigns one of several supported intra-coding modes to the respective intra-coded block 80. For inter-coded blocks 82, the data stream 14 may have one or more motion parameters coded thereinto. Generally speaking, inter-coded blocks 82 are not restricted to being temporally coded. Alternatively, inter-coded blocks 82 may be any block predicted from previously coded portions beyond the current picture 12 itself, such as previously coded pictures of a video to which picture 12 belongs, or picture of another view or an hierarchically lower layer in the case of encoder and decoder being scalable encoders and decoders, respectively.
(35) The prediction residual signal 24″″ in
(36)
(37) In
(38) Naturally, while transformer 28 would support all of the forward transform versions of these transforms, the decoder 20 or inverse transformer 54 would support the corresponding backward or inverse versions thereof: Inverse DCT-II (or inverse DCT-III) Inverse DST-IV Inverse DCT-IV Inverse DST-VII Identity Transformation (IT)
(39) The subsequent description provides more details on which transforms could be supported by encoder 10 and decoder 20. In any case, it should be noted that the set of supported transforms may comprise merely one transform such as one spectral-to-spatial or spatial-to-spectral transform, but it is also possible, that no transform is used by the encoder or decoder at all or for single blocks 80, 82, 84.
(40) As already outlined above,
(41)
(42) The video decoder/video encoder is configured to determine a set of search area location candidates in a reference picture 12″ of a video; match the set of search area location candidates with a current template area 110 adjacent to a current block 120 of a current picture 12′ to obtain a best matching search area location candidate; select, out of a search area positioned in the reference picture 12″ at the best matching search area location candidate or a predetermined search region within the search area, a set of one or more predictor blocks 120′ by matching the current template area 110 against the search area or the predetermined search region within the search area; and predictively decode/encode the current block 120 from/into a data stream based on the set of one or more predictor blocks 120′.
(43) In the following a plurality of details is set forth to provide a more throughout explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that specific details of embodiments of the present invention described with respect to the video decoder may be practiced by a video encoder and that specific details of embodiments of the present invention described with respect to video encoder details may be practiced by a video decoder.
(44) Template matching is a texture synthesis method used in digital image processing. According to an aspect of the invention, an advanced and faster version of this technique is developed for the inter prediction of video coding. The proposed method does, for example, not send the motion vectors to the decoder, i.e. the relative displacements of the current block on the one hand and the regions of the reference picture from which the current block is predicted on the other hand. Instead they are derived in a predetermined manner which may also be applied at the decoder. The region-based template matching (RTM) for inter prediction is described in the following:
(45) A template (110, 110′) is referred to as neighboring samples, e. g. two lines of reconstructed samples, present above and left of a block (120, 120′). That is, a template is a set of samples neighboring a certain block, which have a predetermined relative position to the block. They may, for instance, be located to the left and top of the block and cover immediate neighboring samples and, optionally, one or more further lines of samples which are one, two or any other number of samples away from the block's border. These samples of the template are typically already coded/reconstructed. This is true for blocks in a reference picture as the coding order may traverse the pictures of the video sequentially and, accordingly, the reference picture may have been coded/decoded completely in coding order before the current block is up to coding/decoding. This is true for the current block's template as well as the coding order leads generally in a raster scan order from top to bottom, leading row wise from left to right, with the current picture. Alternatives are, naturally, possible with respect to the coding order. For instance, the coding order may be a mixture of a raster scan order with respect to tree-root blocks into which the pictures are regularly partitioned in rows and columns, with traversing the leaf blocks of each recursively partitioned tree root block in a depth first traversal order.
(46) Consider a block, e. g. the current block 120, which can comprise a luma block, to be predicted, as shown in
(47) Any matric for measuring the similarity between templates can be used for template matching procedures used herein. An example for an error minimizing metric is a sum of squared differences (SSD). Same can be used to find the best match T.sub.b from the large pool of reference templates T.sub.r 110′ available in the reference pictures 12″. The reference template 110′ that gives the error value associated with highest similarity, such as the least one when using SSD, is considered as the best template match T.sub.b for the current block 120 and the block (e.g. the predictor block 120′) corresponding to the best template match is, for example, the predictor of the current block 120. It should be noted here that the search algorithm is, for example, applied at integer-sample positions.
(48) The search for the best template match T.sub.b in the reference picture 12″ is, for example, restricted or confined to a window (e. g. a search area 130), e.g. square shaped, whose central position is, for example, C. In accordance with the examples set out herein below, C is selected out of several candidate positions for C. In accordance with these embodiments, this selection is also done by template matching. The finally chosen C, i.e. the C of the search area 130 within which the actual template matching to find the one or more reference blocks is performed, can represent the best matching search area location candidate.
(49) In accordance with certain embodiments, which are further described below, the search area 130 is, for example, further partitioned into, for example, n regions in a predetermined manner. According to an embodiment, the search area 130 is partitioned, for instance, into five regions where C is the central position of the search area (see
(50) The central position, C, should be chosen wisely, since this can have a direct impact on the efficiency of the RTM prediction. A badly chosen central position can lead to poor predictors and wastage of time from the search algorithm. In order to handle this issue, the subsequently described embodiments have, for example, a separate design to decide the central position before the start of the actual search algorithm for the RTM predictors (i.e. the set of one or more predictor blocks). In this step, encoder and decoder look into a list 114 of motion vector predictors 115.sub.1 to 115.sub.n for the current block 120, wherein n can be a positive integer like, for example, 7 as illustrated in
(51) To be more precise, according to an embodiment a separate design similar to PMVD (pattern matched motion vector derivation) in Joint Exploration Model (JEM) of JVET (Joint Video Exploration Team) [11] is used to identify C (e.g. a position of C, wherein C can define a search area location candidate) in the reference picture 12″ in order to improve the efficiency of an inter RTM (region-based template matching) mode. In this step, a list of candidates for C, for example, the set of search area location candidates C.sub.1 to C.sub.7 as shown in
(52) The candidates C′.sub.1 to C′.sub.7 are compared to T.sub.c 110 to select the best one. In other words the set of search area location candidates C.sub.1 to C.sub.7 are matched with a current template area 110 adjacent to a current block 120 of a current picture 12′ to obtain the best matching search area location candidate.
(53) According to an embodiment this can mean, that one template area C′.sub.1 to C′.sub.7 at each position candidate C.sub.1 to C.sub.7 is compared to the current template area T.sub.c 110 such as the one (C′.sub.7) depicted in
(54) According to an alternative embodiment, for each candidate of the set of search area location candidates C.sub.1 to C.sub.7, several templates, like C′.sub.5a to C′.sub.5k as illustrated in
(55) Again, it should be noted that the central positioning of a candidate position C.sub.5 with respect to its associated candidate search area 130′ is merely an example. Instead, the positions C.sub.1 to C.sub.7 to which the motion vector predictor candidates 115.sub.1 to 115.sub.7 point, may be defined to form a top left most corner of its candidate search area 130′. The distribution of template areas, e.g. C′.sub.5a to C′.sub.5k, within the corresponding candidate search area 130′, may be predefined any may be equal for each position candidate C.sub.1 to C.sub.7. According to an embodiment the several templates, e.g. C′.sub.5a to C′.sub.5k, can be determined in the candidate search area 130′ by a hexagonal search as illustrated in
(56) In other words the separate design similar to PMVD in JEM of JVET is used to identify the point C of the finally chosen search area 130 in the reference picture 12″ in order to improve the efficiency of the inter RTM mode. In this step an already available predictor list for the motion vector of the current block 120 is used to generate a new list called the RTMpredList. In particular, each motion vector in the list is used to locate a candidate point for C in the reference picture. The candidate points are listed in RTMpredList. This means, that the proposed algorithm at first determines the positions C.sub.1 to C.sub.7 in the reference picture 12″ and collects them in a first list. This is done at decoder and encoder based on the predictor list for the motion vector. Then, encoder and decoder perform template matching to select one of the candidate positions in the new list. For each of the list members, one or more templates are determined so, for instance, the relative spatial arrangement relative to the respective position candidate C.sub.1 to C.sub.7 is the same among all position candidates. In
(57) In other words the video decoder and encoder can be configured to determine the set of search area location candidates using one or more motion vector predictors spatially and/or temporally predicted for the current block.
(58) The RTMpredList (i. e. the set of search area location candidates C.sub.1 to C.sub.7) is, for example, created from an advanced motion vector prediction (AMVP) [1] candidate list of the current block 120. Already decoded motion vectors (MVs) of spatial neighboring blocks (e. g. in the current picture 12′) and temporal neighboring blocks (e. g. in the reference picture 12″) are, for example, utilized for generating the AMVP list. According to an embodiment the positions in the reference picture pointed to the motion vectors in the AMVP list, mapped to integer-sample position, are used to form the RTMpredList.
(59) In other words video decoder and encoder can be configured to round the one or more motion vector predictors to integer-sample positions in order to obtain the set of search area location candidates C.sub.1 to C.sub.7.
(60) In accordance with an embodiment, if the position in the reference picture which is co-located to the position of the current template T.sub.c 110, i.e. the position in the reference picture 12″ pointed to by a zero motion vector, is not already present in the position candidate list (i. e. the set of search area location candidates C.sub.1/C′.sub.1), video encoder and decoder add same to the list before performed the selection of the best search area candidate position out of the list. In
(61) In other words the video decoder and/or video encoder can be configured to check whether a predicted search area location C.sub.7/C′.sub.7 in the reference picture 12″, colocated to the current template area 110 of the current block 120, is contained in the set of search area location candidates C.sub.1 to C.sub.7, if not, add the predicted search area location C.sub.7 to the set of search area location candidates C.sub.1 to C.sub.7.
(62) Now, the encoder or decoder needs to find the best template, i. e. the best matching search area location candidate, from this list according to an embodiment. The simplest and fastest way is to choose the template (e. g. one of the templates C′.sub.1 to C′.sub.7) that leads to the least error (SSD) with the current template 110 (see e. g.
(63) In the following embodiments details described above are described in other words:
(64) According to an embodiment the video decoder and/or video encoder is configured to, in matching the set of search area location candidates C.sub.1 to C.sub.7, especially, for example, the candidate templates C′.sub.1 to C′.sub.7, with a current template area 110, for each of the set of search area location candidates C.sub.1 to C.sub.7, determining a similarity of the reference picture 12″, at one or more positions, at (e. g. at C.sub.1 to C.sub.7) and/or around (e. g. C′.sub.5a to C′.sub.5k), the respective search area location candidate C.sub.1 to C.sub.7, to the current template area 110, appoint a search area location candidate, e. g. one of C.sub.1 to C.sub.7, the best matching search area location candidate for which the similarity is highest.
(65) According to an embodiment the video decoder and/or video encoder is configured to determine the similarity by way of a sum of squared sample differences.
(66) According to an embodiment the video decoder and/or video encoder is configured to determine the similarity at the one or more positions (at and/or around, the respective search area location candidate C.sub.1 to C.sub.7) by determining the sum of squared sample differences between the current template area 110 and a coshaped candidate template area C′.sub.1 to C′.sub.7 at the one or more positions C.sub.1 to C.sub.7 in the reference picture 12″, wherein the best matching search area location candidate is associated with a least sum of squared sample differences out of a set of sum of squared differences.
(67) Once the position of C is determined, borders of search regions in the search area can be decided or calculated as, for example, in
M=(2A.sub.1+1)+2A.sub.2+2A.sub.3, (1a) or more general:
M=(2A.sub.1+1)+2A.sub.2+2A.sub.3+2A.sub.4+ . . . +2A.sub.n, (1b) wherein n defines a number of search regions and is a positive integer.
(68) In the following embodiments details described above are described in other words:
(69) An embodiment wherein the search area 130 is subdivided into search regions (Region 1 to Region 5), and the video decoder and/or video encoder is configured to select the predetermined search region out of the search regions (e. g. one of Region 1 to Region 5) based on a signalization in the data stream; and restrict the selection of the set of one or more predictor blocks 120′, by matching the current template area 110 against the search area 130, to the predetermined search region (e. g. one of Region 1 to Region 5).
(70) An embodiment, wherein the search area 130 is subdivided into the search regions (Region 1 to Region 5) so that a first search region (Region 1) is arranged in a middle of the search area 130 and further search regions (Region 2 to Region 5) are in a manner surrounding the first search region (Region 1) (see
(71) An embodiment, wherein the search area 130 is subdivided into the search regions (Region 1 to Region 5) so that each of the further search regions (Region 2 to Region 5) extends circumferentially around the first region (Region 1) in an incomplete manner (see
(72) According to an embodiment the encoder and/or decoder can be configured to search for a best template match, by matching the current template area 110 against the search area 130, wherein a selection of the set of one or more predictor blocks can be based on the best template match. According to an embodiment The proposed RTM algorithm uses, for example, a linear search method. For example, for each search region (e. g. region 1 to region 5), the search for the best template match starts from the starting position S and progresses towards the outer borders (see
(73) For other regions also the search may be carried out in the same manner. However, only in one direction. For Region 2 and 4, the search progresses, for example, towards its bottom-right corner (see
(74) For a given region, e. g. region 3 according to
(75) According to an embodiment there will be, for example, k and/or 2k (e. g. for a uni-predictive or bi-predictive picture respectively) number of predictors for P frame (110′ and/or 120′) and B frame (110 and/or 120) respectively.
(76) In other words the video decoder and/or video encoder is configured to decode/encode the current block 120 by determining a linear combination of the set of one or more predictor blocks, e. g. the set of one or more predictor blocks 120′ in
(77) As in a typical inter prediction method, the predictors 120′.sub.1 to 120′.sub.3 for the current luma block are found through the inter RTM method. The predictors for the chroma blocks are, for example, obtained by mapping the luma block predictors to that of the chroma blocks based on the chroma sub-sampling ratio. It should be noted here that, according to an embodiment, all search algorithms related to RTM mode are applied at integer-sample positions.
(78) The inter RTM mode finds k predictors 120′.sub.1 to 120′.sub.3 from each reference list. The value of k is, for example, determined based on two criteria. First, it should be greater than 1, since for a template matching approach using multiple predictors 120′.sub.1 to 120′.sub.3 typically improves the coding performance [14], [15]. Second, the value of k should be a power of 2 for ease of hardware implementation. Thus, the value of k is chosen to be 4 in an embodiment. Nevertheless it is also possible to have 3 predictors 120′.sub.1 to 120′.sub.3 like in
(79) According to an embodiment Let P.sub.mr be a general expression for a predictor 120′.sub.1 to 120′.sub.3, where r is a reference list index with r=0, 1 and m is an index of the predictor 120′.sub.1 to 120′.sub.3 from each list with 1≤m≤k. Let optionally e.sub.mr be the SSD error associated with P.sub.mr such that e.sub.1r≤e.sub.2r≤e.sub.3r≤e.sub.4r. The proposed method sorts all the predictors 120′.sub.1 to 120′.sub.3 together based on their SSD errors, for example, in ascending order and discards those that have an error greater than a threshold error (see
(80) If P.sub.i is the set of sorted predictors (i. e. the set of one or more predictor blocks), then the prediction signal is given by (3) where w.sub.il is the corresponding weight of the predictors 120′.sub.1 to 120′.sub.3 and they are decided based on the table in
(81)
(82) In the following embodiments details described above are described in other words:
(83) According to an embodiment the video decoder and/or video encoder is configured to decode/encode the current block 120 based on an average, such as a normal average, a weighted average, or a combination of both, of the set of one or more predictor blocks P.sub.mr, or based on an average of a subset P.sub.i out of the set of one or more predictor blocks P.sub.mr with the subset P.sub.i excluding predictor blocks P from the set of one or more predictor blocks P.sub.mr whose (whose is understood as the predictor blocks to be excluded from the set of one or more predictor blocks P.sub.mr) reference template area 110′ matches with the current template area 110 more than a predetermined threshold e.sub.thres worse than that for a best matching predictor block (e. g. P.sub.1 of the sorted predictors) in the set of the one or more predictor blocks P.sub.mr.
(84) According to an embodiment the video decoder and/or video encoder is configured to, in predictively decoding/encoding the current block 120 from/into a data stream based on the set of one or more predictor blocks P.sub.mr, sort and weight the set of the one or more predictor blocks P.sub.mr (e.g. to determine the sorted set of one or more predictor blocks P.sub.1 to P.sub.n and to weight this the sorted set of one or more predictor blocks P.sub.1 to P.sub.n) based on a similarity of a reference template area 110′ of each of the predictor blocks P.sub.i and the current template area 110, and determine the current block P.sub.final 120 according to
(85)
wherein P.sub.i is a predictor block of the sorted set of one or more predictor blocks P.sub.1 to P.sub.n, wherein w.sub.il is a weighing factor applied to the predictor block P.sub.i, wherein i is an index associated with a position of the predictor block P.sub.i in the sorted set of one or more predictor blocks P.sub.1 to P.sub.n, wherein n is an index associated with a total number of predictor block P.sub.i in the sorted set of one or more predictor blocks P.sub.1 to P.sub.n, and wherein 1 is an index associated with the number of predictor blocks P.sub.i in the sorted set of one or more predictor blocks P.sub.1 to P.sub.n, whose reference template area matches with the current template area more than a predetermined threshold e.sub.thres, wherein the predetermined threshold e.sub.thres is based on the highest similarity. According to an embodiment 1=n.
(86) In other words let the predictors from a first reference list be P.sub.10, P.sub.20, P.sub.30, . . . , P.sub.k0 with SSD errors e.sub.10, e.sub.20, e.sub.30, . . . , e.sub.k0 respectively, where e.sub.10≤e.sub.20≤e.sub.30≤ . . . ≤e.sub.k0. Similarly, the predictors from a second reference list are, for example, P.sub.11, P.sub.21, P.sub.31, . . . , P.sub.k1 with SSD errors e.sub.11, e.sub.21, e.sub.31, . . . , e.sub.k1 respectively, where e.sub.11≤e.sub.21≤e.sub.31≤ . . . ≤e.sub.k1. Thus according to an embodiment the set of one or more predictors can comprise more than one list of predictors, wherein each list can comprise more than one predictor. According to an embodiment each of t lists of predictors can represent predictors selected in one reference picture out of t numbers of reference pictures. According to an embodiment each of t lists of predictors can represent predictors selected in one search region out oft numbers of search regions in the search area 130 of the reference picture 12″.
(87) The final prediction signal of the current block 120 is, for example, the weighted average of the predictors given by,
(88)
where w.sub.10, w.sub.20, . . . , w.sub.k0 are the weights associated with the predictors P.sub.10, P.sub.20, . . . , P.sub.k0 respectively and w.sub.11, w.sub.21, . . . , w.sub.k1 are the weights associated with the predictors P.sub.11, P.sub.21, . . . , P.sub.k1 respectively.
(89) According to the embodiment of
(90) The proposed method produces, for example, individual prediction signals from each region of the search area. The encoder, for example, decides the best prediction signal based on a rate-distortion optimization algorithm. The region that gives the best prediction is considered as the best region. The information related to the best region is send to the decoder. According to an aspect of the invention, the decoder reads the information related to the chosen region from the bitstream (i. e data stream) and repeats the search algorithm only in that region.
(91) The final prediction signal of the inter RTM method is given, for example, by equation (3) or equation (4). The authors have done some investigation on the value of the weights for the particular case of k=4. Based on that an adaptive averaging approach is proposed.
(92) Since k=4, the predictors from the first reference list are P.sub.10, P.sub.20, P.sub.30, P.sub.40 and that from the second reference list are P.sub.11, P.sub.21, P.sub.31, P.sub.41. The predictors from each list are already sorted based on their SSD error values. The proposed method sorts, for example, all the predictors (available from both lists, i. e. the complete set of one or more predictor blocks) based on their SSD error value in ascending order and discards those that have an error value greater than a threshold (see e. g.
(93) If, for example, the sorted predictors are P.sub.1, P.sub.2, P.sub.3, P.sub.4, P.sub.5, P.sub.6, P.sub.7, P.sub.8 and the weights associated with them are w.sub.18, w.sub.28, w.sub.38, w.sub.48, w.sub.58, w.sub.68, w.sub.78, w.sub.88 respectively, then according to eq. (3) or (4) (1=8, for 8 predictors in the set of one or more predictor blocks),
(94)
(95) Let the number of sorted predictors with SSD error value less than or equal to e.sub.thres be 1. Then the weights are assigned, for example, with values according to the table in
(96) For example, DCT-II and DST-VII transforms or inverse-transforms are used on inter RTM residual blocks by the encoder or decoder respectively. The transform choice is, for example based on the block size irrespective of the channel type as mentioned in the table of
(97) According to an embodiment the region-based template matching method is added to the bitstream, for example, as an alternative merge mode, an RTM mode 400, which the encoder may choose among one or more other modes. In the subsequently explained embodiment, besides the RTM mode 400, these are a normal merge mode 300 and a normal inter-prediction mode 200. Details are described below. In the subsequently explained embodiment, an RTM merge flag 350 is transmitted after the conventional merge flag 250 to distinguish between the available inter-prediction modes. The syntax elements (or the signaling) for the proposed mode are described in
(98) According to an embodiment the encoder calculates the rate-distortion (RD) cost for each region and compares with that of other inter methods. The method that gives the minimum cost will, for example, be finally applied to the current block 120, by an encoder as described herein, and the information related to this mode will be sent to a decoder, as described herein, for reconstruction. The commonly used cost function J is defined as
J=D+λR, (2)
where D is a distortion between an original block and the predicted blocks 120′.sub.1 to 120′.sub.3, R is a number of bits associated with the method and λ is a Lagrange parameter that determines a trade-off between D and R [1]. Alternatively a constrained rate or distortion minimization can be solved to calculate the rate-distortion (RD) cost for each region or any other known algorithm can be applied.
(99) According to an embodiment, if the mode that has the lowest cost is inter RTM, then the index 136.sub.1 to 136.sub.5 of the corresponding region is transmitted in the bitstream/data stream to the decoder. The decoder searches for template matches optionally only in that region.
(100) According to an embodiment the video decoder is configured to read a merge flag 250 from the data stream. If the merge flag 250 is in a first state (e. g. 0 in
(101) If the merge flag is in a second state (e. g. 1 in
(102) If the region-based template matching merge flag (RTM merge flag 350) is in a second state (e. g. 1 in
(103) According to an embodiment the proposed inter method can be implemented on a VVC test model (e. g. VTM version 1.0) [17], [18]. Simulations are, for example, carried out for JVET common test conditions [19] and TGM YUV 420 class of HEVC screen content coding common test conditions [20], with random access (RA) configuration (encoder_randomaccess_vtm). The experimental results for RA configurations for A.sub.1=A.sub.2=A.sub.3=4, i. e. M=25, is tabulated in the table in
(104) According to an embodiment the diagram of
(105) According to an embodiment the diagram of
(106) According to an embodiment the decoder run-time can also be seen in terms of decoder complexity. Thus the decoder run-time can by synonymous to the decoder complexity.
(107) According to an embodiment the proposed inter prediction method is implemented on NextSoftware, which is an alternative to JEM [21]. Simulations are carried out for JVET common test conditions, explained in [22], with random access (RA) configuration (encoder_randomaccess_qtbt10). The Quadtree plus Binary Tree (QTBT) tool is, for example, turned on for the tests. The QTBT is a block structuring tool which offers square and rectangle shaped blocks for coding [11]. The experimental results for RA configurations for A.sub.1=A.sub.2=A.sub.3=4, i. e. M=25, is tabulated in the table in
(108) The herein described invention proposes, for example, a decoder-side motion vector derivation method using region-based template matching. The proposed algorithm partitions, for example, the search area 130 into five regions unlike conventional template matching methods. The regions are, for example, clearly defined such that they can give independent prediction signal. Given a region at the decoder, the template matching search algorithm is, for example, carried out only in that region. A linear combination of the blocks (i. e. the set of one or more predictor blocks) found from template matching is, for example, the final prediction signal. For a specific set of region sizes 132.sub.1 to 132.sub.5, the inter region-based template matching method achieves, for example, up to −8.26% or up to −9.57% BD-rate gain with an overall gain (e. g. a Bjøntegaard Delta bit rate gain) of −3.00% or −3.15% respectively for random access configuration. According to an embodiment the inter region-based template matching method achieves, for example, a BD-rate gain up to 5% to 12%, 6% to 11% or 7% to 10% with an overall gain of 1% to 6%, 2% to 5% or 3% to 4% for random access configuration. The decoder and encoder run-time are, for example, 131% or 132% and 184% or 201% respectively. According to an embodiment the decoder run-time is, for example, in a range of 100% to 160%, 120% to 150% or 125% to 135% and the encoder run-time is, for example, in a range of 160% to 220%, 170% to 210% or 180% to 205%. The experimental results indicate that the proposed method can achieve better trade-off between coding efficiency and decoding time than conventional template matching based decoder-side motion vector derivation methods. Sub-sample refinement for inter RTM will be considered as a subject of further study. The proposed method can be tuned to different combinations of coding gain and run-time by varying the size of the regions. In other words the region sizes can be varied such that different trade-offs between coding gain and complexity can be obtained.
(109) According to an embodiment the herein described invention can be described by the following index terms:
(110) Video coding, H.265/HEVC, JVET, JEM, VVC, VTM, Inter prediction, Motion compensation, Template matching, Decoder-side motion vector derivation.
(111) The concept of inter RTM prediction is mainly explained for the case of n=5 herein. However, the idea holds for any value of n such that n is greater than 0.
(112) The search algorithms (for deciding C and for the best template matches) in the given example are carried out at integer-pel positions. Nevertheless, it can be applied to sub-pel positions.
(113) The template 110 in the given example has a width of 2 samples. However, this can be any value greater than 0. Besides the template 110 can be broadly defined as the patch present in the immediate neighborhood of the current block 120, even though in the given example the samples present above and left of the block to be predicted are considered.
(114) The experimental results indicates that the regions of inter RTM can be down-sampled individually or all together for decreasing the computational complexity with some reduction in coding performance.
(115) The inter RTM method applies averaging to its k number of predictors where k≥1. However, for the special case of screen content sequences k=1 is found to be the best option (i. e. no averaging).
(116) Further, it is found from the experimental results that varying the value of k based on the SSD error associated with P.sub.1 gives better coding performance.
(117) The normal inter prediction techniques like sub-pel refinement, sub-block refinement, filtering etc. can be applied as a post-processing stage of inter RTM mode.
(118) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
(119) The inventive encoded video signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
(120) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
(121) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(122) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(123) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(124) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(125) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
(126) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(127) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(128) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(129) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.
(130) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
(131) [1] V. Sze, M. Budagavi, and G. J. Sullivan, High Efficiency Video Coding (HEVC) Algorithms and Architectures. Springer, 2014. [2] T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, July 2003. [3] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding HEVC standard,” in IEEE Transactions on Circuits and Systems for Video Technolog, vol. 22, September 2012, pp. 1649-1668. [4] K. Sugimoto, M. Kobayashi, Y. Suzuki, S. Kato, and C. S. Boon, “Inter frame coding with template matching spatio-temporal prediction,” in IEEE International Conference on Image Processing (ICIP), Singapore, Singapore, October 2004. [5] Y. Suzuki, C. S. Boon, and T. K. Tan, “Inter frame coding with template matching averaging,” in IEEE International Conference on Image Processing (ICIP), San Antonio, Tex., USA, October 2007. [6] R. Wang, L. Huo, S. Ma, and W. Gao, “Combining template matching and block motion compensation for video coding,” in International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2010). [7] S. Kamp, M. Evertz, and M. Wien, “Decoder side motion vector derivation for inter frame video coding,” in IEEE International Conference on Image Processing (ICIP), San Diego, Calif., USA, October 2008. [8] S. Kamp and M. Wien, “Decoder-side motion vector derivation for hybrid video inter coding,” in IEEE International Conference on Multimedia and Expo (ICME), Suntec City, Singapore, July 2010. [9] , “Decoder-side motion vector derivation for block-based video cod-ing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, pp. 1732-1745, December 2012. [10] S. Kamp, B. Bross, and M. Wien, “Fast decoder side motion vector derivation for inter frame video coding,” in Picture Coding Symposium (PCS), Chicago, Ill., USA, May 2009. [11] J. Chen, E. Alshina, G. J. Sullivan, J.-R. Ohm, and J. Boyce, “Algorithm description of joint exploration test model 7 (JEM 7),” in JVET-G1001, Turin, IT, July 2017. [12] S. Esenlik, Y.-W. Chen, X. Xiu, A. Robert, X. Chen, T.-D. Chuang, B. Choi, J. Kang, and N. Park, “CE9: Summary report on decoder side my derivation,” in JVET-K0029, Ljubljana, SI, July 2018. [13] L.-Y. Wei and M. Levoy, “Fast texture synthesis using tree-structured vector quantization,” vol. 34, May 2000. [14] G. Venugopal, P. Merkle, D. Marpe, and T. Wiegand, “Fast template matching for intra prediction,” in IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017, pp. 1692-1696. [15] T. K. Tan, C. S. Boon, and Y. Suzuki, “Intra prediction by averaged template matching predictors,” in Proc. CCNC 2007, Las Vegas, Nev., USA, 2007, pp. 405-109. [16] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, July 2003, pp. 620-636. [17] B. Bross, “Versatile video coding (draft 1),” in JVET-J1001, San Diego, US, April 2018. [18] J. Chen and E. Alshina, “Algorithm description for versatile video coding and test model 1 (VTM 1),” in JVET-J1002, San Diego, US, April 2018. [19] F. Bossen, J. Boyce, K. Suehring, X. Li, and V. Seregin, “JVET common test conditions and software reference configurations,” in JVET-K1010, Ljubljana, SI, July 2018. [20] H. Yu, R. Cohen, K. Rapaka, and J. Xu, “Common test conditions for screen content coding,” in JCTVC-U1015, Warsaw, PL, June 2015. [21] M. Albrecht, C. Bartnik, S. Bosse, J. Brandenburg, B. Bross, J. Erfurt, V. George, P. Haase, P. Helle, C. Helmrich, A. Henkel, T. Hinz, S. de Luxan Hernandez, S. Kaltenstadler, H. Kirchhoffer, C. Lehmann, W.-Q. Lim, J. Ma, D. Maniry, D. Marpe, P. Merkle, T. Nguyen, J. Pfaff, J. Rasch, R. Rischke, C. Rudat, M. Schaefer, T. Schierl, H. Schwarz, M. Siekmann, R. Skupin, B. Stallenberger, J. Stegemann, K. Suehring, G. Tech, G. Venugopal, S. Walter, A. Wieckowski, T. Wiegand, and M. Winken, “Description of SDR, HDR, and 360 video coding technology proposal by Fraunhofer HHI,” in JVET-J0014-v1, San Diego, US, April 2018. [22] A. Segall, V. Baroncini, J. Boyce, J. Chen, and T. Suzuki, “Joint call for proposals on video compression with capability beyond HEVC,” in JVET-H1002 (v6), Macau, China, October 2017.