Mixed domain collaborative in-loop filter for lossy video coding
10827200 ยท 2020-11-03
Assignee
Inventors
- Victor Alexeevich Stepin (Moscow, RU)
- Roman Igorevich Chernyak (Moscow, RU)
- Ruslan Faritovich Mullakhmetov (Moscow, RU)
Cpc classification
H04N19/126
ELECTRICITY
H04N19/154
ELECTRICITY
International classification
H04N19/126
ELECTRICITY
H04N19/154
ELECTRICITY
H04N19/635
ELECTRICITY
Abstract
A video coding apparatus for encoding or decoding a frame of a video, the video coding apparatus comprising a frame reconstruction unit configured to reconstruct the frame, a parameter determination unit configured to determine one or more filter parameters, based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information, and a mixed-domain filtering unit configured to filter in a frequency domain and a pixel domain the reconstructed frame based on the determined filter parameters to obtain a filtered frame.
Claims
1. A video coding apparatus for encoding or decoding a frame of a video, the video coding apparatus comprising: a computer-readable storage medium storing program code; and a processor, wherein when executed by the processor, the program code causes the processor to: reconstruct the frame; determine one or more filter parameters, based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information; filter in a frequency domain and a pixel domain the reconstructed frame based on the determined one or more filter parameters to obtain a filtered frame; estimate the original frame from the reconstructed frame and determine the one or more first parameters based on the estimated original frame; determine the one or more filter parameters by: partitioning the estimated original frame into blocks; and for each of the blocks: determine a cluster of patches that are similar to the block; 2D-transform the cluster of patches to obtain transformed patches; and determine the one or more first parameters based on the transformed blocks determine, for each of the blocks, the one or more filter parameters based on the transformed patches by: regroup elements of the transformed patches to obtain a matrix T.sub.i, wherein each row of the matrix T.sub.i comprises frequency components with same spatial frequencies; transform the matrix T.sub.i to obtain a transformed matrix tf.sub.vw.sup.i, wherein each row of the matrix tf.sub.vw.sup.i is a 1D transform of a corresponding row of matrix T.sub.i.
2. The video coding apparatus of claim 1, wherein the program instructions further cause the processor to store the filtered frame in a decoder picture buffer for next frame prediction and to output the filtered frame.
3. The video coding apparatus of claim 1, wherein the program instructions further cause the processor to determine where filtering should be implemented based on a weighted function of a prediction improvement and an output video degradation.
4. The video coding apparatus of claim 1, wherein the program instructions further cause the processor to: store a plurality of reconstructed frames in a decoded picture buffer; and determine the one or more first parameters based on one or more frames of the decoded picture buffer.
5. The video coding apparatus of claim 1, wherein the program instructions further cause the processor to: determine a quantization noise value from the codec signaling information; and determine the one or more second parameters based on the derived quantization noise value.
6. The video coding apparatus of claim 1, wherein the program instructions further cause the processor to, for each of a set of blocks of the reconstructed frame: determine a set of patches that are similar to the block; 2D-transform the patches into the frequency domain to obtain frequency-domain patches; perform collaborative filtering of the frequency-domain patches in the frequency domain to obtain transformed frequency-domain patches; inverse 2D transform the filtered transformed frequency-domain patches in the frequency domain to obtain filtered patches; and perform collaborative filtering of the filtered patches in the pixel domain along pixel patches from different sets of patches with the same spatial coordinates.
7. The video coding apparatus of claim 6, wherein the program instructions further cause the processor to perform, for each of the blocks, the collaborative filtering based on the transformed patches by: regrouping elements of the transformed patches to obtain a matrix T.sub.i, wherein each row of the matrix T.sub.i comprises frequency components with same spatial frequencies; and transforming the matrix T.sub.i to obtain a transformed matrix, wherein each row of the matrix is a 1D transform of a corresponding row of matrix T.sub.i.
8. The video coding apparatus of claim 1, wherein for 2D-transforming the cluster of patches to obtain transformed patches, the program instructions further cause the processor to at least one of: perform the 2D transforming using a Haar wavelet transform; and perform the 1D transformation using a Hadamard transform.
9. The video coding apparatus of claim 1, wherein the program instructions further cause the processor to at least one of: use an adaptive_filtering_flag flag to indicate that a frame should be filtered; use a frame_level_usage_flag flag to indicate that the entire reconstructed frame should be filtered; use a macroblock size field to indicate a macroblock size which should be used for the filtering; and use a use_filtered_mb_flag flag to indicate whether a filtered macroblock should be used in the method.
10. A system comprising: the video coding apparatus according to claim 1, wherein the video coding apparatus further comprises: a video encoding apparatus; and a video decoding apparatus, wherein: the video encoding apparatus transfers a bitstream to the video decoding apparatus; and the program instructions further cause the processor to determine the one or more filter parameters for the video decoding apparatus in the same way as the one or more filter parameters are determined for the video encoding apparatus.
11. A method for video coding, the method comprising: reconstructing a frame of a video; determining one or more filter parameters based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information; and filtering in a frequency domain and in a pixel domain the reconstructed frame based on the determined filtering parameters to obtain a filtered frame estimating the original frame from the reconstructed frame and determine the one or more first parameters based on the estimated original frame; determining the one or more filter parameters by: partitioning the estimated original frame into blocks; and for each of the blocks: determining a cluster of patches that are similar to the block; 2D-transforming the cluster of patches to obtain transformed patches; and determining the one or more first parameters based on the transformed blocks determining, for each of the blocks, the one or more filter parameters based on the transformed patches by: regrouping elements of the transformed patches to obtain a matrix T.sub.i, wherein each row of the matrix T.sub.i comprises frequency components with same spatial frequencies; and transforming the matrix T.sub.i to obtain a transformed matrix tf.sub.vw.sup.i, wherein each row of the matrix tf.sub.vw.sup.i is a 1D transform of a corresponding row of matrix T.sub.i.
12. A non-transitory computer-readable storage medium storing program code, the program code comprising instructions that when executed by a processor, cause the processor to carry out a method for video coding comprising the steps of: reconstructing a frame of a video; determining one or more filter parameters based on one more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information; filtering in a frequency domain and in a pixel domain the reconstructed frame based on the determined filtering parameters to obtain a filtered frame estimating the original frame from the reconstructed frame and determine the one or more first parameters based on the estimated original frame; determining the one or more filter parameters by: partitioning the estimated original frame into blocks; and for each of the blocks: determining a cluster of patches that are similar to the block; 2D-transforming the cluster of patches to obtain transformed patches; and determining the one or more first parameters based on the transformed blocks determining, for each of the blocks, the one or more filter parameters based on the transformed patches by: regrouping elements of the transformed patches to obtain a matrix T.sub.i, wherein each row of the matrix T.sub.i comprises frequency components with same spatial frequencies; and transforming the matrix T.sub.i to obtain a transformed matrix tf.sub.vw.sup.i, wherein each row of the matrix t.sub.vw.sup.i is a 1D transform of a corresponding row of matrix T.sub.i.
13. The video coding apparatus of claim 1, wherein to determine, for each of the blocks, the one or more filter parameters based on the transformed patches, the program instructions further cause the processor to determine, for each of the blocks, the one or more filter parameters based on the transformed blocks by: determining the one or more filter parameters g.sub.v,w.sup.i as:
14. The video coding apparatus of claim 7, wherein to perform, for each of the blocks, the collaborative filtering based on the transformed patches, the program instructions further cause the processor to perform, for each of the blocks, the collaborative filtering based on the transformed patches by: performing filtering by multiplying each element of matrix tf.sub.vw.sup.i by a filter frequency impulse response g().sub.vw.sup.i, wherein is a column number in matrix tf.sub.vw.sup.i and spatial frequencies v,w correspond to a j-th row of matrix tf.sub.vw.sup.i.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) To illustrate the technical features of embodiments of the present invention more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present invention, modifications on these embodiments are possible without departing from the scope of the present invention as defined in the claims.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION
(15)
(16) The video coding apparatus comprises a frame reconstruction unit 110, a parameter determination unit 120 and a mixed-domain filtering unit 130.
(17) The frame reconstruction unit 110 is configured to reconstruct the frame.
(18) The parameter determination unit 120 is configured to determine one or more filter parameters, based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information.
(19) The mixed-domain filtering unit 130 configured to filter in a frequency domain and a pixel domain the reconstructed frame based on the determined filter parameters to obtain a filtered frame.
(20) The video coding apparatus 100 can be an encoder and/or a decoder.
(21)
(22)
(23) The method comprises a first step of reconstructing 310 a frame of the video.
(24) The method comprises a second step of determining 320 one or more filter parameters based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information.
(25) The method comprises a third step of filtering 330 in a frequency domain and in a pixel domain the reconstructed frame based on the determined filter parameters to obtain a filtered frame.
(26) The method 300 of
(27)
(28) Similar to ALF, the parameter estimation block 410 calculates filter parameters. But in contrast to ALF, the filter parameters are calculated without knowledge about the source (original) images. The filter parameters are estimated based on two groups of input parameters. The first group of input parameters is estimated based on the reconstructed frame and the second group of input parameters is derived from service codec parameters which are already transferred from the encoder to the decoder in a general hybrid video codec.
(29) The filter parameters can be estimated in the decoder, and therefore the filter parameters do not need to be transferred from the encoder to the decoder (in contrast to ALF). In ALF, the parameter estimation block calculates the pixel domain Impulse response but in the loop filter 400, the parameter estimation block estimates the frequency impulse response, because base filtering is performed in the frequency domain. The frequency domain implementation allows building a more efficient non-linear frequency domain filter.
(30) In contrast to ALF, which performs local filtering of the reconstructed image in the pixel domain, non-local collaborative filtering of the reconstructed image is performed in a mixed domain (spatial frequency and pixel domain). Such approach allows more efficient usage of spatial redundancy. Initial filtering is performed in the frequency domain and the final averaging is performed in the pixel domain.
(31) In contrast to ALF, the loop filter 400 does not perform matching (e.g. correlation) between filtered and original videos in order to estimate the filter parameters and therefore the filter 400 can suppress input sensor noise and improve prediction quality. Input noise is a useful signal for the end user. Therefore, the Application map block 430 during RDO process determines areas where filtering should be applied. The application map data are then provided to an entropy coder 440. The improvement in prediction and the removal of quantization noise from the decoded video is a benefit, while the degradation of the filtered decoded image is a drawback. If the benefit is significantly greater than the drawback, then filtering is applied. Otherwise the reconstructed video is used for prediction and as output for the end user. Benefit and drawback are estimated by a cost function which is a weighted sum of square errors between the original image and the filtered image and the number of bits for compressed image transmission.
(32) If there is no matching (correlation) between filtered and original signal during the filter parameter estimation, then the filter 400 can suppress sensor noise. On the one hand, if filter suppresses the sensor noise, it will improve the prediction quality. On the other hand, it will decrease the quality of the decoded signal because sensor noise is a useful signal for the end user. The application map estimates which effect is bigger. If from the point of view of correlation between the filtered and original image it is better to improve prediction, then filtering will be performed. Otherwise it is better not to perform filtering.
(33)
(34) The noise estimator 510 derives sensor and quantization noise variance as a function of the hybrid codec Quantization parameter QP.
(35) The original image estimator 520 estimates original video (source video) from reconstructed (decoded) video. Only reconstructed video and noise variance are used for the source video estimation.
(36) The filter parameter estimator 530 estimates collaborative filter parameters based on source image estimation and noise variance derived from encoder service parameters. In case the mixed domain collaborative filtering block (see
(37)
(38) As shown in
(39) The image partitioning block 610 generates a set of macroblocks (macroblocks set) which cover the source frame estimate.
(40) Then, for each macroblock from this set the block matching block 620 finds k nearest blocks using a minimum-square-error (MSE) metric. Found patches are grouped into clusters. For each patch from each cluster, a 2D transform is performed in the 2D transform block 630.
(41) For each group of frequency domain patches (each frequency domain patch is the 2D spectrum of the pixel domain patch), collaborative filter parameters are calculated in the parameters calculator block 640. For the particular case of Wiener collaborative filter the frequency impulse response is calculated in the parameters calculator block 640.
(42)
(43) Each macroblock will serve as a reference macroblock. Then for each reference macroblock k nearest blocks are found using MSE metric during block matching procedure P.sub.i=BlockMatch (,b.sub.i)={b.sub.i, p.sub.i.sup.0, p.sub.i.sup.1, p.sub.i.sup.2, . . . , p.sub.i.sup.k-1}, where is the source frame estimation from reconstructed frame and p.sub.i.sup.j is the j patch corresponding to b.sub.i reference macroblock.
(44) In the next stage, for each pixel patch (block of pixels) from the pixel domain cluster P.sub.i, 2D transform is performed. Frequency domain cluster F.sub.i including 2D spectra of pixel domain patches from pixel domain cluster P.sub.i is used for estimating collaborative filter parameters (These filter parameters are used for filtering all patches from P.sub.i). In general case filter parameters are a function of frequency domain cluster and noise variance, G.sub.i=Func(F.sub.i,N).
(45) For the particular case in which a Wiener collaborative filter is used, the frequency impulse response is determined by the following procedure. At the first step StackTransform( ) procedure is performed for each frequency domain cluster F.sub.i.
(46)
(47) The following scanning rule is used: each row of matrix T.sub.i consists of frequency components from different patches of the same frequency domain cluster F.sub.i with the same spatial frequencies [v, w]:
(48)
where nn is the total number of pixels in the respective macroblock, and k is the total number of patches in frequency domain cluster F.sub.i. In the last step of the StackTransform( ) procedure, the output matrix TF.sub.i; is created. Each row of the output matrix TF.sub.i is a 1D transform 820 of corresponding row of T.sub.i matrix.
(49) Then the frequency impulse response matrix (with dimension n*nk) of the Wiener collaborative filter is calculated based on elements of the TF.sub.i matrix. The elements g.sub.vw.sup.i() of the frequency impulse response matrix G.sup.i are determined by the following equation:
(50)
where is the column index of the matrix TF.sub.i and also of the matrix G.sup.i and wherein the pair [v,w] of spatial frequencies v and w serves as the row index of matrix TF.sub.i and also of the matrix G.sup.i. Each row of the matrix G.sup.i is an individual frequency impulse response corresponding to spatial frequencies [v, w]. Given the spatial frequencies [v, w], the column index determines the filter gain for the coefficient of transform along different patches from one cluster.
(51)
(52) As in the filter parameter estimator block, the image partitioning block 910 creates a set of macroblocks (macroblock set) which covers the reconstructed frame. Then for each reference macroblock from this set, k nearest blocks are found using MSE metric by block matching block 920. In the next step, found spatial patches are combined in a pixel domain cluster corresponding to a reference macroblock.
(53) The 2D transform block 930 applies a 2D transform to each patch in the chosen pixel domain cluster and produces a cluster in the frequency domain which consists of 2D spectra of corresponding pixel domain patches. The collaborative frequency domain filter 940 performs collaborative filtering of the 2D spectrum of the pixel domain patches using filter parameters calculated in the previous step. The inverse 2D transform block 950 transforms the filtered frequency-domain patches back to the pixel domain. The pixel-based collaborative filter 960 then performs final averaging of pixel domain patches corresponding to the reference macroblock.
(54)
(55) On the next stage for each patch from pixel domain cluster P.sub.i 2D transform is performed. Frequency domain cluster F.sub.i including 2D spectra of pixel domain patches from pixel domain cluster P.sub.i is used for collaborative filtering. In general case collaborative filter in frequency domain performs patches averaging in frequency domain and produces filtered patches in frequency domain R.sub.i=FreqCollaborativeFiltering(F.sub.i, G.sub.i) corresponding to patches in pixel domain. Inverse 2D transforms returns filtered patches in frequency domain R.sub.i to pixel domain and produces filtered patches in pixel domain {tilde over (P)}.sub.i. On the last processing stage filtered patches in frequency domain {tilde over (P)}.sub.0, {tilde over (P)}.sub.1, . . . , {tilde over (P)}.sub.M are averaged in pixel domain based on procedure SameBlockAvg( ) which will be described below.
(56)
(57)
(58) Each macroblock in the reconstructed frame can be reference for one cluster in pixel domain and secondary for other clusters in pixel domain. In each cluster averaging is performed independently and so the same patch can be filtered in different cluster by different ways. A collaborative filter in pixel domain averages the same patches (patches with fixed spatial coordinates) along all clusters which include this patch. This allows to decrease noise and introduce low edge distortion, because filtration in frequency domain allows saving source signal spectrum.
(59) As mentioned above, the mixed filter can suppress not only quantization noise but also input sensor noise because the filter parameters are estimated without matching to the original signal. But the sensor noise is useful signal for end user, so benefit from prediction signal improvement/noise suppression and decoded signal distortion should be balanced. An application map module can perform this balancing.
(60)
(61) Correspondingly, a switch 1320 chooses an output block to be either the reconstructed block or the filtered block. The decision may be made based on an RDO process. The decision may be encoded for the bitstream by an entropy encoder 1330. If the benefit from both prediction improvement and removal of quantization noise from the decoded video is significantly greater than the drawback from filtered decoded image degradation then filtering is applied. Otherwise, the reconstructed video is used for prediction and as output for end user. The application map block decisions are encoded and transferred from the encoder to the decoder.
(62) Further embodiments of the present invention may include: 1. A method and an apparatus for predictive coding a video stream of subsequent frames into an encoded video bit stream, comprising: 1) reconstructing video generator corresponding to coded video data 2) adaptive loop filter in mixed domain (spatial frequency and pixel domain) applied to reconstructing video frame for post filtering (improvement of decoded signal) and in-loop filtering (prediction improvement), where some of the filter parameters (=first subset of filter parameters) are estimated from reconstructed video signal and some of the filter parameters (=second subset of filter parameters) are derived from encoder service information which is already encoded into the bitstream and is used for encoded signal reconstruction in codecs without adaptive loop filter (in other words, there is no additional bit budget for transmitting the second subset of parameters, because these parameters are already transmitted in video codecs without loop filters). 2. Same as previous, where any frame from the Decoded Picture Buffer can be used for filter parameter estimation. 3. Same as previous, where both subsets of the adaptive loop filter parameters can be derived in the decoder and so should not be encoded into bitstream. 4. Same as previous, where an application map is implemented on the filter output for optimal tradeoff between sensor quantization noise suppression and decoded video degradation. 5. Same as previous, where filter parameter estimation is based on original image estimation from reconstructed signal and quantization noise estimation. 6. Same as previous, where original image estimation is based on reconstructed image only. 7. Same as previous, where noise estimation is a function of the encoder quantization parameter (QP) 8. Same as previous, where the collaborative adaptive loop filter in the mixed domain (spatial frequency and pixel domain) comprises the following steps (see
(63) The foregoing descriptions are only implementation manners of the present invention, the scope of the present invention is not limited to this. Any variations or replacements can be easily made through person skilled in the art. Therefore, the protection scope of the present invention should be subject to the protection scope of the attached claims.