H04N19/19

Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA)

Video quality analysis may be used in many multimedia transmission and communication applications, such as encoder optimization, stream selection, and/or video reconstruction. An objective VQA metric that accurately reflects the quality of processed video relative to a source unprocessed video may take into account both spatial measures and temporal, motion-based measures when evaluating the processed video. Temporal measures may include differential motion metrics indicating a difference between a frame difference of a plurality of frames of the processed video relative to that of a corresponding plurality of frames of the source video. In addition, neural networks and deep learning techniques can be used to develop additional improved VQA metrics that take into account both spatial and temporal aspects of the processed and unprocessed videos.

METHOD AND APPARATUS FOR VARIABLE RATE COMPRESSION WITH A CONDITIONAL AUTOENCODER

A method and apparatus for variable rate compression with a conditional autoencoder is herein provided. According to one embodiment, a method includes training a conditional autoencoder using a Lagrange multiplier and training a neural network that includes the conditional autoencoder with mixed quantization bin sizes.

METHOD AND APPARATUS FOR VARIABLE RATE COMPRESSION WITH A CONDITIONAL AUTOENCODER

A method and apparatus for variable rate compression with a conditional autoencoder is herein provided. According to one embodiment, a method includes training a conditional autoencoder using a Lagrange multiplier and training a neural network that includes the conditional autoencoder with mixed quantization bin sizes.

Receptive-field-conforming convolutional models for video coding
10869036 · 2020-12-15 · ·

A convolutional neural network (CNN) for determining a partitioning of a block is disclosed. The block is of size NN and a smallest partition is of size SS. The CNN includes feature extraction layers; a concatenation layer that receives, from the feature extraction layers, first feature maps of the block, where each first feature map is of size SS; and classifiers. Each classifier includes classification layers, each classification layer receives second feature maps having a respective feature dimension. Each classifier is configured to infer partition decisions for sub-blocks of size (S)(S) of the block, wherein is a power of 2 and =2, . . . , N/S, by: applying, at some of successive classification layers of the classification layers, a kernel of size 11 to reduce the respective feature dimension in half; and outputting by a last layer of the classification layers an output corresponding to a N/(S)N/(S)1 output map.

Receptive-field-conforming convolutional models for video coding
10869036 · 2020-12-15 · ·

A convolutional neural network (CNN) for determining a partitioning of a block is disclosed. The block is of size NN and a smallest partition is of size SS. The CNN includes feature extraction layers; a concatenation layer that receives, from the feature extraction layers, first feature maps of the block, where each first feature map is of size SS; and classifiers. Each classifier includes classification layers, each classification layer receives second feature maps having a respective feature dimension. Each classifier is configured to infer partition decisions for sub-blocks of size (S)(S) of the block, wherein is a power of 2 and =2, . . . , N/S, by: applying, at some of successive classification layers of the classification layers, a kernel of size 11 to reduce the respective feature dimension in half; and outputting by a last layer of the classification layers an output corresponding to a N/(S)N/(S)1 output map.

Rate/distortion/RDcost modeling with machine learning
10848765 · 2020-11-24 · ·

A method for encoding a block of a video stream includes generating, using pixel values of the block, block features for the block; for each candidate encoding mode of candidate encoding modes, generating, using the block features and the each candidate encoding mode as inputs to a machine-learning module, a respective encoding cost; selecting, based on the respective encoding costs, a predetermined number of the candidate encoding modes; selecting, based on the respective encoding costs of the at least some encoding modes, a best mode for encoding the block; and encoding, in a compressed bitstream, the block using the best mode.

Method and apparatus for encoding a picture

A method and an apparatus for encoding a picture are disclosed. Such method comprises: determining a rate-distortion cost (41) for a current block of said picture when said current block is not split into subblocks, taking into account quantization parameters assigned to the subblocks of said current block, determining (42) whether said current block is split or not according at least to the determined rate-distortion cost, encoding (43) the current block according to the result of determining whether a current block is split or not into subblocks.

Mixed domain collaborative in-loop filter for lossy video coding

A video coding apparatus for encoding or decoding a frame of a video, the video coding apparatus comprising a frame reconstruction unit configured to reconstruct the frame, a parameter determination unit configured to determine one or more filter parameters, based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information, and a mixed-domain filtering unit configured to filter in a frequency domain and a pixel domain the reconstructed frame based on the determined filter parameters to obtain a filtered frame.

Mixed domain collaborative in-loop filter for lossy video coding

A video coding apparatus for encoding or decoding a frame of a video, the video coding apparatus comprising a frame reconstruction unit configured to reconstruct the frame, a parameter determination unit configured to determine one or more filter parameters, based on one or more first parameters which are based on the reconstructed frame and one or more second parameters which are based on codec signaling information, and a mixed-domain filtering unit configured to filter in a frequency domain and a pixel domain the reconstructed frame based on the determined filter parameters to obtain a filtered frame.

Method for inter prediction and device therefor, and method for motion compensation and device therefor

Provided are an inter prediction method and a motion compensation method. The inter prediction method includes: performing inter prediction on a current image by using a long-term reference image stored in a decoded picture buffer; determining residual data and a motion vector of the current image generated via the inter prediction; and determining least significant bit (LSB) information as a long-term reference index indicating the long-term reference image by dividing picture order count (POC) information of the long-term reference image into most significant bit (MSB) information and the LSB information.