H04N19/19

Video fidelity measure

A video fidelity measure is determined for a video sequence (1) by determining distorted and original difference pictures (30, 40) as pixel-wise differences between pixels (14, 24) in a distorted picture (10) and corresponding pixels (24) in an original picture (20) and between pixels in a preceding distorted picture (11) and corresponding pixels in a preceding original picture (21). First and second maps representing distortions in pixel values between the distorted and original pictures (10, 20) and between distorted and original difference pictures (30, 40) are determined. Third and sixth maps are determined as respective aggregations of local variabilities in pixels values in the distorted and original pictures (10, 20) and local variabilities in pixels values in the distorted and original difference pictures (30, 40), respectively. The video fidelity measure is then determined based on the first to third and sixth maps.

METHOD AND APPARATUS FOR APPLYING DEEP LEARNING TECHNIQUES IN VIDEO CODING, RESTORATION AND VIDEO QUALITY ANALYSIS (VQA)
20220239925 · 2022-07-28 ·

Video quality analysis may be used in many multimedia transmission and communication applications, such as encoder optimization, stream selection, and/or video reconstruction. An objective VQA metric that accurately reflects the quality of processed video relative to a source unprocessed video may take into account both spatial measures and temporal, motion-based measures when evaluating the processed video. Temporal measures may include differential motion metrics indicating a difference between a frame difference of a plurality of frames of the processed video relative to that of a corresponding plurality of frames of the source video. In addition, neural networks and deep learning techniques can be used to develop additional improved VQA metrics that take into account both spatial and temporal aspects of the processed and unprocessed videos.

Methods and devices for coding and decoding a data stream representative of at least one image
11394964 · 2022-07-19 · ·

Coding and decoding methods for coding and decoding a coded data stream representing at least one image split into blocks. The method includes, for a current block, determining if the current block is coded according to an intra coding mode or another coding mode, the intra coding mode using an intra prediction mode selected from a group of intra prediction modes, according to an intra prediction mode associated with a neighbouring block of the current block. When coded according to the intra coding mode, an intra prediction mode in the group is determined for the current block, according to an intra prediction mode associated with a previously decoded block of the image. The current block is decoded according to the determined intra prediction mode associated with the current block. When coded according to the other coding mode, the current block is decoded according to the other coding mode.

MACHINE LEARNING BASED RATE-DISTORTION OPTIMIZER FOR VIDEO COMPRESSION

Systems and techniques are described for data encoding using a machine learning approach to generate a distortion prediction {circumflex over (D)} and a predicted bit rate {circumflex over (R)}, and to use {circumflex over (D)} and {circumflex over (R)} to perform rate-distortion optimization (RDO). For example, a video encoder can generate the distortion prediction {circumflex over (D)} and the bit rate residual prediction custom-character based on outputs of the one or more neural networks in response to the one or more neural networks receiving a residual portion of a block of a video frame as input. The video encoder can determine bit rate metadata prediction custom-character based on metadata associated with a mode of compression, and determine {circumflex over (R)} to be the sum of custom-character and custom-character. The video encoder can determine a rate-distortion cost prediction Ĵ as a function of {circumflex over (D)} and {circumflex over (R)}, and can determine a prediction mode for compressing the block based on Ĵ.

MACHINE LEARNING BASED RATE-DISTORTION OPTIMIZER FOR VIDEO COMPRESSION

Systems and techniques are described for data encoding using a machine learning approach to generate a distortion prediction {circumflex over (D)} and a predicted bit rate {circumflex over (R)}, and to use {circumflex over (D)} and {circumflex over (R)} to perform rate-distortion optimization (RDO). For example, a video encoder can generate the distortion prediction {circumflex over (D)} and the bit rate residual prediction custom-character based on outputs of the one or more neural networks in response to the one or more neural networks receiving a residual portion of a block of a video frame as input. The video encoder can determine bit rate metadata prediction custom-character based on metadata associated with a mode of compression, and determine {circumflex over (R)} to be the sum of custom-character and custom-character. The video encoder can determine a rate-distortion cost prediction Ĵ as a function of {circumflex over (D)} and {circumflex over (R)}, and can determine a prediction mode for compressing the block based on Ĵ.

Temporal domain rate distortion optimization based on video content characteristic and QP-λcorrection

A temporal domain rate distortion optimization based on video content characteristic and QP-λ correction provides the temporal domain rate distortion optimization based on the video content characteristic and the QP-λ correction for a new generation encoder AV1, wherein according to a previous temporal domain dependency relationship under an HEVC-RA coding structure, a feature of the new generation encoder AV1 and a video sequence feature, an aggregation distortion of a current coding unit and an affected future coding unit is estimated and ta propagation factor of the current coding unit in a temporal domain distortion propagation model is calculated by constructing a temporal domain distortion propagation chain, wherein a Lagrange multiplier is adjusted through a more accurate propagation factor to realize a temporal domain dependency rate distortion optimization, and a relationship of QP-λ is re-corrected and an I frame is adjusted to achieve a better coding effect.

METHOD AND APPARATUS FOR VIDEO CODING
20220224883 · 2022-07-14 · ·

Aspects of the disclosure include methods, apparatuses, and non-transitory computer-readable storage mediums for video encoding/decoding. An apparatus includes processing circuitry that decodes a video bitstream to obtain a reduced-resolution residual block for a current block. The processing circuitry determines that a block level flag is set to a pre-defined value. The pre-defined value indicates that the current block is coded in reduced-resolution coding. Based on the block level flag, the processing circuitry generates a reduced-resolution prediction block for the current block by down-sampling a full-resolution reference block of the current block. The processing circuitry generates a reduced-resolution reconstruction block for the current block based on the reduced-resolution prediction block and the reduced-resolution residual block. The processing circuitry generates a full-resolution reconstruction block for the current block by up-sampling the reduced-resolution reconstruction block.

MOTION-COMPENSATED COMPRESSION OF DYNAMIC VOXELIZED POINT CLOUDS

Disclosed herein are exemplary embodiments of innovations in the area of point cloud encoding and decoding. Example embodiments can reduce the computational complexity and/or computational resource usage during 3D video encoding by selectively encoding one or more 3D-point-cloud blocks using an inter-frame coding (e.g., motion compensation) technique that allows for previously encoded/decoded frames to be used in predicting current frames being encoded. Alternatively, one or more 3D-point-cloud block can be encoded using an intra-frame encoding approach. The selection of which encoding mode to use can be based, for example, on a threshold that is evaluated relative to rate-distortion performance for both intra-frame and inter-frame encoding. Still further, embodiments of the disclosed technology can use one or more voxel-distortion-correction filters to correct distortion errors that may occur during voxel compression. Such filters are uniquely adapted for the particular challenges presented when compressing 3D image data. Corresponding decoding techniques are also disclosed.

MOTION-COMPENSATED COMPRESSION OF DYNAMIC VOXELIZED POINT CLOUDS

Disclosed herein are exemplary embodiments of innovations in the area of point cloud encoding and decoding. Example embodiments can reduce the computational complexity and/or computational resource usage during 3D video encoding by selectively encoding one or more 3D-point-cloud blocks using an inter-frame coding (e.g., motion compensation) technique that allows for previously encoded/decoded frames to be used in predicting current frames being encoded. Alternatively, one or more 3D-point-cloud block can be encoded using an intra-frame encoding approach. The selection of which encoding mode to use can be based, for example, on a threshold that is evaluated relative to rate-distortion performance for both intra-frame and inter-frame encoding. Still further, embodiments of the disclosed technology can use one or more voxel-distortion-correction filters to correct distortion errors that may occur during voxel compression. Such filters are uniquely adapted for the particular challenges presented when compressing 3D image data. Corresponding decoding techniques are also disclosed.

METHOD FOR INTER PREDICTION AND DEVICE THEREFOR, AND METHOD FOR MOTION COMPENSATION AND DEVICE THEREFOR

Provided are an inter prediction method and a motion compensation method. The inter prediction method includes: performing inter prediction on a current image by using a long-term reference image stored in a decoded picture buffer; determining residual data and a motion vector of the current image generated via the inter prediction; and determining least significant bit (LSB) information as a long-term reference index indicating the long-term reference image by dividing picture order count (POC) information of the long-term reference image into most significant bit (MSB) information and the LSB information.