H04N19/90

Point Cloud Compression
20230099049 · 2023-03-30 · ·

A system comprises an encoder configured to compress attribute information for a point cloud and/or a decoder configured to decompress compressed attribute information for the point cloud. Attribute values for at least one starting point are included in a compressed attribute information file and attribute correction values used to correct predicted attribute values are included in the compressed attribute information file. Attribute values are predicted based, at least in part, on attribute values of neighboring points and distances between a particular point for whom an attribute value is being predicted and the neighboring points. The predicted attribute values are compared to attribute values of a point cloud prior to compression to determine attribute correction values. A decoder follows a similar prediction process as an encoder and corrects predicted values using attribute correction values included in a compressed attribute information file.

Point Cloud Compression
20230099049 · 2023-03-30 · ·

A system comprises an encoder configured to compress attribute information for a point cloud and/or a decoder configured to decompress compressed attribute information for the point cloud. Attribute values for at least one starting point are included in a compressed attribute information file and attribute correction values used to correct predicted attribute values are included in the compressed attribute information file. Attribute values are predicted based, at least in part, on attribute values of neighboring points and distances between a particular point for whom an attribute value is being predicted and the neighboring points. The predicted attribute values are compared to attribute values of a point cloud prior to compression to determine attribute correction values. A decoder follows a similar prediction process as an encoder and corrects predicted values using attribute correction values included in a compressed attribute information file.

ENCODING METHOD AND DEVICE THEREFOR, AND DECODING METHOD AND DEVICE THEREFOR

A video decoding method includes determining, based on an area of a current block, whether a multi-prediction combination mode for predicting the current block by combining prediction results obtained according to a plurality of prediction modes is applied to the current block, when the multi-prediction combination mode is applied to the current block, determining the plurality of prediction modes to be applied to the current block, generating a plurality of prediction blocks of the current block, according to the plurality of prediction modes, and determining a combined prediction block of the current block, by combining the plurality of prediction blocks according to respective weights.

CONTENT-ADAPTIVE ONLINE TRAINING WITH FEATURE SUBSTITUTION IN NEURAL IMAGE COMPRESSION
20220353512 · 2022-11-03 · ·

Aspects of the disclosure provide a method and an apparatus for video encoding. The apparatus includes processing circuitry configured to generate an initial feature representation from an input image to be encoded and perform an iterative update of values of a plurality of elements in the initial feature representation. The iterative update includes generate a coded representation corresponding to a final feature representation based on the final feature representation that has been updated from the initial feature representation by a number of iterations of the iterative update. A reconstructed image corresponding to the final feature representation is generated based on the coded representation. An encoded image corresponding to the final feature representation having updated values of the plurality of elements is generated. One of (i) a rate-distortion loss corresponding to the final feature representation or (ii) the number of iterations of the iterative update satisfies a pre-determined condition.

Machine learning video processing systems and methods
11616960 · 2023-03-28 · ·

System and method for improving video encoding and/or video decoding. In embodiments, a video encoding pipeline includes a main encoding pipeline that compresses source image data corresponding with an image frame by processing the source image data based at least in part on encoding parameters to generate encoded image data. Additionally the video encoding pipeline includes a machine learning block communicatively coupled to the main encoding pipeline, in which the machine learning block analyzes content of the image frame by processing the source image data based at least in part on machine learning parameters implemented in the machine learning block when the machine learning block is enabled by the encoding parameters; and the video encoding pipeline adaptively adjusts the encoding parameters based at least in part on the content expected to be present in the image frame to facilitate improving encoding efficiency.

Machine learning video processing systems and methods
11616960 · 2023-03-28 · ·

System and method for improving video encoding and/or video decoding. In embodiments, a video encoding pipeline includes a main encoding pipeline that compresses source image data corresponding with an image frame by processing the source image data based at least in part on encoding parameters to generate encoded image data. Additionally the video encoding pipeline includes a machine learning block communicatively coupled to the main encoding pipeline, in which the machine learning block analyzes content of the image frame by processing the source image data based at least in part on machine learning parameters implemented in the machine learning block when the machine learning block is enabled by the encoding parameters; and the video encoding pipeline adaptively adjusts the encoding parameters based at least in part on the content expected to be present in the image frame to facilitate improving encoding efficiency.

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues
11487999 · 2022-11-01 · ·

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.

METHOD, APPARATUS, AND STORAGE MEDIUM FOR COMPRESSING FEATURE MAP

Disclosed herein are a method, an apparatus and a storage medium for processing a feature map. An encoding method for a feature map includes configuring a feature frame for feature maps, and generating encoded information by performing encoding on the feature frame. A decoding method for a feature map includes reconstructing a feature frame by performing decoding on encoded information, and reconstructing feature maps using the feature frame. A feature frame is configured using feature maps, and compression using a video compression codec or a deep learning-based image compression method is applied to the feature frame.

Deep loop filter by temporal deformable convolution
11601661 · 2023-03-07 · ·

A method, apparatus and storage medium for performing video coding are provided. The method includes obtaining a plurality of image frames in a video sequence; determining a feature map for each of the plurality of image frames and determining an offset map based on the feature map; determining an aligned feature map by performing a temporal deformable convolution (TDC) on the feature map and the offset map; and generating a plurality of aligned frames based on the aligned feature map.

Method and system for lossy image or video encoding, transmission and decoding
11599972 · 2023-03-07 · ·

There is provided a method for lossy image or video encoding and transmission, including the steps of receiving an input image at a first computer system, encoding the input image using a first trained neural network to produce a latent representation, performing a quantization process on the latent representation to produce a quantized latent, and transmitting the quantized latent to a second computer system.