H04N19/31

IMAGE INFORMATION DECODING METHOD, IMAGE DECODING METHOD, AND DEVICE USING SAME

The present invention relates to an image information decoding method. The decoding method includes receiving a bit stream including a Network Abstraction Layer (NAL) unit that includes information related to encoded image, and parsing a NAL unit header of the NAL unit. The NAL unit header may not include 1 bit flag information that represents whether a picture is a non-reference picture or a reference picture in the entire bit stream during encoding.

Systems and Methods for Low Resolution Motion Estimation Searches
20230096682 · 2023-03-30 ·

A video encoding system encodes source image data corresponding with an image includes a low resolution pipeline that receives the source image data corresponding with a first coding block in the image. The low resolution pipeline includes a low resolution motion estimation block programmed to generate a first downscaled coding block by downscaling resolution of the source image data corresponding with the first coding block. The first downscaled coding block comprises a first downscaled prediction block corresponding with a first prediction block in the first coding block. The low resolution pipeline may also perform several low resolution motion estimation searches to generate motion vector candidates. The video encoding system also includes a main pipeline that receives the source image data and determines encoding parameters to be used to encode the first coding block based at least partially on the motion vector candidates.

Systems and Methods for Low Resolution Motion Estimation Searches
20230096682 · 2023-03-30 ·

A video encoding system encodes source image data corresponding with an image includes a low resolution pipeline that receives the source image data corresponding with a first coding block in the image. The low resolution pipeline includes a low resolution motion estimation block programmed to generate a first downscaled coding block by downscaling resolution of the source image data corresponding with the first coding block. The first downscaled coding block comprises a first downscaled prediction block corresponding with a first prediction block in the first coding block. The low resolution pipeline may also perform several low resolution motion estimation searches to generate motion vector candidates. The video encoding system also includes a main pipeline that receives the source image data and determines encoding parameters to be used to encode the first coding block based at least partially on the motion vector candidates.

Method and System for Encoding a 3D Scene
20220353530 · 2022-11-03 ·

A computer-implemented method for encoding a scene volume includes: (a) identifying features of a scene volume that are within a camera perspective range with respect to a default camera perspective; (b) converting the identified features into rendered features; and (c) sorting the rendered features into a plurality of scene layers, each including corresponding depth, color, and transparency maps for the respective rendered features. Further, (a), (b), and (c) may be repeated, operating on temporally ordered scene volumes, to produce and output a sequence encoding a video. Corresponding systems and non-transitory computer-readable media are disclosed for encoding a 3D scene and for decoding an encoded 3D scene. Efficient compression, transmission, and playback of video describing a 3D scene can be enabled, including for virtual reality displays with updates based on a changing perspective of a user viewer for variable-perspective playback.

Method and System for Encoding a 3D Scene
20220353530 · 2022-11-03 ·

A computer-implemented method for encoding a scene volume includes: (a) identifying features of a scene volume that are within a camera perspective range with respect to a default camera perspective; (b) converting the identified features into rendered features; and (c) sorting the rendered features into a plurality of scene layers, each including corresponding depth, color, and transparency maps for the respective rendered features. Further, (a), (b), and (c) may be repeated, operating on temporally ordered scene volumes, to produce and output a sequence encoding a video. Corresponding systems and non-transitory computer-readable media are disclosed for encoding a 3D scene and for decoding an encoded 3D scene. Efficient compression, transmission, and playback of video describing a 3D scene can be enabled, including for virtual reality displays with updates based on a changing perspective of a user viewer for variable-perspective playback.

Event-based adaptation of coding parameters for video image encoding

The present disclosure relates to encoding of video image using coding parameters, which are adapted based on events related to motion within the video image. Image content is captured by a standard image sensor and an event-triggered sensor, providing an event-signal indicating changes (e.g. amount and time-spatial location) of image intensity. Objects are detected within the video image, based on the event signal assessing motion of the object, and their textures extracted. The spatial-time coding parameters of the video image are determined based on the location and strength of the event signal, and the extent to which the detected objects moves.

SYSTEMS AND METHODS FOR SIGNALING DECODED PICTURE BUFFER INFORMATION IN VIDEO CODING
20220353535 · 2022-11-03 ·

A device may be configured to signal decoded picture buffer information according to one or more of the techniques described herein.

SYSTEMS AND METHODS FOR SIGNALING DECODED PICTURE BUFFER INFORMATION IN VIDEO CODING
20220353535 · 2022-11-03 ·

A device may be configured to signal decoded picture buffer information according to one or more of the techniques described herein.

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues
11487999 · 2022-11-01 · ·

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues
11487999 · 2022-11-01 · ·

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.