H04N19/33

Video client optimization during pause

A system and method for providing quality control in immersive video during pausing of a video streaming session. In one embodiment, a paused video frame may comprise a plurality of mixed quality video tiles depending on user gaze vector information. Under pause control, the video quality of all tiles of the paused video frame is equalized such that it is of same value for all the video tiles, which may be the video quality of the tiles presented in a viewport of the client device. The paused video frame having the same quality tiles throughout is used as a replacement video frame, which is presented to the client device player for decoding and displaying instead of the mixed quality video frame while the streaming session is paused.

Data Stream Encoder Configuration
20230091776 · 2023-03-23 ·

A media encoder for encoding a stream of media data blocks is provided having an encoder pipeline comprising a sequence of processing modules for processing a stream of media data blocks, and a pipeline configurator configured effect a switch in the encoder pipeline from one or more first encode parameters to one or more second encode parameters. The first processing module of the pipeline can be configured to associate a trigger value with at least a first media data block processed at the first processing module in accordance with second encode parameters, the trigger value passing to subsequent modules so as to cause those modules to adopt the second encode parameters.

Data Stream Encoder Configuration
20230091776 · 2023-03-23 ·

A media encoder for encoding a stream of media data blocks is provided having an encoder pipeline comprising a sequence of processing modules for processing a stream of media data blocks, and a pipeline configurator configured effect a switch in the encoder pipeline from one or more first encode parameters to one or more second encode parameters. The first processing module of the pipeline can be configured to associate a trigger value with at least a first media data block processed at the first processing module in accordance with second encode parameters, the trigger value passing to subsequent modules so as to cause those modules to adopt the second encode parameters.

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues
11487999 · 2022-11-01 · ·

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues
11487999 · 2022-11-01 · ·

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.

Reference subpicture scaling ratios for subpictures in video coding

A video decoder can be configured to determine that a first subpicture of a current picture has associated scaling parameters; receive the associated scaling parameters for the first subpicture of the current picture in response to determining that the first subpicture of the current picture has the associated scaling parameters; determine motion information, for a block of the first subpicture of the current picture, that identifies a subpicture of a reference picture; locate a prediction block for the block of the first subpicture of the current picture in the subpicture of the reference picture; and scale the prediction block based on the associated scaling parameters for the first subpicture of the current picture.

IMAGE DISPLAY SYSTEM, MOVING IMAGE DISTRIBUTION SERVER, IMAGE PROCESSING APPARATUS, AND MOVING IMAGE DISTRIBUTION METHOD
20220353555 · 2022-11-03 · ·

A server performs a part of a forming process necessary for conversion into formats corresponding to display modes of a head mounted display and a flat-plate display connected to an image processing apparatus, and transmits a processing result to the image processing apparatus. At this time, the server switches the part of the process to transmit any one of a pair of a left-eye image and a right-eye image, an image suited for the flat-plate display, and an image constituted by a left-eye image and a right-eye image to each of which distortion for an ocular lens has been given.

MERGING FRIENDLY FILE FORMAT

Video data for deriving a spatially variable section of a scene therefrom as well as corresponding methods and apparatuses for creating video data for deriving a spatially variable section of a scene therefrom and for deriving a spatially variable section of a scene from video data. The video data comprises a set of source tracks comprising coded video data representing spatial portions of a video showing the scene and is formatted in a specific file format and supports the merging of different spatial portions into a joint bitstream through compressed-domain processing.

MERGING FRIENDLY FILE FORMAT

Video data for deriving a spatially variable section of a scene therefrom as well as corresponding methods and apparatuses for creating video data for deriving a spatially variable section of a scene therefrom and for deriving a spatially variable section of a scene from video data. The video data comprises a set of source tracks comprising coded video data representing spatial portions of a video showing the scene and is formatted in a specific file format and supports the merging of different spatial portions into a joint bitstream through compressed-domain processing.

ENCODING AND DECODING A POINT CLOUD USING PATCHES FOR IN-BETWEEN SAMPLES

At least one embodiment relates to provides a method of encoding/decoding attribute of orthogonally projected 3D samples and in-between 3D samples in which an information indicates whether at least one first attribute patch of 2D samples obtained by encoding the attribute of said at least one orthogonally projected 3D samples according to a first attribute coding mode, and at least one second attribute patch of 2D samples of an image obtained by encoding the attribute of at least one in-between 3D sample, are stored in separate images.