H04N19/29

Low-complexity two-dimensional (2D) separable transform design with transpose buffer management
11785224 · 2023-10-10 · ·

Methods are provided for reducing the size of a transpose buffer used for computation of a two-dimensional (2D) separable transform. Scaling factors and clip bit widths determined for a particular transpose buffer size and the expected transform sizes are used to reduce the size of the intermediate results of applying the 2D separable transform. The reduced bit widths of the intermediate results may vary across the intermediate results. In some embodiments, the scaling factors and associated clip bit widths may be adapted during encoding.

Sub-picture Sizing In Video Coding
20230319299 · 2023-10-05 ·

A video coding mechanism is disclosed. The mechanism includes receiving a bitstream comprising one or more sub-pictures partitioned from a picture such that each sub-picture includes a sub-picture width that is an integer multiple of a coding tree unit (CTU) size when the each sub-picture includes a right boundary that does not coincide with a right boundary of the picture. The bitstream is parsed to obtain the one or more sub-pictures. The one or more sub-pictures are decoded to create a video sequence. The video sequence is forwarded for display.

Adaptive resolution of point cloud and viewpoint prediction for video streaming in computing environments

A mechanism is described for facilitating adaptive resolution and viewpoint-prediction for immersive media in computing environments. An apparatus of embodiments, as described herein, includes one or more processors to receive viewing positions associated with a user with respect to a display, and analyze relevance of media contents based on the viewing positions, where the media content includes immersive videos of scenes captured by one or more cameras. The one or more processors are further to predict portions of the media contents as relevant portions based on the viewing positions and transmit the relevant portions to be rendered and displayed.

Scalable Per-Title Encoding
20230118010 · 2023-04-20 · ·

A scalable per-title encoding technique may include detecting scene cuts in an input video received by an encoding network or system, generating segments of the input video, performing per-title encoding of a segment of the input video, training a deep neural network (DNN) for each representation of the segment, thereby generating a trained DNN, compressing the trained DNN, thereby generating a compressed trained DNN, and generating an enhanced bitrate ladder including metadata comprising the compressed trained DNN. In some embodiments, the method also may include generating a base layer bitrate ladder for CPU devices, and providing the enhanced bitrate ladder for GPU-available devices.

Scalable Per-Title Encoding
20230118010 · 2023-04-20 · ·

A scalable per-title encoding technique may include detecting scene cuts in an input video received by an encoding network or system, generating segments of the input video, performing per-title encoding of a segment of the input video, training a deep neural network (DNN) for each representation of the segment, thereby generating a trained DNN, compressing the trained DNN, thereby generating a compressed trained DNN, and generating an enhanced bitrate ladder including metadata comprising the compressed trained DNN. In some embodiments, the method also may include generating a base layer bitrate ladder for CPU devices, and providing the enhanced bitrate ladder for GPU-available devices.

Encoders, methods and display apparatuses incorporating gaze-directed compression ratios
11567567 · 2023-01-31 · ·

An encoder for encoding images. The encoder includes processor. The processor is configured to: receive, from display apparatus, information indicative of at least one of: head pose of user, gaze direction of user; identify gaze location in input image, based on the at least one of: head pose, gaze direction; divide input image into first input portion and second input portion, wherein first input portion includes and surrounds gaze location; and encode first input portion and second input portion at first compression ratio and at least one second compression ratio to generate first encoded portion and second encoded portion, respectively, wherein at least one second compression ratio is larger than first compression ratio.

Medical imaging device for spatially resolved recording of multispectral video data

A medical imaging device configured to spatially resolve recording of multispectral video data of an examination area of a patient including a light source having multiple optical emitters with different wavelengths in the visible and NIR spectral range. The light source has an emitter whose wavelength lies in the range of ±50% of its half-width around the intersection of the blue and green filter curves or the green and red filter curves, and the exposure control and the data processing means are arranged to separately detect the affected two of the red and green or the green and blue colour signals in an exposure pattern with activation of the emitter at the intersection point and to evaluate them in the multispectral analysis with mutually different wavelengths shifted by the two affected filter curves as two supporting point wavelengths.

FRAMEWORK FOR VIDEO CONFERENCING BASED ON FACE RESTORATION
20220217371 · 2022-07-07 · ·

There is included a method and apparatus comprising computer code configured to cause a processor or processors to perform obtaining video data, detecting at least one face from at least one frame of the video data, determining a set of facial landmark features of the at least one face from the at least one frame of the video data, and coding the video data at least partly by a neural network based on the determined set of facial landmark features.

FRAMEWORK FOR VIDEO CONFERENCING BASED ON FACE RESTORATION
20220217371 · 2022-07-07 · ·

There is included a method and apparatus comprising computer code configured to cause a processor or processors to perform obtaining video data, detecting at least one face from at least one frame of the video data, determining a set of facial landmark features of the at least one face from the at least one frame of the video data, and coding the video data at least partly by a neural network based on the determined set of facial landmark features.

Forward error correction using source blocks with symbols from at least two datastreams with synchronized start symbol identifiers among the datastreams

A forward error correction (FEC) data generator has an input for at least two datastreams for which FEC data shall be generated in a joint manner, each datastream having a plurality of symbols. A FEC data symbol is based on a FEC source block possibly having a subset of symbols of the at least two data streams. The FEC data generator further has a signaling information generator configured to generate signaling information for the FEC data symbol regarding which symbols within the at least two datastreams belong to the corresponding source block by determining pointers to start symbols within a first and a second datastream, respectively, of the at least two datastreams and a number of symbols within the first datastream and second datastreams, respectively, that belong to the corresponding source block.