Patent classifications
H04N19/29
Layer-based video encoding
A technique for encoding a video signal generates multiple layers and multiple corresponding masks for each of a set of blocks of the video signal. Each of the layers for a given block is a rendition of that block, and each of the masks distinguishes pixels of the respective layer that are relevant in reconstructing the block from pixels that are not. The encoder applies lossy compression to each of the layers and transmits the lossily compressed layers and a set of the masks to a decoder, such that the decoder may reconstruct the respective block from the layers and the mask(s).
VIDEO ANALYTICS USING SCALABLE VIDEO CODING
In one embodiment, a computing device includes processing circuitry and interface circuitry. The processing circuitry receives an encoded video stream via the interface circuitry, which contains video encoded within multiple layers corresponding to different video resolutions, including a base layer and one or more enhancement layers. The base layer encodes the video at a base resolution and the one or more enhancement layers encode the video at one or more enhanced resolutions higher than the base resolution. The processing circuitry extracts the base layer from the encoded video stream and decodes the video at the base resolution from the base layer. The processing circuitry then detects content in the video at the base resolution and generates metadata indicating the content detected in the video.
VIDEO ANALYTICS USING SCALABLE VIDEO CODING
In one embodiment, a computing device includes processing circuitry and interface circuitry. The processing circuitry receives an encoded video stream via the interface circuitry, which contains video encoded within multiple layers corresponding to different video resolutions, including a base layer and one or more enhancement layers. The base layer encodes the video at a base resolution and the one or more enhancement layers encode the video at one or more enhanced resolutions higher than the base resolution. The processing circuitry extracts the base layer from the encoded video stream and decodes the video at the base resolution from the base layer. The processing circuitry then detects content in the video at the base resolution and generates metadata indicating the content detected in the video.
ADAPTIVE RESOLUTION OF POINT CLOUD AND VIEWPOINT PREDICTION FOR VIDEO STREAMING IN COMPUTING ENVIRONMENTS
A mechanism is described for facilitating adaptive resolution and viewpoint-prediction for immersive media in computing environments. An apparatus of embodiments, as described herein, includes one or more processors to receive viewing positions associated with a user with respect to a display, and analyze relevance of media contents based on the viewing positions, where the media content includes immersive videos of scenes captured by one or more cameras. The one or more processors are further to predict portions of the media contents as relevant portions based on the viewing positions and transmit the relevant portions to be rendered and displayed.
ENCODERS, METHODS AND DISPLAY APPARATUSES INCORPORATING GAZE-DIRECTED COMPRESSION
An encoder for encoding images. The encoder includes processor. The processor is configured to: receive, from display apparatus, information indicative of at least one of: head pose of user, gaze direction of user; identify gaze location in input image, based on the at least one of: head pose, gaze direction; divide input image into first input portion and second input portion, wherein first input portion includes and surrounds gaze location; and encode first input portion and second input portion at first compression ratio and at least one second compression ratio to generate first encoded portion and second encoded portion, respectively, wherein at least one second compression ratio is larger than first compression ratio.
Improved Split Rendering for Extended Reality (XR) Applications
A method (20, 40) for reducing undesirable visual effects, such as judder, in a video image having one or more frames is disclosed. A server node (16) generates graphics layers from 3D objects, and augments the layers with the Z-layer information and motion information for the objects. The server node then groups (28, 30, 32) the graphics layers, encodes (34) the groups into a video stream such that every video frame of the stream is a composite video frame of the graphics layers, adds (36) the motion information to the composite video frame, and sends (38) the stream to a client device (18). Upon receipt, the client device extracts (44) the motion information, compensates the positions of the graphics layers using the Z-layer information and the motion information, and applies a selected positional time-warp algorithm to compensate for the translational and rotational movement of the user’s head (50). The resultant layers are then combined into a video image and rendered (52) for the user.
Improved Split Rendering for Extended Reality (XR) Applications
A method (20, 40) for reducing undesirable visual effects, such as judder, in a video image having one or more frames is disclosed. A server node (16) generates graphics layers from 3D objects, and augments the layers with the Z-layer information and motion information for the objects. The server node then groups (28, 30, 32) the graphics layers, encodes (34) the groups into a video stream such that every video frame of the stream is a composite video frame of the graphics layers, adds (36) the motion information to the composite video frame, and sends (38) the stream to a client device (18). Upon receipt, the client device extracts (44) the motion information, compensates the positions of the graphics layers using the Z-layer information and the motion information, and applies a selected positional time-warp algorithm to compensate for the translational and rotational movement of the user’s head (50). The resultant layers are then combined into a video image and rendered (52) for the user.
Image classification and conversion method and device, image processor and training method therefor, and medium
Disclosed are an image classification and conversion method, apparatus, image processor and training method thereof, and medium. The image classification method includes receiving a first input image and a second input image; performing image encoding on the first input image by utilizing n stages of encoding units connected in cascades to produce a first output image, wherein n is an integer greater than 1, and wherein as for 1≤i<n, the output of the i-th stage of encoding unit is an input of an (i+1)-th stage of encoding unit, wherein m is an integer greater than 1; outputting a first output image, the first output image comprising m.sup.n output sub-images, and each of the m.sup.n output sub-images is corresponding to an image category.
Adaptive resolution of point cloud and viewpoint prediction for video streaming in computing environments
A mechanism is described for facilitating adaptive resolution and viewpoint-prediction for immersive media in computing environments. An apparatus of embodiments, as described herein, includes one or more processors to receive viewing positions associated with a user with respect to a display, and analyze relevance of media contents based on the viewing positions, where the media content includes immersive videos of scenes captured by one or more cameras. The one or more processors are further to predict portions of the media contents as relevant portions based on the viewing positions and transmit the relevant portions to be rendered and displayed.
Low-complexity two-dimensional (2D) separable transform design with transpose buffer management
Methods are provided for reducing the size of a transpose buffer used for computation of a two-dimensional (2D) separable transform. Scaling factors and clip bit widths determined for a particular transpose buffer size and the expected transform sizes are used to reduce the size of the intermediate results of applying the 2D separable transform. The reduced bit widths of the intermediate results may vary across the intermediate results. In some embodiments, the scaling factors and associated clip bit widths may be adapted during encoding.