Patent classifications
H04N19/192
VIDEO CONFERENCING BASED ON ADAPTIVE FACE RE-ENACTMENT AND FACE RESTORATION
A method and apparatus for adaptive decoding of compressed video for video conferencing may be provided. The method may include receiving compressed video data comprising a plurality of video frames, and determining, a selection signal indicating whether at least one of a face restoration technique and a face reenactment technique is to be used. The method may include adaptively selecting and transmitting a single reference frame or a plurality of low resolution (LR) frames comprising essential facial features, generating, one or more recovered facial features and one or more respective decompressed low resolution (LR) extended face areas based on the selection signal and the compressed video data, and decoding a video frame from the plurality of video frames based on the one or more recovered facial features and the one or more respective decompressed low resolution (LR) extended face areas.
Iterative media object compression algorithm optimization using decoupled calibration of perceptual quality algorithms
One or more multi-stage optimization iterations are performed with respect to a compression algorithm. A given iteration comprises a first stage in which hyper-parameters of a perceptual quality algorithm are tuned independently of the compression algorithm. A second stage of the iteration comprises tuning hyper-parameters of the compression algorithm using a set of perceptual quality scores generated by the tuned perceptual quality algorithm. The final stage of the iteration comprises performing a compression quality evaluation test on the tuned compression algorithm.
System and method for image format conversion using 3D lookup table approximation
A system is provided for converting image data from a first image format to a second image format that approximates a three-dimensional lookup table. The system includes an image processing operation database that stores image format conversion configurations; an image format conversion selector that selects an image format conversion for converting the image data from a first to a second format and that accesses, from the database, a corresponding image format conversion configuration for converting the image data to the second format; and an image processor that executes processing input operations on RGB components of the image data, a 3×3 matrix, and processing output operations on the respective RGB components that are output from the 3×3 matrix, such that the image data is converted to the second format, with the processing input and output operations comprising the accessed image format conversion configuration.
VIDEO COMPRESSION BASED ON LONG RANGE END-TO-END DEEP LEARNING
At least a method and an apparatus are presented for efficiently encoding or decoding video. For example, a plurality of frames is provided to a motion estimator to produce an output comprising estimated motion information. The estimated motion information is provided to an auto-encoder or an auto-decoder to produce an output comprising reconstructed motion field. The reconstructed motion field and one or more decoded frames of the plurality of frames are provided to a deep neural network to produce an output comprising refined bi-directional motion field. The video is encoded or decoded based on the refined bi-directional motion field.
Method and apparatus for coding unit partitioning
A method for coding unit partitioning in a video encoder is provided that includes performing intra-prediction on each permitted coding unit (CU) in a CU hierarchy of a largest coding unit (LCU) to determine an intra-prediction coding cost for each permitted CU, storing the intra-prediction coding cost for each intra-predicted CU in memory, and performing inter-prediction, prediction mode selection, and CU partition selection on each permitted CU in the CU hierarchy to determine a CU partitioning for encoding the LCU, wherein the stored intra-prediction coding costs for the CUs are used.
Method and apparatus for coding unit partitioning
A method for coding unit partitioning in a video encoder is provided that includes performing intra-prediction on each permitted coding unit (CU) in a CU hierarchy of a largest coding unit (LCU) to determine an intra-prediction coding cost for each permitted CU, storing the intra-prediction coding cost for each intra-predicted CU in memory, and performing inter-prediction, prediction mode selection, and CU partition selection on each permitted CU in the CU hierarchy to determine a CU partitioning for encoding the LCU, wherein the stored intra-prediction coding costs for the CUs are used.
CONTENT-ADAPTIVE ONLINE TRAINING WITH IMAGE SUBSTITUTION IN NEURAL IMAGE COMPRESSION
Aspects of the disclosure provide a method and an apparatus for video encoding. The apparatus includes processing circuitry configured to perform an iterative update of sample values of a plurality of samples in an initial input image. The iterative update includes generating a coded representation of a final input image based on the final input image by an encoding neural network (NN) and at least one training module. The final input image has been updated from the initial input image by a number of iterations of the iterative update. The iterative update includes generating a reconstructed image of the final input image based on the coded representation of the final input image by a decoding NN. One of a rate-distortion loss for the final input image or the number of iterations of the iterative update satisfies a pre-determined condition. An encoded image corresponding to the final input image is generated.
Method and apparatus for enhanced patch boundary identification for point cloud compression
A method and apparatus for decoding a video stream encoded using video point cloud coding, the decoding including obtaining a geometry-reconstructed point cloud based on one or more patches; identifying a first boundary of a patch including a plurality of first boundary points; identifying a second boundary including a plurality of second boundary points inside the first boundary; performing smoothing on the first boundary points and the second boundary points; obtaining a smoothed geometry-reconstructed point cloud based on the smoothed first boundary points and the smoothed second boundary points; and reconstructing a dynamic point cloud using the smoothed geometry-reconstructed point cloud.
Systems and methods for rendering and pre-encoded load estimation based encoder hinting
Systems and methods for hinting an encoder are disclosed in which a server monitors for information related to changes in frame rendering, calculates tolerance boundaries, rolling average frame time, and short-term trends in frame time, and uses those calculations to identify a frame time peak. The server then hints a codec (encoder) to modulate the quality settings of frame output in proportion to the size of the frame time peak. In certain embodiments, a renderer records one or more playthroughs in a game environment, sorts a plurality of frames from one or more playthroughs into a plurality of cells on a heatmap, and collects the list of sorted frames. A codec may then encode one or more frames from the list of sorted frames to calculate an average encoded frame size for each cell in the heatmap, and associate each average encoded frame size with a per-cell normalized encoder quality setting.
Systems and methods for rendering and pre-encoded load estimation based encoder hinting
Systems and methods for hinting an encoder are disclosed in which a server monitors for information related to changes in frame rendering, calculates tolerance boundaries, rolling average frame time, and short-term trends in frame time, and uses those calculations to identify a frame time peak. The server then hints a codec (encoder) to modulate the quality settings of frame output in proportion to the size of the frame time peak. In certain embodiments, a renderer records one or more playthroughs in a game environment, sorts a plurality of frames from one or more playthroughs into a plurality of cells on a heatmap, and collects the list of sorted frames. A codec may then encode one or more frames from the list of sorted frames to calculate an average encoded frame size for each cell in the heatmap, and associate each average encoded frame size with a per-cell normalized encoder quality setting.