G06T9/002

CODING SCHEME FOR VIDEO DATA USING DOWN-SAMPLING/UP-SAMPLING AND NON-LINEAR FILTER FOR DEPTH MAP

Methods of encoding and decoding video data are provided. In an encoding method, source video data comprising one or more source views is encoded into a video bitstream. Depth data of at least one of the source views is nonlinearly filtered and downsampled prior to encoding. After decoding, the decoded depth data is up-sampled and nonlinearly filtered.

Generative adversarial neural network assisted video reconstruction

A latent code defined in an input space is processed by the mapping neural network to produce an intermediate latent code defined in an intermediate latent space. The intermediate latent code may be used as appearance vector that is processed by the synthesis neural network to generate an image. The appearance vector is a compressed encoding of data, such as video frames including a person's face, audio, and other data. Captured images may be converted into appearance vectors at a local device and transmitted to a remote device using much less bandwidth compared with transmitting the captured images. A synthesis neural network at the remote device reconstructs the images for display.

LARGE-SCALE GENERATION OF PHOTOREALISTIC 3D MODELS
20230044644 · 2023-02-09 ·

A system and methods are provided for large-scale generation of photorealistic 3D models, including training texture map and 3D mesh encoder and decoder neural networks, and training a sampler neural network to convert random seeds into input vectors for the texture map and 3D mesh decoder networks. Training the sampler neural network may include feeding random seeds to the sampler neural network, generating training 3D models from the texture map and 3D mesh decoders, rendering 2D images from the training 3D models, back-propagating output of realism classifier and of a uniqueness function of the 2D images to the sampler neural network; and providing the trained sampler neural network with additional random seed inputs to generate multiple respective input vectors for the texture map and 3D mesh decoders, and responsively generating by the texture map and 3D mesh decoders multiple new 3D models.

Designing a 3D modeled object via user-interaction
11556678 · 2023-01-17 · ·

A computer-implemented method for designing a 3D modeled object via user-interaction. The method includes obtaining the 3D modeled object and a machine-learnt decoder. The machine-learnt decoder is a differentiable function taking values in a latent space and outputting values in a 3D modeled object space. The method further includes defining a deformation constraint for a part of the 3D modeled object. The method further comprises determining an optimal vector. The optimal vector minimizes an energy. The energy explores latent vectors. The energy comprises a term which penalizes, for each explored latent vector, non-respect of the deformation constraint by the result of applying the decoder to the explored latent vector. The method further includes applying the decoder to the optimal latent vector. This constitutes an improved method for designing a 3D modeled object via user-interaction.

Compressing weight updates for decoder-side neural networks

A method, apparatus, and computer program product are provided for training a neural network or providing a pre-trained neural network with the weight-updates being compressible using at least a weight-update compression loss function and/or task loss function. The weight-update compression loss function can comprise a weight-update vector defined as a latest weight vector minus an initial weight vector before training. A pre-trained neural network can be compressed by pruning one or more small-valued weights. The training of the neural network can consider the compressibility of the neural network, for instance, using a compression loss function, such as a task loss and/or a weight-update compression loss. The compressed neural network can be applied within a decoding loop of an encoder side or in a post-processing stage, as well as at a decoder side.

ENCODING AND DECODING A STYLIZED CUSTOM GRAPHIC
20230237706 · 2023-07-27 ·

Disclosed are methods for encoding information in a graphic image. The information may be encoded so as to have a visual appearance that adopts a particular style, so that the encoded information is visually pleasing in the environment in which it is displayed. An encoder and decoder are trained during an integrated training process, where the encoder is tuned to minimize a loss when its encoded images are decoded. Similarly, the decoder is also trained to minimize loss when decoding the encoded images. Both the encoder and decoder may utilize a convolutional neural network in some aspects to analyze data and/or images. Once data is encoded, a style from a sample image is transferred to the encoded data. When decoding, the decoder may largely ignore the style aspects of the encoded data and decode based on a content portion of the data.

Ultra Light Models and Decision Fusion for Fast Video Coding
20230007284 · 2023-01-05 ·

Ultra light models and decision fusion for increasing the speed of intra-prediction are described. Using a machine-learning (ML) model, an ML intra-prediction mode is obtained. A most-probable intra-prediction mode is obtained from amongst available intra-prediction modes for encoding the current block. As an encoding intra-prediction mode, one of the ML intra-prediction mode or the most-probable intra-prediction mode is selected, and the encoding intra-prediction mode is encoded in a compressed bitstream. A current block is encoded using the encoding intra-prediction mode. Selection of the encoding intra-prediction mode is based on relative reliabilities of the ML intra-prediction mode and the most-probable intra-prediction mode.

IMMERSIVE VIDEO CODING USING OBJECT METADATA
20230007277 · 2023-01-05 ·

Methods, apparatus, systems and articles of manufacture for video coding using object metadata are disclosed. An example apparatus includes an object separator to separate input views into layers associated with respective objects to generate object layers for geometry data and texture data of the input views, a pruner to project the first object layer of a first basic view of the at least one basic views against the first object layer of a first additional view of the at least one additional views to generate a first pruned view and a first pruning mask, a patch packer to tag a patch with an object identifier of the first object, the patch corresponding to the first pruning mask, and an atlas generator to generate at least one atlas to include in encoded video data, the atlas including the patch.

UAV video aesthetic quality evaluation method based on multi-modal deep learning
11568637 · 2023-01-31 · ·

The present disclosure provides a UAV video aesthetic quality evaluation method based on multi-modal deep learning, which establishes a UAV video aesthetic evaluation data set, analyzes the UAV video through a multi-modal neural network, extracts high-dimensional features, and concatenates the extracted features, thereby achieving aesthetic quality evaluation of the UAV video. There are four steps, step one to: establish a UAV video aesthetic evaluation data set, which is divided into positive samples and negative samples according to the video shooting quality; step two to: use SLAM technology to restore the UAV's flight trajectory and to reconstruct a sparse 3D structure of the scene; step three to: through a multi-modal neural network, extract features of the input UAV video on the image branch, motion branch, and structure branch respectively; and step four to: concatenate the features on multiple branches to obtain the final video aesthetic label and video scene type.

Method and electronic device for deblurring blurred image

A method for deblurring a blurred image includes encoding, by at least one processor, a blurred image at a plurality of stages of encoding to obtain an encoded image at each of the plurality of stages; decoding, by the at least one processor, an encoded image obtained from a final stage of the plurality of stages of encoding by using an encoding feedback from each of the plurality of stages and a machine learning (ML) feedback from at least one ML model; and generating, by the at least one processor, a deblurred image in which at least one portion of the blurred image is deblurred based on a result of the decoding.