Patent classifications
H04N2013/0088
SELF-SUPERVISED TRAINING OF A DEPTH ESTIMATION SYSTEM
A method for training a depth estimation model and methods for use thereof are described. Images are acquired and input into a depth model to extract a depth map for each of the plurality of images based on parameters of the depth model. The method includes inputting the images into a pose decoder to extract a pose for each image. The method includes generating a plurality of synthetic frames based on the depth map and the pose for each image. The method includes calculating a loss value with an input scale occlusion and motion aware loss function based on a comparison of the synthetic frames and the images. The method includes adjusting the plurality of parameters of the depth model based on the loss value. The trained model can receive an image of a scene and generate a depth map of the scene according to the image.
Method for processing immersive video and method for producing immersive video
Disclosed herein is an immersive video processing method. The immersive video processing method includes: determining a priority order of pruning for source videos; extracting patches from the source videos based on the priority order of pruning; generating at least one atlas based on the extracted patches; and encoding metadata. Herein, the metadata may include first threshold information that becomes a criterion for distinguishing between a valid pixel and an invalid pixel in the atlas video.
APPARATUS AND METHOD FOR GENERATING IMAGES OF A SCENE
An apparatus comprises a store (209) storing a set of anchor poses for a scene, as well as typically 3D image data for the scene. A receiver (201) receives viewer poses for a viewer and a render pose processor (203) determines a render pose in the scene for a current viewer pose of the viewer pose where the render pose is determined relative to a reference anchor pose. A retriever (207) retrieves 3D image data for the reference anchor pose and a synthesizer (205) synthesizes images for the render pose in response to the 3D image data. A selector selects the reference anchor pose from the set of anchor poses and is arranged to switch the reference anchor pose from a first anchor pose of the set of anchor poses to a second anchor pose of the set of anchor poses in response to the viewer poses.
Image processing apparatus, image processing method, and program for switching between two types of composite images
There are achieved an apparatus and a method for switching a color image-based composite image and a black-and-white image-based composite image at an optimum timing such that it is difficult for an observer to notice the switching of the images. A color image and a black-and-white image captured from different viewpoints are input to generate either of the following two types of composite images of (a) a color image-based composite image in which a position of the black-and-white image is adjusted to coincide with a position of the color image and (b) a black-and-white image-based composite image in which the position of the color image is adjusted to coincide with the position of the black-and-white image, by switching between the two types of composite images on the basis of a predetermined reference image switching threshold. In this configuration, a hysteresis is set as the reference image switching threshold, and the hysteresis is changed according to a situation. Thus, reference images can be switched at an optimum timing such that it is difficult for an observer to notice the switching of the reference images.
Self-supervised training of a depth estimation system
A method for training a depth estimation model and methods for use thereof are described. Images are acquired and input into a depth model to extract a depth map for each of the plurality of images based on parameters of the depth model. The method includes inputting the images into a pose decoder to extract a pose for each image. The method includes generating a plurality of synthetic frames based on the depth map and the pose for each image. The method includes calculating a loss value with an input scale occlusion and motion aware loss function based on a comparison of the synthetic frames and the images. The method includes adjusting the plurality of parameters of the depth model based on the loss value. The trained model can receive an image of a scene and generate a depth map of the scene according to the image.
Image Processing Method and Device, and Three-Dimensional Imaging System
Disclosed are an image processing method and device, and a three-dimensional imaging system. The method comprises the following steps of: acquiring a two-dimensional image to be processed; aligning the two-dimensional image to be processed to a grid template; performing mapping processing on the two-dimensional image to be processed by using a grid mapping table to acquire a first image, wherein the grid mapping table is used for representing the mapping relationship of grid images; mirroring the first image to acquire a second image; and synthesizing the first image and the second image to acquire the superimposed image of the first image and the second image. According to the method, the grid template and the grid mapping table are used for performing mapping processing on the two-dimensional image to be processed so as to simulate a left-eye image and a right-eye image acquired by human eyes; and a same two-dimensional image to be processed need to be mapped only once to acquire the left-eye image and the right-eye image, the steps of image processing being reduced, thus the time of image processing being shortened, and providing favorable conditions for the follow-up real-time conversion of the superimposed two-dimensional image into a three-dimensional image.
MODIFICATION OF A LIVE-ACTION VIDEO RECORDING USING VOLUMETRIC SCENE RECONSTRUCTION TO REPLACE A DESIGNATED REGION
A main video sequence of a live action scene is captured along with ancillary device data to provide corresponding volumetric information about the scene. The volumetric data can then be used to visually remove or replace objects in the main video sequence. A removed object is replaced by the view that would have been captured by the main video sequence had the removed object not been present in the live action scene at the time of capturing.
LAYERED SCENE DECOMPOSITION CODEC WITH HIGHER ORDER LIGHTING
A system and methods for a CODEC driving a real-time light field display for multi-dimensional video streaming, interactive gaming and other light field display applications is provided applying a layered scene decomposition strategy. Multi-dimensional scene data including information on directions of normals is divided into a plurality of data layers of increasing depths as the distance between a given layer and the display surface increases. Data layers which are sampled using a plenoptic sampling scheme and rendered using hybrid rendering, such as perspective and oblique rendering, to encode light fields corresponding to each data layer. The resulting compressed, (layered) core representation of the multi-dimensional scene data is produced at predictable rates, reconstructed and merged at the light field display in real-time by applying view synthesis protocols, including edge adaptive interpolation, to reconstruct pixel arrays in stages (e.g. columns then rows) from reference elemental images.
COMPUTER-GENERATED IMAGE PROCESSING INCLUDING VOLUMETRIC SCENE RECONSTRUCTION
An imagery processing system determines pixel color values for pixels of captured imagery from volumetric data, providing alternative pixel color values. A main imagery capture device, such as a camera, captures main imagery such as still images and/or video sequences, of a live action scene. Alternative devices capture imagery of the live action scene, in some spectra and form, and capture information related to pixel color values for multiple depths of a scene, which can be processed to provide reconstruction.
Real-World Object Holographic Transport and Communication Room System
A novel holographic transport and communication room system utilizes a single red-green-blue (RGB)-depth (RGB-D) camera to capture the motion of a dynamic target, which is required to rotate around the RGB-D camera, instead of capturing three-dimensional volume of the dynamic target conventionally with a plurality of multi-angle cameras positioned around the dynamic target. The captured 3D volume of the dynamic target subject undergoes relighting, subject depth calculations, geometrical extrapolations, and volumetric reconstructions in a machine-learning graphical transformation feedback loop to synthesize a refined real-time hologram. The resulting hologram in one holographic room system is shared with other users occupying other holographic room systems equipped with similar holographic capabilities for live bilateral or multilateral holographic visualization and collaboration. Preferably, each holographic room system also integrates a mixed-reality content synthesis table for real-time remote participant collaboration in manipulating holographic contents and a one-to-one ratio life-size holographic display and capture tubular device.