G06T7/596

ONLINE CALIBRATION OF 3D SCAN DATA FROM MULTIPLE VIEWPOINTS
20220012476 · 2022-01-13 ·

A calibration system and method for online calibration of 3D scan data from multiple viewpoints is provided. The calibration system receives a set of depth scans and a corresponding set of color images of a scene that includes a human-object as part of a foreground of the scene. The calibration system extracts a first three-dimensional (3D) representation of the foreground based on a first depth scan and spatially aligns the extracted first 3D representation with a second 3D representation of the foreground. The first 3D representation and the second 3D representation are associated with a first viewpoint and a second viewpoint, respectively, in a 3D environment. The calibration system updates the spatially aligned first 3D representation based on the set of color images and a set of structural features of the human-object and reconstructs a 3D mesh of the human-object based on the updated first 3D representation of the foreground.

Object pose estimation in visual data

The pose of an object may be estimated based on fiducial points identified in a visual representation of the object. Each fiducial point may correspond with a component of the object, and may be associated with a first location in an image of the object and a second location in a 3D space. A 3D skeleton of the object may be determined by connecting the locations in the 3D space, and the object's pose may be determined based on the 3D skeleton.

Point depth estimation from a set of 3D-registered images

Embodiments provide 3D coordinates for points in a scene that are observed to be in the correct physical position in a series of images. A method may comprise obtaining a plurality of images including a base image having at least one annotated point corresponding to a point of an object shown in the base image, and a plurality of side images showing the object from different viewpoints than the base image, wherein the plurality of side images are given with the camera poses relative to the base image, extracting from at least some of the side images, image patches showing the annotated point, wherein a plurality of sets of image patches are extracted, wherein a set of image patches is extracted at a plurality of corresponding candidate depth values, classifying each set as having a corresponding candidate depth value that is correct or incorrect, and outputting a correct depth value.

GENERATION OF A 3D POINT CLOUD OF A SCENE
20230326126 · 2023-10-12 ·

A method for generating a 3D point cloud of a scene is performed by an image processing device. The method obtains digital images depicting the scene. Each digital image is composed of pixels. The method includes segmenting each of the digital images into digital image segments. The method includes determining a depth vector and a normal vector per each of the digital image segments by applying MVS processing to a subset of the pixels per each digital image segment. The method includes forming a map of depth vectors and normal vectors per each pixel in the digital images by, based on the determined depth and normal vectors per each of the digital image segments, estimating a 3D plane per digital image segment. The method includes generating the 3D point cloud of the scene as a combination of all the maps of depth vectors and normal vectors per each pixel in the digital images.

Object pose estimation in visual data

The pose of an object may be estimated based on fiducial points identified in a visual representation of the object. Each fiducial point may correspond with a component of the object, and may be associated with a first location in an image of the object and a second location in a 3D coordinate pace. A 3D skeleton of the object may be determined by connecting the locations in the 3D space, and the object's pose may be determined based on the 3D skeleton.

Systems and Methods for Estimating Depth from Projected Texture using Camera Arrays
20230152087 · 2023-05-18 · ·

Systems and methods for estimating depth from projected texture using camera arrays are described. A camera array includes a conventional camera and at least one two-dimensional array of cameras, where the conventional camera has a higher resolution than the cameras in the at least one two-dimensional array of cameras, an illumination system configured to illuminate a scene with a projected texture, where an image processing pipeline application directs the processor to: utilize the illumination system controller application to control the illumination system to illuminate a scene with a projected texture, capture a set of images of the scene illuminated with the projected texture, and determining depth estimates for pixel locations in an image from a reference viewpoint using at least a subset of the set of images.

OBJECT POSE ESTIMATION IN VISUAL DATA

The pose of an object may be estimated based on fiducial points identified in a visual representation of the object. Each fiducial point may correspond with a component of the object, and may be associated with a first location in an image of the object and a second location in a 3D coordinate pace. A 3D skeleton of the object may be determined by connecting the locations in the 3D space, and the object's pose may be determined based on the 3D skeleton.

Photo-video based spatial-temporal volumetric capture system for dynamic 4D human face and body digitization

The photo-video based spatial-temporal volumetric capture system more efficiently, produces high frame rate and high resolution 4D dynamic human videos, without a need for 2 separate 3D and 4D scanner systems, by combining a set of high frame rate machine vision video cameras with a set of high resolution photography cameras. It reduces a need for manual CG works, by temporally up-sampling shape and texture resolution of 4D scanned video data from a temporally sparse set of higher resolution 3D scanned keyframes that are reconstructed both by using machine vision cameras and photography cameras. Unlike typical performance capture system that uses single static template model at initialization (e.g. A or pose), the photo-video based spatial-temporal volumetric capture system stores multiple keyframes of high resolution 3D template models for robust and dynamic shape and texture refinement of 4D scanned video sequence. For shape up-sampling, the system can apply mesh-tracking based temporal shape super resolution. For texture up-sampling, the system can apply machine learning based temporal texture super resolution.

Geospatial modeling system providing 3D geospatial model update based upon iterative predictive image registration and related methods

A geospatial modeling system may include a memory and a processor cooperating therewith to: (a) generate a three-dimensional (3D) geospatial model including geospatial voxels based upon a plurality of geospatial images; (b) select an isolated geospatial image from among the plurality of geospatial images; (c) determine a reference geospatial image from the 3D geospatial model using Artificial Intelligence (AI) and based upon the isolated geospatial image; (d) align the isolated geospatial image and the reference geospatial image to generate a predictively registered image; (e) update the 3D geospatial model based upon the predictively registered image; and (f) iteratively repeat (b)-(e) for successive isolated geospatial images.

Array-based depth estimation

A method includes obtaining at least three input image frames of a scene captured using at least three imaging sensors. The input image frames include a reference image frame and multiple non-reference image frames. The method also includes generating multiple disparity maps using the input image frames. Each disparity map is associated with the reference image frame and a different non-reference image frame. The method further includes generating multiple confidence maps using the input image frames. Each confidence map identifies weights associated with one of the disparity maps. In addition, the method includes generating a depth map of the scene using the disparity maps and the confidence maps. The imaging sensors are arranged to define multiple baseline directions, where each baseline direction extends between the imaging sensor used to capture the reference image frame and the imaging sensor used to capture a different non-reference image frame.