G06V20/647

Image targeting via targetable 3D data

A method can include identifying a geolocation of an object in an image, the method comprising receiving data indicating a pixel coordinate of the image selected by a user, identifying a data point in a targetable three-dimensional (3D) data set corresponding to the selected pixel coordinate, and providing a 3D location of the identified data point.

3D segmentation using space carving and 2D convolutional neural networks

A system for generating a 3D segmentation of a target volume is provided. The system accesses views of an X-ray scan of a target volume. The system applies a 2D CNN to each view to generate a 2D multi-channel feature vector for each view. The system applies a space carver to generate a 3D channel volume for each channel based on the 2D multi-channel feature vectors. The system then applies a linear combining technique to the 3D channel volumes to generate a 3D multi-label map that represents a 3D segmentation of the target volume.

Augmented reality display device and program recording medium
11568608 · 2023-01-31 · ·

Provided is an augmented reality display technology capable of more entertaining a user. An augmented reality display device 10 includes an imaging unit 13, a special effect execution unit 11b, and a display unit 14. The imaging unit 13 acquires a background image of a real world. When a plurality of models for a specific combination are present in a virtual space, the special effect execution unit 11b executes a special effect corresponding to the combination of the models. The display unit 14 displays the models together with the background image based on the special effect.

Image sensor having on-chip compute circuit

In one example, an apparatus comprises: a first sensor layer, including an array of pixel cells configured to generate pixel data; and one or more semiconductor layers located beneath the first sensor layer with the one or more semiconductor layers being electrically connected to the first sensor layer via interconnects. The one or more semiconductor layers comprises on-chip compute circuits configured to receive the pixel data via the interconnects and process the pixel data, the on-chip compute circuits comprising: a machine learning (ML) model accelerator configured to implement a convolutional neural network (CNN) model to process the pixel data; a first memory to store coefficients of the CNN model and instruction codes; a second memory to store the pixel data of a frame; and a controller configured to execute the codes to control operations of the ML model accelerator, the first memory, and the second memory.

POSE DETECTION OF AN OBJECT IN A VIDEO FRAME

Aspects of the disclosure provide solutions for determining a position of an object in a video frame. Examples include: receiving a segmentation mask of an identified object in a video frame; adjusting a 3D representation of a moveable part of the object based on constraints for the moveable part; comparing the 3D model of the object to the segmentation mask of the object; determining a match between the 3D model of the object to the segmentation mask of the object is above a threshold; and based on the match being above the threshold, determining a position of the object.

MACHINE-LEARNING TRAINING SERVICE FOR SYNTHETIC DATA
20230229513 · 2023-07-20 ·

Various embodiments, methods and systems for implementing a distributed computing system machine-learning training service are provided. Initially a machine learning model is accessed. A plurality of synthetic data assets are accessed, where a synthetic data asset is associated with asset-variation parameters that are programmable for machine-learning. The machine learning model is retrained using the plurality of synthetic data assets. The machine-learning training service is further configured for executing real-time calls to generate an on-the-fly-generated synthetic data asset such that the on-the-fly-generated synthetic data asset is rendered in real-time to preclude pre-rendering and storing the on-the-fly-generated synthetic data asset. The machine-learning training service further supports hybrid-based machine learning training, where the machine learning model is trained based on a combination of the plurality of synthetic data assets, a plurality of non-synthetic data assets, and synthetic data asset metadata associated with the plurality of synthetic data assets.

VOLUMETRIC CAPTURE AND MESH-TRACKING BASED MACHINE LEARNING 4D FACE/BODY DEFORMATION TRAINING
20230230304 · 2023-07-20 ·

Mesh-tracking based dynamic 4D modeling for machine learning deformation training includes: using a volumetric capture system for high-quality 4D scanning, using mesh-tracking to establish temporal correspondences across a 4D scanned human face and full-body mesh sequence, using mesh registration to establish spatial correspondences between a 4D scanned human face and full-body mesh and a 3D CG physical simulator, and training surface deformation as a delta from the physical simulator using machine learning. The deformation for natural animation is able to be predicted and synthesized using the standard MoCAP animation workflow. Machine learning based deformation synthesis and animation using standard MoCAP animation workflow includes using single-view or multi-view 2D videos of MoCAP actors as input, solving 3D model parameters (3D solving) for animation (deformation not included), and given 3D model parameters solved by 3D solving, predicting 4D surface deformation from ML training.

Mobile multi-camera multi-view capture

A background scenery portion may be identified in each of a plurality of image sets of an object, where each image set includes images captured simultaneously from different cameras. A correspondence between the image sets may determined, where the correspondence tracks control points associated with the object and present in multiple images. A multi-view interactive digital media representation of the object that is navigable in one or more dimensions and that includes the image sets may be generated and stored.

Observed-object recognition system and method

To accurately recognize observed objects. An observed-object recognition system includes an observation region estimation portion, an existence region estimation portion, and an object recognition portion. The observation region estimation portion estimates an observation region that is relatively highly likely to be an observation point in at least one first-person image in a first-person video (a video based on the first-person perspective). Based on the observation region, the existence region estimation portion estimates an existence region that belongs to the first-person image and causes an observed object to exist. The object recognition portion recognizes an object in the estimated existence region of the first-person image.

Multi-sensor analysis of food

In an embodiment, a method for estimating a composition of food includes: receiving a first three-dimensional (3D) image; identifying food in the first 3D image; determining a volume of the identified food based on the first 3D image; and estimating a composition of the identified food using a millimeter-wave radar.