G06V20/647

Neural network based facial analysis using facial landmarks and associated confidence values

Systems and methods for more accurate and robust determination of subject characteristics from an image of the subject. One or more machine learning models receive as input an image of a subject, and output both facial landmarks and associated confidence values. Confidence values represent the degrees to which portions of the subject's face corresponding to those landmarks are occluded, i.e., the amount of uncertainty in the position of each landmark location. These landmark points and their associated confidence values, and/or associated information, may then be input to another set of one or more machine learning models which may output any facial analysis quantity or quantities, such as the subject's gaze direction, head pose, drowsiness state, cognitive load, or distraction state.

TRAINING USING RENDERED IMAGES

Examples of methods for training using rendered images are described herein. In some examples, a method may include, for a set of iterations, randomly positioning a three-dimensional (3D) object model in a virtual space with random textures. In some examples, the method may include, for the set of iterations, rendering a two-dimensional (2D) image of the 3D object model in the virtual space and a corresponding annotation image. In some examples, the method may include training a machine learning model using the rendered 2D images and corresponding annotation images.

METHODS AND SYSTEMS FOR GENERATING 3D DATASETS TO TRAIN DEEP LEARNING NETWORKS FOR MEASUREMENTS ESTIMATION
20220351378 · 2022-11-03 ·

Disclosed are systems and methods for generating data sets for training deep learning networks for key point annotations and measurements extraction from photos taken using a mobile device camera. The method includes the steps of receiving a 3D scan model of a 3D object or subject captured from a 3D scanner and a 2D photograph of the same 3D object or subject at a virtual workspace. The 3D scan model is rigged with one or more key points. A superimposed image of a pose-adjusted and aligned 3D scan model superimposed over the 2D photograph is captured by a virtual camera in the virtual workspace. Training data for a key point annotation DLN is generated by repeating the steps for a plurality of objects belonging to a plurality of object categories. The key point annotation DLN learns from the training data to produce key point annotations of objects from 2D photographs captured using any mobile device camera.

METHOD, COMPUTER DEVICE AND STORAGE MEDIUM FOR REAL-TIME URBAN SCENE RECONSTRUCTION
20220351463 · 2022-11-03 ·

A method, a device, a computer device and a storage medium for a real-time urban scene reconstruction are provided. The method comprises: obtaining a target image frame and an adjacent image frame corresponding to a target urban scene; locating a position of an object in the target image frame according to the target image frame and the adjacent image frame and obtaining an object point cloud, an object image and a coordinate transformation matrix corresponding to a target object; determining a global characteristic of the target object and parameters of surfaces to be selected of the target object which is configured to determine a characteristic of the surface to be selected; determining a plane combination matrix of the target object; reconstructing a three-dimensional scene model of the target urban scene according to the plane combination matrix, the parameters of the surfaces to be selected and the coordinate transformation matrix.

METHOD AND APPARATUS FOR SCENE SEGMENTATION FOR THREE-DIMENSIONAL SCENE RECONSTRUCTION
20230092248 · 2023-03-23 ·

A method includes obtaining, from an image sensor, image data of a real-world scene; obtaining, from a depth sensor, sparse depth data of the real-world scene; and passing the image data to a first neural network to obtain one or more object regions of interest (ROIs) and one or more feature map ROIs. Each object ROI includes at least one detected object. The method also includes passing the image data and sparse depth data to a second neural network to obtain one or more dense depth map ROIs; aligning the one or more object ROIs, one or more feature map ROIs, and one or more dense depth map ROIs; and passing the aligned ROIs to a fully convolutional network to obtain a segmentation of the real-world scene. The segmentation contains one or more pixelwise predictions of one or more objects in the real-world scene.

THREE-DIMENSIONAL TARGET ESTIMATION USING KEYPOINTS
20230087261 · 2023-03-23 ·

Systems and techniques are described for performing object detection and tracking. For example, a tracking object can obtain an image comprising a target object at least partially in contact with a surface. The tracking object can obtain a plurality of two-dimensional (2D) keypoints based on one or more features associated with one or more portions of the target object in contact with the surface in the image. The tracking object can obtain information associated with a contour of the surface. Based on the plurality of 2D keypoints and the information associated with the contour of the surface, the tracking object can determine a three-dimensional (3D) representation associated with the plurality of 2D keypoints.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
20220342427 · 2022-10-27 ·

The present disclosure relates to an information processing device, an information processing method, and a program that cause a high-speed moving body to appropriately plan a trajectory. A position and distance of an object can be appropriately recognized by extracting feature points in association with a semantic label that is an object certification result by semantic segmentation, connecting feature points of the same semantic label, and forming a Delaunay mesh to form a mesh for each same object, and then a trajectory is planned. The present disclosure can be applied to a moving body.

THREE-DIMENSIONAL RECONSTRUCTION METHOD, THREE-DIMENSIONAL RECONSTRUCTION APPARATUS, DEVICE AND STORAGE MEDIUM
20220343603 · 2022-10-27 ·

Three-dimensional reconstruction method, three-dimensional reconstruction apparatus, device, and storage medium are provided. An implementation of the method may include: determining, based on an initial three-dimensional human body model, a target two-dimensional image corresponding to the three-dimensional human body model; semantically segmenting the target two-dimensional image, and determining semantic labels of pixels in the target two-dimensional image; determining semantic labels of skinned mesh vertices according to corresponding relationships between the skinned mesh vertices in the initial three-dimensional human body model and the pixels in the target two-dimensional image; determining target weights of the skinned mesh vertices according to the semantic labels of the skinned mesh vertices; and determining a target three-dimensional human body model according to the target weights.

METHOD AND DEVICE FOR IDENTIFYING PRESENCE OF THREE-DIMENSIONAL OBJECTS USING IMAGES
20220343661 · 2022-10-27 · ·

Provided are a method and apparatus for identifying the presence of a 3D object using an image. According to the method and the apparatus, two-dimensional images are used to identify whether a 3D object exists in the images. According to the method and apparatus for identifying the presence of a 3D object by using an image, the presence of a 3D object in space can be accurately and quickly identified by using two-dimensional images, leading to higher productivity.

System for detecting surface type of object and artificial neural network-based method for detecting surface type of object
11610390 · 2023-03-21 · ·

An artificial neural network-based method for detecting a surface type of an object includes: receiving a plurality of object images, wherein a plurality of spectra of the plurality of object images are different from one another and each of the object images has one of the spectra; transforming each object image into a matrix, wherein the matrix has a channel value that represents the spectrum of the corresponding object image; and executing a deep learning program by using the matrices to build a predictive model for identifying a target surface type of the object. Accordingly, the speed of identifying the target surface type of the object is increased, further improving the product yield of the object.