G06V10/806

Multi-task multi-sensor fusion for three-dimensional object detection
11494937 · 2022-11-08 · ·

Provided are systems and methods that perform multi-task and/or multi-sensor fusion for three-dimensional object detection in furtherance of, for example, autonomous vehicle perception and control. In particular, according to one aspect of the present disclosure, example systems and methods described herein exploit simultaneous training of a machine-learned model ensemble relative to multiple related tasks to learn to perform more accurate multi-sensor 3D object detection. For example, the present disclosure provides an end-to-end learnable architecture with multiple machine-learned models that interoperate to reason about 2D and/or 3D object detection as well as one or more auxiliary tasks. According to another aspect of the present disclosure, example systems and methods described herein can perform multi-sensor fusion (e.g., fusing features derived from image data, light detection and ranging (LIDAR) data, and/or other sensor modalities) at both the point-wise and region of interest (ROI)-wise level, resulting in fully fused feature representations.

Slot filling with contextual information
11494647 · 2022-11-08 · ·

A system, method and non-transitory computer readable medium for editing images with verbal commands are described. Embodiments of the system, method and non-transitory computer readable medium may include an artificial neural network (ANN) comprising a word embedding component configured to convert text input into a set of word vectors, a feature encoder configured to create a combined feature vector for the text input based on the word vectors, a scoring layer configured to compute labeling scores based on the combined feature vectors, wherein the feature encoder, the scoring layer, or both are trained using multi-task learning with a loss function including a first loss value and an additional loss value based on mutual information, context-based prediction, or sentence-based prediction, and a command component configured to identify a set of image editing word labels based on the labeling scores.

METHOD, SYSTEM AND COMPUTER PROGRAMS FOR TRACEABILITY OF LIVING SPECIMENS
20230096439 · 2023-03-30 · ·

A method, system and computer programs for traceability of living specimens are provided. The method comprises executing a first process that performs video tracking of a plurality of living specimens and that determines tracking features thereof; determining a trajectory vector that includes a trajectory followed by each detected living specimen; executing a second process at a certain period of time that determines secondary features of one or more living specimens; matching tracking features of the trajectory vector with the secondary features, providing reference point of hyperfeatures; determining secondary features of the living specimens for other periods of time, providing other reference points of hyperfeatures; identifying when two reference points are contained within a same digital identifier, and as a result providing a potential trajectory segment; comparing physical characteristics of said potential trajectory segment and establish that the potential trajectory segment is valid/invalid depending if said comparison is inside/outside a given range.

DETECTING BOXES
20230096840 · 2023-03-30 ·

A method for detecting boxes includes receiving a plurality of image frame pairs for an area of interest including at least one target box. Each image frame pair includes a monocular image frame and a respective depth image frame. For each image frame pair, the method includes determining corners for a rectangle associated with the at least one target box within the respective monocular image frame. Based on the determined corners, the method includes the following: performing edge detection and determining faces within the respective monocular image frame; and extracting planes corresponding to the at least one target box from the respective depth image frame. The method includes matching the determined faces to the extracted planes and generating a box estimation based on the determined corners, the performed edge detection, and the matched faces of the at least one target box.

FACIAL RECOGNITION METHOD AND APPARATUS, DEVICE, AND MEDIUM

This application discloses a facial recognition method and apparatus, a device and a medium, which relates to the field of image processing. The method includes: fusing a color map and a depth map of a facial image to obtain a fused image of the facial image, the fused image including two -dimensional information and depth information of the facial image (202); dividing the fused image into blocks to obtain at least two image blocks of the fused image (204); irreversibly shuffling pixels in the at least two image blocks to obtain a pixel-confused facial image (206); and determining an object identifier corresponding to the facial image according to the pixel-confused facial image (208).

IMAGE RECOGNITION METHOD AND APPARATUS, AND STORAGE MEDIUM

Provided is an image recognition method. The method includes determining subject decoded features of a to-be-detected image and an original interaction decoded feature of a subject interactive relationship in the to-be-detected image; determining subject decoded features associated with the original interaction decoded feature, and updating the original interaction decoded feature by using the associated subject decoded features so as to obtain a new interaction decoded feature; and according to the subject decoded features of the to-be-detected image and the new interaction decoded feature, determining at least two subjects to which the subject interactive relationship in the to-be-detected belongs.

METHOD AND APPARATUS WITH OBJECT CLASSIFICATION

An object classification method and apparatus are disclosed. The object classification method includes receiving an input image, storing first feature data extracted by a first feature extraction layer of a neural network configured to extract features of the input image, receiving second feature data from a second feature extraction layer which is an upper layer of the first feature extraction layer, generating merged feature data by merging the first feature data and the second feature data, and classifying an object in the input image based on the merged feature data.

FACE IMAGE PROCESSING METHOD, FACE IMAGE PROCESSING MODEL TRAINING METHOD, APPARATUS, DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT

This application discloses a face image processing method performed by an electronic device. The method includes: acquiring a face image of a source face and a face template image of a template face; performing three-dimensional face modeling on the face image and the face template image to obtain a three-dimensional face image feature of the face image and a three-dimensional face template image feature of the face template image; fusing the three-dimensional face image feature and the three-dimensional face template image feature to obtain a three-dimensional fusion feature; performing face replacement feature extraction on the face image based on the face template image to obtain an initial face replacement feature; transforming the initial face replacement feature based on the three-dimensional fusion feature to obtain a target face replacement feature; and replacing the template face with the source face based on the target face replacement feature to obtain a target face image.

FULL BODY POSE ESTIMATION THROUGH FEATURE EXTRACTION FROM MULTIPLE WEARABLE DEVICES
20230101617 · 2023-03-30 ·

Embodiments are disclosed for full body pose estimation using features extracted from multiple wearable devices. In an embodiment, a method comprises: obtaining point of view (POV) video data and inertial sensor data from multiple wearable devices worn at the same time by a user; obtaining depth data capturing the user's full body; extracting two-dimensional (2D) keypoints from the POV video data; reconstructing a full body 2D skeletal model from the 2D keypoints; generating a three-dimensional (3D) mesh model of the user's full body based on the depth data; merging nodes of the 3D mesh model with the inertial sensor data; aligning respective orientations of the 2D skeletal model and the 3D mesh model in a common reference frame; and predicting, using a machine learning model, classification types based on the aligned 2D skeletal model and 3D mesh model.

SYSTEM AND METHOD FOR PRODUCE DETECTION AND CLASSIFICATION

Systems, methods, and computer-readable storage media for object detection and classification, and particularly produce detection and classification. A system configured according to this disclosure can receiving, at a processor, an image of an item. The system can then perform, across multiple pre-trained neural networks, feature detection on the image, resulting in feature maps of the image. These feature maps can be concatenated and combined, then input into an additional neural network for feature detection on the combined feature map, resulting in tiered neural network features. The system then classifies, via the processor, the item based on the tiered neural network features.