Patent classifications
G06V10/803
Multi-channel object matching
A method may include obtaining first sensor data captured by a first sensor system and second sensor data captured by a second sensor system of a different type from the first sensor system. The method may include detecting a first object included in the first sensor data and a second object included in the second sensor data. The method may include assigning a first label to the first object and a second label to the second object after comparing the first and the second sensor data. The first and second labels may indicate degrees to which the first and the second objects match. Responsive to the first and second labels indicating that the first and the second objects match, the method may include designating a matched object representative of the first object and the second object and sending the matched object to a downstream computing system of an autonomous vehicle.
Imaging system for detecting human-object interaction and a method for detecting human-object interaction
The present application discloses an imaging system for detecting human-object interaction and a method for detecting human-object interaction thereof. The imaging system includes an event sensor, an image sensor, and a controller. The event sensor is configured obtain an event data set of the targeted scene according to variations of light intensity sensed by pixels of the event sensor when an event occurs in the targeted scene. The image sensor is configured capture a visual image of the targeted scene. The controller is configured to detect human according to the event data set, trigger the image sensor to capture the visual image when the human is detected, and detect the human-object interaction in the targeted scene according to the visual image and a series of event data sets obtained by the event sensor during the event.
Machine learning classification based on separate processing of multiple views
Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for machine learning classification based on separate processing of multiple views. In some implementations, a system obtains image data for multiple images showing different views of an object. A machine learning model is used to generate a separate output based on each the multiple images individually. The outputs for the respective images are combined to generate a combined output. A predicted characteristic of the object is determined based on the combined output. An indication of the predicted characteristic of the object is provided.
DATA CONSTRUCTION AND LEARNING SYSTEM AND METHOD BASED ON METHOD OF SPLITTING AND ARRANGING MULTIPLE IMAGES
The present disclosure relates to a data construction and learning system and method based on a method of splitting and arranging multiple images. The data construction and learning system based on a method of splitting and arranging multiple images includes an input unit configured to receive images captured by a plurality of cameras disposed in a vehicle, a memory in which a program for merging the images into a single image and estimating information on a road situation and an object has been stored, and a processor configured to execute the program. The processor merges and recognizes, as one situation, road situations and objects redundantly included in the images.
UNIFIED PRETRAINING FRAMEWORK FOR DOCUMENT UNDERSTANDING
The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.
3D SURFACE STRUCTURE ESTIMATION USING NEURAL NETWORKS FOR AUTONOMOUS SYSTEMS AND APPLICATIONS
In various examples, to support training a deep neural network (DNN) to predict a dense representation of a 3D surface structure of interest, a training dataset is generated using a simulated environment. For example, a simulation may be run to simulate a virtual world or environment, render frames of virtual sensor data (e.g., images), and generate corresponding depth maps and segmentation masks (identifying a component of the simulated environment such as a road). To generate input training data, 3D structure estimation may be performed on a rendered frame to generate a representation of a 3D surface structure of the road. To generate corresponding ground truth training data, a corresponding depth map and segmentation mask may be used to generate a dense representation of the 3D surface structure.
METHOD AND SYSTEM FOR IMAGE PROCESSING
A method and system for image processing is provided. The method for image processing may include: obtaining an image data set, wherein the image data set includes a first set of volume data; and determining, by the at least one processor, a target anatomy of interest based on the first set of volume data. The determining of the target anatomy of interest may include: determining an initial anatomy of interest in the first set of volume data; and editing the initial anatomy of interest to obtain the target anatomy of interest. The target anatomy of interest may include at least one region of interest (ROI) or at least one volume of interest (VOI). The initial anatomy of interest may include at least one ROI or at least one VOI.
DATA FUSION SYSTEM FOR A VEHICLE EQUIPPED WITH UNSYNCHRONIZED PERCEPTION SENSORS
A sensor data fusion system for a vehicle with multiple sensors includes a first-sensor, a second-sensor, and a controller-circuit. The first-sensor is configured to output a first-frame of data and a subsequent-frame of data indicative of objects present in a first-field-of-view. The first-frame is characterized by a first-time-stamp, the subsequent-frame of data characterized by a subsequent-time-stamp different from the first-time-stamp. The second-sensor is configured to output a second-frame of data indicative of objects present in a second-field-of-view that overlaps the first-field-of-view. The second-frame is characterized by a second-time-stamp temporally located between the first-time-stamp and the subsequent-time-stamp. The controller-circuit is configured to synthesize an interpolated-frame from the first-frame and the subsequent-frame. The interpolated-frame is characterized by an interpolated-time-stamp that corresponds to the second-time-stamp. The controller-circuit fuses the interpolated-frame with the second-frame to provide a fused-frame of data characterized by the interpolated-time-stamp, and operates the host-vehicle in accordance with the fused-frame.
OPHTHALMOLOGIC APPARATUS, AND METHOD OF CONTROLLING THE SAME
An ophthalmologic apparatus of an embodiment example includes a front image acquiring device, a first search processor, and a second search processor. The front image acquiring device is configured to acquire a front image of a fundus of a subject's eye. The first search processor is configured to search for an interested region corresponding to an interested site of the fundus based on a brightness variation in the front image. The second search processor is configured to search for the interested region by template matching between the front image and a template image in the event that the interested region has not been detected by the first search processor.
Skin detection method and electronic device
A skin detection method includes: dividing a region of interest in a face image into a highlighted region and a non-highlighted region; separately determining a first segmentation threshold of the highlighted region and a second segmentation threshold of the non-highlighted region; obtaining a binary image of the highlighted region based on the first segmentation threshold, and obtaining a binary image of the non-highlighted region based on the second segmentation threshold; fusing the binary image of the highlighted region and the binary image of the non-highlighted region; and identifying, based on a fused image, pores and/or blackheads included in the region of interest.