G06V10/806

METHOD AND SYSTEM FOR PREDICTING A TRAJECTORY OF A TARGET VEHICLE IN AN ENVIRONMENT OF A VEHICLE

A method for predicting a trajectory of a target vehicle in an environment of a vehicle. The method includes the steps of a) capturing states of the target vehicle, capturing states of further vehicle objects in the environment of the vehicle and capturing road markings by a camera-based capture device; b) preprocessing the data obtained in step a), wherein outliers are removed and missing states are calculated; c) calculating an estimated trajectory by a physical model on the basis of the data preprocessed in step b); d) calculating a driver-behavior-based trajectory on the basis of the data preprocessed in step b); and e) combining the trajectories calculated in steps c) and d) to form a predicted trajectory of the target vehicle.

SENSOR FUSION FOR AUTONOMOUS MACHINE APPLICATIONS USING MACHINE LEARNING

In various examples, a multi-sensor fusion machine learning model—such as a deep neural network (DNN)—may be deployed to fuse data from a plurality of individual machine learning models. As such, the multi-sensor fusion network may use outputs from a plurality of machine learning models as input to generate a fused output that represents data from fields of view or sensory fields of each of the sensors supplying the machine learning models, while accounting for learned associations between boundary or overlap regions of the various fields of view of the source sensors. In this way, the fused output may be less likely to include duplicate, inaccurate, or noisy data with respect to objects or features in the environment, as the fusion network may be trained to account for multiple instances of a same object appearing in different input representations.

SURGERY SUPPORT SYSTEM AND SURGERY SUPPORT METHOD

A surgery support system according to an embodiment includes processing circuitry. The processing circuitry acquires medical information of a subject under surgery. The processing circuitry detects an event relating to an abnormality, based on the acquired medical information of the subject. The processing circuitry associates a point of time of detecting the event relating to the abnormality with the medical information acquired at the point of time to generate association information.

Aligning symbols and objects using co-attention for understanding visual content

A method, apparatus and system for understanding visual content includes determining at least one region proposal for an image, attending at least one symbol of the proposed image region, attending a portion of the proposed image region using information regarding the attended symbol, extracting appearance features of the attended portion of the proposed image region, fusing the appearance features of the attended image region and features of the attended symbol, projecting the fused features into a semantic embedding space having been trained using fused attended appearance features and attended symbol features of images having known descriptive messages, computing a similarity measure between the projected, fused features and fused attended appearance features and attended symbol features embedded in the semantic embedding space having at least one associated descriptive message and predicting a descriptive message for an image associated with the projected, fused features.

Multi-modal dense correspondence imaging system

A multi-modal dense correspondence image processing system submit the multi-modal images to a neural network to produce multi-modal features for each pixel of each of the multi-modal image. Each multi-modal image includes an image of a first modality and a corresponding image of a second modality different from the first modality. The neural network includes a first subnetwork trained to extract first features from pixels of the first modality, a second subnetwork trained to extract second features from pixels of the second modality, and a combiner configured to combine the first features and the second features to produce multi-modal features of a multi-modal image. The system compares the multi-modal features of a pair of multi-modal images to estimate a dense correspondence between pixels of the multi-modal images of the pair and outputs the dense correspondence between pixels of the multi-modal images in the pair.

Machine learning based models for object recognition

Machine learning based models recognize objects in images. Specific features of the object are extracted from the image using machine learning based models. The specific features extracted from the image assist deep learning based models in identifying subtypes of a type of object. The system recognizes the objects and collections of objects and determines whether the arrangement of objects violates any predetermined policies. For example, a policy may specify relative positions of different types of objects, height above ground at which certain types of objects are placed, or an expected number of certain types of objects in a collection.

Item Identification Method, System and Electronic Device
20210397844 · 2021-12-23 ·

An item identification method, system and electronic device are provided. The method includes: acquiring multi-frame images of the item by an image capturing device; processing the multi-frame images of the item to obtain position information and category information of the item in each frame image; acquiring auxiliary information of the item by an information capturing device; performing multi-modality fusion on the position information and the auxiliary information to obtain a fusion result; and determining an identification result of the item according to the category information and the fusion result. Through at least some embodiments of the present disclosure, a problem of low identification accuracy when identifying an item in the related art is partially solved.

SYSTEMS AND METHODS FOR DEEP LEARNING MODEL BASED PRODUCT MATCHING USING MULTI MODAL DATA
20210398183 · 2021-12-23 ·

Methods and systems for generating a plurality of matching items that match a reference item are disclosed. The method includes first determining reference attribute data for the reference item, where the reference attribute data is multimodal. Next, selecting a deep learning multimodal matching model from a plurality of candidate multimodal matching models. The selected deep learning multimodal matching model has a first deep learning neural network (DLNN) for processing data having a first data mode and a second DLNN analyzer for processing data having a second data mode. Then, matching a potential matching item to the reference item using the selected deep learning multimodal matching model to generate a match score, where the match score is computed based on the reference attribute data for the reference item and attribute data for the potential matching item. Finally, adding the potential matching item to the plurality of matching items based on the match score.

Selective attention mechanism for improved perception sensor performance in vehicular applications

The vehicle mounted perception sensor gathers environment perception data from a scene using first and second heterogeneous (different modality) sensors, at least one of the heterogeneous sensors is directable to a predetermined region of interest. A perception processor receives the environment perception data and performs object recognition to identify objects each with a computed confidence score. The processor assesses the confidence score vis-à-vis a predetermined threshold, and based on that assessment, generates an attention signal to redirect the one of the heterogeneous sensors to a region of interest identified by the other heterogeneous sensor. In this way information from one sensor primes the other sensor to increase accuracy and provide deeper knowledge about the scene and thus do a better job of object tracking in vehicular applications.

MODEL GENERATION
20210390667 · 2021-12-16 ·

Embodiments of the present disclosure provide a model generation method, including: constructing a training sample set including a sample image, where feature information of the sample image is line information and optical flow information; and learning the training sample set to generate a recognition model that uses line information and optical flow information of an image as input.