G06V10/806

MULTI-MODAL, MULTI-TECHNIQUE VEHICLE SIGNAL DETECTION

A vehicle includes one or more cameras that capture a plurality of two-dimensional images of a three-dimensional object. A light detector and/or a semantic classifier search within those images for lights of the three-dimensional object. A vehicle signal detection module fuses information from the light detector and/or the semantic classifier to produce a semantic meaning for the lights. The vehicle can be controlled based on the semantic meaning. Further, the vehicle can include a depth sensor and an object projector. The object projector can determine regions of interest within the two-dimensional images, based on the depth sensor. The light detector and/or the semantic classifier can use these regions of interest to efficiently perform the search for the lights.

System for surveillance by integrating radar with a panoramic staring sensor

Described is system for surveillance that integrates radar with a panoramic staring sensor. The system captures image frames of a field-of-view of a scene using a multi-camera panoramic staring sensor. The field-of-view is scanned with a radar sensor to detect an object of interest. A radar detection is received when the radar sensor detects the object of interest. A radar message indicating the presence of the object of interest is generated. Each image frame is marked with a timestamp. The image frames are stored in a frame storage database. The set of radar-based coordinates from the radar message is converted into a set of multi-camera panoramic sensor coordinates. A video clip comprising a sequence of image frames corresponding in time to the radar message is created. Finally, the video clip is displayed, showing the object of interest.

Method and apparatus for generating vehicle damage information

A method and an apparatus for generating vehicle damage information are provided. The method includes: acquiring a damage area image of a target vehicle; performing image segmentation on the damage area image to obtain a first suspected damage area; inputting the damage area image to a pre-trained detection model to obtain a second suspected damage area, the detection model being configured to detect a location of the suspected damage area in the image; determining a damage image feature based on the first suspected damage area and the second suspected damage area; and inputting the damage image feature to a pre-trained classification model to generate a damage type, the classification model being configured to characterize a corresponding relationship between the image feature and the damage type.

Generating multi modal image representation for an image

Technologies for generating a multi-modal representation of an image based on the image content are provided. The disclosed techniques include receiving an image, to be classified, that comprises one or more embedded text characters. The one or more embedded text characters are identified from the image and a first machine learning model is used to generate a text vector that represents a numerical representation of the one or more embedded text characters. A second machine learning model is used to generate an image vector that represents a numerical representation of the graphical portion of the image. The text vector and the image vector are used as input to generate a multi-modal vector that contains information from both the text vector and the image vector. The image may be classified into one of a plurality of image classifications based upon the information in the multi-modal vector.

Audio-visual speech enhancement

Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

System and method for detection of objects of interest in imagery

Described is a system for detecting objects of interest in imagery. The system is configured to receive an input video and generate an attention map. The attention map represents features found in the input video that represent potential objects-of-interest (OI). An eye-fixation map is generated based on a subject's eye fixations. The eye-fixation map also represents features found in the input video that are potential OI. A brain-enhanced synergistic attention map is generated by fusing the attention map with the eye-fixation map. The potential OI in the brain-enhanced synergistic attention map are scored, with scores that cross a predetermined threshold being used to designate potential OI as actual or final OI.

IMAGE ANALYSIS METHOD SUPPORTING ILLNESS DEVELOPMENT PREDICTION FOR A NEOPLASM IN A HUMAN OR ANIMAL BODY

The present invention relates to an image analysis method for providing information for supporting illness development prediction regarding a neoplasm in a human or animal body. The method includes receiving for the neoplasm first and second image data at a first and second moment in time, and deriving for a plurality of image features a first and a second image feature parameter value from the first and second image data. These feature parameter values being a quantitative representation of a respective image feature. Further, calculating an image feature difference value by calculating a difference between the first and second image feature parameter value, and based on a prediction model deriving a predictive value associated with the neoplasm for supporting treatment thereof. The prediction model includes a plurality of multiplier values associated with image features. For calculating the predictive value the method includes multiplying each image feature difference value with its associated multiplier value and combining the multiplied image feature difference values.

SYSTEM AND METHOD FOR AUTOMATED STEREOLOGY OF CANCER

A system and method for applying an ensemble of segmentations to a tissue sample at a blob level and at an image level to determine if the tissue sample is representative of cancerous tissue. The ensemble of segmentations at the image level is used to accept or reject images based upon the segmentation quality of the images and both the blob level segmentation and the image level segmentation are used to calculate a mean nuclear volume to discriminate between cancer and normal classes of tissue samples.

BEAUTY PREDICTION METHOD AND DEVICE BASED ON MULTITASKING AND WEAK SUPERVISION, AND STORAGE MEDIUM
20220309828 · 2022-09-29 ·

A beauty prediction method and device based on multitasking and weak supervision, and a storage medium are disclosed. The method includes the steps of pre-processing inputted facial images; allocating the pre-processed images to multiple tasks; extracting shared image features; and obtaining a plurality of classification results via a plurality of classification networks each including a residual network, a standard neural network and a classifier.

System and Method for Audio-Visual Speech Recognition

Disclosed herein is method of performing speech recognition using audio and visual information, where the visual information provides data related to a person's face. Image preprocessing identifies regions of interest, which is then combined with the audio data before being processed by a speech recognition engine.