G06V10/806

OBJECT HEIGHT ESTIMATION FROM MONOCULAR IMAGES
20200380316 · 2020-12-03 ·

Systems and methods for estimating a height of an object from a monocular image are described herein. Objects are detected in the image, each object being indicated by a region of interest. The image is then cropped for each region of interest and the cropped image scaled to a predetermined size. The cropped and scaled image is then input into a convolutional neural network (CNN), the output of which is an estimated height for the object. The height may be represented by a mean of a probability distribution of possible sizes, a standard deviation, as well as a level of confidence. A location of the object may be determined based on the estimated height and region of interest. A ground truth dataset may be generated for training the CNN by simultaneously capturing a LIDAR sequence with a monocular image sequence.

METHOD AND APPARATUS FOR SAR IMAGE RECOGNITION BASED ON MULTI-SCALE FEATURES AND BROAD LEARNING

Disclosed are method and apparatus for SAR image recognition based on multi-scale features and broad learning. A region of interest of an original SAR image is extracted by centroid localization, the image is rotated and added with noise for enhancing the data volume, the image is downsampled, LBP features and PPQ features are extracted, an LBP feature vector X.sub.LBP and an LPQ feature vector X.sub.LPQ are cascaded to achieve dimension reduction by principal component analysis to obtain a fusion feature data X.sub.m, the fusion feature data X.sub.m is input to a broad learning network for image recognition and a recognition result is output. By fusing the LBP features and the LPQ features, complementary information is fully utilized and redundant information is reduced. The broad learning network is used to improve the training speed and reduce the time cost. As a result, the recognition effect is more stable, robust and reliable.

AUTONOMOUS VEHICLE OPERATION USING ACOUSTIC MODALITIES
20200379108 · 2020-12-03 ·

Techniques for autonomous vehicle operation using acoustic modalities include using one or more acoustic sensors of a vehicle to receive acoustic waves from one or more objects. The acoustic waves have multiple wavelengths. The acoustic waves are clustered into one or more acoustic clusters based on the plurality of wavelengths. A particular acoustic cluster of the one or more acoustic clusters is selected based on signal processing of the one or more acoustic clusters. A particular object is associated with the particular acoustic cluster. An acoustic fingerprint of the particular object is generated based on the particular acoustic cluster. Characteristics of the particular object are determined based on the acoustic fingerprint of the particular object. A control circuit of the vehicle is used to operate the vehicle to avoid a collision with the particular object based on the characteristics of the particular object.

DETECTING KEY FRAMES IN VIDEO COMPRESSION IN AN ARTIFICIAL INTELLIGENCE SEMICONDUCTOR SOLUTION

A system for detecting key frames in a video may include a feature extractor configured to extract feature descriptors for each of the multiple image frames in the video. The feature extractor may be an embedded cellular neural network of an artificial intelligence (AI) chip. The system may also include a key frame extractor configured to determine one or more key frames in the multiple image frames based on the corresponding feature descriptors of the image frames. The key frame extractor may determine the key frames based on distance values between a first set of feature descriptors corresponding to a first subset of image frames and a second set of feature descriptors corresponding to a second subset of image frames. The system may output an alert based on determining the key frames and/or display the key frames. The system may also compress the video by removing the non-key frames.

System and method for camera radar fusion

A method for camera radar fusion includes receiving, by the processor, radar object detection data for an object and modeling, by a processor, a three dimensional (3D) physical space kinematic model, including updating 3D coordinates of the object, to generate updated 3D coordinates of the object, in response to receiving the radar object detection data for the object. The method also includes transforming, by the processor, the updated 3D coordinates of the object to updated two dimensional (2D) coordinates of the object, based on a 2D-3D calibrated mapping table and modeling, by the processor, a two dimensional (2D) image plane kinematic model, while modeling the 3D physical space kinematic model, where modeling the 2D image plane kinematic model includes updating coordinates of the object based on the updated 2D coordinates of the object.

SYSTEMS AND METHODS FOR QUANTITATIVE PHENOTYPING OF FIBROSIS
20200372640 · 2020-11-26 ·

Systems and methods are provided for computer aided phenotyping of fibrosis-related conditions. A digital image indicates presence of collagens in a biological tissue sample. The image is processed to quantify parameters, each parameter describing a feature of the collagens that is expected to be different for different phenotypes of fibrosis. At least some features are tissue level features that describe macroscopic characteristics of the collagens, morphometric level features that describe morphometric characteristics of the collagens, and texture level features that describe an organization of the collagens. At least some of the plurality of parameters are statistics associated with histograms corresponding to distributions of the associated parameters across at least some of the digital image. At least some of the plurality of parameters are combined to obtain one or more composite scores that quantify a phenotype of fibrosis for the biological tissue sample.

APPARATUS AND METHOD FOR IMAGE PROCESSING FOR MACHINE LEARNING
20200372280 · 2020-11-26 ·

An image processing apparatus includes a superpixel extractor configured to extract a plurality of superpixels from an input original image, a backbone network including N feature extracting layers (here, N is a natural number of two or more) which divide the input original image into grids including a plurality of regions and generate an output value including a feature value for each of the divided regions, and a superpixel pooling layer configured to generate a superpixel feature value corresponding to each of the plurality of superpixels using a first output value to an N.sup.th output value output from each of the N feature extracting layers.

Methods and apparatus for autonomous robotic control

Sensory processing of visual, auditory, and other sensor information (e.g., visual imagery, LIDAR, RADAR) is conventionally based on stovepiped, or isolated processing, with little interactions between modules. Biological systems, on the other hand, fuse multi-sensory information to identify nearby objects of interest more quickly, more efficiently, and with higher signal-to-noise ratios. Similarly, examples of the OpenSense technology disclosed herein use neurally inspired processing to identify and locate objects in a robot's environment. This enables the robot to navigate its environment more quickly and with lower computational and power requirements.

MULTI-MODAL EMOTION RECOGNITION DEVICE, METHOD, AND STORAGE MEDIUM USING ARTIFICIAL INTELLIGENCE
20200364446 · 2020-11-19 ·

A multi-modal emotion recognition system is disclosed. The system includes a data input unit for receiving video data and voice data of a user, a data pre-processing unit including a voice pre-processing unit for generating voice feature data from the voice data and a video pre-processing unit for generating one or more face feature data from the video data, a preliminary inference unit for generating situation determination data as to whether or not the user's situation changes according to a temporal sequence based on the video data. The system further comprises a main inference unit for generating at least one sub feature map based on the voice feature data or the face feature data, and inferring the user's emotion state based on the sub feature map and the situation determination data.

OBJECT PREDICTION METHOD AND APPARATUS, AND STORAGE MEDIUM
20200364518 · 2020-11-19 ·

The present application relates to an object prediction method and apparatus, an electronic device, and a storage medium. The method is applied to a neural network and includes: performing feature extraction processing on a to-be-predicted object to obtain feature information of the to-be-predicted object; determining multiple intermediate prediction results for the to-be-predicted object according to the feature information; performing fusion processing on the multiple intermediate prediction results to obtain fusion information; and determining multiple target prediction results for the to-be-predicted object according to the fusion information. According to embodiments of the present application, feature information of a to-be-predicted object may be extracted; multiple intermediate prediction results for the to-be-predicted object are determined according to the feature information; fusion processing is performed on the multiple intermediate prediction results to obtain fusion information; and multiple target prediction results for the to-be-predicted object are determined according to the fusion information. The method facilitates improving the accuracy of multiple target prediction results.