Patent classifications
G06V10/806
COMPUTER AIDED DIAGNOSIS SYSTEM FOR DETECTING TISSUE LESION ON MICROSCOPY IMAGES BASED ON MULTI-RESOLUTION FEATURE FUSION
Embodiments of the present disclosure include a method, device and computer readable medium involving receiving image data to detect tissue lesions, passing the image data through at least one first convoluted neural network, segmenting the image data, fusing the segmented image data, and detecting tissue lesions.
SYSTEMS AND METHODS FOR UTILIZING MODELS TO IDENTIFY A VEHICLE ACCIDENT BASED ON VEHICLE SENSOR DATA AND VIDEO DATA CAPTURED BY A VEHICLE DEVICE
A device may receive sensor data and video data associated with a vehicle, and may process the sensor data, with a rule-based detector model, to determine whether a probability of a vehicle accident satisfies a first threshold. The device may preprocess acceleration data of the sensor data to generate calibrated acceleration data, and may process the calibrated acceleration data, with an anomaly detector model, to determine whether the calibrated acceleration data includes anomalies. The device may filter the sensor data to generate filtered sensor data, and may process the filtered sensor data and anomaly data, with a decision model, to determine whether the probability of the vehicle accident satisfies a second threshold. The device may process the filtered sensor data, the anomaly data, and the video data, with a machine learning model, to determine whether the vehicle accident has occurred, and may perform one or more actions.
IMAGE PROCESSING SYSTEM
Disclosed is a multi-modal convolutional neural network (CNN) for fusing image information from a frame based camera, such as, a near infra-red (NIR) camera and an event camera for analysing facial characteristics in order to produce classifications such as head pose or eye gaze. The neural network processes image frames acquired from each camera through a plurality of convolutional layers to provide a respective set of one or more intermediate images. The network fuses at least one corresponding pair of intermediate images generated from each of image frames through an array of fusing cells. Each fusing cell is connected to at least a respective element of each intermediate image and is trained to weight each element from each intermediate image to provide the fused output. The neural network further comprises at least one task network configured to generate one or more task outputs for the region of interest.
SPATIAL PARKING PLACE DETECTION METHOD AND DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT
The present disclosure provides a spatial parking place detection method and device, a storage medium and a program product, which relate to the field of data processing and, in particular, to the fields of computer vision, autonomous parking and autonomous driving. A specific implementation lies in: acquiring ultrasonic data around a vehicle collected by an ultrasonic sensor on the vehicle, and image data around the vehicle collected by an image collection apparatus; determining a first spatial parking place around the vehicle according to the ultrasonic data, and determining a second spatial parking place around the vehicle according to the image data; fusing the first spatial parking place and the second spatial parking place that are located at an identical position to determine a spatial parking place at that position; and checking the availability of the detected spatial parking place, and determining available spatial parking places of the vehicle.
DETERMINING VISUALLY SIMILAR PRODUCTS
A computer-implemented method for determining image similarity includes determining, by a first neural network, a first feature value associated with a first characteristic of a first product based on an image of the first product. The method also includes determining, by a second neural network, a second feature value associated with a second characteristic of the first product based on the image of the first product. The method further involves calculating a first vector space distance between the first feature value and a third feature value associated with the first characteristic of a second product, and calculating a second vector space distance between the second feature value and a fourth feature value associated with the second characteristic of the second product. Additionally, the method includes determining a similarity value based on the first vector space distance and the second vector space distance.
Method for video recognition capable of encoding spatial and temporal relationships of concepts using contextual features
The proposed invention aims at encoding contextual information for video analysis and understanding, by encoding spatial and temporal relationships of objects and the main agent in a scene. The main target application of the invention is human activity recognition. The encoding of such spatial and temporal relationships may be crucial to distinguish different categories of human activities and may be important to help in the discrimination of different video categories, aiming at video classification, retrieval, categorization and other video analysis applications.
SYSTEM AND METHOD FOR ATTENTION-BASED SURFACE CRACK SEGMENTATION
This disclosure relates to a system and method for attention-based surface crack segmentation. Existing methods do not efficiently handle the sub-problem of data imbalance and inaccurate predicted pixels are ignored. The present disclosure obtains a binary edge map by passing a m-channel image through an edge detection algorithm and concatenate the obtained binary edge map with a channel dimension to obtain a (m+1)-channel image. Feature maps are extracted from an encoder and a decoder by feeding the obtained (m+1)-channel image into a network, wherein the feature maps are convolved with an attention mask and merged in a fused network. The merged feature maps are up sampled and concatenated to obtain a final fused feature map. The final fused feature map is passed through a sigmoid activation function to obtain a probability map which is iteratively thresholded to obtain a binary predicted image. The binary image is indicative of crack pixels.
All-weather target detection method based on vision and millimeter wave fusion
An all-weather target detection method based on a vision and millimeter wave fusion includes: simultaneously acquiring continuous image data and point cloud data using two types of sensors of a vehicle-mounted camera and a millimeter wave radar; pre-processing the image data and point cloud data; fusing the pre-processed image data and point cloud data by using a pre-established fusion model, and outputting a fused feature map; and inputting the fused feature map into a YOLOv5 detection network for detection, and outputting a target detection result by non-maximum suppression. The method fully fuses millimeter wave radar echo intensity and distance information with the vehicle-mounted camera images. It analyzes different features of a millimeter wave radar point cloud and fuses the features with image information by using different feature extraction structures and ways, so that the advantages of the two types of sensor data complement each other.
APPARATUS AND METHOD FOR IDENTIFYING REAL-TIME BIOMETRIC IMAGE
Provided are a computing device and methods for identifying real-time biometric image. In certain aspects, disclosed a method including the steps of: extracting a first feature information from a nth(n is a natural number) biometric image among biometric images continuously photographed temporally of an object based on a machine learning model; generating a fusion data using at least one sensor data among sensor data temporally corresponding to a n+1th or more biometric images and the first feature information of the nth biometric image; and extracting a second feature information of the n+1th or more biometric images from the fusion data based on a second machine learning model. This present disclosure application is a result developed through the Seoul Industry Promotion Agency's 2021 technology commercialization support project (TB210264), “Improvement and advancement of an explainable artificial intelligence prototype that detects major organs during laparoscopic surgery”.
Supplementing top-down predictions with image features
The described techniques relate to predicting object behavior based on top-down representations of an environment comprising top-down representations of image features in the environment. For example, a top-down representation may comprise a multi-channel image that includes semantic map information along with additional information for a target object and/or other objects in an environment. A top-down image feature representation may also be a multi-channel image that incorporates various tensors for different image features with channels of the multi-channel image, and may be generated directly from an input image. A prediction component can generate predictions of object behavior based at least in part on the top-down image feature representation, and in some cases, can generate predictions based on the top-down image feature representation together with the additional top-down representation.