Patent classifications
G06V10/806
Representation learning from video with spatial audio
A computer system is trained to understand audio-visual spatial correspondence using audio-visual clips having multi-channel audio. The computer system includes an audio subnetwork, video subnetwork, and pretext subnetwork. The audio subnetwork receives the two channels of audio from the audio-visual clips, and the video subnetwork receives the video frames from the audio-visual clips. In a subset of the audio-visual clips the audio-visual spatial relationship is misaligned, causing the audio-visual spatial cues for the audio and video to be incorrect. The audio subnetwork outputs an audio feature vector for each audio-visual clip, and the video subnetwork outputs a video feature vector for each audio-visual clip. The audio and video feature vectors for each audio-visual clip are merged and provided to the pretext subnetwork, which is configured to classify the merged vector as either having a misaligned audio-visual spatial relationship or not. The subnetworks are trained based on the loss calculated from the classification.
SYSTEM AND METHOD FOR GENERATING DIFFERENTIAL DIAGNOSIS IN A HEALTHCARE ENVIRONMENT
A system for generating a differential diagnosis in a healthcare environment is presented. The system includes a receiver configured to receive one or more user inputs and generate a plurality of input streams. The system further includes a processor including a multi-stream neural network, a training module, and a differential diagnosis generator. The multi-stream neural network includes a plurality of feature extractor sub-networks and a combiner sub-network. The training module includes a feature optimizer configured to train each feature-extractor sub-network individually, and a combiner optimizer configured to train the plurality of feature extractor sub-networks and the combiner sub-network together. The training module is configured to alternate between the feature optimizer and the combiner optimizer until a training loss reaches a defined saturation value. The differential diagnosis generator is configured to generate the differential diagnosis based on a combined feature set generated by the trained multi-stream network.
LEARNING MODEL ARCHITECTURE FOR IMAGE DATA SEMANTIC SEGMENTATION
A learning model may provide a hierarchy of convolutional layers configured to perform convolutions upon image features, each layer other than a topmost layer convoluting the image features at a lower resolution to a higher layer, and each layer other than a bottommost layer returning the image features to a lower layer. Each layer fuses the lower resolution image features received from a higher layer with same resolution image features convoluted at the layer, so as to combine large-scale and small-scale features of images. Layers of the hierarchy may be substantially equal to a number of lateral convolutions at a bottommost convolutional layer. The bottommost convolutional layer ultimately passes the fused features to an attention mapping module, which utilizes two attention mapping pathways in combination to detect non-local dependencies and interactions between large-scale and small-scale features of images without de-emphasizing local interactions.
Reducing false positive detections of malignant lesions using multi-parametric magnetic resonance imaging
Systems and methods for reducing false positive detections of malignant lesions are provided. A candidate malignant lesion is detected in one or more medical images, such as, e.g., multi-parametric magnetic resonance images. One or more patches associated with the candidate malignant lesion are extracted from the one or more medical images. The candidate malignant lesion is classified as being a true positive detection of a malignant lesion or a false positive detection of the malignant lesion based on the one or more extract patches using a trained machine learning network. The results of the classification are output.
Compound expression recognition method with few samples of multi-domain adversarial learning
Disclosed is a compound expression recognition method with few samples of multi-domain adversarial learning. To extract compound expression features with diversity and complexity with few samples, multiple small sample datasets are fused, and divided into expression sub-domains, and multi-domain adversarial learning is performed to improve the performance of compound expression recognition. Based on the generative adversarial network framework, the face domain and the contour-independent compound expression domain are fused in the generative network to enhance diversity and complexity, and two discriminators are designed to guide the generator. The face discriminator uses the face domain to guide the generator and identify the generator to generate expression-independent face identity attributes, so that the generator has identity diversity. The compound expression fusing discriminator fuses the basic expression domain and the contour-related compound expression domain together to guide the generator and identify the complexity of the expressions generated by the generator.
Detecting boxes
A method for detecting boxes includes receiving a plurality of image frame pairs for an area of interest including at least one target box. Each image frame pair includes a monocular image frame and a respective depth image frame. For each image frame pair, the method includes determining corners for a rectangle associated with the at least one target box within the respective monocular image frame. Based on the determined corners, the method includes the following: performing edge detection and determining faces within the respective monocular image frame; and extracting planes corresponding to the at least one target box from the respective depth image frame. The method includes matching the determined faces to the extracted planes and generating a box estimation based on the determined corners, the performed edge detection, and the matched faces of the at least one target box.
Systems and methods for selecting trajectories based on interpretable semantic representations
Systems and methods for generating semantic occupancy maps are provided. In particular, a computing system can obtain map data for a geographic area and sensor data obtained by the autonomous vehicle. The computer system can identify feature data included in the map data and sensor data. The computer system can, for a respective semantic object type from a plurality of semantic object types, determine, by the computing system and using feature data as input to a respective machine-learned model from a plurality of machine-learned models, one or more occupancy maps for one or more timesteps in the future, and wherein the respective machine-learned model is trained to determine occupancy for the respective semantic object type. The computer system can select a trajectory for the autonomous vehicle based on a plurality of occupancy maps associated with the plurality of semantic object types.
METHOD AND SYSTEM FOR PROCESSING A TASK WITH ROBUSTNESS TO MISSING INPUT INFORMATION
A unit is disclosed for generating combined feature maps in accordance with a processing task to be performed, the unit comprising a feature map generating unit for receiving more than one modality and for generating more than one corresponding feature map using more than one corresponding transformation; wherein the generating of each of the more than one corresponding feature map is performed by applying a given corresponding transformation on a given corresponding modality, wherein the more than one corresponding transformation is generated following an initial training performed in accordance with the processing task to be performed and a combining unit for selecting and combining the corresponding more than one feature map generated by the feature map generating unit in accordance with at least one combining operation and for providing at least one corresponding combined feature map; wherein the combining unit is operating in accordance with the processing task to be performed and the combining operation reduces each corresponding numeric value of each of the more than one feature map generated by the feature map generation unit down to one numeric value in the at least one corresponding combined feature map.
Gesture Recognition Using Multiple Antenna
Various embodiments wirelessly detect micro gestures using multiple antenna of a gesture sensor device. At times, the gesture sensor device transmits multiple outgoing radio frequency (RF) signals, each outgoing RF signal transmitted via a respective antenna of the gesture sensor device. The outgoing RF signals are configured to help capture information that can be used to identify micro-gestures performed by a hand. The gesture sensor device captures incoming RF signals generated by the outgoing RF signals reflecting off of the hand, and then analyzes the incoming RF signals to identify the micro-gesture.
Gesture Recognition Using Multiple Antenna
Various embodiments wirelessly detect micro gestures using multiple antenna of a gesture sensor device. At times, the gesture sensor device transmits multiple outgoing radio frequency (RF) signals, each outgoing RF signal transmitted via a respective antenna of the gesture sensor device. The outgoing RF signals are configured to help capture information that can be used to identify micro-gestures performed by a hand. The gesture sensor device captures incoming RF signals generated by the outgoing RF signals reflecting off of the hand, and then analyzes the incoming RF signals to identify the micro-gesture.