Patent classifications
G06V10/806
Shape fusion for image analysis
Various types of image analysis benefit from a multi-stream architecture that allows the analysis to consider shape data. A shape stream can process image data in parallel with a primary stream, where data from layers of a network in the primary stream is provided as input to a network of the shape stream. The shape data can be fused with the primary analysis data to produce more accurate output, such as to produce accurate boundary information when the shape data is used with semantic segmentation data produced by the primary stream. A gate structure can be used to connect the intermediate layers of the primary and shape streams, using higher level activations to gate lower level activations in the shape stream. Such a gate structure can help focus the shape stream on the relevant information and reduces any additional weight of the shape stream.
METHOD AND APPARATUS FOR TRAINING CLASSIFIER
This application relates to Artificial intelligence and provides a method for training a classifier, one example method including: obtaining a first training sample, where the first training sample includes a corresponding semantic tag; obtaining a plurality of second training samples, where each of the second training samples includes a corresponding semantic tag; determining a target sample from the plurality of second training samples based on semantic similarities between the first training sample and the plurality of second training samples; and training the classifier based on the first training sample, the target sample, and a semantic similarity between the first training sample and the target sample.
Image matching method using feature point matching
An image matching method includes: extracting a plurality of feature points from a reference image; selecting a first feature point from the feature points, and selecting a first reference search area comprising the first feature point; setting a first matching candidate search area corresponding to the first reference search area from a target image, and extracting a plurality of feature points from the first matching candidate search area; selecting a second feature point closest to the first feature point in the first reference search area, and selecting a first straight line connecting the first and second feature points; generating a plurality of segments from the feature points extracted from the first matching candidate search area; and determining a first matching straight line matching a length and an angle of the first straight line, from the segments generated from the feature points extracted from the first matching candidate search area.
DEVICES, SYSTEMS, AND METHODS FOR FEATURE ENCODING
Devices, systems, and methods obtain data in a first modality; propagate the data in the first modality through a neural network, thereby generating network outputs, wherein the neural network includes a first-stage neural network and a second-stage neural network, wherein the first-stage neural network includes two or more layers, wherein each layer of the two or more layers of the first-stage neural network includes a plurality of respective nodes, wherein the second-stage neural network includes two or more layers, one of which is an input layer and one of which is an output layer, and wherein each node in each layer of the first-stage neural network is connected to the input layer of the second-stage neural network; calculate a gradient of a loss function based on the network outputs; backpropagate the gradient through the neural network; and update the neural network based on the backpropagation of the gradient.
SYNTHETIC APERTURE RADAR (SAR) IMAGE TARGET DETECTION METHOD
The present disclosure provides a synthetic aperture radar (SAR) image target detection method. The present disclosure takes the anchor-free target detection algorithm YOLOX as the basic framework, reconstructs the backbone feature extraction network from the lightweight perspective, and replaces the depthwise separable convolution in MobilenetV2 with one ordinary convolution and one depthwise separable convolution. The number of channels in the feature map is reduced by half through the ordinary convolution, features input from the ordinary convolution are further extracted by the depthwise separable convolution, and the convolutional results from the two convolutions are spliced. The present disclosure highlights the unique strong scattering characteristic of the SAR target through the attention enhancement pyramid attention network (CSEMPAN) by integrating channels and spatial attention mechanisms. In view of the multiple scales and strong sparseness of the SAR target, the present disclosure uses an ESPHead.
APPARATUS AND METHOD WITH IMAGE PROCESSING
A processor-implement method with image processing includes: generating a feature map of a first image and detecting a target region in the first image based on the feature map; correcting the detected target region; and processing an object corresponding to the target region, based on the corrected target region.
METHODS AND DEVICES FOR GAZE ESTIMATION
Methods and systems for estimating a gaze direction of an individual using a trained neural network. Inputs to the neural network include a face image and an image of a visually significant eye in the face image. Feature representations are extracted for the face image and significant eye image and feature fusion is performed on the feature representations to generate a fused feature representation. The fused feature representation is input into a trained gaze estimator to output a gaze vector including gaze angles, the gaze vector representing a gaze direction. The disclosed network may enable gaze estimation performance on user devices typically having limited hardware and computational resources such as mobile devices.
Generating reports of three dimensional images
Various techniques are provided for generating reports of three dimensional (3D) images. The techniques include identifying a plurality of volume features in a 3D image using a first machine learning (ML) module trained with annotated 3D images, and identifying a plurality of semantic representations associated with the 3D image using a second ML module trained with the annotated 3D images and reports associated with the annotated 3D images. The techniques further include generating a report of the 3D image based on the volume features and the semantic representations using a third ML module trained with the reports and outputs generated by the first ML module and the second ML module using the annotated 3D images and the reports.
Training apparatus, training method, and non-transitory computer-readable recording medium
An anomaly detection apparatus generates pieces of image data using a generator and train the generator and a discriminator that discriminates whether an image data, generated by the generator, is real or fake. The anomaly detection apparatus trains the generator such that the generator, in generating the pieces of image data to maximize the discrimination error of the discriminator, generate at least a piece of specified image data to reduce the discrimination error at a fixed rate with respect to the pieces of image data and trains, based on the pieces of image data and the at least a piece of specified image data, the discriminator to minimize the discrimination error.
MACHINE-LEARNING TECHNIQUES FOR OXYGEN THERAPY PREDICTION USING MEDICAL IMAGING DATA AND CLINICAL METADATA
Apparatuses, systems, and techniques to train one or more neural networks based, at least in part on, medical imaging data and clinical metadata or inference using one or more neural networks trained as such. In at least one embodiment, one or more circuits to train one or more neural network to predict a treatment for a patient suspected to have or confirmed to have COVID-19 based, at least in part on, medical imaging data and clinical metadata.