G06V10/806

IMAGE PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE AND COMPUTER READABLE STORAGE MEDIUM

An image processing apparatus including: at least one memory; and at least one processor coupled to the at least one memory and configured to implement: an image acquisition module configured to acquire an input image including an object region; a mask image generation module configured to generate a mask image based on the input image; and an image inpainting module configured to extract a fusion feature map corresponding to the input image using an encoding network according to the input image and the mask image, and to inpaint the object region in the input image using a decoding network based on the fusion feature map, to obtain an inpainting result

RADAR-BASED INDOOR LOCALIZATION AND TRACKING SYSTEM

Embodiments of the present disclosure describe mechanisms for a radar-based indoor localization and tracking system. One example can include monitoring unit that includes a radar source, a camera unit, and one or more processors coupled to the radar element and the camera unit. The monitoring unit is configured to generate point cloud data associated with an object; execute Point Cloud Library (PCL) preprocessing based, at least, on the point cloud data; execute Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering; execute multi-object tracking on the object; and execute an image PCL overlay based on the point cloud data to generate real-time data associated with the object.

Learning method and learning device for training an object detection network by using attention maps and testing method and testing device using the same

A method for training an object detection network by using attention maps is provided. The method includes steps of: (a) an on-device learning device inputting the training images into a feature extraction network, inputting outputs of the feature extraction network into a attention network and a concatenation layer, and inputting outputs of the attention network into the concatenation layer; (b) the on-device learning device inputting outputs of the concatenation layer into an RPN and an ROI pooling layer, inputting outputs of the RPN into a binary convertor and the ROI pooling layer, and inputting outputs of the ROI pooling layer into a detection network and thus to output object detection data; and (c) the on-device learning device train at least one of the feature extraction network, the detection network, the RPN and the attention network through backpropagations using an object detection losses, an RPN losses, and a cross-entropy losses.

Method and System for Controlling Machines Based on Object Recognition
20210142110 · 2021-05-13 ·

A method includes: capturing one or more images of an unorganized collection of items inside a first machine; determining one or more item types of the unorganized collection of items from the one or more images, comprising: dividing a respective image in the one or more images into a respective plurality of sub-regions; performing feature detection on the respective plurality of sub-regions to obtain a respective plurality of regional feature vectors, wherein a regional feature vector for a sub-region indicates characteristics for a plurality of predefined local item features for the sub-region; generating an integrated feature vector by combining the respective plurality of regional feature vectors; and applying a plurality of binary classifiers to the integrated feature vector; and selecting a machine setting for the first machine based on the determined one or more clothes type in the unorganized collection of items.

Detecting similarity between images
10977528 · 2021-04-13 · ·

Techniques for determining image similar are described. For example, a computer-implemented method comprising: receiving a request to determine similarity between a first image and at least one other image; determining similarity between the first image and at least one other image based upon one or more Gram matrix-based style values and one or more vector distance calculation-based content values as determined from one or more outputs of layers of a convolutional neural network; and providing an indication of the similarity of between the first image and the at least one other image is described.

Framework for Training Machine-Learned Models on Extremely Large Datasets

A MapReduce-based training framework exploits both data parallelism and model parallelism to scale training of complex models. Particular model architectures facilitate and benefit from use of such training framework. As one example, a machine-learned model can include a shared feature extraction portion configured to receive and process a data input to produce an intermediate feature representation and a plurality of prediction heads that are configured to receive and process the intermediate feature representation to respectively produce a plurality of predictions. For example, the data input can be a video and the plurality of predictions can be a plurality of classifications for content of the video (e.g., relative to a plurality of classes).

SYSTEM AND METHOD FOR AUTOMATED DIAGNOSIS OF SKIN CANCER TYPES FROM DERMOSCOPIC IMAGES
20210118550 · 2021-04-22 ·

Disclosed is a content-based image retrieval (CBIR) system and related methods that serve as a diagnostic aid for diagnosing whether a dermoscopic image correlates to a skin cancer type. Systems and methods according to aspects of the invention use as a reference a set of images of pathologically confirmed benign or malignant past cases from a collection of different classes that are of high similarity to the unknown new case in question, along with their diagnostic profiles. Systems and methods according to aspects of the invention predict what class of skin cancer is associated with a particular patient skin lesion, and may be employed as a diagnostic aid for general practitioners and dermatologists.

COMPUTER AIDED DIAGNOSIS SYSTEM FOR DETECTING TISSUE LESION ON MICROSCOPY IMAGES BASED ON MULTI-RESOLUTION FEATURE FUSION
20210133958 · 2021-05-06 · ·

Embodiments of the present disclosure include a method, device and computer readable medium involving receiving image data to detect tissue lesions, passing the image data through at least one first convoluted neural network, segmenting the image data, fusing the segmented image data, and detecting tissue lesions.

AUDIO-VISUAL SPEECH ENHANCEMENT

Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

Method and apparatus for SAR image recognition based on multi-scale features and broad learning

Disclosed are method and apparatus for SAR image recognition based on multi-scale features and broad learning. A region of interest of an original SAR image is extracted by centroid localization, the image is rotated and added with noise for enhancing the data volume, the image is downsampled, LBP features and PPQ features are extracted, an LBP feature vector X.sub.LBP and an LPQ feature vector X.sub.LPQ are cascaded to achieve dimension reduction by principal component analysis to obtain a fusion feature data X.sub.m, the fusion feature data X.sub.m is input to a broad learning network for image recognition and a recognition result is output. By fusing the LBP features and the LPQ features, complementary information is fully utilized and redundant information is reduced. The broad learning network is used to improve the training speed and reduce the time cost. As a result, the recognition effect is more stable, robust and reliable.