G06V10/806

Item identification method, system and electronic device

An item identification method, system and electronic device are provided. The method includes: acquiring multi-frame images of the item by an image capturing device; processing the multi-frame images of the item to obtain position information and category information of the item in each frame image; acquiring auxiliary information of the item by an information capturing device; performing multi-modality fusion on the position information and the auxiliary information to obtain a fusion result; and determining an identification result of the item according to the category information and the fusion result. Through at least some embodiments of the present disclosure, a problem of low identification accuracy when identifying an item in the related art is partially solved.

Method, system and electronic device for processing audio-visual data

A method, a system and an electronic device for processing audio-visual data. In the method, a first dataset is obtained, where the first dataset includes several data pairs, and each of the data pairs in the first dataset includes a video frame and an audio clip that match each other. A multi-channel feature extraction network model is established to extract the visual features of each video frame and the auditory features of each audio clip in the first dataset. A contrastive loss function model is established using the extracted visual features and the auditory features to train the multi-channel feature extraction network. A classifier is established to determine whether an input audio-visual data pair is matched.

System and method for produce detection and classification

Systems, methods, and computer-readable storage media for object detection and classification, and particularly produce detection and classification. A system configured according to this disclosure can receiving, at a processor, an image of an item. The system can then perform, across multiple pre-trained neural networks, feature detection on the image, resulting in feature maps of the image. These feature maps can be concatenated and combined, then input into an additional neural network for feature detection on the combined feature map, resulting in tiered neural network features. The system then classifies, via the processor, the item based on the tiered neural network features.

Method, apparatus, and device for fusing features applied to small target detection, and storage medium

Embodiments of the present disclosure disclose a method, apparatus, and device for fusing features applied to small target detection, and a storage medium, relate to the field of computer vision technology. A particular embodiment of the method for fusing features applied to small target detection comprises: acquiring feature maps output by convolutional layers in a Backbone network; performing convolution on the feature maps to obtain input feature maps of feature layers, the feature layers representing resolutions of the input feature maps; and fusing, based on densely connection feature pyramid network features, the input feature maps of each feature layer to obtain output feature maps of the feature layer. Since no additional convolutional layer is introduced for feature fusion, the detection performance for small targets may be enhanced without additional parameters, and the detection ability for small targets may be improved with computing resource constraints.

IMAGE FUSION METHOD BASED ON FOURIER SPECTRUM EXTRACTION

The present invention discloses an image fusion method based on Fourier spectrum extraction, which comprises: step 1, inputting a plurality of to-be-fused images to a processor of a computer by an input unit of the computer, and performing the following steps by the processor of the computer: performing Fourier transform on images at different focus positions, extracting a frequency component corresponding to an image with the maximum frequency amplitude in the images at different focus positions in a transformed frequency domain space, taking the frequency component as a frequency component of a fused image at the corresponding spatial frequency, traversing each frequency to generate a frequency domain component of the fused image, finally performing inverse Fourier transform on the frequency domain component of the fused image to obtain the fused image; and step 2, outputting the fused image obtained by the processor by an output unit of the computer.

Single-channel and multi-channel source separation enhanced by lip motion
11348253 · 2022-05-31 · ·

Methods and systems are provided for implementing source separation techniques, and more specifically performing source separation on mixed source single-channel and multi-channel audio signals enhanced by inputting lip motion information from captured image data, including selecting a target speaker facial image from a plurality of facial images captured over a period of interest; computing a motion vector based on facial features of the target speaker facial image; and separating, based on at least the motion vector, audio corresponding to a constituent source from a mixed source audio signal captured over the period of interest. The mixed source audio signal may be captured from single-channel or multi-channel audio capture devices. Separating audio from the audio signal may be performed by a fusion learning model comprising a plurality of learning sub-models. Separating the audio from the audio signal may be performed by a blind source separation (“BSS”) learning model.

METHOD FOR PROCESSING AUDIO AND VIDEO INFORMATION, ELECTRONIC DEVICE AND STORAGE MEDIUM

A method for processing audio and video information includes: audio information and video information of an audio and video file are acquired; feature fusion is performed on a spectrum feature of the audio information and a video feature of the video information based on time information of the audio information and time information of the video information to obtain at least one fused feature; it is determined, based on the at least one fused feature, whether the audio information and the video information are synchronous.

Point cloud feature enhancement and apparatus, computer device and storage medium
11734799 · 2023-08-22 · ·

The present disclosure relates to a point cloud feature enhancement and apparatus, a computer device and a storage medium. The method includes: acquiring a three-dimensional point cloud, the three-dimensional point cloud including a plurality of input points; performing feature aggregation on neighborhood point features of the input point to obtain a first feature of the input point; mapping the first feature to an attention point corresponding to the corresponding input point; performing feature aggregation on neighborhood point features of the attention point to obtain a second feature of the corresponding input point; and performing feature fusion on the first feature and the second feature of the input point to obtain a corresponding enhanced feature. An enhancement effect of point cloud features can be improved with the method.

METHOD FOR GLASS DETECTION IN REAL SCENES
20220148292 · 2022-05-12 ·

The invention discloses a method for glass detection in a real scene, which belongs to the field of object detection. The present invention designs a combination method based on LCFI blocks to effectively integrate context features of different scales. Finally, multiple LCFI combination blocks are embedded into the glass detection network GDNet to obtain large-scale context features of different levels, thereby realize reliable and accurate glass detection in various scenarios. The glass detection network GDNet in the present invention can effectively predict the true area of glass in different scenes through this method of fusing context features of different scales, successfully detect glass with different sizes, and effectively handle with glass in different scenes. GDNet has strong adaptability to the various glass area sizes of the images in the glass detection dataset, and has the highest accuracy in the field of the same type of object detection.

PEDESTRIAN DETECTION METHOD AND APPARATUS, COMPUTER-READABLE STORAGE MEDIUM, AND CHIP

This application relates to the field of artificial intelligence, and specifically, to the field of computer vision. The method includes: performing feature extraction on an image to obtain a basic feature map of the image; determining a proposal of a region possibly including a pedestrian in the image; processing the basic feature map of the image to obtain an object visibility map in which a response to a pedestrian visible part is greater than a response to a pedestrian blocked part and a background part; performing weighted summation processing on the object visibility map and the basic feature map to obtain an enhanced feature map of the image; and determining, based on the proposal of the image and the enhanced feature map of the image, a bounding box including a pedestrian in the image and a confidence level of the bounding box including the pedestrian in the image.