G06V10/806

Method and system for optical and microwave synergistic retrieval of aboveground biomass
11454534 · 2022-09-27 · ·

A method of optical and microwave synergistic retrieval of aboveground biomass, the method including: 1) obtaining an observation value of aboveground biomass (AGB) of a sample plot; 2) pre-processing laser radar (LiDAR) data, optical remote sensing data and microwave remote sensing data covering a research region, to yield crown height model (CHM) data, surface reflectance data and a backscattering coefficient, respectively; 3) extracting different LiDAR variables, extracting a plurality of optical characteristic vegetation indexes, and extracting a plurality of microwave characteristic variables; 4) establishing a multiple stepwise linear regression model of the biomass; 5) taking the biomass value of the LiDAR data coverage region as a training set and a verification sample set, and selecting samples for modeling and verification; 6) screening out the optical and microwave characteristic variables; and 7) constructing an optical model, a microwave model, and an optical and microwave synergistic model of AGB retrieval, respectively.

AUTOMATIC DEPRESSION DETECTION METHOD BASED ON AUDIO-VIDEO

Disclosed is an automatic depression detection method using audio-video, including: acquiring original data containing two modalities of long-term audio file and long-term video file from an audio-video file; dividing the long-term audio file into several audio segments, and meanwhile dividing the long-term video file into a plurality of video segments; inputting each audio segment/each video segment into an audio feature extraction network/a video feature extraction network to obtain in-depth audio features/in-depth video features; calculating the in-depth audio features and the in-depth video features by using multi-head attention mechanism so as to obtain attention audio features and attention video features; aggregating the attention audio features and the attention video features into audio-video features; and inputting the audio-video features into a decision network to predict a depression level of an individual in the audio-video file.

TRACHEAL INTUBATION POSITIONING METHOD AND DEVICE BASED ON DEEP LEARNING, AND STORAGE MEDIUM

The disclosure relates to a tracheal intubation positioning method and device based on deep learning, and a storage medium. The method includes: constructing a YOLOv3 network based on dilated convolution and feature map fusion, and extracting feature information of an image through the trained YOLOv3 network to acquire first target information; determining second target information by utilizing a vectorized positioning mode according to carbon dioxide concentration differences detected by sensors; and fusing the first target information and the second target information to acquire a final target position. According to the disclosure, the tracheal orifice and the esophageal orifice can be rapidly detected in real time.

System and method for sensor data processing to determine position of a vehicle

A method and apparatus for processing data, the data including a set of one or more system inputs; and a set of one or more system outputs; wherein each system output corresponds to a respective system input. Each system input can include a plurality of data points, such that at least one of these data points is from a different data source to at least one other of those data points. The method includes performing a kernel function on a given system input from the data and a further system input to provide kernelized data; and inferring a value indicative of a significance of data from a particular data source; wherein the inferring includes applying a regression technique to the kernelized data.

TEMPORALLY DISTRIBUTED NEURAL NETWORKS FOR VIDEO SEMANTIC SEGMENTATION

A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.

FACIAL EXPRESSION RECOGNITION
20220269879 · 2022-08-25 ·

Systems and techniques are provided for facial expression recognition. In some examples, a system receives an image frame corresponding to a face of a person. The system also determines, based on a three-dimensional model of the face, landmark feature information associated with landmark features of the face. The system then inputs, to at least one layer of a neural network trained for facial expression recognition, the image frame and the landmark feature information. The system further determines, using the neural network, a facial expression associated with the face.

LOW LEVEL SENSOR FUSION BASED ON LIGHTWEIGHT SEMANTIC SEGMENTATION OF 3D POINT CLOUDS
20220269900 · 2022-08-25 ·

A method and a system described herein provide sensor-level based data stream processing. In particular, concepts of enabling low level sensor fusion by lightweight semantic segmentation on sensors generating point cloud as generated from LIDAR, radar, cameras and Time-of-Flight sensors are described. According to the present disclosure a computer-implemented method for sensor-level based data stream processing comprises receiving a first data stream from a LIDAR sensor, removing a ground from the point cloud, performing clustering on the point cloud, and feature processing on the point cloud. The point cloud represents a set of data points in space.

ELECTRONIC DEVICE AND TOOL DETECTING METHOD
20220051394 · 2022-02-17 ·

A method for detecting defects in working CNC tools in real time, implemented in an electronic device, includes acquiring sounds of operation of a tool during a cutting or other operation process and dividing the acquired cutting sounds into a plurality of recordings of audio according to a preset time interval. Time-frequency features of the plurality of recordings of audio are acquired according to multiple feature transformation methods and a fusion feature image of the cutting sound is formed according to the extracted time-frequency features. A tool detection model is generated by training the fusion feature image, and any defects of the tool and any defect types the tool has are detected according to the tool detection model.

Eye tracking method and user terminal performing same
11250242 · 2022-02-15 · ·

A user terminal according to an embodiment of the present invention includes a capturing device for capturing a face image of a user, and an eye tracking unit for, on the basis of a configured rule, acquiring, from the face image, a vector representing the direction that the face of the user is facing, and a pupil image of the user, and performing eye tracking of the user by inputting, in a configured deep learning model, the face image, the vector and the pupil image.

Point of view video processing and curation platform
11250886 · 2022-02-15 · ·

Embodiments of the present disclosure may provide methods and systems enabled to perform the following stages: receiving a plurality of content streams; retrieving metadata associated with each of the plurality of content streams; processing the metadata to detect at least one target annotation within at least one target content stream; retrieving telemetry data associated with the at least one target content stream; processing the telemetry data and the metadata associated with a plurality of frames in the at least one target content stream to ascertain vector motion data; and mapping a spatial relationship associated with at least one capturing device associated with at least one target content source.