G06V10/806

Ensemble Deep Learning Method for Identifying Unsafe Behaviors of Operators in Maritime Working Environment

The present invention proposes an ensemble deep learning method for identifying unsafe behaviors of operators in maritime working environment. Firstly, extract features of maritime images with the You Only Look Once (YOLO) V3 model, and then enhance a multi-scale detection capability by introducing a feature pyramid structure. Secondly, obtain instance-level features and time memory features of the operators and devices in the maritime working environment with the Joint Learning of Detection and Embedding (JDE) paradigm. Thirdly, transfer spatial-temporal interaction information into a feature memory pool, and update the time memory features with the asynchronous memory updating algorithm. Finally, identify the interaction between the operators, the devices, and unsafe behaviors with an asynchronous interaction aggregation network. The proposed invention can accurately determine the unsafe behaviors of the operators, and thus provide operation decisions for maritime management relevant activities.

METHOD FOR CONVERTING IMAGE FORMAT, DEVICE, AND STORAGE MEDIUM
20230011823 · 2023-01-12 ·

The present disclosure provides a method and apparatus for converting an image format, an electronic device, a computer readable storage medium and a computer program product, relates to the field of artificial intelligence technology such as computer vision and deep learning, and can be applied to intelligent sensing ultra-definition scenarios. A specific implementation of the method includes: acquiring a to-be-converted standard dynamic range image; performing a convolution operation on the standard dynamic range image to obtain a local feature; performing a global average pooling operation on the standard dynamic range image to obtain a global feature; and converting the standard dynamic range image into a high dynamic range image according to the local feature and the global feature.

METHOD AND APPARATUS FOR DETECTING OBJECT BASED ON VIDEO, ELECTRONIC DEVICE AND STORAGE MEDIUM

A method for detecting an object based on a video includes: obtaining a plurality of image frames of a video to be detected; obtaining initial feature maps by extracting features of the plurality of image frames; for each two adjacent image frames of the plurality of image frames, obtaining a target feature map of a latter image frame of the two adjacent image frames by performing feature fusing on the sub-feature maps of the first target dimensions included in the initial feature map of a former image frame of the two adjacent image frames and the sub-feature maps of the second target dimensions included in the initial feature map of the latter image frame; and performing object detection on the respective target feature map of each image frame.

DRIVER ATTENTION AREA PREDICTION SYSTEM
20230222756 · 2023-07-13 ·

A driver attention area prediction method includes: S1, acquiring an original driving video of a driver attention area and preprocessing the original driving video, thereby obtaining a processed driving video sequence; S2, constructing a deep learning model through a deep learning keras framework and training the deep learning model to obtain a trained deep learning model; S3, performing area prediction on the processed driving video sequence through the trained deep learning model, thereby obtaining a driver attention area prediction result; and S4, outputting the driver attention area prediction result. Moreover, a driver attention area prediction system includes a driving video acquisition and preprocessing module, a model training module, a model application module and a result output module. Differentiated training can be carried out on driving attentions in LHT and RHT scenes, and driving attentions can be accurately predicted as per scenes and conditions.

Image feature combination for image-based object recognition
11551329 · 2023-01-10 · ·

Methods, systems, and articles of manufacture to improve image recognition searching are disclosed. In some embodiments, a first document image of a known object is used to generate one or more other document images of the same object by applying one or more techniques for synthetically generating images. The synthetically generated images correspond to different variations in conditions under which a potential query image might be captured. Extracted features from an initial image of a known object and features extracted from the one or more synthetically generated images are stored, along with their locations, as part of a common model of the known object. In other embodiments, image recognition search effectiveness is improved by transforming the location of features of multiple images of a same known object into a common coordinate system. This can enhance the accuracy of certain aspects of existing image search/recognition techniques including, for example, geometric verification.

Method of multi-sensor data fusion
11552778 · 2023-01-10 · ·

A method of multi-sensor data fusion includes determining a plurality of first data sets using a plurality of sensors, each of the first data sets being associated with a respective one of a plurality of sensor coordinate systems, and each of the sensor coordinate systems being defined in dependence of a respective one of a plurality of mounting positions for the sensors; transforming the first data sets into a plurality of second data sets using a transformation rule, each of the second data sets being associated with a unified coordinate system, the unified coordinate system being defined in dependence of at least one predetermined reference point; and determining at least one fused data set by fusing the second data sets.

Methods and systems for computer-based determining of presence of objects

A computer-implemented method for processing a 3-D point cloud data and an associated image data to enrich the 3-D point cloud data with relevant portions of the image date. The method comprises generating a 3-D point cloud data tensor representative of information contained in the 3-D point cloud data and generating an image tensor representative of information contained in the image data; and then analyzing the image tensor to identify a relevant data portion of the image information relevant to the at least one object candidate. The method further includes amalgamating the 3-D point cloud data tensor with a relevant portion of the image tensor associated with the relevant data portion of the image information to generate an amalgamated tensor associated with the surrounding area and storing the amalgamated tensor to be used by a machine learning algorithm (MLA) to determine presence of the object in the surrounding area.

Method of performing function of electronic device and electronic device using same

An electronic device includes: a camera; a microphone; a display; a memory; and a processor configured to receive an input for activating an intelligent agent service from a user while at least one application is executed, identify context information of the electronic device, control to acquire image information of the user through the camera, based on the identified context information, detect movement of a user's lips included in the acquired image information to recognize a speech of the user, and perform a function corresponding to the recognized speech.

CAMERA-RADAR SENSOR FUSION USING LOCAL ATTENTION MECHANISM
20230213643 · 2023-07-06 ·

Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for processing sensor data. In one aspect, a method includes obtaining image data representing a camera sensor measurement of a scene; obtaining radar data representing a radar sensor measurement of the scene; generating a feature representation of the image data; generating a respective initial depth estimate for each of a subset of the plurality of pixels; generating a feature representation of the radar data; for each of the subset of the plurality of pixels, generating a respective adjusted depth estimate for the pixel using the initial depth estimate for the pixel and the radar feature vectors for a corresponding subset of the plurality of radar reflection points; generating a fused point cloud that includes a plurality of three-dimensional data points; and processing the fused point cloud to generate an output that characterizes the scene.

Gesture recognition using multiple antenna

Various embodiments wirelessly detect micro gestures using multiple antenna of a gesture sensor device. At times, the gesture sensor device transmits multiple outgoing radio frequency (RF) signals, each outgoing RF signal transmitted via a respective antenna of the gesture sensor device. The outgoing RF signals are configured to help capture information that can be used to identify micro-gestures performed by a hand. The gesture sensor device captures incoming RF signals generated by the outgoing RF signals reflecting off of the hand, and then analyzes the incoming RF signals to identify the micro-gesture.