Patent classifications
G06V10/80
Ensemble Deep Learning Method for Identifying Unsafe Behaviors of Operators in Maritime Working Environment
The present invention proposes an ensemble deep learning method for identifying unsafe behaviors of operators in maritime working environment. Firstly, extract features of maritime images with the You Only Look Once (YOLO) V3 model, and then enhance a multi-scale detection capability by introducing a feature pyramid structure. Secondly, obtain instance-level features and time memory features of the operators and devices in the maritime working environment with the Joint Learning of Detection and Embedding (JDE) paradigm. Thirdly, transfer spatial-temporal interaction information into a feature memory pool, and update the time memory features with the asynchronous memory updating algorithm. Finally, identify the interaction between the operators, the devices, and unsafe behaviors with an asynchronous interaction aggregation network. The proposed invention can accurately determine the unsafe behaviors of the operators, and thus provide operation decisions for maritime management relevant activities.
Method and Processing Unit for Processing Sensor Data of Several Different Sensors with an Artificial Neural Network in a Vehicle
A method for operating a processing unit of a vehicle for processing sensor data of several different sensors with an artificial neural network, wherein a set of volume data cells is provided as a volumetric representation of different volume elements of an environment, and when sensor data is generated by the sensors the sensor data is transferred to the respective volume data cells using an inverse mapping function, wherein each inverse mapping function is a mapping of a respective sensor coordinate system of the sensor to an internal volumetric coordinate system corresponding to the world coordinate system, and by the transfer of the sensor data each volume data cell receives the sensor data that are associated with this volume data cell according to the inverse mapping function from each sensor, wherein the received sensor data from each sensor are accumulated in the respective volume data cell as combined data.
METHOD FOR CONVERTING IMAGE FORMAT, DEVICE, AND STORAGE MEDIUM
The present disclosure provides a method and apparatus for converting an image format, an electronic device, a computer readable storage medium and a computer program product, relates to the field of artificial intelligence technology such as computer vision and deep learning, and can be applied to intelligent sensing ultra-definition scenarios. A specific implementation of the method includes: acquiring a to-be-converted standard dynamic range image; performing a convolution operation on the standard dynamic range image to obtain a local feature; performing a global average pooling operation on the standard dynamic range image to obtain a global feature; and converting the standard dynamic range image into a high dynamic range image according to the local feature and the global feature.
MULTIMODAL DATA PROCESSING
Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.
MULTIMODAL DATA PROCESSING
Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.
METHOD AND APPARATUS FOR DETECTING OBJECT BASED ON VIDEO, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method for detecting an object based on a video includes: obtaining a plurality of image frames of a video to be detected; obtaining initial feature maps by extracting features of the plurality of image frames; for each two adjacent image frames of the plurality of image frames, obtaining a target feature map of a latter image frame of the two adjacent image frames by performing feature fusing on the sub-feature maps of the first target dimensions included in the initial feature map of a former image frame of the two adjacent image frames and the sub-feature maps of the second target dimensions included in the initial feature map of the latter image frame; and performing object detection on the respective target feature map of each image frame.
System and method for implementing reward based strategies for promoting exploration
A system and method for implementing reward based strategies for promoting exploration that include receiving data associated with an agent environment of an ego agent and a target agent and receiving data associated with a dynamic operation of the ego agent and the target agent within the agent environment. The system and method also include implementing a reward function that is associated with exploration of at least one agent state within the agent environment. The system and method further include training a neural network with a novel unexplored agent state.
SENSOR FUSION ARCHITECTURE FOR LOW-LATENCY ACCURATE ROAD USER DETECTION
Aspects described herein provide sensor data stream processing for enabling camera/radar sensor fusion, with application to road user detection in the context of Autonomous Driving/Assisted Driving (ADAS). In particular, a scheme to extract Region-of-Interests (ROI) from a high-resolution, high-dimensional radar data cube that can then be transmitted to a sensor fusion unit is described. The ROI scheme allows to extract relevant information, thus reducing the latency and data transmission rate to the sensor fusion module, without trading-off accuracy and detection rates. The sensor data stream processing comprises receiving a first data stream from a radar sensor, forming a point cloud by extracting 3D points from the 3D data cube, performing clustering on the point cloud in order to identify high-density regions representing one or ROIs, and extracting one or more 3D bounding boxes from the 3D data cube corresponding to the one or more ROIs and classifying each ROI.
Quantum computing-based video alert system
A quantum computing based video alert system converts captured video and audio signals, in real time, into a sequence of video qubits and a sequence of audio qubits. An entanglement score is generated based on a comparison of the video qubits to historical video qubits that are verified to show malicious activity. A second entanglement score is generated based on a comparison of the audio qubits to historical audio qubits that are verified to show malicious activity. A probability score is generated for each segment of the video qubit sequence and for each segment of the audio qubit sequence. If the probability score for the video qubit sequence, the audio qubit sequence, or a combination of probability scores for both the video qubit sequence and the audio qubit sequence meet a threshold, then an alert is generated to identify possible malicious activity at the location of a CCTV camera capturing the real-time data.
MULTI-CHANNEL OBJECT MATCHING
A method may include obtaining first sensor data captured by a first sensor system and second sensor data captured by a second sensor system of a different type from the first sensor system. The method may include detecting a first object included in the first sensor data and a second object included in the second sensor data. The method may include assigning a first label to the first object and a second label to the second object after comparing the first and the second sensor data. The first and second labels may indicate degrees to which the first and the second objects match. Responsive to the first and second labels indicating that the first and the second objects match, the method may include designating a matched object representative of the first object and the second object and sending the matched object to a downstream computing system of an autonomous vehicle.