G06V10/806

Video description generation method and apparatus, video playing method and apparatus, and storage medium

The present disclosure discloses a video description generation method and apparatus, a video playing method and apparatus, and a computer-readable storage medium. The method includes: extracting video features, and obtaining a video feature sequence corresponding to video encoding moments in a video stream; encoding the video feature sequence by using a forward recurrent neural network and a backward recurrent neural network, to obtain a forward hidden state sequence and a backward hidden state sequence corresponding to each video encoding moment; and positioning, according to the forward hidden state sequence and the backward hidden state sequence, an event corresponding to each video encoding moment and an interval corresponding to the event at the video encoding moment, thereby predicting a video content description of the event. On the basis of distinguishing overlapping events, the interval corresponding to the event is introduced to predict and generate a word corresponding to the event at the video encoding moment, and events that overlap at the video encoding moment correspond to different intervals, so that the video content descriptions of events at this video encoding moment have a high degree of distinction. By analogy, events in the given video stream can be described more distinctively.

WORK ANALYSIS DEVICE, WORK ANALYSIS METHOD AND COMPUTER-READABLE MEDIUM
20220215327 · 2022-07-07 ·

A work analysis device configured to analyze a work step that includes a plurality of processes, the work analysis device including: a reception unit configured to receive a captured image of a work area; a detector unit configured to parse the captured image and detecting the position and orientation of a worker working in the work area; a determination unit configured to determine the process being performed by the worker on the basis of the position and orientation of the worker; and a generation unit configured to measure a work time for each of the processes and generating a time chart representing the processes in the work step performed by the worker.

Map element extraction method and apparatus, and server

This application discloses a map element extraction method and apparatus, and a server. The map element extraction method includes obtaining a laser point cloud and an image of a target scene, the target scene including a map element; performing registration between the laser point cloud and the image to obtain a depth map of the image; performing image segmentation on the depth map of the image to obtain a segmented image of the map element in the depth map; and converting a two-dimensional location of the segmented image in the depth map to a three-dimensional location of the map element in the target scene according to a registration relationship between the laser point cloud and the image.

Target detection method based on fusion of prior positioning of millimeter-wave radar and visual feature

A target detection method based on the fusion of prior positioning of a millimeter-wave radar and a visual feature includes: simultaneously obtaining, based on the millimeter-wave radar and a vehicle-mounted camera after being calibrated, point cloud data of the millimeter-wave radar and a camera image; performing spatial 3D coordinate transformation on the point cloud data to project transformed point cloud data onto a camera plane; generating a plurality of anchor samples based on projected point cloud data according to a preset anchor strategy, and obtaining a final anchor sample based on a velocity-distance weight of each candidate region; fusing RGB information of the camera image and intensity information of an RCS in the point cloud data to obtain a feature of the final sample; and inputting the feature of the final anchor sample into a detection network to generate category and position information of a target in a scenario.

Systems and methods for privacy-enabled biometric processing
11392802 · 2022-07-19 · ·

In one embodiment, a set of feature vectors can be derived from any biometric data, and then using a deep neural network (“DNN”) on those one-way homomorphic encryptions (i.e., each biometrics' feature vector) can determine matches or execute searches on encrypted data. Each biometrics' feature vector can then be stored and/or used in conjunction with respective classifications, for use in subsequent comparisons without fear of compromising the original biometric data. In various embodiments, the original biometric data is discarded responsive to generating the encrypted values. In another embodiment, the homomorphic encryption enables computations and comparisons on cypher text without decryption. This improves security over conventional approaches. Searching biometrics in the clear on any system, represents a significant security vulnerability. In various examples described herein, only the one-way encrypted biometric data is available on a given device. Various embodiments restrict execution to occur on encrypted biometrics for any matching or searching.

SYSTEM FOR ESTIMATING A USER'S RESPONSE TO A STIMULUS
20220248996 · 2022-08-11 · ·

A method for training a system for measuring or estimating or both of a user's response to a stimulus and for classifying the response is disclosed. The system is trained using a test stimulus presented to one or more users. One or more images of the users' face are captured and simultaneously EEG signals of the users are captured. One or more emotional features are derived from the facial data of the users. One or more cognitive features and emotional features are derived from the EEG signals of each of the users, and a training dataset is created by correlating the one or more emotional features from the facial data, one or more cognitive and emotional features from the EEG signals with one or more features associated with the test stimulus. Created training dataset is used for measuring or estimating or both of a user's response to the stimulus.

System and method for produce detection and classification

Systems, methods, and computer-readable storage media for object detection and classification, and particularly produce detection and classification. A system configured according to this disclosure can receiving, at a processor, an image of an item. The system can then perform, across multiple pre-trained neural networks, feature detection on the image, resulting in feature maps of the image. These feature maps can be concatenated and combined, then input into an additional neural network for feature detection on the combined feature map, resulting in tiered neural network features. The system then classifies, via the processor, the item based on the tiered neural network features.

Automated mapping information generation from inter-connected images
11408738 · 2022-08-09 · ·

Techniques are described for using computing devices to perform automated operations to generate mapping information using inter-connected images of a defined area, and for using the generated mapping information in further automated manners. In at least some situations, the defined area includes an interior of a multi-room building, and the generated information includes a floor map of the building, such as from an automated analysis of multiple panorama images or other images acquired at various viewing locations within the building—in at least some such situations, the generating is further performed without having detailed information about distances from the images' viewing locations to walls or other objects in the surrounding building. The generated floor map and other mapping-related information may be used in various manners, including for controlling navigation of devices (e.g., autonomous vehicles), for display on one or more client devices in corresponding graphical user interfaces, etc.

Method and apparatus for grounding a target video clip in a video

A method and an apparatus for grounding a target video clip in a video are provided. The method includes: determining a current video clip in the video based on a current position; acquiring descriptive information indicative of a pre-generated target video clip descriptive feature, and executing a target video clip determining step which includes: determining current state information of the current video clip, wherein the current state information includes information indicative of a feature of the current video clip; generating a current action policy based on the descriptive information and the current state information, the current action policy being indicative of a position change of the current video clip in the video; the method further comprises: in response to reaching a preset condition, using a video clip resulting from executing the current action policy on the current video clip as the target video clip.

Mobile device navigation system

Location mapping and navigation user interfaces may be generated and presented via mobile computing devices. A mobile device may detect its location and orientation using internal systems, and may capture image data using a device camera. The mobile device also may retrieve map information from a map server corresponding to the current location of the device. Using the image data captured at the device, the current location data, and the corresponding local map information, the mobile device may determine or update a current orientation reading for the device. Location errors and updated location data also may be determined for the device, and a map user interface may be generated and displayed on the mobile device using the updated device orientation and/or location data.