G06F18/253

VIDEO CLIP POSITIONING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230024382 · 2023-01-26 ·

This application discloses a video clip positioning method performed by a computer device. In this application, clip features of video clips in a video are determined according to the unit features of video units within the video clips, so that the acquired clip features integrate the features of the video units and the time sequence correlation between the video units; and then the clip features of the video clips and a text feature of a target text are fused. The features of video clip dimensions and the time sequence correlation between the video clips are fully used in the feature fusion process, so that more accurate attention weights can be acquired based on the fused features. The attention weights are used to represent matching degrees between the video clips and the target text, and then a target video clip matching the target text can be positioned more accurately.

Facial beauty prediction method and device based on multi-task migration

Disclosed are a facial beauty prediction method and device based on multi-task migration. The method includes: performing similarity measurement based on a graph structure on a plurality of tasks to obtain an optimal combination of the plurality of tasks; constructing a facial beauty prediction model including a feature sharing layer based on the optimal combination; migrating feature parameters of an existing large-scale facial image network to the feature sharing layer of the facial beauty prediction model; inputting facial images for training to pre-train the facial beauty prediction model; and inputting a facial image to be tested to the trained facial beauty prediction model to obtain facial recognition results.

FACE DETECTION GUIDED SOUND SOURCE LOCALIZATION PAN ANGLE POST PROCESSING FOR SMART CAMERA TALKER TRACKING AND FRAMING
20230025997 · 2023-01-26 ·

A videoconferencing system includes a camera acquiring image data and a microphone array acquiring audio data. Image data is used in conjunction with sound source localization (SSL) data to locate a talker depicted in the image data. SSL processes the audio data and determines SSL pan angle values indicative of an estimated direction of a sound. Columns of pixels in an image are associated with bins. A bin count is incremented for each SSL pan angle value of the audio data that falls within a given bin. A bounding box in the image data is determined that encompasses a face depicted in the image data. A range of pixels is determined for the bounding box, such as extending from a leftmost column to a rightmost column. The bin with the highest bin count that also overlaps a range of pixels for a bounding box is deemed to contain the talker.

AGRICULTURAL HARVESTING MACHINE WITH PRE-EMERGENCE WEED DETECTION AND MITIGATION SYSTEM

An agricultural harvesting machine includes crop processing functionality configured to engage crop in a field, perform a crop processing operation on the crop, and move the processed crop to a harvested crop repository, and a control system configured to identify a weed seed area indicating presence of weed seeds, and generate a control signal associated with a pre-emergence weed seed treatment operation based on the identified weed seed area.

Detecting boxes

A method for detecting boxes includes receiving a plurality of image frame pairs for an area of interest including at least one target box. Each image frame pair includes a monocular image frame and a respective depth image frame. For each image frame pair, the method includes determining corners for a rectangle associated with the at least one target box within the respective monocular image frame. Based on the determined corners, the method includes the following: performing edge detection and determining faces within the respective monocular image frame; and extracting planes corresponding to the at least one target box from the respective depth image frame. The method includes matching the determined faces to the extracted planes and generating a box estimation based on the determined corners, the performed edge detection, and the matched faces of the at least one target box.

Method and apparatus for fusing position information, and non-transitory computer-readable recording medium

A method and an apparatus for fusing position information, and a non-transitory computer-readable recording medium are provided. In the method, words of an input sentence are segmented to obtain a first sequence of words in the input sentence, and absolute position information of the words in the first sequence is generated. Then, subwords of the words in the first sequence are segmented to obtain a second sequence including subwords, and position information of the subwords in the second sequence are generated, based on the absolute position information of the words in the first sequence, to which the respective subwords belong. Then, the position information of the subwords in the second sequence are fused into a self-attention model to perform model training or model prediction.

AUGMENTING AUDIENCE MEMBER EMOTES IN LARGE-SCALE ELECTRONIC PRESENTATION
20230231730 · 2023-07-20 ·

A presentation service generates an audience interface for an electronic presentation. The audience interface may simulate an in-person presentation, including features such as a central presenter and seat locations for audience members. The audience members may select emotes which may be displayed in the audience interface. The emotes may indicate the audience members' opinion of the content being presented. The presentation service may enable chats between multiple audience members, grouping of audience members private rooms, and other virtual simulations of functions corresponding to in-person presentations.

Method, apparatus, terminal, and storage medium for training model

This application disclose a method for training a model performed at a computing device. The method includes: acquiring a template image and a test image; invoking a first object recognition model to process a feature of a tracked object in the template image to obtain a first reference response, and a second object recognition model to process the feature in the template image to obtain a second reference response; invoking the first model to process a feature of a tracked object in the test image to obtain a first test response, and the second model to process the feature to obtain a second test response; tracking the first test response to obtain a tracking response of the tracked object; and updating the first object recognition model based on differences between the first and second reference responses, that between the first and second test responses, and that between a tracking label and the tracking response.

Classifying time series image data

The present invention extends to methods, systems, and computer program products for classifying time series image data. Aspects of the invention include encoding motion information from video frames in an eccentricity map. An eccentricity map is essentially a static image that aggregates apparent motion of objects, surfaces, and edges, from a plurality of video frames. In general, eccentricity reflects how different a data point is from the past readings of the same set of variables. Neural networks can be trained to detect and classify actions in videos from eccentricity maps. Eccentricity maps can be provided to a neural network as input. Output from the neural network can indicate if detected motion in a video is or is not classified as an action, such as, for example, a hand gesture.

Device and method for detecting clinically important objects in medical images with distance-based decision stratification

A method for performing a computer-aided diagnosis (CAD) includes: acquiring a medical image set; generating a three-dimensional (3D) tumor distance map corresponding to the medical image set, each voxel of the tumor distance map representing a distance from the voxel to a nearest boundary of a primary tumor present in the medical image set; and performing neural-network processing of the medical image set to generate a predicted probability map to predict presence and locations of oncology significant lymph nodes (OSLNs) in the medical image set, wherein voxels in the medical image set are stratified and processed according to the tumor distance map.