G06V20/41

MULTI-VIEW MULTI-TARGET ACTION RECOGNITION

Implementations generally perform robust multi-view multi-target action recognition using reconstructed 3-dimensional (3D) poses. In some implementations, a method includes obtaining a plurality of videos of a plurality of subjects in an environment, where at least one target subject of the plurality of subjects performs one or more actions in the environment. The method further includes tracking the at least one target subject across at least two cameras. The method further includes reconstructing a 3-dimensional (3D) model of the at least one target subject based on the plurality of videos and the tracking of the at least one target subject. The method further includes recognizing the one or more actions of the at least one target subject based on the reconstructing of the 3D model.

FEW-SHOT ACTION RECOGNITION

Methods and systems of training a neural network include training a feature extractor and a classifier using a first set of training data that includes one or more base cases. The classifier is trained with few-shot adaptation using a second set of training data, smaller than the first set of training data, while keeping parameters of the feature extractor constant.

Person replacement utilizing deferred neural rendering

Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first parameters of a face and body of a source person from a first frame in a video shot. The system also determines second parameters of a face and body of a target person. The system determines that the target person is a replacement for the source person in the first frame. The system generates third parameters of the target person based on merging the first parameters with the second parameters. The system then performs deferred neural rendering of the target person based on a neural texture that corresponds to a texture space of the video shot. The system then outputs a second frame that shows the target person as the replacement for the source person.

Search results within segmented communication session content

Methods and systems provide for search results within segmented communication session content. In one embodiment, the system receives a transcript and video content of a communication session between participants, the transcript including timestamps for a number of utterances associated with speaking participants; processes the video content to extract textual content visible within the frames of the video content; segments frames of the video content into a number of contiguous topic segments; determines a title for each topic segment; assigns a category label for each topic segment; receives a request from a user to search for specified text within the video content; determines one or more titles or category labels for which a prediction of relatedness with the specified text is present; and presents content from at least one topic segment associated with the one or more titles or category labels for which a prediction of relatedness is present.

Image processing apparatus, image processing method and medium

An object of one embodiment of the present disclosure is to provide a product with a high added value to a user by preventing an unnatural character string from being combined, combination of no character string, and the like in a case where there is no voice or almost no voice before or after an image selected from within a moving image. One embodiment of the present disclosure is an image processing apparatus including: a selection unit configured to select, from a moving image including a plurality of frames, a part of the moving image; an extraction unit configured to extract a voice during a predetermined time corresponding to the selected part in the moving image; and a combination unit configured to combine a character string based on a voice extracted by the extraction unit, with the part of the moving image selected by the selection unit.

Systems and methods for video archive and data extraction
11580159 · 2023-02-14 · ·

Systems and methods for full motion video search are provided. In one aspect, a method includes receiving one or more search terms. The search terms include one or more of a characterization of the amount of man-made features in a video image and a characterization of the amount of natural features in the video image. The method further includes searching a full motion video database based on the one or more search terms.

Computer-implemented interfaces for identifying and revealing selected objects from video

A computer-implemented visual interface for identifying and revealing objects from video-based media provides visual cues to enable users to interact with video-based media. Objects in videos are inferred and identified based upon automatic interpretations of the video and/or audio that is associated with the video. The automatic interpretations may be performed by a computer-implemented neural network. The computer-implemented visual interface is integrated with the video to enable users to interact with the identified objects. User interactions with the visual interface may be through either touch or non-touch means. Information is delivered to users that is based upon the identified objects, including in augmented or virtual reality-based form, responsive to user interactions with the computer-implemented visual interface.

Imaging device, video retrieving method, video retrieving program, and information collecting device

A drive recorder according to an embodiment of the present disclosure includes: an imaging unit that is mounted on a vehicle and captures a video of the surroundings of the vehicle; a video recording unit that has, recorded therein, video data captured; a network connecting unit that receives accident information including a time and date when an accident occurred and a place where the accident occurred; and a video retrieving unit that determines whether any video data captured in a predetermined time period and in a predetermined region are available in the video data recorded in the video recording unit, the predetermined time period including the time and date when the accident occurred, the predetermined region including the place where the accident occurred.

Method and apparatus for locating video playing node, device and storage medium

The disclosure provides a method for locating a video playing node, and relates to fields of big data and video processing. The method includes: selecting a target video out from a plurality of videos; and sending the target video, a plurality of subtitle text segments of the target video and start time information of each of the plurality of subtitle text segments to a client, to cause the client to display the plurality of subtitle text segments, and determine, in response to a trigger operation on an any subtitle text segment of the plurality of subtitle text segments, a start playing node of the target video based on the start time information of the any subtitle text segment. The disclosure further provides an apparatus for locating a video playing node, an electronic device and a storage medium.

Detection apparatus, detection method, and computer program product
11580739 · 2023-02-14 · ·

A detection apparatus includes one or more processors. The processors set at least one time-period candidate. The processors input, to a first model that inputs a feature acquired from a plurality of time-series images and the time-period candidate and outputs at least one first likelihood indicating a likelihood of occurrence of at least one action previously determined as a detection target and correction information for acquisition of at least one correction time period resulting from correction of the at least one time-period candidate, the feature and the time-period candidate, and acquire the first likelihood and the correction information output from the first model. The processors detect, based on the at least one correction time period acquired based on the correction information and the first likelihood, the action included in the time-series images and a start time and a finish time of a time period of occurrence of the action.