G06V20/49

IMAGE RETRIEVAL APPARATUS

An image retrieval apparatus includes a processor, and the processor performs a process including: determining an image in which a first characteristic object is included in a subject to be a first image, and determining an image that is captured after the first image and in which a second characteristic object is included in the subject to be a second image, from among a series of captured images; specifying images as an image group, the images being captured during a period after the first image is captured before the second image is captured from among the series of captured images; and extracting a representative image from the image group.

VIDEO PROCESSING SYSTEM
20230239428 · 2023-07-27 · ·

A video processing system includes: an object movement information acquiring means for detecting a moving object moving in a plurality of segment regions from video data obtained by shooting a monitoring target area, and acquiring movement segment region information as object movement information, the movement segment region information representing segment regions where the detected moving object has moved; an object movement information and video data storing means for storing the object movement information in association with the video data corresponding to the object movement information; a retrieval condition inputting means for inputting a sequence of the segment regions as a retrieval condition; and a video data retrieving means for retrieving the object movement information in accordance with the retrieval condition and outputting video data stored in association with the retrieved object movement information, the object movement information being stored by the object movement information and video data storing means.

METHODS FOR ARTHROSCOPIC SURGERY VIDEO SEGMENTATION AND DEVICES THEREFOR

Methods, non-transitory computer readable media, and arthroscopic video segmentation apparatuses and systems that facilitate improved, automatic segmentation analysis of videos of arthroscopic procedures are disclosed. With this technology, a video feed of an arthroscopic surgery can be automatically segmented using machine learning models and one or more tags related to the segments can be associated with the video feed. The generated videos can be output in real time to provide segmented information related to the surgical procedure or can be saved with the one or more segments tagged for playback for training or informational purposes.

Cross-media measurement device and method
11570513 · 2023-01-31 · ·

A method of identifying media content presented on a display device includes determining a selected input source providing a video signal to the display device, and then selecting a first set of content identification rules when it is determined that the selected input source is a first input source, and selecting a second set of content identification rules when it is determined that the selected input source is a second input source. The method further comprises applying the selected first set or second set of content identification rules to the video signal in order to generate content identification data for the media content presented on the display device. Application of the content identification rules includes waiting for a trigger event and applying an algorithm to one or more frames of the video signal following the trigger event.

Learning apparatus and method for creating emotion expression video and apparatus and method for emotion expression video creation

A learning apparatus for creating an emotion expression video according to an embodiment disclosed include first generative adversarial networks (GAN) that receive text for creating an emotion expression video, extract vector information by performing embedding on the input text, and create an image based on the extracted vector information, and second generative adversarial networks that receive an emotion expression image and a frame of comparison video, and create a frame of emotion expression video from the emotion expression image and the frame of comparison video.

VIDEO CLIP POSITIONING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230024382 · 2023-01-26 ·

This application discloses a video clip positioning method performed by a computer device. In this application, clip features of video clips in a video are determined according to the unit features of video units within the video clips, so that the acquired clip features integrate the features of the video units and the time sequence correlation between the video units; and then the clip features of the video clips and a text feature of a target text are fused. The features of video clip dimensions and the time sequence correlation between the video clips are fully used in the feature fusion process, so that more accurate attention weights can be acquired based on the fused features. The attention weights are used to represent matching degrees between the video clips and the target text, and then a target video clip matching the target text can be positioned more accurately.

Method for searching video and equipment with video search function
11709890 · 2023-07-25 · ·

A method for searching a video and equipment with a video search function are provided. The method for searching a video includes constructing a video DB by analyzing continuity of a tag given to an appearing object and extracting section information about the tag, and detecting video information. An object may be recognized, a video database may be constructed, and a video may be searched on the basis of analysis based on an artificial intelligence (AI) model through a 5G network.

SEMI-SUPERVISED VIDEO TEMPORAL ACTION RECOGNITION AND SEGMENTATION

Systems, apparatuses, and methods include technology that generates final frame predictions for a first plurality of frames of a video, where the first plurality of frames is associated with unlabeled data. The technology predicts an ordered list of actions for the first plurality of frames based on the final frame predictions, and temporally aligning the ordered list of actions to the final frame predictions to generate labels.

Neural-Symbolic Action Transformers for Video Question Answering
20230027713 · 2023-01-26 ·

Mechanisms are provided for performing artificial intelligence-based video question answering. A video parser parses an input video data sequence to generate situation data structure(s), each situation data structure comprising data elements corresponding to entities, and first relationships between entities, identified by the video parser as present in images of the input video data sequence. First machine learning computer model(s) operate on the situation data structure(s) to predict second relationship(s) between the situation data structure(s). Second machine learning computer model(s) execute on a received input question to predict an executable program to execute to answer the received question. The program is executed on the situation data structure(s) and predicted second relationship(s). An answer to the question is output based on results of executing the program.

ADDING AUGMENTED REALITY TO A SUB-VIEW OF A HIGH RESOLUTION CENTRAL VIDEO FEED

Techniques are disclosed to add augmented reality to a sub-view of a high resolution central video feed. In various embodiments, a central video feed is received from a first camera on a first recurring basis and time-stamped position information is received from a tracking system on a second recurring basis. The central video feed is calibrated against a spatial region encompassed by the central video feed. The received time-stamped position information and a determined plurality of tiles associated with at least one frame of the central video feed are used to define a first sub-view of the central video feed. The first sub-view and a homography defining placement of augmented reality elements on the at least one frame of the central video feed are provided as output to a device configured to use the first sub-view and the homography display the first sub-view.