G06V20/49

OPTIMIZED VIDEO SEGMENTATION FOR COMPLETING TASKS

A computer-implemented method for segmenting and recombining video segments of an input video based on prerequisites is disclosed. The computer-implemented method includes classifying video segments of an input video with respective activities associated with the video segments. The computer-implemented method further includes determining one or more prerequisites for performing classified activities associated with video segments of the input video. The computer-implemented method further includes determining respective users which satisfy the one or more determined prerequisites for performing the classified activities associated with the video segments of the input video. The computer-implemented method further includes generating a new video for a user based, at least in part, on merging those video segments in which the user satisfies the one or more determined prerequisites for performing a classified activity associated with a video segment.

METHOD AND APPARATUS FOR VIDEO RECOGNITION

Broadly speaking, the present techniques generally relate to a method and apparatus for video recognition, and in particular relate to a computer-implemented method for performing video recognition using a transformer-based machine learning, ML, model. Put another way, the present techniques provide new methods of image processing in order to automatically extract feature information from a video.

DEEP LEARNING-BASED VIDEO EDITING METHOD, RELATED DEVICE, AND STORAGE MEDIUM

A deep learning-based video editing method can allow for automated editing of a video, reducing or eliminating user input, saving time and labor investments, and thereby improving video editing efficiency. Attribute recognition is performed on an object in a target video using a deep learning model. A target object is selected that satisfies an editing requirement of the target video. A plurality of groups of pictures associated with the target object from the target video are obtained using editing. An edited video corresponding to the target video is generated using the plurality of groups of pictures.

Search results within segmented communication session content

Methods and systems provide for search results within segmented communication session content. In one embodiment, the system receives a transcript and video content of a communication session between participants, the transcript including timestamps for a number of utterances associated with speaking participants; processes the video content to extract textual content visible within the frames of the video content; segments frames of the video content into a number of contiguous topic segments; determines a title for each topic segment; assigns a category label for each topic segment; receives a request from a user to search for specified text within the video content; determines one or more titles or category labels for which a prediction of relatedness with the specified text is present; and presents content from at least one topic segment associated with the one or more titles or category labels for which a prediction of relatedness is present.

Video visual relation detection methods and systems

Methods and systems for detecting visual relations in a video are disclosed. A method comprises: decomposing the video sequence into a plurality of segments; for each segment, detecting objects in frames of the segment; tracking the detected objects over the segment to form a set of object tracklets for the segment; for the detected objects, extracting object features; for pairs of object tracklets of the set of object tracklets, extracting relativity features indicative of a relation between the objects corresponding to the pair of object tracklets; forming relation feature vectors for pairs of object tracklets using the object features of objects corresponding to respective pairs of object tracklets and the relativity features of the respective pairs of object tracklets; and generating a set of segment relation prediction results from the relation features vectors; generating a set of visual relation instances for the video sequence by merging the segment prediction results from different segments; and generating a set of visual relation detection results from the set of visual relation instances.

Detection apparatus, detection method, and computer program product
11580739 · 2023-02-14 · ·

A detection apparatus includes one or more processors. The processors set at least one time-period candidate. The processors input, to a first model that inputs a feature acquired from a plurality of time-series images and the time-period candidate and outputs at least one first likelihood indicating a likelihood of occurrence of at least one action previously determined as a detection target and correction information for acquisition of at least one correction time period resulting from correction of the at least one time-period candidate, the feature and the time-period candidate, and acquire the first likelihood and the correction information output from the first model. The processors detect, based on the at least one correction time period acquired based on the correction information and the first likelihood, the action included in the time-series images and a start time and a finish time of a time period of occurrence of the action.

METHOD OF IDENTIFYING AN ABRIDGED VERSION OF A VIDEO
20230044011 · 2023-02-09 ·

A computer-implemented method of identifying whether a target video comprises an abridged version of a reference video includes evaluating condition a) that the target video does not comprise all shots of the reference video; condition b) that the target video includes groups of consecutive shots also included in the reference video; and condition c) that all shots which are present in both the target video and the reference video are in the same order. The method further includes identifying whether the target video comprises an abridged version of the reference video; and outputting a result of the identifying. The target video is identified as comprising an abridged version of the reference video on condition that conditions a), b) and c) are met. Also provided is a data processing apparatus for performing the method; and a computer program and computer readable storage medium comprising instructions to perform the method.

BEHAVIOR RECOGNITION METHOD AND SYSTEM, ELECTRONIC DEVICE AND COMPUTER-READABLE STORAGE MEDIUM
20230042187 · 2023-02-09 ·

A behavior recognition method and system, including: dividing video data into a plurality of video clips, performing frame extraction processing on each video clip to obtain frame images, and performing optical flow extraction on the frame images to obtain optical flow images; performing feature extraction on the frame images and the optical flow images to obtain feature maps of the frame images and the optical flow images; performing spatio-temporal convolution processing on the feature maps of the frame images and the optical flow images, and determining a spatial prediction result and a temporal prediction result; fusing the spatial prediction results of all the video clips to obtain a spatial fusion result, and fusing the temporal prediction results of all the video clips to obtain a temporal fusion result; and performing two-stream fusion on the spatial fusion result and the temporal fusion result to obtain a behavior recognition result.

Methods and apparatus to detect commercial advertisements associated with media presentations

Methods and apparatus to detect commercial advertisements associated with media presentations are disclosed. An example method involves receiving a video frame and detecting a change in box-formatting between the video frame and a subsequent video frame. A transition between the video frame and the subsequent video frame is indicated as a commercial advertisement transition based on the detected change in box-formatting.

Scene change method and system combining instance segmentation and cycle generative adversarial networks

A scene change method and system combining instance segmentation and cycle generative adversarial networks are provided. The method includes: processing a video of a target scene and then inputting the video into an instance segmentation network to obtain segmented scene components, that is, obtain mask cut images of the target scene; and processing targets in the mask cut images of the target scene by using cycle generative adversarial networks according to the requirements of temporal attributes to generate data in a style-migrated state, and generating style-migrated targets with unfixed spatial attributes into a style-migrated static scene according to a specific spatial trajectory to achieve a scene change effect.