G06V20/47

Generating a video segment of an action from a video

A computer-implemented method includes receiving a video that includes multiple frames. The method further includes identifying a start time and an end time of each action in the video based on application of one or more of an audio classifier, an RGB classifier, and a motion classifier. The method further includes identifying video segments from the video that include frames between the start time and the end time for each action in the video. The method further includes generating a confidence score for each of the video segments based on a probability that a corresponding action corresponds to one or more of a set of predetermined actions. The method further includes selecting a subset of the video segments based on the confidence score for each of the video segments.

Systems and methods for generating music recommendations

Systems, methods, and non-transitory computer-readable media can be configured to determine a video embedding for a video content item based at least in part on a first machine learning model. A set of music embeddings can be determined for a set of music content items based at least in part on a second machine learning model. The set of music content items can be ranked based at least in part on the video embedding and the set of music embeddings.

Method and apparatus for data processing, and method and apparatus for video cover generation

Embodiments of the disclosure provides methods and apparatuses for data processing and video cover generation. The method for video cover generation comprises: receiving a video for processing, a processing request for the video, and a video cover generation condition; segmenting the video based on the processing request to obtain a sequence of target clips; scoring target clips in the sequence of target clips based on a pre-configured algorithm to obtain scores for the target clips; after weighting the scores of target clips satisfying the video cover generation condition, weighting the scores of target clips associated with a video category of the video; and generating, based on a weighting process result, a personalized video cover attracting more user attention.

TEXT-BASED FRAMEWORK FOR VIDEO OBJECT SELECTION

Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.

System and method of generating video from video clips based on moments of interest within the video clips
11468914 · 2022-10-11 · ·

Videos may be automatically generated using a set of video clip. Individual moments of interest may be identified within individual video clips of a set of video clips. A moment of interest may correspond to a point in time within a video clip. The point in time may be associated with one or more values of one or more attributes of the video clip. Individual moments of interest may be associated with individual portions of a video. The video may be generated using the set of video clips based on the associations.

METHOD OF PROCESSING MULTIMEDIA DATA, DEVICE AND MEDIUM

A method of processing multimedia data, a device, and a medium, which relates to a field of an artificial intelligence technology, in particular to fields of knowledge graph and deep learning. The method of processing the multimedia data includes: recognizing the multimedia data so as to obtain at least one key information of the multimedia data; querying a predetermined knowledge base according to the at least one key information, so as to determine a multimedia name associated with the at least one key information and an association degree between the multimedia name and the at least one key information; and determining, in the multimedia name, a name of the multimedia data based on a similarity between alternative multimedia data for the multimedia name and the multimedia data, in response to the association degree being less than a first threshold value.

Method and apparatus for filtering video

An artificial intelligence (AI) system for simulating functions such as recognition, determination, and so forth of a human brain by using a mechanical learning algorithm such as deep learning, or the like, and an application thereof are provided. A method of filtering video by a device is provided. The method includes selecting at least one previous frame preceding a current frame being played from among a plurality of frames included in the video, generating metadata regarding the selected at least one previous frame, predicting harmfulness of at least one next frame to be displayed on the device after playback of the current frame, based on the generated metadata, and filtering the next frame based on the predicted harmfulness.

On-line video filtering
11468679 · 2022-10-11 ·

Some embodiments relate to a system and method to increase the speed of a computer determination whether a video contains a particular content. In some embodiments, the quantity of data in the video is first reduced while preserving the searched-for content. Optionally, first, the size of the data is reduced by reducing the resolution, for example resolution may be reduced without searching and/or processing the full data set. Additionally or alternatively, low quality and/or empty data is removed from the dataset. Additionally or alternatively, redundant data may be searched out and/or removed. Optionally, after data reduction, the reduced dataset is analyzed to determine if it contains the searched-for content. Optionally, an estimate is made of the probability of the full dataset containing the searched-for content.

Automated media editing operations in consumer devices

Techniques disclosed for managing video captured by an imaging device. Methods disclosed capture a video in response to a capture command received at the imaging device. Following a video capture, techniques for classifying the captured video based on feature(s) extracted therefrom, for marking the captured video based on the classification, and for generating a media item from the captured video according to the marking are disclosed. Accordingly, the captured video may be classified as representing a static event, and, as a result, a media item of a still image may be generated. Otherwise, the captured video may be classified as representing a dynamic event, and, as a result, a media item of a video may be generated.

Systems and methods for generating improved content based on matching mappings
11604827 · 2023-03-14 · ·

Systems and methods are disclosed herein for generating content based on matching mappings by implementing deconstruction and reconstruction techniques. The system may retrieve a first content structure that includes a first object with a first mapping that includes a first list of attribute values. The system may then search content structures for a matching content structure having a second object with a second list of attributes and a second mapping including second attribute values corresponding to the second list of attributes. Upon finding a match, the system may generate a new content structure having the first object from the first content structure with the second mapping from the matching content structure. The system may then generate for output a new content segment based on the newly generated content structure.