G06V20/48

Method and system of pushing video viewfinder

The present disclosure describes techniques of pushing information associated with the at least one location that is associated with a video. The disclosed techniques comprises obtaining video data, wherein the video data comprise a plurality frames of a video and information associated with the video; determining at least one location associated with at least one frame among the plurality of frames of the video based on comparing the video data with data included in a database; determining information associated with the at least one location; and pushing the information associated with the at least one location to a first computing device based on a time point of playing the at least one frame among the plurality of frames of the video.

Audiovisual source separation and localization using generative adversarial networks

A method (and structure and computer product) for an audiovisual source separation processing includes receiving video data showing images of a plurality of sound sources into a video encoder, while concurrently receiving into the video encoder optical flow data of the video data, the optical flow data indicating motions of pixels between frames of the video data. The video encoder encodes the received video data into video localization data comprising information associating pixels in the frames of video data with different channels of sound and encodes the received optical flow data into video separation data comprising information associating motion information in the frames of video data with the different channels of sound.

SYSTEMS AND METHODS FOR PIRACY DETECTION AND PREVENTION
20220358762 · 2022-11-10 ·

Examples of the present disclosure describe systems and methods for detecting and preventing digital media piracy. In example aspects, a machine learning model is trained on a dataset related to digital media content. Input data may then be collected by a data collection engine and provided to a multimedia processor. The multimedia processor may extract multimedia features (e.g., audio, visual, etc.) and recognized patterns from the input data and provide the extracted multimedia features to a trained machine learning model. The trained machine learning model may compare the extracted features to the model, and a confidence value may be generated. The confidence value may be compared to a confidence threshold. If the confidence value is equal to or exceeds the confidence threshold, then the input data may be classified as pirated digital media. Remedial action response(s) may subsequently be deployed to thwart the piracy of the digital media.

Systems and methods for mixing different videos
11581018 · 2023-02-14 · ·

There are provided methods and systems for media processing, comprising: providing at least one media asset source selected from a media asset sources library, the at least one media asset source comprising at least one source video, via a network to a client device; receiving via the network or the client device a media recording comprising a client video recorded by a user of the client device; transcoding the at least one source video and the client video which includes parsing the client video and the source video, respectively, to a plurality of client video frames and a plurality of source video frames based on the matching; segmenting one or more frames of the plurality of source video frames to one or more character frames; detecting one or more face images in one or more frames of the plurality of client video frames and provide face markers; resizing the one or more character frames according to the face markers compositing the resized character frames with the background frames using one or more blending methods to yield a mixed media asset frames; and encoding the mixed media asset frames to yield a mixed media asset video.

AUDIOVISUAL SOURCE SEPARATION AND LOCALIZATION USING GENERATIVE ADVERSARIAL NETWORKS
20230044635 · 2023-02-09 ·

A method (and structure and computer product) for an audiovisual source separation processing, including receiving video data including images of a plurality of sound sources, receiving an optical flow data of the video data, the optical flow data indicating motions of pixels between frames of the video data, and encoding the received video data into video localization data comprising information associating pixels in the frames of video data with different channels of sound.

Video processing for embedded information card localization and content extraction
11615621 · 2023-03-28 · ·

Metadata for one or more highlights of a video stream may be extracted from one or more card images embedded in the video stream. The highlights may be segments of the video stream, such as a broadcast of a sporting event, that are of particular interest. According to one method, video frames of the video stream are stored. One or more information cards embedded in a decoded video frame may be detected by analyzing one or more predetermined video frame regions. Image segmentation, edge detection, and/or closed contour identification may then be performed on identified video frame region(s). Further processing may include obtaining a minimum rectangular perimeter area enclosing all remaining segments, which may then be further processed to determine precise boundaries of information card(s). The card image(s) may be analyzed to obtain metadata, which may be stored in association with at least one of the video frames.

SYSTEM FOR DETECTING AND VISUALIZING DEMOGRAPHICS, DIVERSITY AND DISPARITY IN USER-GENERATED VIDEOS
20230034089 · 2023-02-02 ·

A system for detecting and visualizing demographics, diversity and disparity in user-generated videos includes a collection of user-generated content, such as videos, having demographic information, a target collection of content having defined diversity information, creator content having diversity information, and a processor capable of performing a diversity analysis on the user-generated content, displaying an indication of disparity between the user-generated content and the target collection of content, and taking an action based on the analysis to modify the collection to more closely reflect the target collection of information such as inviting content creators to contribute to the collection content associated with the identified areas of disparity.

Systems, methods, and devices for determining an introduction portion in a video program

Systems, methods, and devices relating to determining an introduction portion in a video program are described herein. A method may determine first and second hard-matching pairs of video segments in first and second video content such that video fingerprints of the first hard-matching pair match and video fingerprints of the second hard-matching pair also match. The method may classify a third pair of video segments in the first and second video content, sequentially between the first and second hard-matching pairs, as a soft-matching pair of video segments of an introduction portion. The method may use the classification of the third pair of video segments as a soft-matching pair to determine a model configured to determine that a pair of video segments in two video content items are a soft-matching pair of video segments of an introduction portion.

SYSTEMS AND METHODS FOR RETRIEVING VIDEOS USING NATURAL LANGUAGE DESCRIPTION
20230086735 · 2023-03-23 · ·

Implementations are directed to methods, systems, and computer-readable media for obtaining videos and extracting, from each video, a key frame for the video including a timestamp. For each key frame, a scene graph is generated. Generating the scene graph for the key frame includes identifying, objects in the image, and extracting a relationship feature defining a relationship between a first object and a second, different object of the objects in the key frame. The scene graph for the key frame is generated that includes a set of nodes and a set of edges. A natural language query request for a video is received, including terms defining a relationship between two or more particular objects. A query graph is generated for the natural language query request, and a set of videos corresponding to the set of scene graphs matching the query graph are provided for display on a user device.

VIDEO LOOP RECOGNITION
20230093746 · 2023-03-23 · ·

A method for video loop recognition includes determining a first encoding feature and a second encoding from a first video clip pair of a video. The first encoding feature is associated with first modal information, and the second encoding feature is associated with second modal information that is different from the first modal information. A network model that includes a first sequence model associated with the first modal information and a second sequence model associated with the second modal information is acquired. The method includes inputting the first encoding feature to the first sequence model that outputs a first similarity result, inputting the second encoding feature to the second sequence model that outputs a second similarity result, and obtaining a loop comparison result based on a comparison of the first similarity result with the second similarity result. The loop comparison result indicates a video type of the video.