G11B27/28

Masking systems and methods

Term masking is performed by generating a time-alignment value for a plurality of units of sound in vocal audio content contained in a mixed audio track, force-aligning each of the plurality of units of sound to the vocal audio content based on the time-alignment value, thereby generating a plurality of force-aligned identifiable units of sound, identifying from the plurality of force-aligned units of sound a force-aligned unit of sound to be altered, and altering the identified force-aligned unit of sound.

Modification of objects in film

A computer-implemented method of processing video data comprising a first sequences of image frames containing a first instance of an object. The method includes isolating said first instance of the object within the first sequence of image frames, determining, using the isolated first instance of the object, first parameter values for a synthetic model of the object, modifying the first parameter values for the synthetic model of the object, rendering a modified first instance of the object using a trained machine learning model and the modified first parameter values for the synthetic model of the object, and replacing at least part of the first instance of the object within the first sequence of image frames with a corresponding at least part of the modified first instance of the object.

Highlight video generated with adaptable multimodal customization

In implementations for highlight video generated with adaptable multimodal customization, a multimodal detection system tracks activities based on poses and faces of persons depicted in video clips of video content. The system determines a pose highlight score and a face highlight score for each of the video clips that depict at least one person, the highlight scores representing a relative level of the interest in an activity depicted in a video clip. The system also determines pose-based emotion features for each of the video clips. The system can detect actions based on the activities of the persons depicted in the video clips, and detect emotions exhibited by the persons depicted in the video clips. The system can receive input selections of actions and emotions, and filter the video clips based on the selected actions and emotions. The system can then generate a highlight video of ranked and filtered video clips.

SYSTEMS AND METHODS FOR IDENTIFYING CANDIDATE VIDEOS FOR AUDIO EXPERIENCES
20230098356 · 2023-03-30 ·

A computer-implemented method for identifying candidate videos for audio experiences may include (i) identifying a video with audio content that is a candidate for an audio-primary user experience that enables users to consume the video by listening to the audio content without watching visual content of the video, (ii) determining, at least in part by analyzing the video via a machine learning algorithm, that the audio content of the video is suitable for the audio-primary user experience, and (iii) presenting the audio content of the video to at least one user via an interface designed for the audio-primary user experience in response to determining that the audio content of the video is suitable for the audio-primary user experience. Various other methods, systems, and computer-readable media are also disclosed.

Automated Recording Highlights For Conferences

A transcript of a conference (e.g., a video conference, an audio conference, or a telephone call with two or more participants) is processed to extract a conference summary. Scores are determined for strings of the transcript that are used to select strings for inclusion in the conference summary. Determining the scores includes determining respective sentence vectors for strings. A sentence vector has elements corresponding to words in the transcript that are proportional to occurrences of the word in the string and inversely proportional to occurrences of the word in the transcript. A short video conference summary or a short audio conference summary is then generated using timestamps from the transcript associated with strings (e.g., sentences) that have been selected for inclusion in the conference summary. The short video or audio summary may be presented to users to enable efficient storage and transmission of conference information within a unified communications system.

Method and system of clipping a video, computing device, and computer storage medium

Embodiments of the present disclosure describes techniques for clipping a video. The disclosed techniques comprise obtaining a video including a plurality of frames performing object detection on each frame; identifying objects contained in each frame, wherein a region where each object is located is selected through a detection box; classifying and recognizing the objects identified in each frame using a pre-trained classification model; selecting human body region images; determining a similarity between each human body region image selected from the plurality of frames and a target character image; in response to determining that a similarity between a human body region image and the target character image is greater than a predetermined threshold, identifying the human body region image as a clipping image; and synthesizing clipping images identified in the plurality of frames in order of time to obtain a clipping video.

Connected interactive content data creation, organization, distribution and analysis

A method for identifying a product which appears in a video stream. The method includes playing the video stream on a video playback device, identifying key scenes in the video stream containing product images, selecting product images identified by predetermined categories of trained neural-network object identifiers stored in training datasets. Object identifiers of identified product images are stored in a database. Edge detection and masking is then performed based on at least one of shape, color and perspective of the object identifiers. A polygon annotation of the object identifiers is created using the edge detection and masking. The polygon annotation is annotated to provide correct object identifier content, accuracy of polygon shape, title, description and URL of the object identifier for each identified product image corresponding to the stored object identifiers. Also disclosed is a method for an end user to select and interact with an identified product.

Video automatic editing method and system based on machine learning
11615814 · 2023-03-28 · ·

Disclosed are a video automatic editing method and system based on machine learning. The video automatic editing system based on machine learning includes at least one processor, and the at least one processor includes a video acquirer configured to acquire input video, a highlight frame extractor configured to extract at least one highlight frame from the input video using a highlight extraction model pre-trained through machine learning, and a highlight video generator configured to generate highlight video from the at least one extracted highlight frame.

Systems and Methods for Intelligent Media Content Segmentation and Analysis

There is provided a system including a non-transitory memory storing an executable code and a hardware processor executing the executable code to receive a media content including a plurality of frames, divide the media content into a plurality of shots, each of the plurality of shots including a plurality of frames of the media content based on a first similarity between the plurality of frames, determine a plurality of sequential shots of the plurality of shots to be part of a first sub-scene of a plurality of sub-scenes of a scene based on a timeline continuity of the plurality of sequential shots, identify each of the plurality of shots of the media content and each of the plurality of sub-scenes with a corresponding beginning time code and a corresponding ending time code.

Systems and Methods for Intelligent Media Content Segmentation and Analysis

There is provided a system including a non-transitory memory storing an executable code and a hardware processor executing the executable code to receive a media content including a plurality of frames, divide the media content into a plurality of shots, each of the plurality of shots including a plurality of frames of the media content based on a first similarity between the plurality of frames, determine a plurality of sequential shots of the plurality of shots to be part of a first sub-scene of a plurality of sub-scenes of a scene based on a timeline continuity of the plurality of sequential shots, identify each of the plurality of shots of the media content and each of the plurality of sub-scenes with a corresponding beginning time code and a corresponding ending time code.