H04N21/440236

Identifying and removing restricted information from videos
11587591 · 2023-02-21 · ·

A video is provided to viewers using a web-based platform without restricted audio, such as a copyrighted soundtrack. To do so, a video comprising at least two audio layers is received. The audio layers can include separate and distinct audio layers or a mix of audio from separate sources. A restricted audio element is identified in a first audio layer and a speech element is identified in a second audio layer. A stitched text string can be generated by performing speech-to-text on both audio layers and removing the text corresponding to the restricted audio element of the second audio layer. When playing back the video, a portion of the video is muted based on the restricted audio element. A voice synthesizer is employed to generate audible sound during the muted portion using the stitched text string.

Distribution of Sign Language Enhanced Content
20220358854 · 2022-11-10 ·

A system for distributing sign language enhanced content includes a computing platform having processing hardware and a system memory storing a software code. The processing hardware is configured to execute the software code to receive content including at least one of a sequence of audio frames or a sequence of video frames, perform an analysis of the content, and identify, based on the analysis, a message conveyed by the content. The processing hardware is further configured to execute the software code to generate a sign language translation of the content, the sign language translation including one or more of a gesture, body language, or a facial expression communicating the message conveyed by the content.

CLOSED CAPTION CONTENT GENERATION

A system may include a memory and a processor in communication therewith configured to perform operations. The operations may include receiving an audio file and a text file related to the audio file, analyzing the audio file to produce an analysis, and determining a portion of the audio file is similar to a segment of the text file. The operations may include identifying a first terminal signal and corresponding the first terminal signal to a first terminal tag in the text file such that the first terminal tag is aligned with the first terminal signal; the first terminal signal identifies a first portion terminal end of the portion and the first terminal tag identifies a first segment terminal end of the segment. The operations may include generating a converted text from the analysis and inserting the segment into the converted text.

AUDIO CONTENT RECOGNITION METHOD AND APPARATUS, AND DEVICE AND COMPUTER-READABLE MEDIUM
20230091272 · 2023-03-23 ·

Embodiments of the present disclosure disclose an audio content recognition method and apparatus, an electronic device and a non-transitory computer-readable medium. A specific implementation of the method includes: obtaining a voice fragment collection and a non-voice fragment collection by segmenting audio; determining a type and language information of each voice fragment in the voice fragment collection; obtaining, for each voice fragment in the voice fragment collection, a first recognition result by performing voice recognition on the voice fragment based on the type and the language information of the voice fragment. In the implementation, speaking and music fragments in the audio are recognized by different models, so that two audio contents may both have better recognition effects. Moreover, audio of different language contents is recognized by using different models, thereby further improving a voice recognition effect.

MULTI-FORMAT CONTENT REPOSITORY SEARCH

An audio file format of an audio portion of a natural language content is determined. Using a trained audio language identification model, a human language included in the audio portion is identified. Using a trained audio to text model trained on the human language, the audio portion is converted to a corresponding set of text data. The set of text data is indexed. Using the indexed set of text data responsive to a search query, a search result is generated, the search query specifying a search including a non-textual portion of the natural language content.

SYSTEMS AND METHODS FOR CONTROLLING TRANSMISSION OF LIVE MEDIA STREAMS
20220345754 · 2022-10-27 · ·

A computer-implemented is disclosed. The method includes: receiving media data of a live media stream; obtaining audience reaction data associated with the live media stream; identifying an event-of-interest in the live media stream based on the audience reaction data, wherein a time of the event-of-interest is prior to a time of the audience reaction data; obtaining a segment of at least one of audio data or video data of the live media stream that is associated with the time of the event-of-interest; generating a digital asset incorporating the segment; and providing the digital asset to at least one viewer of the live media stream.

Modulation of packetized audio signals
11482216 · 2022-10-25 · ·

Modulating packetized audio signals in a voice activated data packet based computer network environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify trigger keyword and request, and generate a first action data structure. The system can identify a content item object based on the trigger keyword, and generate an output signal comprising a first portion corresponding to the first action data structure and a second portion corresponding to the content item object. The system can apply a modulation to the first or second portion of the output signal, and transmit the modulated output signal to the device.

REAL TIME FEATURE ANALYSIS AND INGESTING CORRELATED ADVERTISEMENTS IN A VIDEO ADVERTISEMENT

A computer-implemented method for displaying advertisements. The method includes associating feature metadata with a product video advertisement of a product. The method includes identifying, based on the feature metadata, a feature of the product corresponding with a section of the product video advertisement when a triggering event is detected during playing of the product video advertisement. The method includes displaying a second product video advertisement of a second product that includes the feature of the product.

Method for processing television screenshot, smart television, and storage medium

The present invention relates to the technical field of televisions, and provides a method for processing a television screenshot, a smart television, and a storage medium. To meet the demands of a more intuitive user interface and a seamless user interaction function, multiple sets of optional bars are displayed while displaying current playback content on a display screen in response to an input screenshot operation instruction, wherein optional bars are respectively used for displaying a picture thumbnail of a screenshot, recognizing content-related recommended content on the basis of an image of the screenshot, and/or responding to a user control instruction input interface for an operation associated with the screenshot.

MODULATION OF PACKETIZED AUDIO SIGNALS
20230111040 · 2023-04-13 ·

Modulating packetized audio signals in a voice activated data packet based computer network environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify trigger keyword and request, and generate a first action data structure. The system can identify a content item object based on the trigger keyword, and generate an output signal comprising a first portion corresponding to the first action data structure and a second portion corresponding to the content item object. The system can apply a modulation to the first or second portion of the output signal, and transmit the modulated output signal to the device.