H04N21/4884

Auxiliary manifest file to provide timed metadata

A client electronic device to provide custom functionality for video content playback. The client electronic device includes one or more processors and a non-transitory computer-readable medium having stored therein instructions, which when executed by the one or more processors, causes the client electronic device to receive a streaming manifest file and a first auxiliary manifest file, where the streaming manifest file includes references to video segments of a video content, where the first auxiliary manifest file includes timed metadata associated with the video content, where the streaming manifest file and the first auxiliary manifest file refer to a same timeline, provide the streaming manifest file to a core playback module to play the video content according to the streaming manifest file, and provide custom functionality using the timed metadata included in the first auxiliary manifest file that replaces or augments functionality provided by the core playback module.

Caption modification and augmentation systems and methods for use by hearing assisted user

A system and method for facilitating communication between an assisted user (AU) and a hearing user (HU) includes receiving an HU voice signal as the AU and HU participate in a call using AU and HU communication devices, transcribing HU voice signal segments into verbatim caption segments, processing each verbatim caption segment to identify an intended communication (IC) intended by the HU upon uttering an associated one of the HU voice signal segments, for at least a portion of the HU voice signal segments (i) using an associated IC to generate an enhanced caption different than the associated verbatim caption, (ii) for each of a first subset of the HU voice signal segments, presenting the verbatim captions via the AU communication device display for consumption, and (iii) for each of a second subset of the HU voice signal segments, presenting enhanced captions via the AU communication device display for consumption.

Method and system for precise presentation of audiovisual content with temporary closed captions

A method, set-top box, and non-transitory computer readable medium are disclosed for presentation of audiovisual content with closed captions. The method includes receiving, via an input device interfaced with the electronic device, an instruction requesting a replay of previously viewed video content with closed captioning; sending, to the display device interfaced with the electronic device, one or more thumbnail images of the previously viewed video content to be displayed on the display device; receiving, via the input device interfaced with the electronic device, one of the one or more thumbnail images of the previously viewed video content being selected for replay of the previously viewed video content; and sending, to the display device interfaced with the electronic device, closed captioning with the previously viewed video content starting at a video frame corresponding to the one of the one or more thumbnails of the previously viewed video content selected for replay.

System and method to support synchronization, closed captioning and highlight within a text document or a media file
11537781 · 2022-12-27 ·

The present invention relates to a system and method for synchronizing and highlighting a target text and audio associated with a reference document. The system and method may comprise one or more of an input unit, an extracting unit, a mapping unit, a processing unit, and an image resizing unit. The system and method may synchronize the target text and audio in order to provide a user with a Read Along. The invention further synchronizes and highlights closed captions and audio that helps people with hearing impairment to comprehend better while watching a movie or listening to songs.

Language agnostic missing subtitle detection
11538461 · 2022-12-27 · ·

Some implementations include methods for detecting missing subtitles associated with a media presentation and may include receiving an audio component and a subtitle component associated with a media presentation, the audio component including an audio sequence, the audio sequence divided into a plurality of audio segments; evaluating the plurality of audio segments using a combination of a recurrent neural network and a convolutional neural network to identify refined speech segments associated with the audio sequence, the recurrent neural network trained based on a plurality of languages, the convolutional neural network trained based on a plurality of categories of sound; determining timestamps associated with the identified refined speech segments; and determining missing subtitles based on the timestamps associated with the identified refined speech segments and timestamps associated with subtitles included in the subtitle component.

LIVE COMMENTING PROCESSING METHOD AND SYSTEM
20220408160 · 2022-12-22 ·

The present disclosure describes techniques of processing bullet comments. The techniques comprise acquiring a page of playing a video to output a video stream; acquiring multiple pieces of bullet comment data associated with the video stream; traversing the multiple pieces of bullet comment data and determining whether the multiple pieces of bullet comment data comprise at least one piece of bullet comment data in an expired state; and deleting the at least one piece of bullet comment data in the expired state from the plurality of pieces of bullet comment data in response to determining that the plurality of pieces of bullet comment data comprise the at least one piece of bullet comment data in the expired state.

BULLET COMMENT PRESENTATION METHOD AND SYSTEM
20220408144 · 2022-12-22 ·

The present disclosure describes techniques of presenting bullet comments. The techniques comprise acquiring a page of playing a video; acquiring multiple pieces of original bullet screen data, wherein each piece of original bullet screen data comprises content of a bullet comment and timing information indicating a time of posting the bullet comment in the video; cloning the multiple pieces of original bullet screen data to obtain multiple pieces of bullet screen data corresponding to the plurality of pieces of original bullet comment data; acquiring multiple pieces of target bullet screen data from the multiple pieces of bullet screen data based on the timing information associated with each piece of original bullet comment data; and content comprised in at least one of the plurality of pieces of target bullet comment data in an area of the page configured to display bullet comments.

Electronic Devices and Corresponding Methods for Redirecting Event Notifications in Multi-Person Content Presentation Environments
20220408159 · 2022-12-22 ·

An electronic device includes a communication device electronically communicating with a content presentation companion device operating as a primary display for the electronic device and at least one augmented reality companion device. One or more sensors detect multiple persons within an environment while the content presentation companion device operates as the primary display. One or more processors redirect an event notification intended for presentation on the primary display to the augmented reality companion device while both the content presentation companion device operates as the primary display for the electronic device and the multiple persons are within the environment of the electronic device. When communicating with two augmented reality companion devices, the one or more processors can direct subtitles associated with a content offering, sometimes in different languages, to at least a first augmented reality companion device and a second augmented reality companion device.

DIARISATION AUGMENTED REALITY AIDE

An image of a real-world environment including one or more users, is received from an image capture device. A mask status of a first user of is determined by a processor based on the image. A stream of audio including speech from one or more users is captured from one or more audio transceivers. A first user speech from the stream of audio identified by the processor. The stream of audio is parsed, by the processor and based on the first user speech and based on an audio processing technique, to create a first user speech element. An augmented view that includes the first user speech element is generated, for a wearable computing device, based on the first user speech and based on the mask status.

Methods and systems for dynamic content modification

An example method can comprise receiving content for presentation at a user device. The content can comprise a plurality of sections, and each section can comprise a video portion and an audio portion. The user device can also receive content metadata regarding one or more features of the content, where the features of the content comprise one or more candidate sections of the content for modification. The user device can apply one or more rules to the received content based on the content metadata to modify one or more of the audio portion and the video portion of at least one section of the content, creating modified content, and can cause presentation of the modified content on a display device.