Patent classifications
H04N21/2335
METHOD AND SYSTEM FOR ENRICHING LIVESTREAMING CONTENT FOR CONTENT VIEWERS
A method and system for enriching livestreaming content for viewers are disclosed. An event occurring within a livestreaming content is detected. The livestreaming content is viewed by a plurality of viewers on respective electronic devices. Event data associated with the detected event is determined based on one or more event attributes captured by one or more data sources associated with the livestreaming content. An event type associated with the detected event is determined based on event data and a plurality of event profiles stored in an event database. At least one applause audio from among a plurality of pre-recorded applause audios stored at an applause repository is identified based on the event type and event data. Thereafter, the livestreaming content is adapted based on the at least one applause audio and an event status flag associated with the detected event to enrich the livestreaming content for the plurality of viewers.
Artificial intelligence model for predicting playback of media data
A system is provided to predict requested playbacks of media files by users from a media storage system. The system includes a processor and a computer readable medium operably coupled thereto, to perform predictive playback operations which include accessing an AI model and a media file comprising metadata associated with generating the media file, generating a predictive score for a playback of the media file based on the AI model and the metadata, comparing the predictive score to a threshold required to transcode the media file into a playback format prior to the playback, predicting the playback based on the comparing, determining a predicted playback time of the media file based on the metadata for the media file, and transcoding the media file into the playback format prior to the predicted playback time.
Text-driven editor for audio and video editing
The disclosed technology is a system and computer-implemented method for assembling and editing a video program from spoken words or soundbites. The disclosed technology imports source audio/video clips and any of multiple formats. Spoken audio is transcribed into searchable text. The text transcript is synchronized to the video track by timecode markers. Each spoken word corresponds to a timecode marker, which in turn corresponds to a video frame or frames. Using word processing operations and text editing functions, a user selects video segments by selecting corresponding transcribed text segments. By selecting text and arranging that text, a corresponding video program is assembled. The selected video segments are assembled on a timeline display in any chosen order by the user. The sequence of video segments may be reordered and edited, as desired, to produce a finished video program for export.
FRAGMENT-ALIGNED AUDIO CODING
Audio video synchronization and alignment or alignment of audio to some other external clock are rendered more effective or easier by treating fragment grid and frame grid as independent values, but, nevertheless, for each fragment the frame grid is aligned to the respective fragment's beginning. A compression effectiveness lost may be kept low when appropriately selecting the fragment size. On the other hand, the alignment of the frame grid with respect to the fragments' beginnings allows for an easy and fragment-synchronized way of handling the fragments in connection with, for example, parallel audio video streaming, bitrate adaptive streaming or the like.
AUTOMATED VOICE TRANSLATION DUBBING FOR PRERECORDED VIDEO
A method for aligning a translation of original caption data with an audio portion of a video is provided. The method involves identifying original caption data for the video that includes caption character strings, identifying translated language caption data for the video that includes translated character strings associated with audio portion of the video, and mapping caption sentence fragments generated from the caption character strings to corresponding translated sentence fragments generated from the translated character strings based on timing associated with the original caption data and the translated language caption data. The method further involves estimating time intervals for individual caption sentence fragments using timing information corresponding to individual caption character strings, assigning time intervals to individual translated sentence fragments based on estimated time intervals of the individual caption sentence fragments, generating a set of translated sentences using consecutive translated sentence fragments, and aligning the set of translated sentences with the audio portion of the video using assigned time intervals of individual translated sentence fragments from corresponding translated sentences.
Apparatus and method for providing audio description content
A method and apparatus are described. The method includes receiving a set of audio soundtracks associated with media content, determining if one of the audio soundtracks is an audio description soundtrack, and modifying at least one main audio soundtrack to include an indication that an audio description soundtrack is available for the media content if it is determined that one of the audio soundtracks is an audio description audio soundtrack. The apparatus includes memory that stores a set of audio soundtracks associated with media content. The apparatus also includes an audio processing circuit configured to retrieve the set of audio soundtracks, determine if one of the audio soundtracks is an audio description soundtrack, and modify at least one main audio soundtrack to include an indication that an audio description soundtrack is available for the media content if it is determined that one of the audio soundtracks is an audio description soundtrack.
Delivery of synchronised soundtracks for electronic media content
A method and system for streaming a soundtrack from a server to a remote user device for a reader of electronic media content. The soundtrack is defined by multiple audio regions. Each audio region defined by an audio track for playback in the audio region, a start position in the electronic media content corresponding to where the playback of the audio region is to begin, and a stop position in the electronic media content corresponding to where the playback of the audio region is to cease. The streaming of the soundtrack is based on control data generated by the remote user device.
Text-driven editor for audio and video assembly
The disclosed technology is a system and computer-implemented method for assembling and editing a video program from spoken words or soundbites. The disclosed technology imports source audio/video clips and any of multiple formats. Spoken audio is transcribed into searchable text. The text transcript is synchronized to the video track by timecode markers. Each spoken word corresponds to a timecode marker, which in turn corresponds to a video frame or frames. Using word processing operations and text editing functions, a user selects video segments by selecting corresponding transcribed text segments. By selecting text and arranging that text, a corresponding video program is assembled. The selected video segments are assembled on a timeline display in any chosen order by the user. The sequence of video segments may be reordered and edited, as desired, to produce a finished video program for export.
Class-based intelligent multiplexing over unmanaged networks
Switched digital television programming for video-on-demand and other interactive television services are combined utilizing class-based, multi-dimensional decision logic to simultaneously optimize video quality and audio uniformity while minimizing latency during user interactions with the system over an unmanaged network. For example, a method of adapting content-stream bandwidth includes generating a content stream for transmission over an unmanaged network with varying capacity; sending the content stream, via the unmanaged network, toward a client device; monitoring the capacity of the unmanaged network; determining whether an aggregate bandwidth of an upcoming portion of the content stream fits the capacity, wherein the upcoming portion of the content stream corresponds to a respective frame time and includes video content and user-interface data; and, in response to a determination that the aggregate bandwidth does not fit the capacity, reducing a size of the upcoming portion of the content stream.
System and Method for Group Stream Broadcasting with Stateless Queuing Feature
A stateless queue system implements and supports a virtual room containing a content playback queue. The system allows multiple clients to listen to queued content in a synchronized manner and to modify the same queue without requiring a broadcasting client to send playback events. The system includes a multiplicity of computing clients that can add content items to the queue by interacting with an application service. The clients are enabled to add content to the queue in a number of ways, including by simple add-on to the bottom of the queue, by vote, or by direct modification of the queue structure. Upon client entrance into the playback session, data is provided to the client that represent the queue order at the time of request, the number of votes each items contains if structure is determined by vote, the start time for each queued item, and the current playing item.