Patent classifications
H04N21/234336
MULTI-FORMAT CONTENT REPOSITORY SEARCH
An audio file format of an audio portion of a natural language content is determined. Using a trained audio language identification model, a human language included in the audio portion is identified. Using a trained audio to text model trained on the human language, the audio portion is converted to a corresponding set of text data. The set of text data is indexed. Using the indexed set of text data responsive to a search query, a search result is generated, the search query specifying a search including a non-textual portion of the natural language content.
Method and system for real-time transcoding of MPEG-DASH on-demand media segments while in transit from content host to dash client
A system, method and computer program product for real-time post-processing system that transforms MPEG-DASH on-demand media streams, including a DASH media player device; an intercepting media server device; a MPEG-DASH content origin server device; and a proxy media client device coupled to the DASH media player device and the intercepting media server device and configured to intercept MPEG-DASH HTTP requests from the DASH media player device and forward the intercepted requests to the intercepting media server device instead of the MPEG-DASH content origin server device. The intercepting media server device is configured to act as an HTTP proxy device, and forward the intercepted requests to the MPEG-DASH content origin server, and with each corresponding MPEG-DASH media subsegment acquired perform analysis of the video media content within the subsegment and apply selective transcoding.
METHOD, APPARATUS, DEVICE AND MEDIUM FOR GENERATING VIDEO IN TEXT MODE
In the present disclosure, methods, apparatuses, devices and media are provided for generating a video in a text mode in an information sharing application. In a method, a request is received for generating the video from a user of the information sharing application. An initial page is displayed for generating the video in the information sharing application, the initial page comprising an indication for entering text. Text input is obtained from the user in response to detection of a touch by the user in an area where the initial page locates. A video to be published in the application is generated based on the text input. In some examples, within the information sharing application, the user may directly generate a corresponding video based on a text input. In this way, a complexity of user operation may be reduced, and the user may be provided with richer publishing content.
SYSTEM AND METHOD OF AUTOMATIC MEDIA GENERATION ON A DIGITAL PLATFORM
A system and method of automatic media generation on a digital platform. The method encompasses extracting from a plurality of product contents, a first set of attributes and a second set of attributes. The method thereafter comprises creating, at least one pair of media content(s) and corresponding correlated text content(s) based at least on a successful matching of the first set of attributes with the second set of attributes. Further the method comprises generating, frame(s) based on the at least one created pair of the media content(s) and the corresponding correlated text content(s). The method thereafter encompasses ranking, the one or more frames based on one or more attributes extracted from at least one of a user profile and a product search query. The method further comprises automatically generating, at least one target media on the digital platform based on the one or more ranked frames.
CAPTION MODIFICATION AND AUGMENTATION SYSTEMS AND METHODS FOR USE BY HEARING ASSISTED USER
A system and method for facilitating communication between an assisted user (AU) and a hearing user (HU) includes receiving an HU voice signal as the AU and HU participate in a call using AU and HU communication devices, transcribing HU voice signal segments into verbatim caption segments, processing each verbatim caption segment to identify an intended communication (IC) intended by the HU upon uttering an associated one of the HU voice signal segments, for at least a portion of the HU voice signal segments (i) using an associated IC to generate an enhanced caption different than the associated verbatim caption, (ii) for each of a first subset of the HU voice signal segments, presenting the verbatim captions via the AU communication device display for consumption, and (iii) for each of a second subset of the HU voice signal segments, presenting enhanced captions via the AU communication device display for consumption.
Methods and systems for selective playback and attenuation of audio based on user preference
Systems and methods are presented for providing to filter unwanted sounds from a media asset. Voice profiles of a first character and a second character are generated based on a first voice signal and a second voice signal received from the media device during a presentation. The user provides a selection to avoid a certain sound or voice in association with the second character. During a presentation of the media asset, a second audio segment is analyzed to determine, based on the voice profile of the second character, whether the second voice signal includes the voice of a second character. If so, the second voice signal output characteristics are adjusted to reduce the sound.
Methods and systems for detecting audio output of associated device
Systems and methods for determining whether a first electronic device detects a media item that is to be output by a second electronic device is described herein. In some embodiments, an individual may request, using a first electronic device, that a media item be played on a second electronic device. The backend system may send first audio data representing a first response to the first electronic device, along with instructions to delay outputting the first response, as well as to continue sending audio data of additional audio captured thereby. The backend system may also send second audio data representing a second response to the second electronic device along with the media item. Text data may be generated representing the captured audio, which may then be compared with text data representing the second response to determine whether or not they match.
Video reader with music word learning feature
Reading material on video gives the reader a seamless reading experience by displaying on a device of their choice a series of segments containing letters, words, phrases, sentences and/or paragraphs on a background of the drafter's choice. One segment flows into the other until the reading material is completed. These sequential segments are set to be viewed seamlessly with audio accompaniment. Words, sentences or paragraphs are set to music, where recognizable features of the music are played at the appearance of a certain word or the beginning of a sentence or paragraph. The appearance of a word, sentence or paragraph may be accompanied by the appearance of an image representing the word, sentence or paragraph, along with a recognizable designated musical element.
INTERACTIVE VIEWING EXPERIENCES BY DETECTING ON-SCREEN TEXT
Systems, methods, and devices for an interactive viewing experience by detecting on-screen data are disclosed. One or more frames of video data are analyzed to detect regions in the visual video content that contain text. A character recognition operation can be performed on the regions to generate textual data. Based on the textual data and the regions, a graphical user interface (GUI) definition to can be generated. The GUI definition can be used to generate a corresponding GUI superimposed onto the visual video content to present users with controls and functionality with which to interact with the text or enhance the video content. Context metadata can be determined from external sources or by analyzing the continuity of audio and visual aspects of the video data. The context metadata can then be used to improve the character recognition or inform the generation of the GUI.
Text-driven editor for audio and video editing
The disclosed technology is a system and computer-implemented method for assembling and editing a video program from spoken words or soundbites. The disclosed technology imports source audio/video clips and any of multiple formats. Spoken audio is transcribed into searchable text. The text transcript is synchronized to the video track by timecode markers. Each spoken word corresponds to a timecode marker, which in turn corresponds to a video frame or frames. Using word processing operations and text editing functions, a user selects video segments by selecting corresponding transcribed text segments. By selecting text and arranging that text, a corresponding video program is assembled. The selected video segments are assembled on a timeline display in any chosen order by the user. The sequence of video segments may be reordered and edited, as desired, to produce a finished video program for export.