Patent classifications
G06F16/685
System and method to support synchronization, closed captioning and highlight within a text document or a media file
The present invention relates to a system and method for synchronizing and highlighting a target text and audio associated with a reference document. The system and method may comprise one or more of an input unit, an extracting unit, a mapping unit, a processing unit, and an image resizing unit. The system and method may synchronize the target text and audio in order to provide a user with a Read Along. The invention further synchronizes and highlights closed captions and audio that helps people with hearing impairment to comprehend better while watching a movie or listening to songs.
Using Video Clips as Dictionary Usage Examples
Implementations are provided for automatically mining corpus(es) of electronic video files for video clips that contain spoken utterances that are suitable usage examples to accompany or compliment dictionary definitions. These video clips may then be associated with target n-grams in a searchable database, such as a database underlying an online dictionary. In various implementations, a set of candidate video clips in which a target n-gram is uttered in a target context may be identified from a corpus of electronic video files. For each candidate video clip of the set, pre-existing manual subtitles associated with the candidate video clip may be compared to text generated based on speech recognition processing of an audio portion of the candidate video clip. Based at least in part on the comparing, a measure of suitability as a dictionary usage example may be calculated for the candidate video clip.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
An information processing apparatus according to the present disclosure includes: a storage unit that stores a plurality of pieces of music feature information in which a plurality of types of feature amounts extracted from music information is associated with predetermined identification information, the music feature information being used as learning data in composition processing using machine learning; a reception unit that receives instruction information transmitted from a terminal apparatus; an extraction unit that extracts the music feature information from the storage unit according to the instruction information; and an output unit that outputs presentation information of the music feature information extracted by the extraction unit.
Tagging an image with audio-related metadata
In one aspect, an example method to be performed by a computing device includes (a) receiving a request to use a camera of the computing device; (b) in response to receiving the request, (i) using a microphone of the computing device to capture audio content and (ii) using the camera of the computing device to capture an image; (c) identifying reference audio content that has at least a threshold extent of similarity with the captured audio content; and (d) outputting an indication of the identified reference audio content while displaying the captured image.
System and method for selecting media content
Methods, systems, and computer programs for generating a playlist of media content items without explicit content. A vector space is created that represents explicit and non-explicit tracks in the same playlists created by other users and then tracks are filtered based on cosine distance between the “seed tracks” and all the tracks in the aforementioned playlist. The explicit tracks are filtered out, and tracks are sorted based on the affinity of the user to the artist.
System and Method for Generating Cue Sheets
A system and method for generating a cue sheet for submission to a performing rights organization. The system comprising a processor for executing the steps of: receiving an electronic file associated with a broadcast production comprising a list of cues used in the broadcast production and associated use data; extracting irrelevant data from the electronic file by determining and deleting irrelevant cues, determining and deleting unnecessary and unrelated use data from each cue, and determining and deleting duplicative cues, so that relevant cues and associated use data remain after extracting; searching a music catalog database to retrieve identification data for each relevant cue; storing each relevant cue with its associated use data and identification data in a memory; and generating a cue sheet comprising relevant cues with associated use data and identification data for each relevant cue.
ANNOTATION OF MEDIA FILES WITH CONVENIENT PAUSE POINTS
A computer-implemented method, a computer system and a computer program product annotate media files with convenient pause points. The method includes acquiring a text file version of an audio narration file. The text file version includes a pause point history of a plurality of prior users. The method also includes generating a list of pause points based on the pause point history. In addition, the method includes determining a tone of voice being used by a speaker at each pause point using natural language processing algorithms. The method further includes determining a set of convenient pause points based on the list of pause points and the determined tone of voice. Lastly, the method includes inserting the determined set of convenient pause points into the audio narration file.
VIDEO GENERATION METHOD, APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT
This disclosure provides a video generation method, a video generation apparatus, an electronic device, a storage medium and a program product, and relates to the field of artificial intelligence technology, and in particular to the field of computer vision technology and deep learning technology. A specific implementation includes: obtaining document content information of a document; extracting, from the document content information, populating information for multiple scenes in a preset video template; populating the populating information for the multiple scenes into corresponding scenes in the preset video template, respectively, to obtain image information of the multiple scenes; generating audio information of the multiple scenes according to the populating information for the multiple scenes; generating a video of the document based on the image information and audio information of the multiple scenes.
MUSIC ANALYSIS AND RECOMMENDATION ENGINE
A music analysis and recommendation system (“the system”) is configured to receive and analyzing data associated with a song performed by a performer. The system also accesses a current contextual information repository to identify a current cultural paradigm and maps the current cultural paradigm to a historical contextual information repository to identify one or more historical periods that have a cultural paradigm matching the current cultural paradigm. The system then identifies one or more hit songs during the one or more historical periods and retrieves data associated with the one or more hit songs. The data associated with the song performed by the performer is compared with data associated with each of the hit songs to determine a similarity. Based upon the determined similarities, the system determines a likelihood of the song becoming a hit song and/or a likelihood of the performer becoming a hit song performer.
AUDIO GENERATION METHOD, RELATED APPARATUS, AND STORAGE MEDIUM
Embodiments of this application provide an audio generation method, a related apparatus, and a storage medium, to provide a better audio generation solution for a user. In embodiments of this application, a text is obtained, a song clip corresponding to the text is obtained through matching, and the song clip is used as audio corresponding to the text. In this way, the text can be expressed in a manner of the song clip.