Patent classifications
G10H2210/041
Real-Time Speech To Singing Conversion
A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.
IDENTIFYING LANGUAGE IN MUSIC
The present disclosure describes techniques for identifying languages associated with music. Training data may be received, wherein the training data comprise information indicative of audio data representative of a plurality of music samples and metadata associated with the plurality of music samples. The training data further comprises information indicating a language corresponding to each of the plurality of music samples. A machine learning model may be trained to identify a language associated with a piece of music by applying the training data to the machine model until the model reaches a predetermined recognition accuracy. A language associated with the piece of music may be determined using the trained machine learning model.
System and Method for Evaluating Semantic Closeness of Data Files
The invention provides for the evaluation of semantic closeness of a source data file relative to candidate data files. The system includes an artificial neural network and processing intelligence that derives a property vector from extractable measurable properties of a data file. The property vector is mapped to related semantic properties for that same data file and such that, during ANN training, pairwise similarity/dissimilarity in property is mapped, during towards corresponding pairwise semantic similarity/dissimilarity in semantic space to preserve semantic relationships. Based on comparisons between generated property vectors in continuous multi-dimensional property space, the system and method assess, rank, and then recommend and/or filter semantically close or semantically disparate candidate files from a query from a user that includes the data file. Applications of the categorization and recommendation system apply to search tools, including identification of illicit materials or logically progressive associations between disparate files.
Processing System for Generating a Playlist from Candidate Files and Method for Generating a Playlist
The invention provides for the evaluation of semantic closeness of a source data file relative to candidate data files. The system includes an artificial neural network and processing intelligence that derives a property vector from extractable measurable properties of a data file. The property vector is mapped to related semantic properties for that same data file and such that, during ANN training, pairwise similarity/dissimilarity in property is mapped, during towards corresponding pairwise semantic similarity/dissimilarity in semantic space to preserve semantic relationships. Based on comparisons between generated property vectors in continuous multi-dimensional property space, the system and method assess, rank, and then recommend and/or filter semantically close or semantically disparate candidate files from a query from a user that includes the data file. Applications apply to search and compilation tools and particularly to recommendation tools that provide a succession of logical progressive associations that link between disparate file content in source and destination files.
System and Method for Recommending Semantically Relevant Content
A property vector derived from extractable measurable properties of a data file is mapped to semantic properties for that data file. The property vector is an output from a trained artificial neural network that, following pairwise training of the ANN using pairs of files that map pairwise similarity/dissimilarity in property space towards corresponding pairwise semantic similarity/dissimilarity in semantic space, both preserves and is representative of semantic properties of the data file. The system and method assesses, based on comparisons between generated property vectors, ranks and then recommends and/or filters semantically close or semantically disparate candidate files in a database from a query from a user that includes the data file. Applications of the categorization and recommendation system and method apply to media or search tools and social media platforms, including media in the form of music, video, images data and/or text files.
Systems and methods for capturing and interpreting audio
A device is provided for capturing vibrations produced by an object such as a musical instrument such as a cymbal of a drum kit. The device comprises a detectable element, such as a ferromagnetic element, such as a metal shim and a sensor spaced apart from and located relative to the musical instrument. The detectable element is located between the sensor and the musical instrument. When the musical instrument vibrates, the sensor remains stationary and the detectable element is vibrated relative to the sensor by the musical instrument.
Media content identification on mobile devices
A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.
Audio matching with semantic audio recognition and report generation
Example articles of manufacture and apparatus for producing supplemental information for audio signature data are disclosed herein. An example apparatus includes memory including computer readable instructions. The example apparatus also includes a processor to execute the instructions to at least obtain first audio signature data associated with a first time period of media, obtain first semantic signature data associated with the first time period of the media and second semantic signature data associated with a second time period of the media, and when second audio signature data associated with the second time period of the media is unavailable, identify the media based on the first audio signature data associated with the first time period of media when the second semantic signature data associated with the second time period matches the first semantic signature data associated with the first time period of the media.
LEARNING SINGING FROM SPEECH
A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
SINGING VOICE CONVERSION
A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.