Patent classifications
G10L21/01
Automatic conversion of speech into song, rap or other audible expression having target meter or rhythm
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
MEASURING AND COMPENSATING FOR JITTER ON SYSTEMS RUNNING LATENCY-SENSITIVE AUDIO SIGNAL PROCESSING
A system and method receives one or more captured signals through a captured audio path and produces one or more playback signals through a playback audio path. The system and method executes one or more signal processing functions and measures the delays within the playback audio path and captured audio path during operation of the one or more signal processing functions. The system and method stores the measured delays in a memory and compensates the one or more signal processing functions for the playback delay and the capture delay.
Audio processor and method for processing an audio signal using horizontal phase correction
An audio processor for processing an audio signal includes an audio signal phase measure calculator configured for calculating a phase measure of an audio signal for a time frame, a target phase measure determiner for determining a target phase measure for the time frame, and a phase corrector configured for correcting phases of the audio signal for the time frame using the calculated phase measure and the target phase measure to obtain a processed audio signal.
AUDIO PROCESSING METHOD AND AUDIO PROCESSING DEVICE
An audio processing device includes a feature extraction unit and signal generating unit. The feature extraction unit is configured to extract a feature quantity of a first audio signal for each of a plurality of periods. The signal generating unit is configured to for generate a second audio signal by time axis expanding/compressing either a section of the first audio signal in which the feature quantity is steadily maintained for a period time, or a section of the first audio signal in which a fluctuation of the feature quantity is repeated and excluding from the time axis expanding/compressing a section of the first audio signal in which a fluctuation of the feature quantity is not similar to that of other sections of the first audio signal.
AUDIO PROCESSING METHOD AND AUDIO PROCESSING DEVICE
An audio processing device includes a feature extraction unit and signal generating unit. The feature extraction unit is configured to extract a feature quantity of a first audio signal for each of a plurality of periods. The signal generating unit is configured to for generate a second audio signal by time axis expanding/compressing either a section of the first audio signal in which the feature quantity is steadily maintained for a period time, or a section of the first audio signal in which a fluctuation of the feature quantity is repeated and excluding from the time axis expanding/compressing a section of the first audio signal in which a fluctuation of the feature quantity is not similar to that of other sections of the first audio signal.
Decoder and method for decoding an audio signal, encoder and method for encoding an audio signal
A decoder for decoding an audio signal includes a first target spectrum generator for generating a target spectrum for a first time frame of a subband signal of the audio signal using first correction data. A first phase corrector for corrects a phase of the subband signal in the first time frame of the audio signal determined with a phase correction algorithm, the correction being performed by reducing a difference between a measure of the subband signal in the first time frame of the audio signal and the target spectrum. An audio subband signal calculator calculates the audio subband signal for the first time frame using a corrected phase for the time frame and for calculating audio subband signals for a second time frame different from the first time frame using the measure of the subband signal in the second time frame or using a corrected phase calculation in accordance with a further phase correction algorithm different from the phase correction algorithm.
SYSTEM AND METHOD FOR AUTOMATIC ALIGNMENT OF PHONETIC CONTENT FOR REAL-TIME ACCENT CONVERSION
The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.
SYSTEM AND METHOD FOR AUTOMATIC ALIGNMENT OF PHONETIC CONTENT FOR REAL-TIME ACCENT CONVERSION
The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.
System and method for automatic alignment of phonetic content for real-time accent conversion
The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.
System and method for automatic alignment of phonetic content for real-time accent conversion
The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.