G10H7/105

VOICE SYNTHESIS METHOD, VOICE SYNTHESIS APPARATUS, AND RECORDING MEDIUM
20200294486 · 2020-09-17 ·

A voice synthesis method includes: supplying a first trained model with control data including phonetic identifier data to generate a series of frequency spectra of harmonic components; supplying a second trained model with the control data to generate a waveform signal representative of non-harmonic components; and generating a voice signal including the harmonic components and the non-harmonic components based on the series of frequency spectra of the harmonic components generated by the first trained model and the waveform signal representative of the non-harmonic components generated by the second trained model.

Smart voice enhancement architecture for tempo tracking among music, speech, and noise
10762887 · 2020-09-01 · ·

Audio data describing an audio signal may be received and used to determine a set of frames of the audio signal. A plurality of note onsets in the set of frames may be identified based on spectral energy of the audio signal in the set of frames. One or more tempos may be computed based on the identified plurality of note onsets. The one or more tempos may be validated based on a tempo validation condition. One or more music states of the audio signal may be determined based on the validated one or more tempos. Audio enhancement of the audio signal may be modified based on the one or more determined states of the audio signal.

Differentiable wavetable synthesizer using plurality of machine learning models to reduce computational complexity of audio synthesis

The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of NL learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.