Patent classifications
G10H7/10
AUDIO PROCESSING METHOD, AUDIO PROCESSING SYSTEM, AND RECORDING MEDIUM
An audio processing method, for each time step of a plurality of time steps on a time axis: acquires encoded data that reflects current musical features of a tune for a current time step and musical features of the tune for succeeding time steps succeeding the current time step; acquires control data according to a real-time instruction provided by a user; and generates acoustic feature data representative of acoustic features of a synthesis sound in accordance with first input data including the acquired encoded data and the acquired control data.
Learning singing from speech
A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
SINGING VOICE CONVERSION
A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
Transducer apparatus for an edge-blown aerophone and an edge-blown aerophone having the transducer apparatus
This disclosure provides a transducer apparatus for an edge-blown aerophone, the edge-blown aerophone having an aerophone embouchure hole. An aerophone speaker delivers sound to a resonant chamber of the aerophone via the aerophone embouchure hole. An aerophone microphone receives, via the aerophone embouchure hole, sound in the resonant chamber. A housing provides a lip plate with a housing embouchure hole independent and separate from the aerophone embouchure hole. Breath sensors sense breath applied across the housing embouchure hole. An electronic processor, connected to the speaker, receives signals from the microphone and the breath sensors. The breath sensors provide signals indicative of breath strength. The electronic processor generates an excitation signal which is delivered as an acoustic excitation signal to the resonant chamber by the aerophone speaker. The electronic processor uses the signals it receives to determine a desired musical note which a player of the aerophone wishes to play.
SOUND SIGNAL SYNTHESIS METHOD, GENERATIVE MODEL TRAINING METHOD, SOUND SIGNAL SYNTHESIS SYSTEM, AND RECORDING MEDIUM
A computer-implemented sound signal synthesis method includes: generating, based on first control data representative of a plurality of conditions of a sound signal to be generated, (i) first data representative of a sound source spectrum of the sound signal, and (ii) second data representative of a spectral envelope of the sound signal; and synthesizing the sound signal based on the sound source spectrum indicated by the first data and the spectral envelope indicated by the second data.
Singing voice conversion
A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
SOUND SIGNAL SYNTHESIS METHOD, NEURAL NETWORK TRAINING METHOD, AND SOUND SYNTHESIZER
A sound signal synthesis method includes generating first data representing a deterministic component of a sound signal based on second control data representing conditions of the sound signal, generating, using a first generation model, second data representing a stochastic component of the sound signal based on the first data and first control data representing conditions of the sound signal, and combining the deterministic component represented by the first data and the stochastic component represented by the second data and thereby generating the sound signal.
SOUND SIGNAL SYNTHESIS METHOD, GENERATIVE MODEL TRAINING METHOD, SOUND SIGNAL SYNTHESIS SYSTEM, AND RECORDING MEDIUM
A computer-implemented sound signal synthesis method generates control data including pitch notation data indicative of a pitch name of a pitch of a sound signal to be synthesized and octave data indicative of an octave of the pitch of the sound signal to be synthesized; and estimates output data indicative of the sound signal to be synthesized by inputting the generated control data into a generative model that has learned a relationship between training control data including training pitch notation data indicative of a pitch name of a pitch of a reference signal and training octave data indicative of an octave of the pitch of the reference signal; and training output data indicative of the reference signal.
SOUND SIGNAL SYNTHESIS METHOD, GENERATIVE MODEL TRAINING METHOD, SOUND SIGNAL SYNTHESIS SYSTEM, AND RECORDING MEDIUM
A method generates first pitch data indicating a pitch of a first sound signal to be synthesized; and uses a generative model to estimate output data indicative of the first sound signal based on the generated first pitch data. The generative model has been trained to learn a relationship between second pitch data indicating a pitch of a second sound signal and the second sound signal. The first pitch data includes a first plurality of pieces of pitch notation data corresponding to pitch names, and is generated by setting, from among the first plurality of pieces of pitch notation data, a first piece of pitch notation data that corresponds to the pitch of the first sound signal as a hot value based on a difference between a reference pitch of a pitch name corresponding to the first piece of pitch notation data and the pitch of the first sound signal.
Singing voice conversion
A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.