Patent classifications
G10L19/08
VOICE PROCESSING METHOD, APPARATUS, AND DEVICE AND STORAGE MEDIUM
A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set.
ELECTRONIC DEVICE AND CONTROL METHOD THEREOF
The disclosure relates to an electronic device and a control method thereof. The electronic device includes a memory, and a processor configured to: obtain first feature data for estimating a waveform by inputting acoustic data of a first quality to a first encoder model; and obtain waveform data of a second quality that is a higher quality than the first quality by inputting the first feature data to a decoder model to.
ELECTRONIC DEVICE AND CONTROL METHOD THEREOF
The disclosure relates to an electronic device and a control method thereof. The electronic device includes a memory, and a processor configured to: obtain first feature data for estimating a waveform by inputting acoustic data of a first quality to a first encoder model; and obtain waveform data of a second quality that is a higher quality than the first quality by inputting the first feature data to a decoder model to.
Artificial intelligence based audio coding
Techniques are described for coding audio signals. For example, using a neural network, a residual signal is generated for a sample of an audio signal based on inputs to the neural network. The residual signal is configured to excite a long-term prediction filter and/or a short-term prediction filter. Using the long-term prediction filter and/or the short-term prediction filter, a sample of a reconstructed audio signal is determined. The sample of the reconstructed audio signal is determined based on the residual signal generated using the neural network for the sample of the audio signal.
Artificial intelligence based audio coding
Techniques are described for coding audio signals. For example, using a neural network, a residual signal is generated for a sample of an audio signal based on inputs to the neural network. The residual signal is configured to excite a long-term prediction filter and/or a short-term prediction filter. Using the long-term prediction filter and/or the short-term prediction filter, a sample of a reconstructed audio signal is determined. The sample of the reconstructed audio signal is determined based on the residual signal generated using the neural network for the sample of the audio signal.
PARAMETER ENCODING AND DECODING
There are disclosed several examples of encoding and decoding technique. In particular, an audio synthesizer for generating a synthesis signal from a downmix signal, includes: an input interface for receiving the downmix signal, the downmix signal having a number of downmix channels and side information, the side information including channel level and correlation information of an original signal, the original signal having a number of original channels; and a synthesis processor for generating, according to at least one mixing rule, the synthesis signal using: channel level and correlation information of the original signal; and covariance information associated with the downmix signal.
PARAMETER ENCODING AND DECODING
There are disclosed several examples of encoding and decoding technique. In particular, an audio synthesizer for generating a synthesis signal from a downmix signal includes: an input interface for receiving the downmix signal, the downmix signal having a number of downmix channels and side information, the side information including channel level and correlation information of an original signal, the original signal having a number of original channels; and a synthesis processor for generating, according to at least one mixing rule, the synthesis signal using: channel level and correlation information of the original signal; and covariance information associated with the downmix signal.
PARAMETER ENCODING AND DECODING
There are disclosed several examples of encoding and decoding technique. In particular, an audio synthesizer for generating a synthesis signal from a downmix signal, includes: an input interface for receiving the downmix signal, the downmix signal having a number of downmix channels and side information, the side information including channel level and correlation information of an original signal, the original signal having a number of original channels; and a synthesis processor for generating, according to at least one mixing rule, the synthesis signal using: channel level and correlation information of the original signal; and covariance information associated with the downmix signal.
PARAMETER ENCODING AND DECODING
There are disclosed several examples of encoding and decoding technique. In particular, an audio synthesizer for generating a synthesis signal from a downmix signal, includes: an input interface for receiving the downmix signal, the downmix signal having a number of downmix channels and side information, the side information including channel level and correlation information of an original signal, the original signal having a number of original channels; and a synthesis processor for generating, according to at least one mixing rule, the synthesis signal using: channel level and correlation information of the original signal; and covariance information associated with the downmix signal.
Processing method of audio signal using spectral envelope signal and excitation signal and electronic device including a plurality of microphones supporting the same
According to an embodiment, the above-described specification discloses an electronic device comprises at least one processor configured to: receive a first audio signal and a second audio signal; detect a spectral envelope signal from the first audio signal and extract a feature point from the second audio signal; extend a high-band of the second audio signal based on the spectral envelope signal from the first audio signal and the feature point from the second audio signal to generate a high-band extension signal; and mix the high-band extension signal and the first audio signal, thereby resulting in a synthesized signal.