Patent classifications
G10L21/038
Apparatus and Method for End-to-End Adversarial Blind Bandwidth Extension with one or more Convolutional and/or Recurrent Networks
An apparatus for processing a narrowband speech input signal by conducting bandwidth extension of the narrowband speech input signal to obtain a wideband speech output signal according to an embodiment is provided. The apparatus includes a signal envelope extrapolator including a first neural network, wherein the first neural network is configured to receive as input values of the first neural network a plurality of samples of a signal envelope of the narrowband speech input signal, and configured to determine as output values of the first neural network a plurality of extrapolated signal envelope samples. Moreover, the apparatus includes an excitation signal extrapolator configured to receive a plurality of samples of an excitation signal of the narrowband speech input signal, and configured to determine a plurality of extrapolated excitation signal samples. Furthermore, the apparatus includes a combiner configured to generate the wideband speech output signal such that the wideband speech output signal is bandwidth extended with respect to the narrowband speech input signal depending on the plurality of extrapolated signal envelope samples and depending on the plurality of extrapolated excitation signal samples.
GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
Filling of non-coded sub-vectors in transform coded audio signals
A spectrum filler for filling non-coded residual sub-vectors of a transform coded audio signal includes a sub-vector compressor configured to compress actually coded residual sub-vectors. A sub-vector rejecter is configured to reject compressed residual sub-vectors that do not fulfill a predetermined sparseness criterion. A sub-vector collector is configured to concatenate the remaining compressed residual sub-vectors to form a first virtual codebook. A coefficient combiner is configured to combine pairs of coefficients of the first virtual codebook to form a second virtual codebook. A sub-vector filler is configured to fill non-coded residual sub-vectors below a predetermined frequency with coefficients from the first virtual codebook, and to fill non-coded residual sub-vectors above the predetermined frequency with coefficients from the second virtual codebook.
Filling of non-coded sub-vectors in transform coded audio signals
A spectrum filler for filling non-coded residual sub-vectors of a transform coded audio signal includes a sub-vector compressor configured to compress actually coded residual sub-vectors. A sub-vector rejecter is configured to reject compressed residual sub-vectors that do not fulfill a predetermined sparseness criterion. A sub-vector collector is configured to concatenate the remaining compressed residual sub-vectors to form a first virtual codebook. A coefficient combiner is configured to combine pairs of coefficients of the first virtual codebook to form a second virtual codebook. A sub-vector filler is configured to fill non-coded residual sub-vectors below a predetermined frequency with coefficients from the first virtual codebook, and to fill non-coded residual sub-vectors above the predetermined frequency with coefficients from the second virtual codebook.
ACOUSTIC OUTPUT APPARATUS
The present disclosure relates to an acoustic output apparatus. The acoustic output apparatus comprising: at least one low-frequency acoustic driver that outputs sound from at least two first sound guiding holes; at least one high-frequency acoustic driver that outputs sound from at least two second sound guiding holes; and a controller configured to cause the low-frequency acoustic driver to output sound in a first frequency range, and cause the high-frequency acoustic driver to output sound in a second frequency range, wherein the second frequency range includes frequencies higher than the first frequency range.
ACOUSTIC OUTPUT APPARATUS
The present disclosure relates to an acoustic output apparatus. The acoustic output apparatus comprising: at least one low-frequency acoustic driver that outputs sound from at least two first sound guiding holes; at least one high-frequency acoustic driver that outputs sound from at least two second sound guiding holes; and a controller configured to cause the low-frequency acoustic driver to output sound in a first frequency range, and cause the high-frequency acoustic driver to output sound in a second frequency range, wherein the second frequency range includes frequencies higher than the first frequency range.
Deep learning based noise reduction method using both bone-conduction sensor and microphone signals
A deep learning speech extraction and noise reduction method fusing signals of bone vibration sensor and microphone comprises steps of a bone vibration sensor and a microphone collecting audio signals to respectively obtain a bone vibration sensor audio signal and a microphone audio signal; inputting the bone vibration sensor audio signal into a high-pass filter module and performing high-pass filtering; inputting the bone vibration sensor audio signal subjected to high-pass filtering or a signal subjected to frequency band broadening, and the microphone audio signal into a DNN module; and the DNN model obtaining subjects by prediction and the subjects are subjected to fusing and noise reduction. By combining signals of bone vibration sensor and traditional microphone, the invention uses modeling of the DNN to realize high vocal reproduction and noise suppression. Signal obtained by performing frequency band broadening on a bone vibration sensor audio signal is used as output.
SYSTEMS, METHODS, AND APPARATUSES FOR RESTORING DEGRADED SPEECH VIA A MODIFIED DIFFUSION MODEL
Systems, methods, and apparatuses to restore degraded speech via a modified diffusion model are described. An exemplary system is specially configured to train a diffusion-based vocoder containing an upsampler, based on pairing original speech x and degraded speech mel-spectrum m.sub.T samples; train a deep convoluted neural network (CNN) upsampler based on a mean absolute error loss to match the estimated original speech {circumflex over (x)}′ outputted by the diffusion-based vocoder by extracting the upsampler, generating a reference conditioner, and generating a weighted altered conditioner ć′.sub.T.sub.
Apparatus and method for encoding an audio signal using compensation values between three spectral bands
An apparatus for encoding an audio signal includes: a core encoder for core encoding first audio data in a first spectral band; a parametric coder for parametrically coding second audio data in a second spectral band being different from the first spectral band, wherein the parametric coder includes: an analyzer for analyzing first audio data in the first spectral band to obtain a first analysis result and for analyzing second audio data in the second spectral band to obtain a second analysis result; a compensator for calculating a compensation value using the first analysis result and the second analysis result; and a parameter calculated for calculating a parameter from the second audio data in the second spectral band using the compensation value.
Apparatus and method for encoding an audio signal using compensation values between three spectral bands
An apparatus for encoding an audio signal includes: a core encoder for core encoding first audio data in a first spectral band; a parametric coder for parametrically coding second audio data in a second spectral band being different from the first spectral band, wherein the parametric coder includes: an analyzer for analyzing first audio data in the first spectral band to obtain a first analysis result and for analyzing second audio data in the second spectral band to obtain a second analysis result; a compensator for calculating a compensation value using the first analysis result and the second analysis result; and a parameter calculated for calculating a parameter from the second audio data in the second spectral band using the compensation value.