Patent classifications
G10L19/035
Apparatus and method for stereo filling in multichannel coding
An apparatus for decoding an encoded multichannel signal of a current frame to obtain three or more current audio output channels is provided. A multichannel processor is adapted to select two decoded channels from three or more decoded channels depending on first multichannel parameters. Moreover, the multichannel processor is adapted to generate a first group of two or more processed channels based on the selected channels. A noise filling module is adapted to identify for at least one of the selected channels, one or more frequency bands, within which all spectral lines are quantized to zero, and to generate a mixing channel using, depending on side information, a proper subset of three or more previous audio output channels that have been decoded, and to fill the spectral lines of frequency bands, within which all spectral lines are quantized to zero, with noise generated using spectral lines of the mixing channel.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND PROGRAM
The present technology relates to a signal processing device, a signal processing method, and a program which are capable of improving encoding efficiency.
The signal processing device includes a correction unit configured to correct an audio signal of an audio object based on a gain value included in metadata of the audio object, and a quantization unit configured to calculate auditory psychological parameters based on a signal obtained by the correction and to quantize the audio signal. The present technology can be applied to an encoding device.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND PROGRAM
The present technology relates to a signal processing device, a signal processing method, and a program which are capable of improving encoding efficiency.
The signal processing device includes a correction unit configured to correct an audio signal of an audio object based on a gain value included in metadata of the audio object, and a quantization unit configured to calculate auditory psychological parameters based on a signal obtained by the correction and to quantize the audio signal. The present technology can be applied to an encoding device.
ENCODING DEVICE, DECODING DEVICE, ENCODING METHOD, DECODING METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM
An encoding device according to the disclosure includes a first encoder, which in operation, encodes a low-band signal from a voice or audio input signal to generate a first encoded signal; a decoder, which in operation, decodes the first encoded signal to generate a low-band decoded signal; a second encoder, which in operation, encodes, on the basis of the low-band decoded signal, a high-band signal comprising a band from the voice or audio input signal, the band being higher than that of the low-band signal to generate a high-band encoded signal; an energy calculator, which in operation, calculates an energy of the voice or audio input signal for each subband of a plurality of subbands of the voice or audio input signal to acquire a calculated energy for each subband of the plurality of subbands of the voice or audio input signal, quantizes the calculated energy for each subband of the plurality of subbands of the voice or audio input signal to acquire a quantized band energy for each subband of the plurality of subbands of the voice or audio input signal and outputs the quantized band energy for each subband of the plurality of subbands of the voice or audio input signal; and a multiplexer, which in operation, multiplexes the quantized band energy for each subband of the plurality of subbands of the voice or audio input signal, the first encoded signal, and the high-band encoded signal to generate and output an encoded signal.
ENCODING DEVICE, DECODING DEVICE, ENCODING METHOD, DECODING METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM
An encoding device according to the disclosure includes a first encoder, which in operation, encodes a low-band signal from a voice or audio input signal to generate a first encoded signal; a decoder, which in operation, decodes the first encoded signal to generate a low-band decoded signal; a second encoder, which in operation, encodes, on the basis of the low-band decoded signal, a high-band signal comprising a band from the voice or audio input signal, the band being higher than that of the low-band signal to generate a high-band encoded signal; an energy calculator, which in operation, calculates an energy of the voice or audio input signal for each subband of a plurality of subbands of the voice or audio input signal to acquire a calculated energy for each subband of the plurality of subbands of the voice or audio input signal, quantizes the calculated energy for each subband of the plurality of subbands of the voice or audio input signal to acquire a quantized band energy for each subband of the plurality of subbands of the voice or audio input signal and outputs the quantized band energy for each subband of the plurality of subbands of the voice or audio input signal; and a multiplexer, which in operation, multiplexes the quantized band energy for each subband of the plurality of subbands of the voice or audio input signal, the first encoded signal, and the high-band encoded signal to generate and output an encoded signal.
ENCODING METHOD, ENCODING DEVICE, DECODING METHOD, AND DECODING DEVICE USING SCALAR QUANTIZATION AND VECTOR QUANTIZATION
Provided are an encoding method, an encoding device, a decoding method, and a decoding device using a scalar quantization and a vector quantization. The encoding method includes converting an input signal of a time domain into a frequency domain, generating a first residual signal from an input signal of a frequency domain by using a scale factor, performing a scalar quantization of the first residual signal, generating a second residual signal from the scalar-quantized first residual signal, performing a lossless encoding of the scalar-quantized first residual signal, performing a vector quantization of the second residual signal, and transmitting a bitstream including the lossless-encoded first residual signal and the vector-quantized second residual signal.
ENCODING METHOD, ENCODING DEVICE, DECODING METHOD, AND DECODING DEVICE USING SCALAR QUANTIZATION AND VECTOR QUANTIZATION
Provided are an encoding method, an encoding device, a decoding method, and a decoding device using a scalar quantization and a vector quantization. The encoding method includes converting an input signal of a time domain into a frequency domain, generating a first residual signal from an input signal of a frequency domain by using a scale factor, performing a scalar quantization of the first residual signal, generating a second residual signal from the scalar-quantized first residual signal, performing a lossless encoding of the scalar-quantized first residual signal, performing a vector quantization of the second residual signal, and transmitting a bitstream including the lossless-encoded first residual signal and the vector-quantized second residual signal.
4-bit Conformer with Accurate Quantization Training for Speech Recognition
A method for training a model includes obtaining a plurality of training samples. Each respective training sample of the plurality of training samples includes a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method includes training, using quantization aware training with native integer operations, an automatic speech recognition (ASR) model on the plurality of training samples. The method also includes quantizing the trained ASR model to an integer target fixed-bit width. The quantized trained ASR model includes a plurality of weights. Each weight of the plurality of weights includes an integer with the target fixed-bit width. The method includes providing the quantized trained ASR model to a user device.
LOW-COMPLEXITY TONALITY-ADAPTIVE AUDIO SIGNAL QUANTIZATION
The invention provides an audio encoder for encoding an audio signal so as to produce therefrom an encoded signal, the audio encoder including: a framing device configured to extract frames from the audio signal; a quantizer configured to map spectral lines of a spectrum signal derived from the frame of the audio signal to quantization indices, wherein the quantizer has a dead-zone, in which the input spectral lines are mapped to quantization index zero; and a control device configured to modify the dead-zone; wherein the control device includes a tonality calculating device configured to calculate at least one tonality indicating value for at least one spectrum line or for at least one group of spectral lines, wherein the control device is configured to modify the dead-zone for the at least one spectrum line or the at least one group of spectrum lines depending on the respective tonality indicating value.
LOW-COMPLEXITY TONALITY-ADAPTIVE AUDIO SIGNAL QUANTIZATION
The invention provides an audio encoder for encoding an audio signal so as to produce therefrom an encoded signal, the audio encoder including: a framing device configured to extract frames from the audio signal; a quantizer configured to map spectral lines of a spectrum signal derived from the frame of the audio signal to quantization indices, wherein the quantizer has a dead-zone, in which the input spectral lines are mapped to quantization index zero; and a control device configured to modify the dead-zone; wherein the control device includes a tonality calculating device configured to calculate at least one tonality indicating value for at least one spectrum line or for at least one group of spectral lines, wherein the control device is configured to modify the dead-zone for the at least one spectrum line or the at least one group of spectrum lines depending on the respective tonality indicating value.