Patent classifications
G10L19/0208
Methods for improving high frequency reconstruction
The present invention proposes a new method and a new apparatus for enhancement of audio source coding systems utilising high frequency reconstruction (HFR). It utilises a detection mechanism on the encoder side to assess what parts of the spectrum will not be correctly reproduced by the HFR method in the decoder. Information on this is efficiently coded and sent to the decoder, where it is combined with the output of the HFR input.
SYSTEM AND METHOD FOR ENHANCEMENT OF A DEGRADED AUDIO SIGNAL
The present disclosure relates to the field of audio enhancement, and in particular to methods, devices and software for supervised training of a machine learning model, MLM, the MLM trained to enhance a degraded audio signal by calculating gains to be applied to frequency bands of the degraded audio signal. The present disclosure further relates to methods, devices and software for use of such a trained MLM.
Apparatus and method for decoding or encoding an audio signal using energy information values for a reconstruction band
An apparatus for decoding an encoded audio signal having an encoded representation of a first set of first spectral portions and an encoded representation of parametric data indicating spectral energies for a second set of second spectral portions, has: an audio decoder for decoding the encoded representation of the first set of the first spectral portions to obtain a first set of first spectral portions and for decoding the encoded representation of the parametric data to obtain a decoded parametric data for the second set of second spectral portions indicating, for individual reconstruction bands, individual energies; a frequency regenerator for reconstructing spectral values in a reconstruction band having a second spectral portion using a first spectral portion of the first set of the first spectral portions and an individual energy for the reconstruction band, the reconstruction band having a first spectral portion and the second spectral portion.
Method and apparatus for detecting a voice activity in an input audio signal
The disclosure provides a method and an apparatus for detecting a voice activity in an input audio signal composed of frames. A noise attribute of the input signal is determined based on a received frame of the input audio signal. A voice activity detection (VAD) parameter is derived based on the noise attribute of the input audio signal using an adaptive function. The derived VAD parameter is compared with a threshold value to provide a voice activity detection decision. The input audio signal is processed according to the voice activity detection decision.
BANDWIDTH EXTENSION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
Embodiments of this application disclose a bandwidth extension (BWE) method and apparatus. The method is performed by an electronic device, and includes: performing a time-frequency transform on a to-be-processed narrowband signal to obtain a corresponding initial low-frequency spectrum; obtaining a correlation parameter of a high-frequency portion and a low-frequency portion of a target broadband spectrum based on the initial low-frequency spectrum by using a neural network model; obtaining an initial high-frequency spectrum based on the correlation parameter and the initial low-frequency spectrum; and obtaining a broadband signal according to a target low-frequency spectrum and a target high-frequency spectrum.
SPEECH SIGNAL ENCODING AND DECODING METHODS AND APPARATUSES, ELECTRONIC DEVICE, AND STORAGE MEDIUM
Embodiments of the disclosure provide a speech signal encoding and decoding methods and apparatuses, an electronic device, and a computer-readable storage medium. The decoding method includes: obtaining a bitstream including a first substream, and the first substream being a bitstream obtained by encoding a low-frequency subband signal of an original speech signal by using a first encoding method; and performing signal reconstruction based on the first substream in a case that the bitstream includes no second substream, to obtain a reconstructed speech signal; or performing signal reconstruction based on the first substream and at least one second substream in a case that the bitstream includes the at least one second substream, to obtain a reconstructed speech signal, each second sub stream being obtained by encoding a high-frequency subband signal of the original speech signal, and the encoding being performed by using a second encoding method corresponding to the each second substream.
MACHINE LEARNING-BASED AUDIO CODEC SWITCHING
Described herein are techniques, devices, and systems for selectively using a music-capable audio codec on-demand during a communication session. A user equipment (UE) may adaptively transition between using a first audio codec that provides a first audio bandwidth and a second audio codec (e.g., the EVS-FB codec) that provides a second audio bandwidth that is greater than the first audio bandwidth. The transition to the second audio codec may occur in response to determining that sound in the environment of the UE includes frequencies outside of a range of frequencies associated with a human voice, such as by determining that music is being played in the environment of the UE, which allows for selectively using a music-capable audio codec when it would be beneficial to do so.
Cross product enhanced subband block based harmonic transposition
The invention provides an efficient implementation of cross-product enhanced high-frequency reconstruction (HFR), wherein a new component at frequency QΩ+rΩ.sub.0 is generated on the basis of existing components at Ω and Ω+Ω.sub.0. The invention provides a block-based harmonic transposition, wherein a time block of complex subband samples is processed with a common phase modification. Superposition of several modified samples has the net effect of limiting undesirable intermodulation products, thereby enabling a coarser frequency resolution and/or lower degree of oversampling to be used. In one embodiment, the invention further includes a window function suitable for use with block-based cross-product enhanced HFR. A hardware embodiment of the invention may include an analysis filter bank, a subband processing unit configurable by control data and a synthesis filter bank.
System and method for non-destructively normalizing loudness of audio signals within portable devices
Many portable playback devices cannot decode and playback encoded audio content having wide bandwidth and wide dynamic range with consistent loudness and intelligibility unless the encoded audio content has been prepared specially for these devices. This problem can be overcome by including with the encoded content some metadata that specifies a suitable dynamic range compression profile by either absolute values or differential values relative to another known compression profile. A playback device may also adaptively apply gain and limiting to the playback audio. Implementations in encoders, in transcoders and in decoders are disclosed.
Digital filterbank for spectral envelope adjustment
An apparatus and method are disclosed for processing an audio signal. The apparatus includes an input interface, a digital filterbank having an analysis part and a synthesis part, a first phase shifter, a spectral envelope adjuster, a second phase shifter, and an output interface. The first phase shifter and the second phase shifter reduce a complexity of the digital filterbank, which includes both analysis and synthesis filters that are complex-exponential modulated versions of a prototype filter.