Patent classifications
G10L19/0212
ENCODING APPARATUS FOR PROCESSING AN INPUT SIGNAL AND DECODING APPARATUS FOR PROCESSING AN ENCODED SIGNAL
Disclosed is an apparatus for processing an input signal, having a perceptual weighter and a quantizer. The perceptual weighter has a model provider and a model applicator. The model provider provides a perceptual weighted model based on the input signal. The model applicator provides a perceptually weighted spectrum by applying the perceptual weighted model to a spectrum based on the input signal. The quantizer is configured to quantize the perceptually weighted spectrum and for providing a bitstream. The quantizer has a random matrix applicator and a sign function calculator. The random matrix applicator is configured for applying a random matrix to the perceptually weighted spectrum in order to provide a transformed spectrum. The sign function calculator is configured for calculating a sign function of components of the transformed spectrum in order to provide the bitstream. The invention further refers to an apparatus for processing an encoded signal and to corresponding methods.
System and method for increasing transmission bandwidth efficiency (“EBT2”)
Systems and methods for increasing transmission bandwidth efficiency by the analysis and synthesis of the ultimate components of transmitted content are presented. To implement such a system, a dictionary or database of elemental codewords can be generated from a set of audio clips. Using such a database, a given arbitrary song or other audio file can be expressed as a series of such codewords, where each given codeword in the series is a compressed audio packet that can be used as is, or, for example, can be tagged to be modified to better match the corresponding portion of the original audio file. Each codeword in the database has an index number or unique identifier. For a relatively small number of bits used in a unique ID, e.g. 27-30, several hundreds of millions of codewords can be uniquely identified. By providing the database of codewords to receivers of a broadcast or content delivery system in advance, instead of broadcasting or streaming the actual compressed audio signal, all that need be transmitted is the series of identifiers along with any modification instructions to the identified codewords. After reception, intelligence on the receiver having access to a locally stored copy of the dictionary can reconstruct the original audio clip by accessing the codewords via the received IDs, modify them as instructed by the modification instructions, further modify the codewords either individually or in groups using the audio profile of the original audio file (also sent by the encoder) and play back a generated sequence of phase corrected codewords and modified codewords as instructed. In exemplary embodiments of the present invention, such modification can extend into neighboring codewords, and can utilize either or both (i) cross correlation based time alignment and (ii) phase continuity between harmonics, to achieve higher fidelity to the original audio clip.
MDCT-based complex prediction stereo coding
The invention provides methods and devices for stereo encoding and decoding using complex prediction in the frequency domain. In one embodiment, a decoding method, for obtaining an output stereo signal from an input stereo signal encoded by complex prediction coding and comprising first frequency-domain representations of two input channels, comprises the upmixing steps of: (i) computing a second frequency-domain representation of a first input channel; and (ii) computing an output channel on the basis of the first and second frequency-domain representations of the first input channel, the first frequency-domain representation of the second input channel and a complex prediction coefficient. The method comprises applying independent band-width limits for the input channels.
BANDWIDTH EXTENSION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
Embodiments of this application disclose a bandwidth extension (BWE) method and apparatus. The method is performed by an electronic device, and includes: performing a time-frequency transform on a to-be-processed narrowband signal to obtain a corresponding initial low-frequency spectrum; obtaining a correlation parameter of a high-frequency portion and a low-frequency portion of a target broadband spectrum based on the initial low-frequency spectrum by using a neural network model; obtaining an initial high-frequency spectrum based on the correlation parameter and the initial low-frequency spectrum; and obtaining a broadband signal according to a target low-frequency spectrum and a target high-frequency spectrum.
DETERMINATION OF SPATIAL AUDIO PARAMETER ENCODING AND ASSOCIATED DECODING
An apparatus comprising means for: receiving values for sub-bands of a frame of an audio signal, the values comprising at least one azimuth value, at least one elevation value at least one energy ratio value and at least one spread and/or surround coherence value for each sub-band; determining a codebook for encoding at least one spread and/or surround coherence value for each sub-band based on the at least one energy ratio value and at least one azimuth value for each sub-band for a frame; discrete cosine transforming at least one vector, the at least one vector comprising the at least one spread and/or surround coherence value for a sub-band for the frame; and encoding a first number of components of the discrete cosine transformed vector based on the determined codebook.
End Node Spectrogram Compression For Machine Learning Speech Recognition
A system and method of recording and transmitting compressed audio signals over a network is disclosed. The end node device first converts the audio signal to a spectrogram, which is commonly used by machine learning algorithms to perform speech recognition. The end node device then compresses the spectrogram prior to transmission. In certain embodiments, the compression is performed using Discrete Cosine Transforms (DCT). Furthermore, in some embodiments, the DCT is performed on the difference between two columns of the spectrogram. Further, in some embodiments, a function that replaces values below a predetermined threshold with zeroes in the Encoded Spectrogram is utilized. These functions may be performed in hardware or software.
Harmonic Transposition in an Audio Coding Method and System
The present invention relates to transposing signals in time and/or frequency and in particular to coding of audio signals. More particular, the present invention relates to high frequency reconstruction (HFR) methods including a frequency domain harmonic transposer. A method and system for generating a transposed output signal from an input signal using a transposition factor T is described. The system comprises an analysis window of length L.sub.a, extracting a frame of the input signal, and an analysis transformation unit of order M transforming the samples into M complex coefficients. M is a function of the transposition factor T. The system further comprises a nonlinear processing unit altering the phase of the complex coefficients by using the transposition factor T, a synthesis transformation unit of order M transforming the altered coefficients into M altered samples, and a synthesis window of length L.sub.s, generating a frame of the output signal.
Downmixer and Method of Downmixing
A downmixer for downmixing a multi-channel signal having at least two channels, includes: a weighting value estimator for estimating band-wise weighting values for the at least two channels; a spectral weighter for weighting spectral domain representations of the at least two channels using the band-wise weighting values; a converter for converting weighted spectral domain representations of the at least two channels into time representations of the at least two channels; and a mixer for mixing the time representations of the at least two channels to obtain a downmix signal.
MDCT-BASED COMPLEX PREDICTION STEREO CODING
The invention provides methods and devices for stereo encoding and decoding using complex prediction in the frequency domain. In one embodiment, a decoding method, for obtaining an output stereo signal from an input stereo signal encoded by complex prediction coding and comprising first frequency-domain representations of two input channels, comprises the upmixing steps of: (i) computing a second frequency-domain representation of a first input channel; and (ii) computing an output channel on the basis of the first and second frequency-domain representations of the first input channel, the first frequency-domain representation of the second input channel and a complex prediction coefficient. The upmixing can be suspended responsive to control data.
TIME-VARYING TIME-FREQUENCY TILINGS USING NON-UNIFORM ORTHOGONAL FILTERBANKS BASED ON MDCT ANALYSIS/SYNTHESIS AND TDAR
Embodiments provide a method for processing an audio signal, including: performing a cascaded lapped critically sampled transform on two partially overlapping blocks of samples of the audio signal, to obtain sets of subband samples; identifying one or more sets of subband samples that in combination represent the same region of the time-frequency plane; performing time-frequency transforms on the identified one or more sets of subband samples, to obtain one or more time-frequency transformed subband samples, each of which represents the same region in the time-frequency plane; performing a weighted combination of two corresponding sets of subband samples or time-frequency transformed versions thereof, to obtain aliasing reduced subband representations of the audio signal.