G10L19/0017

COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS
20230019128 · 2023-01-19 ·

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Method and system for streaming a multichannel audio signal to a binaural hearing system
20220408202 · 2022-12-22 ·

There is provided a method for streaming a multichannel audio signal comprising a first channel (L) and a second channel (R) from an audio source device to a binaural hearing system comprising a first hearing device worn at first ear of a user and a second hearing device worn at a second ear of the user.

Audio coding method based on spectral recovery scheme

An inventive concept relates to an audio coding method to which CNN-based frequency spectrum recovery is applied. An inventive concept transmits a part of frequency spectral coefficients generated in transform coding to a decoder and the decoder recovers the frequency spectral coefficient not transmitted. Furthermore, the signs of frequency spectral coefficient are transmitted from an encoder to the decoder depending on a sign transmission rule.

Encoder, decoder, encoding method, decoding method, program, and recording medium

The present invention aims to encode and decode a sequence of integer values by substantially assigning the number of bits of a decimal fraction value per sample. An integer converter 11 selects M selected integer values from L input integer values for a set of the L input integer values and obtains J-value selection information that specifies which of the L input integer values the M selected integer values are. Furthermore, the integer converter 11 obtains one converted integer value by reversibly converting the M selected integer value and an integer value corresponding to the J-value selection information. An integer encoder 12 encodes the converted integer value to obtain a code.

TRUNCATEABLE PREDICTIVE CODING

A method, system, and computer program to encode and decode a channel coherence parameter applied on a frequency band basis, where the coherence parameters of each frequency band form a coherence vector. The coherence vector is encoded and decoded using a predictive scheme followed by a variable bit rate entropy coding.

Audio decoder, audio encoder, method for providing at least four audio channel signals on the basis of an encoded representation, method for providing an encoded representation on the basis of at least four audio channel signals and computer program using a bandwidth extension

An audio decoder for providing at least four bandwidth-extended channel signals on the basis of an encoded representation provides first and second downmix signals on the basis of a jointly encoded representation of the first and second downmix signals using a multi-channel decoding and provides at least first and second audio channel signals on the basis of the first downmix signal using a multi-channel decoding, and provides at least third and fourth audio channel signals on the basis of the second downmix signal using a multi-channel decoding. It performs a multi-channel bandwidth extension on the basis of the first and third audio channel signals, to obtain first and third bandwidth-extended channel signals, and performs a multi-channel bandwidth extension on the basis of the second and fourth audio channel signals, to obtain second and fourth bandwidth extended channel signals. An audio encoder uses a related concept.

PYRAMID VECTOR QUANTIZER SHAPE SEARCH
20230086320 · 2023-03-23 · ·

An encoder and a method therein for Pyramid Vector Quantizer, PVQ, shape search, the PVQ taking a target vector x as input and deriving a vector y by iteratively adding unit pulses in an inner dimension search loop. The method comprises, before entering a next inner dimension search loop for unit pulse addition, determining, based on the maximum pulse amplitude, maxamp.sub.y, of a current vector y, whether more than a current bit word length is needed to represent enloop.sub.y, in a lossless manner in the upcoming inner dimension loop. The variable enloop.sub.y is related to an accumulated energy of the vector y. The performing of this method enables the encoder to keep the complexity of the search at a reasonable level.

DETERMINATION OF SPATIAL AUDIO PARAMETER ENCODING AND ASSOCIATED DECODING
20220343928 · 2022-10-27 ·

An apparatus comprising means configured to: generate spatial audio signal directional metadata parameters for a block of time-frequencies; generate encoded spatial audio signal directional metadata parameters (108) for a block of time-frequencies based on a first quantization resolution (203); compare a number of bits used for the encoded spatial audio signal directional parameters (108) for the block of time-frequencies based on the first quantization resolution against a determined number of bits; output or store the encoded spatial audio signal directional metadata parameters for a block of time-frequencies (108) based on a first quantization resolution when the number of bits used for the encoded spatial audio signal directional parameters for the block of time-frequencies (108) based on the first quantization resolution is less than a determined number of bits (217); generate encoded spatial audio signal directional metadata parameters (108) for the block of time-frequencies based on a second quantization resolution when the number of bits used for the encoded spatial audio signal directional parameters for the block of time-frequencies (108) based on the first quantization resolution is more than the determined number of bits and a difference between the determined number of bits and the number of bits used for the encoded spatial audio signal directional parameters (108) for the block of time-frequencies based on the first quantization resolution is less than a determined number of bits is within a determined threshold (217); generate encoded spatial audio signal directional metadata parameters (108) for the block of time-frequencies based on a third quantization resolution when the number of bits used for the encoded spatial audio signal directional parameters (108) for the block of time-frequencies based on the first quantization resolution is more than the determined number of bits and the difference between the determined number of bits and the number of bits used for the encoded spatial audio signal directional parameters (108) for the block of time-frequencies based on the first quantization resolution is greater than the determined threshold, wherein the third quantization resolution is determined such that a number of bits used for the encoded spatial audio signal directional parameters for the block of time-frequencies based on the third quantization resolution is always equal to or less than the determined number of bits (217).

SYSTEM AND METHOD FOR ENCODING AUDIO DATA
20220343925 · 2022-10-27 ·

Methods and systems are provided for encoding audio data from an audio file, wherein the audio data comprises audio samples. Audio data is segmented from the audio file in order to obtain at least one segment. Each segment comprises a time interval of the audio data, and each segment also comprises a plurality of audio samples grouped in frames. A segment index and a description stream containing the segment index is then generated. The segment index comprises the position of the segments within the audio file. A segment stream containing the audio data of one particular segment is then generated. At least part of the audio data is encrypted during the generation of the segment stream with an encryption key.

SUPPORT FOR GENERATION OF COMFORT NOISE, AND GENERATION OF COMFORT NOISE

A method for generation of comfort noise for at least two audio channels. The method comprises determining a spatial coherence between audio signals on the respective audio channels, wherein at least one spatial coherence value per frame and frequency band is determined to form a vector of spatial coherence values. A vector of predicted spatial coherence values is formed by a weighted combination of a first coherence prediction and a second coherence prediction that are combined using a weight factor α. The method comprises signaling information about the weight factor α to the receiving node, for enabling the generation of the comfort noise for the at least two audio channels at the receiving node.