G10L2019/0005

COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS
20230019128 · 2023-01-19 ·

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

AUDIO SIGNAL ENCODING AND DECODING METHOD USING NEURAL NETWORK MODEL, AND ENCODER AND DECODER FOR PERFORMING THE SAME

An audio signal encoding and decoding method using a neural network model, and an encoder and decoder for performing the same are disclosed. A method of encoding an audio signal using a neural network model, the method may include identifying an input signal, generating a quantized latent vector by inputting the input signal into a neural network model encoding the input signal, and generating a bitstream corresponding to the quantized latent vector, wherein the neural network model may include i) a feature extraction layer generating a latent vector by extracting a feature of the input signal, ii) a plurality of downsampling blocks downsampling the latent vector, and iii) a plurality of quantization blocks performing quantization of a downsampled latent vector.

Compressing audio waveforms using neural networks and vector quantizers

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

AUDIO DECODER, AUDIO ENCODER, AND RELATED METHODS USING JOINT CODING OF SCALE PARAMETERS FOR CHANNELS OF A MULTI-CHANNEL AUDIO SIGNAL

Audio decoder for decoding an encoded audio signal having multi-channel audio data having data for two or more audio channels, and information on jointly encoded scale parameters, having: a scale parameter decoder for decoding the information on the jointly encoded scale parameters to obtain a first and a second set of scale parameters for a first channel and a second channel, respectively, of a decoded audio signal; and a signal processor for applying the first and second sets of scale parameters to a first and second channel representation, respectively, derived from the multi-channel audio data to obtain the first and second channels of the decoded audio signal, wherein the jointly encoded scale parameters have information on a first group and on a second group of jointly encoded scale parameters, and wherein the scale parameter decoder is configured to combine a jointly encoded scale parameter of the first group and one of the second group using a first and a second combination rule, respectively, to obtain a scale parameter of the first and second sets of scale parameters.

Method for speech coding, method for speech decoding and their apparatuses
09852740 · 2017-12-26 · ·

A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.

AUDIO QUANTIZER AND AUDIO DEQUANTIZER AND RELATED METHODS

An audio quantizer for quantizing a plurality of audio information items has: a first stage vector quantizer for quantizing the plurality of audio information items to determine a first stage vector quantization result and a plurality of intermediate quantized items corresponding to the first stage vector quantization result; a residual item determiner for calculating a plurality of residual items from the plurality of intermediate quantized items and the plurality of audio information items; and a second stage vector quantizer for quantizing the plurality of residual items to obtain a second stage vector quantization result, wherein the first stage vector quantization result and the second stage vector quantization result are a quantized representation of the plurality of audio information items.

COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS
20230186927 · 2023-06-15 ·

Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Frequency envelope vector quantization method and apparatus
09805732 · 2017-10-31 · ·

Embodiments of the present application proposes a frequency envelope vector quantization method and apparatus, where the method includes: dividing N frequency envelopes in one frame into N1 vectors; quantizing a first vector in the N1 vectors by using a first codebook, to obtain a code word corresponding to the quantized first vector, where the first codebook is divided into 2.sup.B1 portions; determining, according to the code word corresponding to the quantized first vector; determining a second codebook according to the codebook of the i.sup.th portion; and quantizing a second vector in the N1 vectors based on the second codebook. In the embodiments of the present application, vector quantization can be performed on frequency envelope vectors by using a codebook with a smaller quantity of bits. Therefore, complexity of vector quantization can be reduced, and an effect of vector quantization can also be ensured.

Method for Speaker Diarization

Disclosed is a speaker diarization process for determining which speaker is speaking at what time during the course of a conversation. The entire process can be most easily described in five main parts: Segmentation where speech/non-speech decisions are made; frame feature extraction where useful information is obtained from the frames; segment modeling where the information from the frame feature extraction is combined with segment start and end time information to create segment specific features; speaker decisions when the segments are clustered to create speaker models; and corrections where frame level corrections are applied to the information extracted.

Efficient storage of multiple structured codebooks
11176953 · 2021-11-16 · ·

It is inter alia disclosed an apparatus comprising: a table comprising a plurality of sub vectors, wherein each entry of the table is a subvector and each subvector have vector components which are the same as vector components of one or more basis code vectors; and a further table wherein an entry of the further table comprises a first pointer pointing to a sub vector in the table and a second pointer pointing to a subvector in the table, wherein the first pointer and the second pointer are arranged in the further table such that when vector components of the sub vector pointed to by the first pointer are combined with vector components of the sub vector pointed to by the second pointer a basis code vector is formed.