IPIQ

G10L19/028

Filling of non-coded sub-vectors in transform coded audio signals

11551702 · 2023-01-10 ·

Telefonaktiebolaget Lm Ericsson (Publ)

A spectrum filler for filling non-coded residual sub-vectors of a transform coded audio signal includes a sub-vector compressor configured to compress actually coded residual sub-vectors. A sub-vector rejecter is configured to reject compressed residual sub-vectors that do not fulfill a predetermined sparseness criterion. A sub-vector collector is configured to concatenate the remaining compressed residual sub-vectors to form a first virtual codebook. A coefficient combiner is configured to combine pairs of coefficients of the first virtual codebook to form a second virtual codebook. A sub-vector filler is configured to fill non-coded residual sub-vectors below a predetermined frequency with coefficients from the first virtual codebook, and to fill non-coded residual sub-vectors above the predetermined frequency with coefficients from the second virtual codebook.

Filling of non-coded sub-vectors in transform coded audio signals

11551702 · 2023-01-10 ·

Telefonaktiebolaget Lm Ericsson (Publ)

Burst frame error handling

11694699 · 2023-07-04 ·

Telefonaktiebolaget Lm Ericsson (Publ)

Stefan Bruhn

There is provided mechanisms for frame loss concealment. A method is performed by a receiving entity. The method comprises adding, in association with constructing a substitution frame for a lost frame, a noise component to the substitution frame. The noise component has a frequency characteristic corresponding to a low-resolution spectral representation of a signal in a previously received frame.

Burst frame error handling

11694699 · 2023-07-04 ·

Telefonaktiebolaget Lm Ericsson (Publ)

Stefan Bruhn

SYSTEMS, METHODS, AND APPARATUSES FOR RESTORING DEGRADED SPEECH VIA A MODIFIED DIFFUSION MODEL

20220392471 · 2022-12-08 ·

Systems, methods, and apparatuses to restore degraded speech via a modified diffusion model are described. An exemplary system is specially configured to train a diffusion-based vocoder containing an upsampler, based on pairing original speech x and degraded speech mel-spectrum m.sub.T samples; train a deep convoluted neural network (CNN) upsampler based on a mean absolute error loss to match the estimated original speech {circumflex over (x)}′ outputted by the diffusion-based vocoder by extracting the upsampler, generating a reference conditioner, and generating a weighted altered conditioner ć′.sub.T.sub.n. The system further optimizes speech quality to invert non-linear transformation and estimate lost data by feeding the degraded mel-spectrum m.sub.T through the CNN upsampler and feeding the degraded mel-spectrum m.sub.T through the diffusion-based vocoder. The system then generates estimated original speech {circumflex over (x)}′ based on the corresponding degraded speech mel-spectrum m.sub.T. Other related embodiments are described.

SYSTEMS, METHODS, AND APPARATUSES FOR RESTORING DEGRADED SPEECH VIA A MODIFIED DIFFUSION MODEL

20220392471 · 2022-12-08 ·

Method and device for decoding signal

11610592 · 2023-03-21 ·

Huawei Technologies Co., Ltd.

An audio signal decoding device includes a non-transitory memory storage stores audio data in a form of a bitstream; and an audio decoder, by which a first spectral coefficient of a first sub-band of a current frame of an audio signal by decoding the bitstream is obtained; a first average quantity of allocated bits per spectral coefficient of the first sub-band is obtained; a first noise filling gain for the first sub-band is obtained when the first average quantity is less than a threshold; a second spectral coefficient is reconstructed according to the first noise filling gain; a frequency domain audio signal is obtained according to the first spectral coefficient and the second spectral coefficient; and a time domain audio signal is generated according to the frequency domain signal.

Method and device for decoding signal

11610592 · 2023-03-21 ·

Huawei Technologies Co., Ltd.

Channel-compensated low-level features for speaker recognition

11657823 · 2023-05-23 ·

Pindrop Security, Inc.

A system for generating channel-compensated features of a speech signal includes a channel noise simulator that degrades the speech signal, a feed forward convolutional neural network (CNN) that generates channel-compensated features of the degraded speech signal, and a loss function that computes a difference between the channel-compensated features and handcrafted features for the same raw speech signal. Each loss result may be used to update connection weights of the CNN until a predetermined threshold loss is satisfied, and the CNN may be used as a front-end for a deep neural network (DNN) for speaker recognition/verification. The DNN may include convolutional layers, a bottleneck features layer, multiple fully-connected layers and an output layer. The bottleneck features may be used to update connection weights of the convolutional layers, and dropout may be applied to the convolutional layers.

Channel-compensated low-level features for speaker recognition

11657823 · 2023-05-23 ·

Pindrop Security, Inc.

Patent classifications

G10L19/028