Patent classifications
G10L2019/0013
VECTOR QUANTIZER
Vector Quantizer and method therein for vector quantization, e.g. in a transform audio codec. The method comprises comparing an input target vector with four centroids C.sub.0, C.sub.1, C.sub.0,flip and C.sub.1,flip, wherein centroid C.sub.0,flip is a flipped version of centroid C.sub.0 and centroid C.sub.1,flip is a flipped version of centroid C.sub.1, each centroid representing a respective class of codevectors. A starting point for a search related to the input target vector in the codebook is determined, based on the comparison. A search is performed in the codebook, starting at the determined starting point, and a codevector is identified to represent the input target vector. A number of input target vectors per block or time segment is variable. A search space is dynamically adjusted to the number of input target vectors. The codevectors are sorted according to a distortion measure reflecting the distance between each codevector and the centroids C.sub.0 and C.sub.1.
PYRAMID VECTOR QUANTIZER SHAPE SEARCH
An encoder and a method therein for Pyramid Vector Quantizer, PVQ, shape search, the PVQ taking a target vector x as input and deriving a vector y by iteratively adding unit pulses in an inner dimension search loop. The method comprises, before entering a next inner dimension search loop for unit pulse addition, determining, based on the maximum pulse amplitude, maxamp.sub.y, of a current vector y, whether more than a current bit word length is needed to represent enloop.sub.y, in a lossless manner in the upcoming inner dimension loop. The variable enloop.sub.y is related to an accumulated energy of the vector y. The performing of this method enables the encoder to keep the complexity of the search at a reasonable level.
PACKET LOSS CONCEALMENT IN AN AUDIO DECODER
A method of performing packet loss concealment in a neural audio encoder/decoder (codec) system. The method includes receiving an indication of a lost audio packet at a receive side of a neural network audio codec system that includes an audio encoder and an audio decoder, wherein the lost audio packet comprises an index of a codeword that is representative of a portion of speech audio presented to the audio encoder, predicting the index of the codeword in the lost packet to obtain a predicted index, deriving a predicted embedding vector from the predicted index, and decoding, by the audio decoder, the embedding vector to generate an audio output.
METHOD AND DEVICE FOR GENERATING SPEECH, STORAGE MEDIUM, AND ELECTRONIC DEVICE
A method for generating a speech includes acquiring a voice feature vector of a voice to be processed, and inputting the voice feature vector into a voice generation model to obtain a language unit vector; acquiring a text feature vector, and determining, according to the text feature vector and the language unit vector, a feature vector to be processed; and inputting the feature vector to be processed into a sequence-to-sequence model to obtain an acoustic feature vector, and inputting the acoustic feature vector into a vocoder to obtain a target voice corresponding to the voice to be processed or the text feature vector.
METHODS OF FIXED CODEBOOK SEARCHING FOR AUDIO CODECS
Methods and systems are described for encoding voice speech. A method may include receiving, by an audio encoder, an audio signal comprising a plurality of subframes; determining, for a first subframe of the plurality of subframes, a number of fixed codebook (FCB) pulses according to a rate distortion criteria; selecting, in the subframe, a first set of one or more FCB pulses across a time domain and according to the determined number of FCB pulses; and generating a FCB signal based on the selected first set of FCB pulses.
Speech encoding and decoding methods and apparatuses, computer device, and storage medium
This application relates to a speech encoding method performed by a computer device the method, including: performing subband decomposition on a target speech signal to obtain a plurality of subband excitation signals; obtaining an auditory perception representational value that corresponds to each subband excitation signal; determining at least one first subband excitation signal and at least one second subband excitation signal from the at least two subband excitation signals; obtaining a gain of each of the at least one first subband excitation signal relative to a preset reference excitation signal as an encoding parameter that corresponds to the first subband excitation signal; obtaining a corresponding encoding parameter that is obtained by quantizing each of the at least one second subband excitation signal; and performing encoding on each subband excitation signal based on the encoding parameter corresponding to the subband excitation signal.