Patent classifications
G10L2019/0001
Audio encoder and decoder using a frequency domain processor , a time domain processor, and a cross processing for continuous initialization
An audio encoder for encoding an audio signal, includes: a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor includes: a time frequency converter for converting the first audio signal portion into a frequency domain representation having spectral lines up to a maximum frequency of the first audio signal portion; a spectral encoder for encoding the frequency domain representation; a second encoding processor for encoding a second different audio signal portion in the time domain; a cross-processor for calculating, from the encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, so that the second encoding processing is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal.
AUDIO ENCODING AND DECODING METHOD AND RELATED PRODUCT
An audio decoding method performed by a computer device includes obtaining encoding vectors of audio frames in an audio frame sequence, and performing, in response to a current audio frame in the audio frame sequence being to be decoded, up-sampling on an encoding vector of a historical audio frame to obtain an up-sampling feature value describing the historical audio frame. The historical audio frame includes one or more audio frames decoded before the current audio frame in the audio frame sequence. The method further includes performing, based on the up-sampling feature value, up-sampling on an encoding vector of the current audio frame to obtain decoded data of the current audio frame.
METHOD AND APPARATUS FOR CALCULATING DOWNMIXED SIGNAL AND RESIDUAL SIGNAL
An audio signal encoding method is provided. According to the method, if a current frame is a switching frame, a to-be-encoded downmixed signal and a to-be-encoded residual signal of the subband corresponding to the preset frequency band in the current frame are obtained based on a switch fade-in/fade-out factor of a previous frame, an initial downmixed signal and an initial residual signal of the preset frequency band of the current frame.
VOICE AUDIO COMPRESSION USING NEURAL NETWORKS
Embodiments are disclosed for training an audio processing system to perform high-quality speech audio encoding and decoding using neural networks. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an audio sequence, the audio sequence including speech audio, generating pitch data representing detected pitch within the audio sequence, passing the audio sequence through an audio encoder to generate a vector representation of the audio sequence, generating, by a vector quantizer, an encoded vector representation of the audio sequence using the vector representation of the audio sequence and a codebook of discrete vectors, and reconstructing, by an audio decoder, the audio sequence using the pitch data and the encoded vector representation of the audio sequence.
ARTIFICIAL INTELLIGENCE SYSTEM FOR MEDIA ITEM CLASSIFICATION USING TRANSFER LEARNING AND ACTIVE LEARNING
At an artificial intelligence system, training iterations of a first machine learning model are implemented. In a particular iteration, a group of data items are selected from an item collection using active learning, and respective labels selected from a set of tags are obtained for at least some of the items of the group. Using feature processing elements of a different machine learning model, a respective feature set corresponding to individual labeled items is generated in the iteration, and the feature sets are included in a training set used to train the first machine learning model. A trained version of the first machine learning model is stored after a training completion criterion is met.
Speech coding system and method using silence enhancement
Various techniques for speech coding and decoding are disclosed. For example, speech data generated from a speech signal may be decoded by receiving the speech data in a format that has at least one main pulse in a subframe of the speech data, and generating a first predicted pulse that has a lower gain than the main pulse. A second predicted pulse may also be generated as a mirror image of the first predicted pulse on a reverse time scale, on the other side of the main pulse in the subframe of the speech data. The the speech signal may be reconstructed using the first predicted pulse and the second predicted pulse.
Signal codec device and method in communication system
The present invention relates to a codec device and method for encoding/decoding voice and audio signals in a communication system, wherein: a fixed codebook excited signal is generated by using a pulse index for a voice signal; a first adaptive codebook excited signal is generated by using a pitch index for the voice signal; a fixed codebook signal is generated by multiplying the fixed codebook excited signal by a fixed codebook gain; a first adaptive codebook signal is generated by multiplying the first adaptive codebook excited signal by a first adaptive codebook gain; and a synthesized filter excited signal is generated by adding the fixed codebook signal and the first adaptive codebook signal.
Apparatus for encoding a speech signal employing ACELP in the autocorrelation domain
An apparatus for encoding a speech signal by determining a codebook vector of a speech coding algorithm is provided. The apparatus includes a matrix determiner for determining an autocorrelation matrix R, and a codebook vector determiner for determining the codebook vector depending on the autocorrelation matrix R. The matrix determiner is configured to determine the autocorrelation matrix R by determining vector coefficients of a vector r, wherein the autocorrelation matrix R includes a plurality of rows and a plurality of columns, wherein the vector r indicates one of the columns or one of the rows of the autocorrelation matrix R, wherein R(i, j)=r(|i?j|), wherein R(i, j) indicates the coefficients of the autocorrelation matrix R, wherein i is a first index indicating one of a plurality of rows of the autocorrelation matrix R, and wherein j is a second index indicating one of the plurality of columns of the autocorrelation matrix R.
COMPRESSION OF DECOMPOSED REPRESENTATIONS OF A SOUND FIELD
In general, techniques are described for compressing decomposed representations of a sound field. A device comprising a memory and processing circuitry may be configured to perform the techniques. The memory may be configured to store a bitstream representative of scene-based audio data, the scene-based audio data comprising ambisonic coefficients representative of a soundfield. The processing circuitry may be configured to process the bitstream to extract foreground components and corresponding foreground directional information, dequantize the corresponding foreground directional information to obtain corresponding dequantized directional information, and obtain, based on the foreground components and the corresponding dequantized foreground directional information, a reconstructed version of the scene-based audio data.
METHODS AND DEVICES FOR VECTOR SEGMENTATION FOR CODING
A method for partitioning of input vectors for coding is presented. The method comprises obtaining of an input vector. The input vector is segmented, in a non-recursive manner, into an integer number, N.sup.SEG, of input vector segments. A representation of a respective relative energy difference between parts of the input vector on each side of each boundary between the input vector segments is determined, in a recursive manner. The input vector segments and the representations of the relative energy differences are provided for individual coding. Partitioning units and computer programs for partitioning of input vectors for coding, as well as positional encoders, are presented.