Patent classifications
G10L2019/0004
Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
Methods, an encoder and a decoder are configured for transition between frames with different internal sampling rates. Linear predictive (LP) filter parameters are converted from a sampling rate S1 to a sampling rate S2. A power spectrum of a LP synthesis filter is computed, at the sampling rate S1, using the LP filter parameters. The power spectrum of the LP synthesis filter is modified to convert it from the sampling rate S1 to the sampling rate S2. The modified power spectrum of the LP synthesis filter is inverse transformed to determine autocorrelations of the LP synthesis filter at the sampling rate S2. The autocorrelations are used to compute the LP filter parameters at the sampling rate S2.
Spatial audio parameter encoding and associated decoding
An apparatus comprising means configured to: obtain at least one direction parameter value for a time-frequency part of at least one audio signal (301); obtain at least one energy ratio for the time-frequency part (301), wherein each energy ratio is associated with a respective direction parameter value; generate respective at least one modified energy ratio from the at least one energy ratio for the time-frequency part (304); determine a quantization spatial resolution for encoding the at least one obtained direction parameter value based on the at least one modified energy ratio (305); and encode the obtained direction parameter values based on the quantization spatial resolution (306).
ERROR RESILIENT TOOLS FOR AUDIO ENCODING/DECODING
There are provided examples of audio signal representation encoders, audio encoders, audio signal representation decoders, and audio decoders, in particular using error resilient tools, e.g. for learnable applications.
In one examples, there is provided an audio signal representation decoder configured to decode an audio signal representation from a bitstream, the bitstream being divided in a sequence of packets, the audio signal representation decoder comprising: a bitstream reader, configured to sequentially read the sequence of packets; a packet loss controller, configured to check whether a current packet is well received or is to be considered as lost; a quantization index converter, configured, in case the packet loss controller has determined that the current packet is well received, to convert at least one index extracted from the current packet onto at least one current code from at least one codebook, thereby forming at least one portion of the audio signal representation; and wherein the audio signal representation decoder is configured, in case the packet loss controller has determined that the current packet is to be considered as lost, to generate, through at least one learnable predictor layer, at least one current code by prediction from at least one preceding code or index, thereby forming at least one portion of the audio signal representation.
Personalized bandwidth extension
A method for personalized bandwidth extension in an audio device. The method comprises obtaining an input microphone signal with a first bandwidth, obtaining a first user parameter indicative of one or more characteristics of a user of the audio device, determining, based on the first user parameter, a bandwidth extension model, and generating an output signal with a second bandwidth by applying the determined bandwidth extension model to the input microphone signal.
Methods, Encoder And Decoder For Linear Predictive Encoding And Decoding Of Sound Signals Upon Transition Between Frames Having Different Sampling Rates
Methods, an encoder and a decoder are configured for transition between frames with different internal sampling rates. Linear predictive (LP) filter parameters are converted from a sampling rate S1 to a sampling rate S2. A power spectrum of a LP synthesis filter is computed, at the sampling rate S1, using the LP filter parameters. The power spectrum of the LP synthesis filter is modified to convert it from the sampling rate S1 to the sampling rate S2. The modified power spectrum of the LP synthesis filter is inverse transformed to determine autocorrelations of the LP synthesis filter at the sampling rate S2. The autocorrelations are used to compute the LP filter parameters at the sampling rate S2.
ATTENTION-BASED VIDEO TOKEN GENERATION
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a video output using an autoregressive token generation neural network model In one aspect, a system comprises obtaining a model input, processing the model input to generate an input sequence of embeddings that represents the model input, autoregressively generating a plurality of output sequences of tokens, wherein each output sequence of tokens corresponds to a respective output modality of tokens from a set of a plurality of modalities that includes a video modality and one or more other modalities, and generating a model output that includes a video output of the video modality by decoding the sequence of tokens.
SPATIAL AUDIO PARAMETER ENCODING AND ASSOCIATED DECODING
An apparatus comprising means configured to: obtain at least one direction parameter value for a time-frequency part of at least one audio signal; obtain at least one energy ratio for the time-frequency part, wherein each energy ratio is associated with a respective direction parameter value; generate respective at least one modified energy ratio from the at least one energy ratio for the time-frequency part; determine a quantization spatial resolution for encoding the at least one obtained direction parameter value based on the at least one modified energy ratio; and encode the obtained direction parameter values based on the quantization spatial resolution.