Patent classifications
G10L19/06
METHODS AND APPARATUS TO PERFORM AUDIO WATERMARKING AND WATERMARK DETECTION AND EXTRACTION
Methods and apparatus to perform audio watermarking and watermark detection and extraction are disclosed. Example apparatus disclosed herein are to select frequency components to be used to represent a code, different sets of frequency components to represent respectively different information, respective ones of the frequency components in the sets of frequency components located in respective code bands, there being multiple code bands and spacing between adjacent code bands being equal to or less than the spacing between adjacent frequency components in the code bands. Disclosed example apparatus are also to synthesize the frequency components to be used to represent the code, combine the synthesized frequency components with an audio block of an audio signal, and output the audio signal and a video signal associated with the audio signal.
Pitch emphasis apparatus, method and program for the same
Provided is pitch enhancement processing having little unnaturalness even in time segments for consonants, and having little unnaturalness to listeners caused by discontinuities even when time segments for consonants and other time segments switch frequently. A pitch emphasis apparatus carries out the following as the pitch enhancement processing: for a time segment in which a spectral envelope of a signal has been determined to be flat, obtaining an output signal for each of times in the time segment, the output signal being a signal including a signal obtained by adding (1) a signal obtained by multiplying the signal of a time, further in the past than the time by a number of samples T.sub.0 corresponding to a pitch period of the time segment, a pitch gain σ.sub.0 of the time segment, a predetermined constant B.sub.0, and a value greater than 0 and less than 1, to (2) the signal of the time.
Audio or video encoder, audio or video decoder and related methods for processing multi-channel audio or video signals using a variable prediction direction
An encoder/decoder is based on a combination of two audio or video channels to obtain a first combination signal as a mid-signal and a residual signal derivable using a predicted side signal derived from the mid-signal. A decoder uses the prediction residual signal, the first combination signal, a prediction direction indicator and prediction information to derive decoded first channel and second channel signals. A real-to-imaginary transform can be applied for estimating the imaginary part of the spectrum of the first combination signal. The prediction signal used in the derivation of the prediction residual signal, the real-valued first combination signal is multiplied by a real portion of the complex prediction information and the estimated imaginary part of the first combination signal is multiplied by an imaginary portion of the complex prediction information.
Audio or video encoder, audio or video decoder and related methods for processing multi-channel audio or video signals using a variable prediction direction
An encoder/decoder is based on a combination of two audio or video channels to obtain a first combination signal as a mid-signal and a residual signal derivable using a predicted side signal derived from the mid-signal. A decoder uses the prediction residual signal, the first combination signal, a prediction direction indicator and prediction information to derive decoded first channel and second channel signals. A real-to-imaginary transform can be applied for estimating the imaginary part of the spectrum of the first combination signal. The prediction signal used in the derivation of the prediction residual signal, the real-valued first combination signal is multiplied by a real portion of the complex prediction information and the estimated imaginary part of the first combination signal is multiplied by an imaginary portion of the complex prediction information.
AUDIO SIGNAL ENCODING METHOD AND APPARATUS, AND AUDIO SIGNAL DECODING METHOD AND APPARATUS
An audio signal encoding method and apparatus, and an audio signal decoding method and apparatus, are described. The encoding method includes obtaining a target frequency-domain coefficient of a current frame and a reference target frequency-domain coefficient of the current frame. The encoding method further includes calculating a cost function based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, where the cost function is for determining whether to perform long-term prediction (LTP) processing on the current frame during encoding of the target frequency-domain coefficient of the current frame. Additionally, the method includes encoding the target frequency-domain coefficient of the current frame based on the cost function.
AUDIO SIGNAL ENCODING METHOD AND APPARATUS, AND AUDIO SIGNAL DECODING METHOD AND APPARATUS
An audio signal encoding method and apparatus, and an audio signal decoding method and apparatus, are described. The encoding method includes obtaining a target frequency-domain coefficient of a current frame and a reference target frequency-domain coefficient of the current frame. The encoding method further includes calculating a cost function based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, where the cost function is for determining whether to perform long-term prediction (LTP) processing on the current frame during encoding of the target frequency-domain coefficient of the current frame. Additionally, the method includes encoding the target frequency-domain coefficient of the current frame based on the cost function.
Audio processing for voice encoding and decoding using spectral shaper model
The present disclosure relates to an audio encoding and decoding (codec) system for voice encoding/decoding using a spectral shaper model. In an embodiment, a method of audio signal decoding comprises: receiving a bit stream associated with an audio signal, the bit stream including encoded transform coefficients, spectral envelope data and one or more parameters of a spectral shaper model, the spectral shaper model indicative of a fundamental frequency of a multi-sinusoidal signal model, where the fundamental frequency corresponds to a time domain delay; decoding the encoded transform coefficients; adjusting the decoded transform coefficients using the spectral envelope data and the spectral shaper model; reconstructing transform coefficients of the audio signal using the adjusted, decoded transform coefficients; and transforming the reconstructed transform coefficients into a time domain audio signal.
Audio processing for voice encoding and decoding using spectral shaper model
The present disclosure relates to an audio encoding and decoding (codec) system for voice encoding/decoding using a spectral shaper model. In an embodiment, a method of audio signal decoding comprises: receiving a bit stream associated with an audio signal, the bit stream including encoded transform coefficients, spectral envelope data and one or more parameters of a spectral shaper model, the spectral shaper model indicative of a fundamental frequency of a multi-sinusoidal signal model, where the fundamental frequency corresponds to a time domain delay; decoding the encoded transform coefficients; adjusting the decoded transform coefficients using the spectral envelope data and the spectral shaper model; reconstructing transform coefficients of the audio signal using the adjusted, decoded transform coefficients; and transforming the reconstructed transform coefficients into a time domain audio signal.
Methods and apparatus for rate quality scalable coding with generative models
Described herein is a method of decoding an audio or speech signal, the method including the steps of: (a) receiving, by a decoder, a coded bitstream including the audio or speech signal and conditioning information; (b) providing, by a bitstream decoder, decoded conditioning information in a format associated with a first bitrate; (c) converting, by a converter, the decoded conditioning information from the format associated with the first bitrate to a format associated with a second bitrate; and (d) providing, by a generative neural network, a reconstruction of the audio or speech signal according to a probabilistic model conditioned by the conditioning information in the format associated with the second bitrate. Described are further an apparatus for decoding an audio or speech signal, a respective encoder, a system of the encoder and the apparatus for decoding an audio or speech signal as well as a respective computer program product.
Methods and apparatus for rate quality scalable coding with generative models
Described herein is a method of decoding an audio or speech signal, the method including the steps of: (a) receiving, by a decoder, a coded bitstream including the audio or speech signal and conditioning information; (b) providing, by a bitstream decoder, decoded conditioning information in a format associated with a first bitrate; (c) converting, by a converter, the decoded conditioning information from the format associated with the first bitrate to a format associated with a second bitrate; and (d) providing, by a generative neural network, a reconstruction of the audio or speech signal according to a probabilistic model conditioned by the conditioning information in the format associated with the second bitrate. Described are further an apparatus for decoding an audio or speech signal, a respective encoder, a system of the encoder and the apparatus for decoding an audio or speech signal as well as a respective computer program product.