Patent classifications
G10L19/26
Oversampling in a combined transposer filter bank
The present invention relates to coding of audio signals, and in particular to high frequency reconstruction methods including a frequency domain harmonic transposer. A system and method for generating a high frequency component of a signal from a low frequency component of the signal is described. The system comprises an analysis filter bank (501) comprising an analysis transformation unit (601) having a frequency resolution of Δf; and an analysis window (611) having a duration of D.sub.A; the analysis filter bank (501) being configured to provide a set of analysis subband signals from the low frequency component of the signal; a nonlinear processing unit (502, 650) configured to determine a set of synthesis subband signals based on a portion of the set of analysis subband signals, wherein the portion of the set of analysis subband signals is phase shifted by a transposition order T; and a synthesis filter bank (504) comprising a synthesis transformation unit (602) having a frequency resolution of QΔf; and a synthesis window (612) having a duration of D.sub.S; the synthesis filter bank (504) being configured to generate the high frequency component of the signal from the set of synthesis subband signals; wherein Q is a frequency resolution factor with Q≥1 and smaller than the transposition order T; and wherein the value of the product of the frequency resolution Δf and the duration D.sub.A of the analysis filter bank is selected based on the frequency resolution factor Q.
Encoded audio metadata-based equalization
A system for producing an encoded digital audio recording has an audio encoder that encodes a digital audio recording having a number of audio channels or audio objects. An equalization (EQ) value generator produces a sequence of EQ values which define EQ filtering that is to be applied when decoding the encoded digital audio recording, wherein the EQ filtering is to be applied to a group of one or more of the audio channels or audio objects of the recording independent of any downmix. A bitstream multiplexer combines the encoded digital audio recording with the sequence of EQ values, the latter as metadata associated with the encoded digital audio recording. Other embodiments are also described including a system for decoding the encoded audio recording.
Encoded audio metadata-based equalization
A system for producing an encoded digital audio recording has an audio encoder that encodes a digital audio recording having a number of audio channels or audio objects. An equalization (EQ) value generator produces a sequence of EQ values which define EQ filtering that is to be applied when decoding the encoded digital audio recording, wherein the EQ filtering is to be applied to a group of one or more of the audio channels or audio objects of the recording independent of any downmix. A bitstream multiplexer combines the encoded digital audio recording with the sequence of EQ values, the latter as metadata associated with the encoded digital audio recording. Other embodiments are also described including a system for decoding the encoded audio recording.
BACKWARD-COMPATIBLE INTEGRATION OF HIGH FREQUENCY RECONSTRUCTION TECHNIQUES FOR AUDIO SIGNALS
A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag.
BACKWARD-COMPATIBLE INTEGRATION OF HIGH FREQUENCY RECONSTRUCTION TECHNIQUES FOR AUDIO SIGNALS
A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag.
Determining corrections to be applied to a multichannel audio signal, associated coding and decoding
A method and device for determining a set of corrections to be made to a multichannel sound signal, in which the set of corrections is determined on the basis of an item of information representative of a spatial image of an original multichannel signal and an item of information representative of a spatial image of the original multichannel signal that has been coded and then decoded.
METHOD AND APPARATUS FOR ACQUIRING SEMANTIC INFORMATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method and an apparatus for acquiring semantic information, an electronic device and a storage medium are provided. The method includes: collecting an echo signal of vibrations of a throat; performing a Fourier transform on a waveform of each period of the echo signal to obtain a spectrogram of each period, wherein the spectrograms of M periods form a spectrogram set, the spectrogram set includes M spectrograms, and the spectrograms are arranged in sequence from first to last according to a return time sequence of the corresponding echo signal; extracting a characteristic waveform of the vibrations of the throat from the spectrogram set; segmenting the characteristic waveform to obtain characteristic segments containing the semantic information; and inputting the characteristic segments into a semantic acquisition model to acquire the semantic information.
METHOD AND APPARATUS FOR ACQUIRING SEMANTIC INFORMATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method and an apparatus for acquiring semantic information, an electronic device and a storage medium are provided. The method includes: collecting an echo signal of vibrations of a throat; performing a Fourier transform on a waveform of each period of the echo signal to obtain a spectrogram of each period, wherein the spectrograms of M periods form a spectrogram set, the spectrogram set includes M spectrograms, and the spectrograms are arranged in sequence from first to last according to a return time sequence of the corresponding echo signal; extracting a characteristic waveform of the vibrations of the throat from the spectrogram set; segmenting the characteristic waveform to obtain characteristic segments containing the semantic information; and inputting the characteristic segments into a semantic acquisition model to acquire the semantic information.
SYSTEMS AND METHODS FOR GENERATING VIDEO-ADAPTED SURROUND-SOUND
Audiovisual presentations, such as film recordings, may have been originally created having an audio soundtrack with multiple audio tracks mixed for a surround sound system that includes a set of speakers physically surrounding a user. The present disclosure presents systems and methods to remix these soundtracks into 3D audio that when presented to the ears of a user can be perceived as a virtual surround sound system that mimics the physical system. What is more, the disclosed systems and methods can enhance the virtual surround sound system by adjusting virtual speakers of the virtual surround sound system according to video content of the audiovisual presentation. Further enhancement may be possible by adjusting the virtual speakers of the virtual surround sound system according to a sensed position of a user.
SYSTEMS AND METHODS FOR GENERATING VIDEO-ADAPTED SURROUND-SOUND
Audiovisual presentations, such as film recordings, may have been originally created having an audio soundtrack with multiple audio tracks mixed for a surround sound system that includes a set of speakers physically surrounding a user. The present disclosure presents systems and methods to remix these soundtracks into 3D audio that when presented to the ears of a user can be perceived as a virtual surround sound system that mimics the physical system. What is more, the disclosed systems and methods can enhance the virtual surround sound system by adjusting virtual speakers of the virtual surround sound system according to video content of the audiovisual presentation. Further enhancement may be possible by adjusting the virtual speakers of the virtual surround sound system according to a sensed position of a user.