G10L19/022

GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.

GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.

METHOD AND APPARATUS FOR RECONSTRUCTING VOICE CONVERSATION
20230223032 · 2023-07-13 · ·

A voice conversation reconstruction method performed by a voice conversation reconstruction apparatus is disclosed. The method includes acquiring speaker-specific voice recognition data about voice conversation, dividing the speaker-specific voice recognition data into a plurality of blocks using a boundary between tokens according to a predefined division criterion, arranging the plurality of blocks in chronological order irrespective of a speaker, merging blocks from continuous utterance of the same speaker among the arranged plurality of blocks, and reconstructing the plurality of blocks subjected to the merging in a conversation format in chronological order and based on a speaker.

METHOD AND APPARATUS FOR RECONSTRUCTING VOICE CONVERSATION
20230223032 · 2023-07-13 · ·

A voice conversation reconstruction method performed by a voice conversation reconstruction apparatus is disclosed. The method includes acquiring speaker-specific voice recognition data about voice conversation, dividing the speaker-specific voice recognition data into a plurality of blocks using a boundary between tokens according to a predefined division criterion, arranging the plurality of blocks in chronological order irrespective of a speaker, merging blocks from continuous utterance of the same speaker among the arranged plurality of blocks, and reconstructing the plurality of blocks subjected to the merging in a conversation format in chronological order and based on a speaker.

Amplitude-independent window sizes in audio encoding
11532314 · 2022-12-20 · ·

A computer-implemented method can include receiving a first signal corresponding to a first flow of acoustic energy, applying a transform to the received first signal using at least a first amplitude-independent window size at a first frequency and a second amplitude-independent window size at a second frequency, the second amplitude-independent window size improving a temporal response at the second frequency, wherein the second frequency is subject to amplitude reduction due to a resonance phenomenon associated with the first frequency, and storing a first encoded signal, the first encoded signal based on applying the transform to the received first signal.

ADAPTIVE COEFFICIENTS AND SAMPLES ELIMINATION FOR CIRCULAR CONVOLUTION

Technologies are disclosed for improving the efficiency of real-time audio processing, and specifically for improving the efficiency of continuously modifying a real-time audio signal. Efficiency is improved by reducing memory bandwidth requirements and by reducing the amount of processing used to modify the real-time audio signal. In some configurations, memory bandwidth requirements are reduced by selectively transferring active samples in the frequency domain—e.g. avoiding the transfer samples with amplitudes of zero or near-zero. This has particular importance when the specialized hardware retrieves samples from main memory in real-time. In some configurations, the amount of processing needed to modify the audio signal is reduced by omitting operations that do not meaningfully affect the output audio signal. For example, a multiplication of samples may be avoided when at least one of the samples has an amplitude of zero or near-zero.

ADAPTIVE COEFFICIENTS AND SAMPLES ELIMINATION FOR CIRCULAR CONVOLUTION

Technologies are disclosed for improving the efficiency of real-time audio processing, and specifically for improving the efficiency of continuously modifying a real-time audio signal. Efficiency is improved by reducing memory bandwidth requirements and by reducing the amount of processing used to modify the real-time audio signal. In some configurations, memory bandwidth requirements are reduced by selectively transferring active samples in the frequency domain—e.g. avoiding the transfer samples with amplitudes of zero or near-zero. This has particular importance when the specialized hardware retrieves samples from main memory in real-time. In some configurations, the amount of processing needed to modify the audio signal is reduced by omitting operations that do not meaningfully affect the output audio signal. For example, a multiplication of samples may be avoided when at least one of the samples has an amplitude of zero or near-zero.

Oversampling in a combined transposer filter bank
11591657 · 2023-02-28 · ·

The present invention relates to coding of audio signals, and in particular to high frequency reconstruction methods including a frequency domain harmonic transposer. A system and method for generating a high frequency component of a signal from a low frequency component of the signal is described. The system comprises an analysis filter bank (501) comprising an analysis transformation unit (601) having a frequency resolution of Δf; and an analysis window (611) having a duration of D.sub.A; the analysis filter bank (501) being configured to provide a set of analysis subband signals from the low frequency component of the signal; a nonlinear processing unit (502, 650) configured to determine a set of synthesis subband signals based on a portion of the set of analysis subband signals, wherein the portion of the set of analysis subband signals is phase shifted by a transposition order T; and a synthesis filter bank (504) comprising a synthesis transformation unit (602) having a frequency resolution of QΔf; and a synthesis window (612) having a duration of D.sub.S; the synthesis filter bank (504) being configured to generate the high frequency component of the signal from the set of synthesis subband signals; wherein Q is a frequency resolution factor with Q≥1 and smaller than the transposition order T; and wherein the value of the product of the frequency resolution Δf and the duration D.sub.A of the analysis filter bank is selected based on the frequency resolution factor Q.

Oversampling in a combined transposer filter bank
11591657 · 2023-02-28 · ·

The present invention relates to coding of audio signals, and in particular to high frequency reconstruction methods including a frequency domain harmonic transposer. A system and method for generating a high frequency component of a signal from a low frequency component of the signal is described. The system comprises an analysis filter bank (501) comprising an analysis transformation unit (601) having a frequency resolution of Δf; and an analysis window (611) having a duration of D.sub.A; the analysis filter bank (501) being configured to provide a set of analysis subband signals from the low frequency component of the signal; a nonlinear processing unit (502, 650) configured to determine a set of synthesis subband signals based on a portion of the set of analysis subband signals, wherein the portion of the set of analysis subband signals is phase shifted by a transposition order T; and a synthesis filter bank (504) comprising a synthesis transformation unit (602) having a frequency resolution of QΔf; and a synthesis window (612) having a duration of D.sub.S; the synthesis filter bank (504) being configured to generate the high frequency component of the signal from the set of synthesis subband signals; wherein Q is a frequency resolution factor with Q≥1 and smaller than the transposition order T; and wherein the value of the product of the frequency resolution Δf and the duration D.sub.A of the analysis filter bank is selected based on the frequency resolution factor Q.

Harmonic transposition in an audio coding method and system
11594234 · 2023-02-28 · ·

The present invention relates to transposing signals in time and/or frequency and in particular to coding of audio signals. More particular, the present invention relates to high frequency reconstruction (HFR) methods including a frequency domain harmonic transposer. A method and system for generating a transposed output signal from an input signal using a transposition factor T is described. The system comprises an analysis window of length L.sub.a, extracting a frame of the input signal, and an analysis transformation unit of order M transforming the samples into M complex coefficients. M is a function of the transposition factor T. The system further comprises a nonlinear processing unit altering the phase of the complex coefficients by using the transposition factor T, a synthesis transformation unit of order M transforming the altered coefficients into M altered samples, and a synthesis window of length L.sub.s, generating a frame of the output signal.