G10L21/007

Audio Encoder for Encoding an Audio Signal, Method for Encoding an Audio Signal and Computer Program under Consideration of a Detected Peak Spectral Region in an Upper Frequency Band

An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.

Audio Encoder for Encoding an Audio Signal, Method for Encoding an Audio Signal and Computer Program under Consideration of a Detected Peak Spectral Region in an Upper Frequency Band

An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.

POST FILTER FOR AUDIO SIGNALS

In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.

POST FILTER FOR AUDIO SIGNALS

In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.

AUDIO TRANSLATOR
20230282200 · 2023-09-07 · ·

Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.

METHOD AND SYSTEM TO MODIFY SPEECH IMPAIRED MESSAGES UTILIZING NEURAL NETWORK AUDIO FILTERS
20230136822 · 2023-05-04 ·

A computer implemented method, system and computer program product are provided that implement a neural network (NN) audio filter. The method, system and computer program product obtain an electronic audio signal comprising a speech impaired message and apply the audio signal to the NN audio filter to modify the speech impaired message to form an unimpaired message. The method, system and computer program product output the unimpaired message.

SYSTEMS AND METHODS FOR MODIFYING MODULATED SIGNALS FOR TRANSMISSION
20230025339 · 2023-01-26 ·

Systems and methods are disclosed herein for modifying modulated signals for transmission. The system receives a modulated signal comprising a speech signal and a carrier wave and generates first and second spectral signals by converting the modulation signal and carrier wave from the time domain to the frequency domain respectively. The system then determines spectral bands for the first and second spectral signals. For each spectral band, the system calculates a weighted spectral band value based on a magnitude of the first spectral signal within the spectral band and generates a modified spectral signal by modifying the second spectral signal with the weighted spectral band value. The system then converts the modified spectral signal from the frequency domain to the time domain and transmits the converted modified spectral signal to a server.

Method, apparatus, and non-transitory computer readable medium for processing audio of virtual meeting room

A method for processing audio generated in a virtual meeting room (VMR) includes setting a quantity of mesh vertexes according to seats in the VMR, obtaining first voiceprint information of a presenter, the first voiceprint information comprising a frequency, an amplitude, and a phase difference of an audio signal, adjusting the frequency or amplitude of the first voiceprint information according to the quantity of the mesh vertexes, and obtaining second voiceprint information; and determining a seat of the presenter in the VMR according to the second voiceprint information. An apparatus and a non-transitory computer readable medium for processing audio as above are also disclosed.

Method, apparatus, and non-transitory computer readable medium for processing audio of virtual meeting room

A method for processing audio generated in a virtual meeting room (VMR) includes setting a quantity of mesh vertexes according to seats in the VMR, obtaining first voiceprint information of a presenter, the first voiceprint information comprising a frequency, an amplitude, and a phase difference of an audio signal, adjusting the frequency or amplitude of the first voiceprint information according to the quantity of the mesh vertexes, and obtaining second voiceprint information; and determining a seat of the presenter in the VMR according to the second voiceprint information. An apparatus and a non-transitory computer readable medium for processing audio as above are also disclosed.

VOICE CONVERSION DEVICE, VOICE CONVERSION METHOD, AND VOICE CONVERSION PROGRAM

A voice conversion device and so forth, capable of realizing both high voice quality and real-time nature using spectral differentials, are provided. The voice conversion device 10 includes an acquisition unit 11 that acquires signals of a voice of a subject, a filter calculation unit 12 that performs transform of features representing a voice timbre of the voice by a trained transformer model, and subjects the features following transform to liftering by a trained lifter, thereby calculating a spectrum of a filter, a shortened filter calculation unit 13 that performs inverse Fourier transform of the spectrum of the filter, and applies a predetermined window function, thereby calculating a shortened filter, and a generating unit 14 that applies a spectrum, obtained by Fourier transform of the shortened filter, to the spectrum of the signals, and performs inverse Fourier transform, thereby generating a synthesized voice.