G10L21/0224

METHOD AND SYSTEM FOR PROTECTING USER PRIVACY DURING AUDIO CONTENT PROCESSING
20220375458 · 2022-11-24 ·

A method and system for protecting user privacy in audio content is disclosed. An audio content including private information related to at least one user is received. The audio content is segmented to generate a plurality of audio blocks. Each audio block is associated with a sequence number based on a respective chronological position in the audio content. A random key of predefined length is generated for each audio block. The plurality of audio blocks are randomly distributed to a plurality of agents for audio-to-text transcription. The random distribution is configured to scramble a data context for protecting the user privacy of the at least one user during the audio-to-text transcription. A textual transcript corresponding to the audio content is generated based on the audio-to-text transcription, the sequence number and the random key generated for each audio block.

SELF-ACTIVATED SPEECH ENHANCEMENT
20220358948 · 2022-11-10 ·

An audio input configured to input an audio stream and a noise reduction module configured for processing the audio stream for emphasizing speech content. A monophonic detector is configured to determine whether the audio stream is either monophonic or not monophonic. A decision module is configured to receive an input from the monophonic detector and to output a decision to bypass the noise-reduction when the audio stream is not monophonic.

SELF-ACTIVATED SPEECH ENHANCEMENT
20220358948 · 2022-11-10 ·

An audio input configured to input an audio stream and a noise reduction module configured for processing the audio stream for emphasizing speech content. A monophonic detector is configured to determine whether the audio stream is either monophonic or not monophonic. A decision module is configured to receive an input from the monophonic detector and to output a decision to bypass the noise-reduction when the audio stream is not monophonic.

METHOD AND APPARATUS FOR PROCESSING SIGNAL, COMPUTER READABLE MEDIUM
20220358951 · 2022-11-10 ·

A method and apparatus for processing a signal. An implementation of the method includes: acquiring a reference signal of a to-be-tested voice, the reference signal being a signal output to a voice output device, where the voice output device outputs the to-be-tested voice after obtaining the reference signal; receiving, from a voice input device, an echo signal of the to-be-tested voice, the echo signal being a signal of the to-be-tested voice collected by the voice input device; performing signal preprocessing on the reference signal and the echo signal respectively; and inputting the processed reference signal and the processed echo signal into a pre-trained time delay estimation model, to obtain a time difference between the reference signal and the echo signal output by the time delay estimation model.

METHOD AND APPARATUS FOR PROCESSING SIGNAL, COMPUTER READABLE MEDIUM
20220358951 · 2022-11-10 ·

A method and apparatus for processing a signal. An implementation of the method includes: acquiring a reference signal of a to-be-tested voice, the reference signal being a signal output to a voice output device, where the voice output device outputs the to-be-tested voice after obtaining the reference signal; receiving, from a voice input device, an echo signal of the to-be-tested voice, the echo signal being a signal of the to-be-tested voice collected by the voice input device; performing signal preprocessing on the reference signal and the echo signal respectively; and inputting the processed reference signal and the processed echo signal into a pre-trained time delay estimation model, to obtain a time difference between the reference signal and the echo signal output by the time delay estimation model.

UNIFIED DEEP NEURAL NETWORK MODEL FOR ACOUSTIC ECHO CANCELLATION AND RESIDUAL ECHO SUPPRESSION
20230096876 · 2023-03-30 · ·

A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.

UNIFIED DEEP NEURAL NETWORK MODEL FOR ACOUSTIC ECHO CANCELLATION AND RESIDUAL ECHO SUPPRESSION
20230096876 · 2023-03-30 · ·

A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.

Real-time assessment of call quality

Disclosed embodiments provide techniques for improved call quality during telephony sessions. The speech quality of an active voice session is periodically evaluated using multiple noise reduction algorithms. In an instance where the speech quality of the currently used noise reduction algorithm is below the quality of another noise reduction algorithm, the telephony system may switch to a new noise reduction algorithm as the currently used (active) noise reduction algorithm in order to improve call quality during an active voice session.

Real-time assessment of call quality

Disclosed embodiments provide techniques for improved call quality during telephony sessions. The speech quality of an active voice session is periodically evaluated using multiple noise reduction algorithms. In an instance where the speech quality of the currently used noise reduction algorithm is below the quality of another noise reduction algorithm, the telephony system may switch to a new noise reduction algorithm as the currently used (active) noise reduction algorithm in order to improve call quality during an active voice session.

Audio signal processing method and device, and storage medium

An audio signal processing method includes: acquiring audio signals from at least two sound sources respectively through at least two microphones (MICs) to obtain respective original noisy signals of the at least two MICs in a time domain; for each frame in the time domain, using a first asymmetric window to perform a windowing operation on the respective original noisy signals of the at least two MICs to acquire windowed noisy signals; performing time-frequency conversion on the windowed noisy signals to acquire respective frequency-domain noisy signals of the at least two sound sources; acquiring frequency-domain estimated signals of the at least two sound sources according to the frequency-domain noisy signals; and obtaining audio signals produced respectively by the at least two sound sources according to the frequency-domain estimated signals.