G10L21/0224

SOUND PICK-UP DEVICE, SOUND PICK-UP METHOD AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM RECORDING SOUND PICK-UP PROGRAM

A sound pick-up device includes an adaptive filter configured to generate, from a reference signal, an estimated noise signal indicating a component of a noise signal contained in an input signal, a noise elimination signal generator configured to generate a noise elimination signal by subtracting the estimated noise signal from the input signal, a filter coefficient update unit configured to update a filter coefficient of the adaptive filter using the noise elimination signal, and a sample position identification unit configured to identify at least one signal sample position among a plurality of signal sample positions from a signal sample position where the noise elimination signal has a maximum absolute value to a signal sample position where the noise elimination signal has an absolute value that is largest at a predetermined order, in which the filter coefficient update unit updates the filter coefficient at the at least one signal sample position.

AUDIO SIGNAL PROCESSING METHOD, TRAINING METHOD, APPARATUS AND STORAGE MEDIUM
20230197096 · 2023-06-22 ·

Provided are an audio signal processing method, a training method, an apparatus and a storage medium, relating to the field of data processing, in particular to, the field of voice. The audio signal processing method includes: eliminating at least part of a linear echo signal from a mixed voice signal, to obtain an intermediate processing signal, where the mixed voice signal is obtained by mixing a target voice signal with an echo signal, and the echo signal is generated in an environment where the target voice signal is located and includes the linear echo signal and a nonlinear echo signal; and removing the nonlinear echo signal and a residual part of the linear echo signal from the intermediate processing signal, by using a target full convolution neural network model, to obtain an approximate target voice signal, the target full convolution neural network model including at least two convolution layers.

AUDIO SIGNAL PROCESSING METHOD, TRAINING METHOD, APPARATUS AND STORAGE MEDIUM
20230197096 · 2023-06-22 ·

Provided are an audio signal processing method, a training method, an apparatus and a storage medium, relating to the field of data processing, in particular to, the field of voice. The audio signal processing method includes: eliminating at least part of a linear echo signal from a mixed voice signal, to obtain an intermediate processing signal, where the mixed voice signal is obtained by mixing a target voice signal with an echo signal, and the echo signal is generated in an environment where the target voice signal is located and includes the linear echo signal and a nonlinear echo signal; and removing the nonlinear echo signal and a residual part of the linear echo signal from the intermediate processing signal, by using a target full convolution neural network model, to obtain an approximate target voice signal, the target full convolution neural network model including at least two convolution layers.

NOISE DETECTION AND REMOVAL SYSTEMS, AND RELATED METHODS

Systems and techniques for removing non-stationary and/or colored noise can include one or more of the three following innovative aspects: (1) detection of an unwanted target signal, or component thereof, within an observed signal; (2) removal of the target (component) from the observed signal; and (3) filling of a gap in the observed signal generated by removal of the unwanted target (component). Removal regions, frequency bands, and/or regions of the observed signal used to train the gap filler can be adapted in correspondence with local characteristics of the observed signal and/or the target signal (component). Related aspects also are described. For example, disclosed noise detection and/or removal methods can include converting an incoming acoustic signal to a corresponding machine-readable form. And, a corrected signal in machine-readable form can be converted to a human-perceivable form, and/or to a modulated signal form conveyed over a communication connection.

SYSTEM AND METHOD FOR REMOVING NOISE AND ECHO FOR MULTI-PARTY VIDEO CONFERENCE OR VIDEO EDUCATION
20230197098 · 2023-06-22 · ·

Disclosed is a system and method for removing noise and echo for multi-party video conference or video education, wherein the system for removing noises and echoes includes a sound reception module preprocessing analog sounds received through a microphone into digital sounds that a deep learning model can learn and infer, the deep learning module learns the digital sounds preprocessed by the sound reception module through a plurality of deep learning models, and inferring a user voice using a real-time service model obtained by light-weighting a specific deep learning model of the plurality of deep learning mode, and a sound output module outputting only a digital sound inferred as the user voice by the real-time service model to an external speaker or a virtual audio device.

SYSTEM AND METHOD FOR REMOVING NOISE AND ECHO FOR MULTI-PARTY VIDEO CONFERENCE OR VIDEO EDUCATION
20230197098 · 2023-06-22 · ·

Disclosed is a system and method for removing noise and echo for multi-party video conference or video education, wherein the system for removing noises and echoes includes a sound reception module preprocessing analog sounds received through a microphone into digital sounds that a deep learning model can learn and infer, the deep learning module learns the digital sounds preprocessed by the sound reception module through a plurality of deep learning models, and inferring a user voice using a real-time service model obtained by light-weighting a specific deep learning model of the plurality of deep learning mode, and a sound output module outputting only a digital sound inferred as the user voice by the real-time service model to an external speaker or a virtual audio device.

Speech Identification and Extraction from Noise Using Extended High Frequency Information

Improved systems and methods are provided herein for extracting target speech from audio signals that can contain masking speech or other unwanted noise content. These systems and methods include detection of target speech in an input signal by detecting elevated frequency content in the signal above a threshold frequency. Portions of the signal determined to contain such elevated high frequency content are then used to generate audio filters to extract target speech from subsequently-obtained audio signals. This can include performing non-negative matrix factorization to determine a set of basis vectors to represent noise content in the spectral domain and then using the set of basis vectors to decompose subsequently-obtained audio signals into noise signals that can then be removed from the audio signals.

Speech Identification and Extraction from Noise Using Extended High Frequency Information

Improved systems and methods are provided herein for extracting target speech from audio signals that can contain masking speech or other unwanted noise content. These systems and methods include detection of target speech in an input signal by detecting elevated frequency content in the signal above a threshold frequency. Portions of the signal determined to contain such elevated high frequency content are then used to generate audio filters to extract target speech from subsequently-obtained audio signals. This can include performing non-negative matrix factorization to determine a set of basis vectors to represent noise content in the spectral domain and then using the set of basis vectors to decompose subsequently-obtained audio signals into noise signals that can then be removed from the audio signals.

NOISE SUPPRESSION DEVICE AND NOISE SUPPRESSION METHOD
20170345440 · 2017-11-30 · ·

A noise suppression device includes: an adaptive filter unit that suppresses, using an adaptive filter, a noise component contained in a voice signal generated from a voice captured by a voice input unit to generate a corrected voice signal; a noise generation detection unit that detects timing of generation of the noise component in the voice signal; and a period suppression unit that suppresses the corrected voice signal during a predetermined period of time after the timing of the generation of the noise component.

VOICE RECEIVING METHOD AND DEVICE
20170345437 · 2017-11-30 ·

A voice receiving device configured for accurate listening includes a microphone array, a camera, a capturing module, a determining module, a time module, a calculating module, and a de-noising module. The microphone array captures a first voice signal and a second voice signal and the camera captures mouth pictures of a user. The determining module determines whether the first voice signal is synchronized with the mouth pictures, and if so compares the first voice signal to a model preset voice signal of a user to determine a target voice signal. The time module obtains time delay difference between one voice reaching different microphones. The calculating module calculates a position of sound source of the target voice signal. According to the position of the sound source, the de-noising module de-noises by reference to the second voice signal. The disclosure further provides a voice receiving method.