Patent classifications
G10L21/0216
Deep multi-channel acoustic modeling using multiple microphone array geometries
Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.
Detection and removal of wind noise
An electronic device includes one or more microphones that generate audio signals and a wind noise detection subsystem. The electronic device may also include a wind noise reduction subsystem. The wind noise detection subsystem applies multiple wind noise detection techniques to the set of audio signals to generate corresponding indications of whether wind noise is present. The wind noise detection subsystem determines whether wind noise is present based on the indications generated by each detection technique and generates an overall indication of whether wind noise is present. The wind noise reduction subsystem applies one or more wind noise reduction techniques to the audio signal if wind noise is detected. The wind noise detection and reduction techniques may work in multiple domains (e.g., the time, spatial, and frequency domains).
Audio signal processing for noise reduction
A headphone, headphone system, and speech enhancing method is provided to enhance speech pick-up from the user of a headphone and includes receiving a plurality of signals from a set of microphones and generating a primary signal by array processing the microphone signals to steer a beam toward the user's mouth. A noise reference signal is also derived from one or more microphones via a delay-and-sum technique, and a voice estimate signal is generated by filtering the primary signal to remove components that are correlated to the noise reference signal.
Audio signal processing for noise reduction
A headphone, headphone system, and speech enhancing method is provided to enhance speech pick-up from the user of a headphone and includes receiving a plurality of signals from a set of microphones and generating a primary signal by array processing the microphone signals to steer a beam toward the user's mouth. A noise reference signal is also derived from one or more microphones via a delay-and-sum technique, and a voice estimate signal is generated by filtering the primary signal to remove components that are correlated to the noise reference signal.
Apparatus, method or computer program for estimating an inter-channel time difference
An apparatus for estimating an inter-channel time difference between a first channel signal and a second channel signal, includes a signal analyzer for estimating a signal characteristic of the first channel signal or the second channel signal or both signals or a signal derived from the first channel signal or the second channel signal; a calculator for calculating a cross-correlation spectrum for a time block from the first channel signal in the time block and the second channel signal in the time block; a weighter for weighting a smoothed or non-smoothed cross-correlation spectrum to obtain a weighted cross correlation spectrum using a first weighting procedure or using a second weighting procedure depending on a signal characteristic estimated by the signal analyzer, wherein the first weighting procedure is different from the second weighting procedure; and a processor for processing the weighted cross-correlation spectrum to obtain the inter-channel time difference.
Modeling and Reduction of Drone Propulsion System Noise
In some embodiments, a method, apparatus and computer program for reducing noise from an audio signal captured by a drone (e.g., canceling the noise signature of a drone from the audio signal) using a model of noise emitted by the drone's propulsion system set, where the propulsion system set includes one or more propulsion systems, each of the propulsion systems including an electric motor, and wherein the noise reduction is performed in response to voltage data indicative of instantaneous voltage supplied to each electric motor of the propulsion system set. In some other embodiments, a method, apparatus and computer program for generating a noise model by determining the noise signature of at least one drone based upon a database of noise signals corresponding to at least one propulsion system and canceling the noise signature of the drone in an audio signal based upon the noise model.
Modeling and Reduction of Drone Propulsion System Noise
In some embodiments, a method, apparatus and computer program for reducing noise from an audio signal captured by a drone (e.g., canceling the noise signature of a drone from the audio signal) using a model of noise emitted by the drone's propulsion system set, where the propulsion system set includes one or more propulsion systems, each of the propulsion systems including an electric motor, and wherein the noise reduction is performed in response to voltage data indicative of instantaneous voltage supplied to each electric motor of the propulsion system set. In some other embodiments, a method, apparatus and computer program for generating a noise model by determining the noise signature of at least one drone based upon a database of noise signals corresponding to at least one propulsion system and canceling the noise signature of the drone in an audio signal based upon the noise model.
MICROPHONE ARRAY SPEECH ENHANCEMENT
Speech received from a microphone array is enhanced. In one example, a noise filtering system receives audio from the plurality of microphones, determines a beamformer output from the received audio, applies a first auto-regressive moving average smoothing filter to the beamformer output, determines noise estimates from the received audio, applies a second auto-regressive moving average smoothing filter to the noise estimates, and combines the first and second smoothing filter outputs to produce a power spectral density output of the received audio with reduced noise.
MICROPHONE ARRAY NOISE SUPPRESSION USING NOISE FIELD ISOTROPY ESTIMATION
Noise is suppressed from a microphone array by estimating a noise field isotropy. In some examples audio is received from a plurality of microphones. A power spectral density of a beamformer output is determined and a power spectral density of microphone noise differences is determined. A noise power spectral density is determined using a transfer function and the noise power spectral density is applied to the beamformer output power spectral density to produce a power spectral density output of the received audio with reduced noise.
METHOD, APPARATUS FOR ELIMINATING POPPING SOUNDS AT THE BEGINNING OF AUDIO, AND STORAGE MEDIUM
A method and apparatus for eliminating popping sounds at the beginning of audio includes: examining audio frames within a pre-set time period at the beginning of audio to determine a popping residing section; applying popping elimination to audio frames in the popping residing section; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting the amplitudes of the audio frames in the popping residing section to zero in response to a determination that the two average values are both smaller than a pre-set sound reduction threshold; weakening the amplitudes of the audio frames in the popping residing section in response to a determination that both the two average values are not smaller than a pre-set sound reduction threshold; M and K are integers larger than one.