G10L21/0232

Audio-based detection and tracking of emergency vehicles

Techniques are provided for audio-based detection and tracking of an acoustic source. A methodology implementing the techniques according to an embodiment includes generating acoustic signal spectra from signals provided by a microphone array, and performing beamforming on the acoustic signal spectra to generate beam signal spectra, using time-frequency masks to reduce noise. The method also includes detecting, by a deep neural network (DNN) classifier, an acoustic event, associated with the acoustic source, in the beam signal spectra. The DNN is trained on acoustic features associated with the acoustic event. The method further includes performing pattern extraction, in response to the detection, to identify time-frequency bins of the acoustic signal spectra that are associated with the acoustic event, and estimating a motion direction of the source relative to the array of microphones based on Doppler frequency shift of the acoustic event calculated from the time-frequency bins of the extracted pattern.

Audio-based detection and tracking of emergency vehicles

Techniques are provided for audio-based detection and tracking of an acoustic source. A methodology implementing the techniques according to an embodiment includes generating acoustic signal spectra from signals provided by a microphone array, and performing beamforming on the acoustic signal spectra to generate beam signal spectra, using time-frequency masks to reduce noise. The method also includes detecting, by a deep neural network (DNN) classifier, an acoustic event, associated with the acoustic source, in the beam signal spectra. The DNN is trained on acoustic features associated with the acoustic event. The method further includes performing pattern extraction, in response to the detection, to identify time-frequency bins of the acoustic signal spectra that are associated with the acoustic event, and estimating a motion direction of the source relative to the array of microphones based on Doppler frequency shift of the acoustic event calculated from the time-frequency bins of the extracted pattern.

Sound Processing Apparatus and Sound Processing Method
20230238013 · 2023-07-27 ·

A sound processing apparatus includes sound collection circuity that collects a sound and generates a first sound signal, and processing circuitry that estimates an estimated noise, controls a gain of the first sound signal and outputs a second sound signal based on the estimated noise, performs filter processing to reduce a component of a predetermined frequency band of the second sound signal based at least in part on the estimated noise.

Sound Processing Apparatus and Sound Processing Method
20230238013 · 2023-07-27 ·

A sound processing apparatus includes sound collection circuity that collects a sound and generates a first sound signal, and processing circuitry that estimates an estimated noise, controls a gain of the first sound signal and outputs a second sound signal based on the estimated noise, performs filter processing to reduce a component of a predetermined frequency band of the second sound signal based at least in part on the estimated noise.

METHOD AND SYSTEM FOR SPEECH DETECTION AND SPEECH ENHANCEMENT
20230005469 · 2023-01-05 ·

A method of speech detection and speech enhancement in a speech detection and speech enhancement unit of Multipoint Conferencing Node (MCN) and a method of training the same. The method comprising receiving input audio segments, and determining an acoustic environment based on input audio auxiliary information, extracting T-F-domain features from the received input audio segments, determining if each of the received input audio segments is speech by inputting the T-F domain features into a speech detection classifier trained for the determined acoustic environment, determining, when one of the received input audio segments is speech, if the received audio segment is noisy speech by inputting the T-F domain features into a noise classifier using a statistical generative model representing the probability distributions of the T-F domain features of noisy speech trained for the determined acoustic environment, and applying a noise reduction mask on the received input audio segments according to the determination of the received audio segment is noisy speech

METHOD AND SYSTEM FOR SPEECH DETECTION AND SPEECH ENHANCEMENT
20230005469 · 2023-01-05 ·

A method of speech detection and speech enhancement in a speech detection and speech enhancement unit of Multipoint Conferencing Node (MCN) and a method of training the same. The method comprising receiving input audio segments, and determining an acoustic environment based on input audio auxiliary information, extracting T-F-domain features from the received input audio segments, determining if each of the received input audio segments is speech by inputting the T-F domain features into a speech detection classifier trained for the determined acoustic environment, determining, when one of the received input audio segments is speech, if the received audio segment is noisy speech by inputting the T-F domain features into a noise classifier using a statistical generative model representing the probability distributions of the T-F domain features of noisy speech trained for the determined acoustic environment, and applying a noise reduction mask on the received input audio segments according to the determination of the received audio segment is noisy speech

SOUND PROCESSING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM
20230007393 · 2023-01-05 ·

A sound processing method includes: determining a vector of a first residual signal according to a first signal vector and a second signal vector, the first signal vector including a first voice signal and a first noise signal input into the first microphone, the second signal vector including a second voice signal and a second noise signal input into the second microphone, and the first residual signal including the second noise signal and a residual voice signal; determining a gain function of a current frame according to the vector of the first residual signal and the first signal vector; and determining a first voice signal of the current frame according to the first signal vector and the gain function of the current frame.

Encoding parameter adjustment method and apparatus, device, and storage medium

An encoding parameter adjustment method is performed at a computer device. The method includes: obtaining a first audio signal, and determining a psychoacoustic masking threshold within a service frequency band in the first audio signal; obtaining a second audio signal, and determining a background environmental noise estimation value of the frequency within the service frequency band in the second audio signal; determining a masking tag corresponding to the service frequency band according to the psychoacoustic masking threshold of the first audio signal and the background environmental noise estimation value of the second audio signal; determining a masking rate of the service frequency band according to the masking tag corresponding to the frequency within the service frequency band; determining a first reference bit rate according to the masking rate of the service frequency band; and configuring an encoding bit rate of an audio encoder based on the first reference bit rate.

Encoding parameter adjustment method and apparatus, device, and storage medium

An encoding parameter adjustment method is performed at a computer device. The method includes: obtaining a first audio signal, and determining a psychoacoustic masking threshold within a service frequency band in the first audio signal; obtaining a second audio signal, and determining a background environmental noise estimation value of the frequency within the service frequency band in the second audio signal; determining a masking tag corresponding to the service frequency band according to the psychoacoustic masking threshold of the first audio signal and the background environmental noise estimation value of the second audio signal; determining a masking rate of the service frequency band according to the masking tag corresponding to the frequency within the service frequency band; determining a first reference bit rate according to the masking rate of the service frequency band; and configuring an encoding bit rate of an audio encoder based on the first reference bit rate.

ANALOG SYSTEMS AND METHODS FOR AUDIO FEATURE EXTRACTION AND NATURAL LANGUAGE PROCESSING
20230238014 · 2023-07-27 ·

An all-analog natural language processing system is provided. Analog audio input is processed directly by an all-analog signal pathway wherein the audio activity detection, voice activity detection, feature extraction and neural network processing are all performed in the analog domain. Audio/voice detection and feature extraction is performed by a bandpass filter bank having a plurality of individual bandpass filters. Each bandpass filter includes an array of individual capacitively coupled current conveyor second order sections having a charge-trap transistor as a programmable element for tuning the passband of the filter. Compared to typical digital systems for natural language processing, the present all-analog system can perform natural language processing with comparable accuracy but greatly reduced energy consumption of up to two orders of magnitude less.