G10L21/0208

Audio data processing method, apparatus and storage medium for detecting wake-up words based on multi-path audio from microphone array

An audio data processing method is provided. The method includes: obtaining multi-path audio data in an environmental space, obtaining a speech data set based on the multi-path audio data, and separately generating, in a plurality of enhancement directions, enhanced speech information corresponding to the speech data set; matching a speech hidden feature in the enhanced speech information with a target matching word, and determining an enhancement direction corresponding to the enhanced speech information having a highest degree of matching with the target matching word as a target audio direction; obtaining speech spectrum features in the enhanced speech information, and obtaining, from the speech spectrum features, a speech spectrum feature in the target audio direction; and performing speech authentication on the speech hidden feature and the speech spectrum feature that are in the target audio direction based on the target matching word, to obtain a target authentication result.

MICROPHONE NOISE SUPPRESSION FOR COMPUTING DEVICE
20180012585 · 2018-01-11 · ·

A computing device with a microphone system is disclosed. The computing device includes a microphone system with an environment microphone and a noise microphone. The environment microphone picks up an environment microphone signal which includes (1) a desired signal component based on desired sound and (2) a noise component based on noise from a noise source. The noise microphone picks up a noise microphone signal based on the noise, and is configured such that contributions to the noise microphone signal from the desired sound, if present, are attenuated relative to the environment microphone. A controller receives and processes time samples from the noise microphone signal to yield a noise estimation of the noise component. The estimation is subtracted from the environment microphone signal to yield and end-user output.

Customized automated audio tuning

An example method of operation may include identifying, in a particular room environment, a number of speakers and one or more microphones on a network controlled by a controller and amplifier, providing test signals to play sequentially from each amplifier channel of the amplifier and the speakers, monitoring the test signals from the one or more microphones simultaneously to detect operational speakers and amplifier channels, providing additional test signals to the speakers to determine tuning parameters, detecting the additional test signals at the one or more microphones controlled by the controller, and automatically establishing a background noise level and noise spectrum of the room environment based on the detected additional test signals.

Audio-based detection and tracking of emergency vehicles

Techniques are provided for audio-based detection and tracking of an acoustic source. A methodology implementing the techniques according to an embodiment includes generating acoustic signal spectra from signals provided by a microphone array, and performing beamforming on the acoustic signal spectra to generate beam signal spectra, using time-frequency masks to reduce noise. The method also includes detecting, by a deep neural network (DNN) classifier, an acoustic event, associated with the acoustic source, in the beam signal spectra. The DNN is trained on acoustic features associated with the acoustic event. The method further includes performing pattern extraction, in response to the detection, to identify time-frequency bins of the acoustic signal spectra that are associated with the acoustic event, and estimating a motion direction of the source relative to the array of microphones based on Doppler frequency shift of the acoustic event calculated from the time-frequency bins of the extracted pattern.

METHOD FOR GENERATING CUSTOMIZED SPATIAL AUDIO WITH HEAD TRACKING

A headphone for spatial audio rendering includes a first database having an impulse response pair corresponding to a reference speaker location. A head sensor provides head orientation information to a second database having rotation filters, the filters corresponding to different azimuth and elevation positions relative to the reference speaker location. A digital signal processor combines the rotation filters with the impulse response pair to generate an output binaural audio signal to transducers of the headphone. Efficiencies in creating impulse response or HRTF databases are achieved by sampling the impulse response less frequently than in conventional methods. This sampling at coarser intervals reduces the number of data measurements required to generate a spherical grid and reduces the time involved in capturing the impulse responses. Impulse responses for data points falling between the sampled data points are generated by interpolating in the frequency domain.

METHOD FOR GENERATING CUSTOMIZED SPATIAL AUDIO WITH HEAD TRACKING

A headphone for spatial audio rendering includes a first database having an impulse response pair corresponding to a reference speaker location. A head sensor provides head orientation information to a second database having rotation filters, the filters corresponding to different azimuth and elevation positions relative to the reference speaker location. A digital signal processor combines the rotation filters with the impulse response pair to generate an output binaural audio signal to transducers of the headphone. Efficiencies in creating impulse response or HRTF databases are achieved by sampling the impulse response less frequently than in conventional methods. This sampling at coarser intervals reduces the number of data measurements required to generate a spherical grid and reduces the time involved in capturing the impulse responses. Impulse responses for data points falling between the sampled data points are generated by interpolating in the frequency domain.

SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM
20230238002 · 2023-07-27 ·

For example, the accuracy of voice recognition is improved.

A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.

SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM
20230238002 · 2023-07-27 ·

For example, the accuracy of voice recognition is improved.

A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.

ACOUSTIC OUTPUT APPARATUS

The present disclosure discloses an acoustic output apparatus including at least one acoustic driver, a controller, and a supporting structure. The at least one acoustic driver may be configured to output sounds through at least two sound guiding holes. The at least two sound guiding holes may include a first sound guiding hole and a second sound guiding hole. The controller may be configured to control a phase and an amplitude of the sounds generated by the at least one acoustic driver using a control signal such that the sounds output by the at least one acoustic driver through the first and second sound guiding holes have opposite phases. The supporting structure may be provided with a baffle and configured to support the at least one acoustic driver such that the first and second sound guiding holes are located on both sides of the baffle.

ACOUSTIC OUTPUT APPARATUS

The present disclosure discloses an acoustic output apparatus including at least one acoustic driver, a controller, and a supporting structure. The at least one acoustic driver may be configured to output sounds through at least two sound guiding holes. The at least two sound guiding holes may include a first sound guiding hole and a second sound guiding hole. The controller may be configured to control a phase and an amplitude of the sounds generated by the at least one acoustic driver using a control signal such that the sounds output by the at least one acoustic driver through the first and second sound guiding holes have opposite phases. The supporting structure may be provided with a baffle and configured to support the at least one acoustic driver such that the first and second sound guiding holes are located on both sides of the baffle.