H04R3/005

MICRO-ELECTRO MECHANICAL DEVICE

A micro-electro mechanical device includes a casing, a vibration sensor, a vibration membrane assembly, and a micro-electro mechanical microphone. The casing has a sound-receiving hole, and the vibration sensor is disposed in the casing. The vibration membrane assembly is disposed in the casing and corresponds to the vibration sensor. The micro-electro mechanical microphone is disposed in the casing and corresponds to the sound-receiving hole, and a back cavity of the micro-electro mechanical microphone is formed in the casing. The back cavity at least partially overlaps with areas corresponding to a vertical projection of the vibration membrane assembly.

System and method for acoustically identifying gunshots fired indoors

A system and method for acoustically detecting the firing of gunshots indoors employs multiple microphones (15, 20) which are utilized individually and in combination to detect sounds inside a building or other structure and, upon sensing a loud impulsive sound which is indicative of a gunshot, processing signals from both microphones (15, 20) to determine if the sound is that of a gunshot. The system and method relies on the acoustic signature of the noise as collected, with the acoustic signature being analyzed to arrive at values which are then compared to adjustable levels that signify a gunshot.

System and method for differentially locating and modifying audio sources
11531518 · 2022-12-20 · ·

A system and method for differentially locating and modifying audio sources that includes receiving multiple audio inputs from a set of distinct locations; determining a multi-dimensional audio map from the audio inputs; acquiring a set of positional audio control inputs applied to the audio map, each audio control input comprising a location and audio processing property; and generating an audio output according to the audio control inputs and the audio inputs. The audio control inputs capable of configuration through manual, automatic, computer vision analysis, and other configuration modes.

Detecting a trigger of a digital assistant

Systems and processes for operating an intelligent automated assistant are provided. In accordance with one example, a method includes, at an electronic device with one or more processors, memory, and a plurality of microphones, sampling, at each of the plurality of microphones of the electronic device, an audio signal to obtain a plurality of audio signals; processing the plurality of audio signals to obtain a plurality of audio streams; and determining, based on the plurality of audio streams, whether any of the plurality of audio signals corresponds to a spoken trigger. The method further includes, in accordance with a determination that the plurality of audio signals corresponds to the spoken trigger, initiating a session of the digital assistant; and in accordance with a determination that the plurality of audio signals does not correspond to the spoken trigger, foregoing initiating a session of the digital assistant.

Pre-voice separation/recognition synchronization of time-based voice collections based on device clockcycle differentials

Methods and devices for conducting, based on a clock difference, a synchronization process on voice information collected by a plurality of voice collection devices. Then, after the synchronization process is performed on the voice information collected by the plurality of voice collection devices, conducting a voice separation and recognition process on voice information that was collected by the plurality of voice collection devices and synchronized based on the clock difference among the plurality of voice collection devices.

Beamformer enhanced direction of arrival estimation in a reverberant environment with directional noise

An estimator of direction of arrival (DOA) of speech from a far-field talker to a device in the presence of room reverberation and directional noise includes audio inputs received from multiple microphones and one or more beamformer outputs generated by processing the microphone inputs. A first DOA estimate is obtained by performing generalized cross-correlation between two or more of the microphone inputs. A second DOA estimate is obtained by performing generalized cross-correlation between one of the one or more beamformer outputs and one or more of: the microphone inputs and other of the one or more beamformer outputs. A selector selects the first or second DOA estimate based on an SNR estimate at the microphone inputs and a noise reduction amount estimate at the beamformer outputs. The SNR and noise reduction estimates may be obtained based on the detection of a keyword spoken by a desired talker.

Wearable audio device with enhanced voice pick-up
11533555 · 2022-12-20 · ·

Various implementations include systems for processing microphone audio signals for a wearable audio device. In particular implementations, a method for processing signals includes: capturing an internal signal with an inner microphone configured to be acoustically coupled to an environment inside an ear canal of a user; extracting a low frequency audio signal from the internal signal; capturing an external signal with an external microphone configured to be acoustically coupled to an environment outside the ear canal of the user; extracting a high frequency audio signal from the external signal; and mixing the high frequency audio signal with the low frequency audio signal.

METHOD FOR SELECTING OUTPUT WAVE BEAM OF MICROPHONE ARRAY
20220399028 · 2022-12-15 ·

A method for selecting an output wave beam of a microphone array, comprising: (a) receiving a plurality of voice signals from the microphone array comprising a plurality of microphones, and performing beamforming on the voice signals to obtain a plurality of wave beams and corresponding wave beam output signals (102); (b) performing the following operation on each wave beam: converting the wave beam output signal of a current wave beam to frequency domain from time domain to obtain a frequency spectrum vector and a power spectrum vector of the current wave beam (104); on the basis of the frequency spectrum vector and the power spectrum vector of the current wave beam, calculating comprehensive voice signal energy of the current wave beam, wherein the comprehensive voice signal energy is the product of comprehensive energy of the current wave beam and a comprehensive voice existence probability, the comprehensive energy indicates the energy level of the wave beam output signal of the current wave beam, the comprehensive voice existence probability indicates an existence probability of voice in the wave beam output signal of the current wave beam, and the comprehensive voice existence probability and the comprehensive energy are scalar quantities (106); and (c) selecting the wave beam with a maximal comprehensive voice signal energy value as the output wave beam (110).

System and Method for Self-attention-based Combining of Multichannel Signals for Speech Processing

A method, computer program product, and computing system for receiving a plurality of signals from a plurality of microphones, thus defining a plurality of channels. A weighted multichannel representation of the plurality of channels may be generated. A plurality of weights for each channel of the plurality of channels may be generated based upon, at least in part, the weighted multichannel representation of the plurality of channels. A single channel representation of the plurality of channels may be generated based upon, at least in part, the weighted multichannel representation of the plurality of channels and the plurality of weights generated for each channel of the plurality of channels.

Distributed audio capturing techniques for virtual reality (VR), augmented reality (AR), and mixed reality (MR) systems

Systems, devices, and methods for capturing audio which can be used in applications such as virtual reality, augmented reality, and mixed reality systems. Some systems can include a plurality of distributed monitoring devices. Each monitoring device can include a microphone and a location tracking unit. The monitoring devices can capture audio signals in an environment, as well as location tracking signals which respectively indicate the locations of the monitoring devices over time during capture of the audio signals. The system can also include a processor to receive the audio signals and the location tracking signals. The processor can determine one or more acoustic properties of the environment based on the audio signals and the location tracking signals.