G10L2021/02166

METHOD FOR SELECTING OUTPUT WAVE BEAM OF MICROPHONE ARRAY
20220399028 · 2022-12-15 ·

A method for selecting an output wave beam of a microphone array, comprising: (a) receiving a plurality of voice signals from the microphone array comprising a plurality of microphones, and performing beamforming on the voice signals to obtain a plurality of wave beams and corresponding wave beam output signals (102); (b) performing the following operation on each wave beam: converting the wave beam output signal of a current wave beam to frequency domain from time domain to obtain a frequency spectrum vector and a power spectrum vector of the current wave beam (104); on the basis of the frequency spectrum vector and the power spectrum vector of the current wave beam, calculating comprehensive voice signal energy of the current wave beam, wherein the comprehensive voice signal energy is the product of comprehensive energy of the current wave beam and a comprehensive voice existence probability, the comprehensive energy indicates the energy level of the wave beam output signal of the current wave beam, the comprehensive voice existence probability indicates an existence probability of voice in the wave beam output signal of the current wave beam, and the comprehensive voice existence probability and the comprehensive energy are scalar quantities (106); and (c) selecting the wave beam with a maximal comprehensive voice signal energy value as the output wave beam (110).

System and Method for Self-attention-based Combining of Multichannel Signals for Speech Processing

A method, computer program product, and computing system for receiving a plurality of signals from a plurality of microphones, thus defining a plurality of channels. A weighted multichannel representation of the plurality of channels may be generated. A plurality of weights for each channel of the plurality of channels may be generated based upon, at least in part, the weighted multichannel representation of the plurality of channels. A single channel representation of the plurality of channels may be generated based upon, at least in part, the weighted multichannel representation of the plurality of channels and the plurality of weights generated for each channel of the plurality of channels.

Vehicular apparatus, vehicle, operation method of vehicular apparatus, and storage medium
11521615 · 2022-12-06 · ·

A vehicular apparatus having at least one of a voice calling function and a voice recognition function, the apparatus comprising a voice input unit including a plurality of microphones, the voice input unit being disposed between a driver's seat and a passenger seat with respect to a vehicle width direction; and a control unit configured to control a directionality direction and a gain level of each of the plurality of microphones, wherein the control unit controls the directionality directions of the plurality of microphones in two directions, the two directions being a driver's seat side and a passenger seat side, and controls a gain level on the passenger seat side to be lower than a gain level on the driver's seat side.

Method and apparatus for using a test audio pattern to generate an audio signal transform for use in performing acoustic echo cancellation
11521636 · 2022-12-06 ·

A test audio pattern is sent to the speaker of the participant computer for outputting by the speaker. A computer receives a microphone input signal from the participant computer that includes the test audio pattern outputted by the speaker of the participant computer, and any ambient noise picked up by the speaker of the participant computer. Ambient noise suppression is performed to cancel out any ambient noise in the microphone input signal picked up by the speaker of the participant computer. The test audio pattern sent to the speaker of the participant computer is compared with the noise-suppressed microphone input signal which includes the test audio pattern outputted by the speaker of the participant computer. An audio signal transform is generated from the comparison. The generated audio signal transform is subsequently used for performing acoustic echo cancellation of streaming audio received from the microphone input signal when the participant computer receives streaming audio and the participants engage in remote audio communications with each other.

Spatially informed audio signal processing for user speech

A device implementing a system for processing speech in an audio signal includes at least one processor configured to receive an audio signal corresponding to at least one microphone of a device, and to determine, using a first model, a first probability that a speech source is present in the audio signal. The at least one processor is further configured to determine, using a second model, a second probability that an estimated location of a source of the audio signal corresponds to an expected position of a user of the device, and to determine a likelihood that the audio signal corresponds to the user of the device based on the first and second probabilities.

Low power mode for speech capture devices

A system configured to enable a Wi-Fi processor to enter a low power mode (LPM) for short periods of time without compromising functionality is provided. A device reduces power consumption by enabling the Wi-Fi processor to enter LPM with scheduled wakeup events to enable specific functionality. In some examples, the Wi-Fi processor toggles between LPM and an active mode based on a first duty cycle to enable new device provisioning. The first duty cycle corresponds to a time required to scan a plurality of wireless channels, waking the Wi-Fi processor at a first frequency to monitor for incoming probe requests. In other examples, the Wi-Fi processor uses a second duty cycle chosen to maintain time synchronicity between a time master device and time follower devices. The device sets the second duty cycle to wake the Wi-Fi processor at a second frequency to exchange data packets with synchronized devices.

Splitting frequency-domain processing between multiple DSP cores

An audio processing system may split frequency-domain processing between multiple DSP cores. Processing multi-channel audio data—e.g., from devices with multiple speakers—may require more computing power than available on a single DSP core. Such processing typically occurs in the frequency domain; DSP cores, however, typically communicate via ports configured for transferring data in the time-domain. Converting frequency-domain data into the time domain for transfer requires additional resources and introduces lag. Furthermore, transferring frequency-domain data may result in scheduling issues due to a mismatch between buffer size, bit rate, and the size of the frequency-domain data chunks transferred. However, the buffer size and bit rate may be artificially configured to transfer a chunk of frequency-domain data corresponding to a delay in the communication mechanism used by the DSP cores. In this manner, frequency-domain data can be transferred with a proper periodicity.

CONFERENCE ROOM SYSTEM AND AUDIO PROCESSING METHOD
20220375486 · 2022-11-24 ·

An audio processing method includes the following steps of capturing audio data by a microphone array to compute frequency array data of the audio data; computing a power sequence of degrees by using the frequency array data; and computing a difference value between a maximum value of the power sequence of degrees and a minimum value of the power sequence of degrees to determine whether the degree corresponding to the maximum value is a source degree relative to the microphone array.

Acoustic output device and buttons thereof

The present disclosure relates to an acoustic output device including an earphone core, a controller, a Bluetooth module, and a button module. The earphone core may include at least one low-frequency acoustic driver configured to output sounds from at least two first guiding holes and at least one high-frequency acoustic driver configured to output sounds from at least two second guiding holes. The controller may be configured to direct the at least one low-frequency acoustic driver to output the sounds in a first frequency range and direct the at least one high-frequency acoustic driver to output the sounds in a second frequency range. The Bluetooth module may be configured to connect the acoustic output device with at least one terminal device. The button module may be configured to implement an interaction between a user of the acoustic output device and the acoustic output device.

Speech translation device, speech translation method, and recording medium

A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.