G10L21/034

MICROPHONE BOARD FOR FAR FIELD AUTOMATIC SPEECH RECOGNITION

System and techniques for a microphone board for far field automatic speech recognition are described herein. The microphone board may include a first plurality of microphones disposed along a circumference of a circle on a surface and a second plurality of microphones disposed along a line on the surface. First connections to the first plurality of microphones may be grouped together and second connections to the second plurality of microphones are grouped together. The first connections and the second connections may be provided to an external entity of the surface via a connector.

Methods and system for wideband signal processing in communication network
09847092 · 2017-12-19 ·

The embodiments herein disclose a device and a method for controlling noise in a wideband communication system. In one embodiment herein, multiple microphones for receiving wideband audio signals are provided. A processor is configured to analyze each wideband audio signal received by each microphone. Further, unique signal patterns are generated based on each analyzed wideband signals for each microphone and the unique signal patterns are compared to detect any identical signal patterns. A controller is also provided for controlling gains of those microphones that are detected to be receiving wideband audio signal of identical signal patterns.

Methods and system for wideband signal processing in communication network
09847092 · 2017-12-19 ·

The embodiments herein disclose a device and a method for controlling noise in a wideband communication system. In one embodiment herein, multiple microphones for receiving wideband audio signals are provided. A processor is configured to analyze each wideband audio signal received by each microphone. Further, unique signal patterns are generated based on each analyzed wideband signals for each microphone and the unique signal patterns are compared to detect any identical signal patterns. A controller is also provided for controlling gains of those microphones that are detected to be receiving wideband audio signal of identical signal patterns.

Automatic selective gain control of audio data for speech recognition
09842608 · 2017-12-12 · ·

This specification describes, among other things, a computer-implemented method. The method can include receiving a stream of audio data at a computing device. The stream of audio data can be segmented into a plurality of audio segments. Respective intensity levels are determined for each of the plurality of audio segments. For each of the plurality of audio segments and based on the respective intensity levels, a determination can be made as to whether the audio segment includes a speech signal. Selective gain control can be performed on the stream of audio data by automatically adjusting a gain of particular ones of the plurality of audio segments that are determined to include a speech signal.

Automatic selective gain control of audio data for speech recognition
09842608 · 2017-12-12 · ·

This specification describes, among other things, a computer-implemented method. The method can include receiving a stream of audio data at a computing device. The stream of audio data can be segmented into a plurality of audio segments. Respective intensity levels are determined for each of the plurality of audio segments. For each of the plurality of audio segments and based on the respective intensity levels, a determination can be made as to whether the audio segment includes a speech signal. Selective gain control can be performed on the stream of audio data by automatically adjusting a gain of particular ones of the plurality of audio segments that are determined to include a speech signal.

AUTOMATIC AUDIO ATTENUATION ON IMMERSIVE DISPLAY DEVICES
20170351485 · 2017-12-07 ·

Examples disclosed herein relate to controlling volume on an immersive display device. One example provides a near-eye display device comprising a sensor subsystem, a logic subsystem, and a storage subsystem storing instructions executable by the logic subsystem to receive image sensor data from the sensor subsystem, present content comprising a visual component and an auditory component, while presenting the content, detect via the image sensor data that speech is likely being directed at a wearer of the near-eye display device, and in response to detecting that speech is likely being directed at the wearer, attenuate an aspect of the auditory component.

Ambient sound rendering for online meetings
09837100 · 2017-12-05 · ·

Techniques of conducting an online meeting involve outputting ambient sound to a participant of an online meeting. Along these lines, in an online meeting during which a participant wears headphones, the participant's computer receives microphone input that contains both speech from the participant and ambient sound that the participant may wish to hear. In response to receiving the microphone input, the participant's computer separates low-volume sounds from high-volume sounds. However, instead of suppressing this low-volume sound from the microphone input, the participant's computer renders this low-volume sound. In most cases, this low-volume sound represents ambient sound generated in the vicinity of the meeting participant. The participant's computer then mixes the low-volume sound with speech received from other conference participants to form output in such a way that the participant may distinguish this sound from the received speech. The participant's computer then provides the output to the participant's headphones.

Ambient sound rendering for online meetings
09837100 · 2017-12-05 · ·

Techniques of conducting an online meeting involve outputting ambient sound to a participant of an online meeting. Along these lines, in an online meeting during which a participant wears headphones, the participant's computer receives microphone input that contains both speech from the participant and ambient sound that the participant may wish to hear. In response to receiving the microphone input, the participant's computer separates low-volume sounds from high-volume sounds. However, instead of suppressing this low-volume sound from the microphone input, the participant's computer renders this low-volume sound. In most cases, this low-volume sound represents ambient sound generated in the vicinity of the meeting participant. The participant's computer then mixes the low-volume sound with speech received from other conference participants to form output in such a way that the participant may distinguish this sound from the received speech. The participant's computer then provides the output to the participant's headphones.

Time domain level adjustment for audio signal decoding or encoding

An audio signal decoder for providing a decoded audio signal representation on the basis of an encoded audio signal representation has a decoder preprocessing stage for obtaining a plurality of frequency band signals from the encoded audio signal representation, a clipping estimator, a level shifter, a frequency-to-time-domain converter, and a level shift compensator. The clipping estimator analyzes the encoded audio signal representation and/or side information relative to a gain of the frequency band signals in order to determine a current level shift factor. The level shifter shifts levels of the frequency band signals according to the level shift factor. The frequency-to-time-domain converter converts the level shifted frequency band signals into a time-domain representation. The level shift compensator acts on the time-domain representation for at least partly compensating a corresponding level shift and for obtaining a substantially compensated time-domain representation.

Time domain level adjustment for audio signal decoding or encoding

An audio signal decoder for providing a decoded audio signal representation on the basis of an encoded audio signal representation has a decoder preprocessing stage for obtaining a plurality of frequency band signals from the encoded audio signal representation, a clipping estimator, a level shifter, a frequency-to-time-domain converter, and a level shift compensator. The clipping estimator analyzes the encoded audio signal representation and/or side information relative to a gain of the frequency band signals in order to determine a current level shift factor. The level shifter shifts levels of the frequency band signals according to the level shift factor. The frequency-to-time-domain converter converts the level shifted frequency band signals into a time-domain representation. The level shift compensator acts on the time-domain representation for at least partly compensating a corresponding level shift and for obtaining a substantially compensated time-domain representation.