G10L2021/02087

RECORDING MEETING AUDIO VIA MULTIPLE INDIVIDUAL SMARTPHONES
20220377458 · 2022-11-24 ·

A method of providing audio information from a meeting includes receiving a first audio stream from a first input audio device and a second audio stream from a second input audio device during the meeting, identifying a first audio fragment from the first audio stream, and identifying a second audio fragment from the second audio stream. The method also includes compiling the audio fragments from the first and second audio streams into an audio file that includes at least the first audio fragment and the second audio fragment. The method further includes providing the audio file to one or more recipients. The audio file identifies the first audio fragment as corresponding to a first participant of the meeting and the second audio fragment as corresponding to a second participant of the meeting.

Ambient sound rendering for online meetings
09837100 · 2017-12-05 · ·

Techniques of conducting an online meeting involve outputting ambient sound to a participant of an online meeting. Along these lines, in an online meeting during which a participant wears headphones, the participant's computer receives microphone input that contains both speech from the participant and ambient sound that the participant may wish to hear. In response to receiving the microphone input, the participant's computer separates low-volume sounds from high-volume sounds. However, instead of suppressing this low-volume sound from the microphone input, the participant's computer renders this low-volume sound. In most cases, this low-volume sound represents ambient sound generated in the vicinity of the meeting participant. The participant's computer then mixes the low-volume sound with speech received from other conference participants to form output in such a way that the participant may distinguish this sound from the received speech. The participant's computer then provides the output to the participant's headphones.

NOISE SUPPRESSION DEVICE AND NOISE SUPPRESSION METHOD
20170345440 · 2017-11-30 · ·

A noise suppression device includes: an adaptive filter unit that suppresses, using an adaptive filter, a noise component contained in a voice signal generated from a voice captured by a voice input unit to generate a corrected voice signal; a noise generation detection unit that detects timing of generation of the noise component in the voice signal; and a period suppression unit that suppresses the corrected voice signal during a predetermined period of time after the timing of the generation of the noise component.

Conversational Service

An apparatus including circuitry configured to: enable a conversational service between a first user of the apparatus and a second user of a remote apparatus wherein the conversational service is a duplex service including simultaneous voice communication from the first user to the second user and voice communication from the second user to the first user; and enable synchronization of a switch to using an active noise cancellation mode at the apparatus for the conversational service and at the remote apparatus for the conversational service, wherein the switch to using the noise cancellation mode is synchronized between the first and second users.

Apparatus and method for acoustic echo cancellation with occluded voice sensor
11670318 · 2023-06-06 · ·

An apparatus and method for auto echo cancellation utilizing an occluded voice sensor. The technology as disclosed and claimed herein and the various implementations and embodiments improves Acoustic Echo Cancellation (AEC) system performance by increasing cancellation quality and speech SNR by the replacement of the lower frequency portion of the microphone signal with the signal of an Occluded Voice Sensor (OVS) signal and never including the spectral band in the transmission. This frequency band replacement excludes the reinjection of that band from the speaker.

System and/or method for enhancing hearing using a camera module, processor and/or audio input and/or output devices
09807492 · 2017-10-31 · ·

An apparatus comprising a camera, a microphone, a speaker and a processor. The camera may be mounted to a first portion of a pair of eyeglasses. The microphone may be mounted to a second portion of the pair of eyeglasses. The speaker may be mounted to the pair of eyeglasses. The processor may be electronically connected to the camera, the microphone and the speaker. The processor may be configured to (i) associate a visual display of the movement of the lips of a target received from the camera with an audio portion of the target received from the microphone, (ii) filter sounds not related to the target, and (iii) play sounds not filtered through the speaker.

Vehicle in cabin sound processing system
09800983 · 2017-10-24 · ·

A sound system of a vehicle includes a plurality of microphones disposed in a cabin of the vehicle, a plurality of speakers disposed in the cabin of the vehicle, and a sound processor operable to process microphone output signals of the microphones to determine a voice signal of a speaking occupant in the vehicle at or near one of the microphones. The sound processor generates a processor output signal that is provided to at least some of the speakers. Responsive to the processor output signal, the at least some of the speakers generate sound representative of the voice signal of the speaking occupant to direct the sound towards other occupants in the vehicle, while one or more speakers at or near the seat occupied by the speaking occupant do not generate sound representative of the voice signal of the speaking occupant.

PERMUTATION INVARIANT TRAINING FOR TALKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
20170337924 · 2017-11-23 ·

The techniques described herein improve methods to equip a computing device to conduct automatic speech recognition (“ASR”) in talker-independent multi-talker scenarios. In some examples, permutation invariant training of deep learning models can be used for talker-independent multi-talker scenarios. In some examples, the techniques can determine a permutation-considered assignment between a model's estimate of a source signal and the source signal. In some examples, the techniques can include training the model generating the estimate to minimize a deviation of the permutation-considered assignment. These techniques can be implemented into a neural network's structure itself, solving the label permutation problem that prevented making progress on deep learning based techniques for speech separation. The techniques discussed herein can also include source tracing to trace streams originating from a same source through the frames of a mixed signal.

Participant-tuned filtering using deep neural network dynamic spectral masking for conversation isolation and security in noisy environments

Isolating and amplifying a conversation between selected participants is provided. A plurality of spectral masks is received. Each spectral mask in the plurality corresponds to a respective participant in a selected group of participants included in a conversation. A composite spectral mask is generated by additive superposition of the plurality of spectral masks. The composite spectral mask is applied to sound captured by a microphone to filter out sounds that do not match the composite spectral mask and amplifying remaining sounds that match the composite spectral mask.

Detecting self-generated wake expressions

A speech-based audio device may be configured to detect a user-uttered wake expression and to respond by interpreting subsequent words or phrases as commands. In order to distinguish between utterance of the wake expression by the user and generation of the wake expression by the device itself, directional audio signals may by analyzed to detect whether the wake expression has been received from multiple directions. If the wake expression has been received from many directions, it is declared as being generated by the audio device and ignored. Otherwise, if the wake expression is received from a single direction or a limited number of directions, the wake expression is declared as being uttered by the user and subsequent words or phrase are interpreted and acted upon by the audio device.