G10L2021/02087

VOICE REINFORCEMENT IN MULTIPLE SOUND ZONE ENVIRONMENTS

Microphone signal is received from at least one microphone. AEC produces an echo cancelled microphone signal using first adaptive filters to estimate and cancel feedback that is a result of the environment. AFC produces a processed microphone signal using second adaptive filters to estimate and cancel feedback resulting from application of the reinforced voice signal within the environment. The uttered speech is reinforced in the processed microphone signal to produce the reinforced voice signal. The reinforced voice signal and the audio signal is applied to the loudspeakers. A step size of adjustment of the second adaptive filters may be increased responsive to detection of reverberation in the microphone signal. The reverberation that is used to control the step size of the second adaptive filters may be added artificially. This may provide multiple benefits including improving adjustment of the second adaptive filters and also improving the sound impression of the voice.

Multi-stream target-speech detection and channel fusion

Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.

Audio device and method of audio processing with improved talker discrimination

An audio device for improved talker discrimination is provided. To improve suppression of close talker interference, the audio device comprises at least a first and a second audio input to receive a first and second voice input signal; a first filter bank, configured to provide a plurality of first sub-band signals; a second filter bank, configured to provide a plurality of second sub-band signals; a correlator, configured to determine at least one signal correlation between at least a group of the first sub-band signals and at least a group of the second sub-band signals; and an attenuator, arranged to receive at least the group of the first sub-band signals and configured to conduct signal attenuation on the group of the first sub-band signals to provide gain-controlled sub-band signals, wherein the signal attenuation is based on the determined at least one signal correlation.

Online target-speech extraction method based on auxiliary function for robust automatic speech recognition

A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.

ACOUSTIC CROSSTALK SUPPRESSION DEVICE AND ACOUSTIC CROSSTALK SUPPRESSION METHOD

An acoustic crosstalk suppression device includes a speaker estimation unit configured to estimate a main speaker based on voice signals collected by n units of microphones corresponding to n number of persons (n: an integer equal to or larger than 3); n units of filter update units each of which is configured to update a parameter of a filter configured to generate a suppression signal of a crosstalk component included in a voice signal of the main speaker; and a crosstalk suppression unit configured to suppress the crosstalk component by using a synthesis suppression signal generated by the maximum (n-1) units of filter update units corresponding to reference signals collected by the maximum (n-1) units of microphones.

SYSTEMS, METHODS, AND DEVICES FOR AUDIO CORRECTION
20220417659 · 2022-12-29 ·

Systems, methods, and devices relating to audio correction are described. A first portion of content including first spoken audio content indicating first word(s) may be determined. Background audio content of the first portion of the content may be determined. A voice profile may be determined based on the first spoken audio content. Based on the voice profile, second spoken audio content indicating second word(s) to replace the first word(s) may be generated. Based on mixing the background audio content and the second spoken audio content, a second portion of content may be determined. In the content, the first portion of the content may be replaced with the generated second portion of content.

AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND STORAGE MEDIUM
20220392472 · 2022-12-08 · ·

An audio signal processing device comprises: a determination unit that determines a first voice segment for a target speaker linked to a host device on the basis of an externally acquired first audio signal; a sharing unit that transmits the first audio signal and the first voice segment to another device linked to a non-target speaker and receives a second audio signal and a second voice segment associated with the non-target speaker from the other device; an estimation unit that estimates the voice of the non-target speaker mixed in the first audio signal on the basis of the second audio signal and the second voice segment that are received and an estimation parameter associated with the target speaker that is acquired; and a removal unit that removes the voice of the non-target speaker from the first audio signal.

SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS

An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.

SYSTEM AND METHOD FOR AUGMENTING VEHICLE PHONE AUDIO WITH BACKGROUND SOUNDS
20220383893 · 2022-12-01 ·

A vehicle infotainment system that adds background sounds to an outgoing call on a mobile device. The infotainment system comprises: i) a database of selectable augmenting audio signals; and ii) audio processing circuitry configured to receive at a first input an uplink signal from the infotainment system and receive at a second input a selected augmenting audio signal. The audio processing circuitry adapts a spectrum of the first selected augmenting audio signal to prevent the selected augmenting audio signal from masking the uplink signal and combines the adapted selected augmenting audio signal and the uplink signal to produce an augmented uplink signal at an output.

Personal hearing device, external acoustic processing device and associated computer program product

Disclosed is a personal hearing device, an external acoustic processing device and an associated computer program product. The personal hearing device includes: a microphone, for receiving an input acoustic signal, wherein the input acoustic signal is a mixture of sounds coming from a first acoustic source and from other acoustic source(s); a speaker; and an acoustic processing circuit, for automatically distinguishing within the input acoustic signal the sound of the first acoustic source from the sound of other acoustic source(s); wherein the acoustic processing circuit further processes the input acoustic signal by having different modifications to the sound of the first acoustic source and to the sound of other acoustic source(s), whereby the acoustic processing circuit produces an output acoustic signal to be played on the speaker.