G10L21/0364

SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS

An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.

SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS

An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.

Method for processing an acoustic speech input signal and audio processing device
11523228 · 2022-12-06 · ·

The invention relates to a method for processing an acoustic input signal, preferably a speech signal, said method comprising the following steps: a) receiving a digital representation (S.sub.in) of an acoustic input signal, b) calculating at least one statistical parameter (P) of the digital representation (S.sub.in) of the acoustic input signal, c) calculating a compression ratio function (CR.sub.f) based—on a prescribed constant compression ratio (CR.sub.pr), said prescribed constant compression ratio (CR.sub.pr) uniformly mapping acoustic input signals of a selected magnitude to acoustic output signals of a selected magnitude, and—on at least one statistical parameter (P) calculated in step b), and d) applying the non-uniform compression ratio function (CR.sub.f) according to step c) on the digital representation (S.sub.in) of the acoustic input signal delivering a digital representation (S.sub.out) of an enhanced acoustic output signal.

Method for processing an acoustic speech input signal and audio processing device
11523228 · 2022-12-06 · ·

The invention relates to a method for processing an acoustic input signal, preferably a speech signal, said method comprising the following steps: a) receiving a digital representation (S.sub.in) of an acoustic input signal, b) calculating at least one statistical parameter (P) of the digital representation (S.sub.in) of the acoustic input signal, c) calculating a compression ratio function (CR.sub.f) based—on a prescribed constant compression ratio (CR.sub.pr), said prescribed constant compression ratio (CR.sub.pr) uniformly mapping acoustic input signals of a selected magnitude to acoustic output signals of a selected magnitude, and—on at least one statistical parameter (P) calculated in step b), and d) applying the non-uniform compression ratio function (CR.sub.f) according to step c) on the digital representation (S.sub.in) of the acoustic input signal delivering a digital representation (S.sub.out) of an enhanced acoustic output signal.

SYSTEM AND METHOD FOR AUGMENTING VEHICLE PHONE AUDIO WITH BACKGROUND SOUNDS
20220383893 · 2022-12-01 ·

A vehicle infotainment system that adds background sounds to an outgoing call on a mobile device. The infotainment system comprises: i) a database of selectable augmenting audio signals; and ii) audio processing circuitry configured to receive at a first input an uplink signal from the infotainment system and receive at a second input a selected augmenting audio signal. The audio processing circuitry adapts a spectrum of the first selected augmenting audio signal to prevent the selected augmenting audio signal from masking the uplink signal and combines the adapted selected augmenting audio signal and the uplink signal to produce an augmented uplink signal at an output.

SYSTEM AND METHOD FOR AUGMENTING VEHICLE PHONE AUDIO WITH BACKGROUND SOUNDS
20220383893 · 2022-12-01 ·

A vehicle infotainment system that adds background sounds to an outgoing call on a mobile device. The infotainment system comprises: i) a database of selectable augmenting audio signals; and ii) audio processing circuitry configured to receive at a first input an uplink signal from the infotainment system and receive at a second input a selected augmenting audio signal. The audio processing circuitry adapts a spectrum of the first selected augmenting audio signal to prevent the selected augmenting audio signal from masking the uplink signal and combines the adapted selected augmenting audio signal and the uplink signal to produce an augmented uplink signal at an output.

ANALYSIS FILTER BANK AND COMPUTING PROCEDURE THEREOF, AUDIO FREQUENCY SHIFTING SYSTEM, AND AUDIO FREQUENCY SHIFTING PROCEDURE
20220383892 · 2022-12-01 ·

An analysis filter bank corresponding to a plurality of sub-bands, comprising: multiple sub-filters with different center frequencies which perform multiple complex-type first-order infinite impulse response filtering operations on an audio input signal to generate multiple sub-filter signals; a first set of binomial combiners, each of which performs a weighted-sum operation on a first number of the sub-filter signals with a first set of binomial weights to generate one of multiple sub-band signals; a second set of binomial combiners, each of which performs a weighted-sum operation on a second number of the sub-filter signals with a second set of binomial weights to generate one of multiple lower sub-band-edge signals or one of multiple higher sub-band-edge signals; and multiple envelope detection with decimation devices, which perform multiple envelope detection with decimation operations on the sub-band signals, the lower sub-band-edge signals, and the higher sub-band-edge signals to generate multiple fine spectrums.

Dynamic creation and insertion of content

In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

Dynamic creation and insertion of content

In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

Using a predictive model to automatically enhance audio having various audio quality issues

Operations of a method include receiving a request to enhance a new source audio. Responsive to the request, the new source audio is input into a prediction model that was previously trained. Training the prediction model includes providing a generative adversarial network including the prediction model and a discriminator. Training data is obtained including tuples of source audios and target audios, each tuple including a source audio and a corresponding target audio. During training, the prediction model generates predicted audios based on the source audios. Training further includes applying a loss function to the predicted audios and the target audios, where the loss function incorporates a combination of a spectrogram loss and an adversarial loss. The prediction model is updated to optimize that loss function. After training, based on the new source audio, the prediction model generates a new predicted audio as an enhanced version of the new source audio.