G10L2021/02168

AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND RECORDING MEDIUM STORING A PROGRAM
20170092299 · 2017-03-30 · ·

An audio signal processing device that includes: a processor configured to execute a procedure, the procedure comprising: detecting a speech segment of an audio signal; suppressing noise in the audio signal; and adjusting an amount of suppression of noise such that the amount of suppression during a specific period, which starts from a position based on a terminal end of the detected speech segment and is a period shorter than a period spanning from the terminal end of the detected speech segment to a starting end of a next speech segment, becomes greater than in other segments, and a memory configured to store audio signals before and after noise suppression and the amount of suppression before and after adjustment.

Method and apparatus for determining periods of excessive noise for receiving smart speaker voice commands

Methods and systems for determining periods of excessive noise for smart speaker voice commands. An electronic timeline of volume levels of currently playing content is made available to a smart speaker. From this timeline, periods of high content volume are determined, and the smart speaker alerts users during periods of high volume, requesting that they wait until the high-volume period has passed before issuing voice commands. In this manner, the smart speaker helps prevent voice commands that may not be detected, or may be detected inaccurately, due to the noise of the content currently being played.

Electronic processing device and processing method, associated acoustic apparatus and computer program

The electronic processing device for an acoustic apparatus including a first air conduction microphone and a second bone conduction microphone, configured for being connected to the first and second microphones, for receiving as inputs the first and respectively second analog signals from the first, and respectively second microphones and for delivering as output a corrected signal. The processing device comprises: a hybridization module configured for calculating a hybrid signal from the first and second analog signals; an estimation module configured for estimating noise in the hybrid signal; a noise reduction module configured for calculating the corrected signal by applying a generalized spectral subtraction algorithm to the hybrid signal and according to the estimated noise.

Method and system for multiple time resolution audio processing

Aspects of the present disclosure provided a method for voice control that includes transforming, using a short-time Fourier transform (STFT) applied to data in each window aligned across each input channel of the multichannel audio stream, the multichannel audio stream into a complex valued frequency-domain representation. For a current window, the method further includes: updating a first complex-valued covariance matrix corresponding to a slowly-adapting beamformer and forming a single-channel denoised estimate for each frequency band in the STFT; calculating a voice activity detection (VAD) estimate for each frequency band in the STFT by comparing a magnitude of the single-channel denoised estimate to a magnitude of each input channel of the multichannel audio stream; and selectively updating or refraining from updating, responsive to the VAD estimate respectively indicating a presence or an absence of speech, a second complex-valued covariance matrix corresponding to a quickly-adapting beamformer.

System and method for detecting deep fake audio
20260031090 · 2026-01-29 ·

A system for analyzing audio includes a memory configured to store known digital audio representation containing known fraudulent audio streams and a processor operably coupled to the memory. The processor receives a portion of an audio stream from an external device and produces a transcript of the portion of the audio stream. The processor then determines a timing score, an emotional score, a background score, and a content score by analyzing the portion of an audio stream and the corresponding transcript and comparing them to the known digital audio representations and transcripts. The processor then determines if the audio stream is malicious by combining the timing score, emotional score, background score, and content score to produce a combined score and comparing the combined score to a threshold. The processor notifies a user that the call may be fraudulent when the combined score is greater than the threshold.

Audio device with microphone and media mixing
12592675 · 2026-03-31 ·

An audio device comprising an interface, a memory, and one or more processors is disclosed, wherein the one or more processors are configured to: obtain a microphone input signal; obtain a media input signal; process the microphone input signal for provision of a microphone output signal; and provide an audio output signal based on the microphone output signal, wherein to process the microphone input signal comprises to: determine a microphone gain; and apply the microphone gain to the microphone input signal for provision of the microphone output signal, and wherein to determine the microphone gain comprises: estimate a first loudness of the microphone input signal; determine a first primary average based on the first loudness; estimate a second loudness of the media input signal; determine a second average based on the second loudness; determine a first gain based on the first primary average and the second average; determine a first secondary average based on the first loudness; determine a second gain based on the first secondary average and the second average; and determine the microphone gain based on the first gain and the second gain.