G10L2025/786

A COMPUTER IMPLEMENTED METHOD AND AN APPARATUS FOR SILENCE DETECTION IN SPEECH RECOGNITION
20230326481 · 2023-10-12 · ·

A computer implemented method for speech recognition from an audio signal includes: obtaining initial values for silence detection parameters including: a lead period; a threshold amplitude; and a terminal period. Detect an amplitude of the audio signal at a first time T1 of the audio signal. Optionally adjusting the threshold amplitude based on the detected amplitude. Starting the speech recognition from a second time T2 of the audio signal. Starting silence detection from the audio signal when the lead period has elapsed after the second time T2 including: responsive to detecting an amplitude below the threshold amplitude for a duration of the terminal period, terminating the speech recognition and the silence detection at a third time T3 of the audio signal and adjusting the silence detection parameters based on the detected amplitude changes of the audio signal between the first time T1 and the third time T3.

User voice activity detection using dynamic classifier

A device includes a memory configured to store instructions and one or more processors configured execute the instructions. The one or more processors are configured execute the instructions to receive audio data including first audio data corresponding to a first output of a first microphone and second audio data corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a dynamic classifier. The dynamic classifier is configured to generate a classification output corresponding to the audio data. The one or more processors are further configured to execute the instructions to determine, at least partially based on the classification output, whether the audio data corresponds to user voice activity.

Method of Detecting Speech Using an in Ear Audio Sensor

The present disclosure provides a method for detecting voice using an in-ear audio sensor, including performing the following processing on each frame of input signals collected by the in-ear audio sensor: calculating a count change value based on at least one feature of an input signal of a current frame, wherein the at least one feature includes at least one of an estimated signal-to-noise ratio, a spectral centroid, a spectral flux, a spectral flux difference value, spectral flatness, energy distribution, and spectral correlations between adjacent frames; adding the calculated count change value with a previous count value of a previous frame to obtain a current count value; comparing the obtained current count value with a count threshold; and determining the category of the input signal of the current frame based on the comparison result and feature attributes, wherein the category includes noise, voiced sound, or unvoiced sound.

Robot
11654575 · 2023-05-23 · ·

A robot includes a microphone configured to receive sound signals, and one or more controllers configured to determine a reference sound pressure level of background noise based on a sound signal received at a first time point via the microphone, detect occurrence of a sound event based on the reference sound pressure level and a sound pressure level of a sound signal received at a second time point via the microphone, recognize an event corresponding to the detected sound event, and control an operation of the robot based on the recognized event.

PIEZOELECTRIC MEMS DEVICE WITH AN ADAPTIVE THRESHOLD FOR DETECTION OF AN ACOUSTIC STIMULUS
20220394384 · 2022-12-08 ·

A device that includes an adaptive acoustic detection circuit and an acoustic sensor device such as a microphone is described. The device includes in addition to the sensor a circuit configured to detect when an input stimulus to the sensor satisfies an adaptive threshold, and further configured to produce a signal upon detection that causes adjustment of performance of the device, wherein the adaptive threshold is a threshold value that varies over time in accordance with detected changes to sound of an environment in which the device is located.

Speaker adaptive end of speech detection for conversational AI applications

In various examples, end of speech (EOS) for an audio signal is determined based at least in part on a rate of speech for a speaker. For a segment of the audio signal, EOS is indicated based at least in part on an EOS threshold determined based at least in part on the rate of speech for the speaker.

Sensitivity mode for an audio spotting system
11823707 · 2023-11-21 · ·

An audio spotting system configured for various operating modes including a regular mode and sensitivity mode is described. An example cascade audio spotting system may include a high-power subsystem including a high-power trigger and a transfer module. This high-power trigger includes one or more detection models used to detect whether a target sound activity is included in the one or more audio streams. The one or more detection models are associated with a first set of hyperparameters when the cascade audio spotting system is in a regular mode, and the one or more detection models are associated with a second set of hyperparameters when the cascade audio spotting system is in a sensitivity mode. The transfer module provides at least one of one or more processed audio streams for further processing in response to the high-power trigger detecting the target sound activity in the one or more audio streams.

Computer implemented method and an apparatus for silence detection in speech recognition
11830522 · 2023-11-28 · ·

A computer implemented method for speech recognition from an audio signal includes: obtaining initial values for silence detection parameters including: a lead period; a threshold amplitude; and a terminal period. Detect an amplitude of the audio signal at a first time T1 of the audio signal. Optionally adjusting the threshold amplitude based on the detected amplitude. Starting the speech recognition from a second time T2 of the audio signal. Starting silence detection from the audio signal when the lead period has elapsed after the second time T2 including: responsive to detecting an amplitude below the threshold amplitude for a duration of the terminal period, terminating the speech recognition and the silence detection at a third time T3 of the audio signal and adjusting the silence detection parameters based on the detected amplitude changes of the audio signal between the first time T1 and the third time T3.

Audio systems and methods for voice activity detection

Audio systems, methods, and processor instructions are provided that detect voice activity of a user and provide an output voice signal. The systems, methods, and instructions receive a plurality of microphone signals and combine the plurality of microphone signals according to a first combination and a second combination. The first combination produces a primary signal having enhanced response in the direction of the user's mouth, and the second combination produces a reference signal having reduced response in the direction of the user's mouth. The primary signal and the reference signal are added and subtracted to produce a summation signal and a difference signal, respectively. The summation signal and the difference signal are compares and an output voice signal is provided based upon the comparison.

METHOD AND APPARATUS FOR DETECTING VALID VOICE SIGNAL AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

A method and apparatus for detecting a valid voice signal and a non-transitory computer readable storage medium are provided. A first audio signal including at least one audio frame signal is obtained. Multiple wavelet decomposition signals respectively corresponding to the at least one audio frame signal are obtained. A wavelet signal sequence is obtained by combining the multiple wavelet decomposition signals. A maximum value and a minimum value among audio intensity values of all sample points are obtained, and a first audio intensity threshold is determined according to the maximum value and the minimum value. Sample points each having an audio intensity value greater than the first audio intensity threshold in the wavelet signal sequence are obtained, and a signal of sample points in the first audio signal corresponding to the sample points each having an audio intensity value greater than the first audio intensity threshold is determined as the valid voice signal.