G10L2025/786

Systems and methods for distinguishing valid voice commands from false voice commands in an interactive media guidance application

Systems and methods for distinguishing valid voice commands from false voice commands in an interactive media guidance application. In some aspects, the interactive media guidance application receives, at a user device, a signature sound sequence. The interactive media guidance application determines, using control circuitry, based on the signature sound sequence, a threshold gain for the current location of the user device. The interactive media guidance application receives, at the user device, a voice command. The interactive media guidance application determines, using the control circuitry, based on the voice command, a gain for the voice command. The interactive media guidance application determines, using the control circuitry, whether the gain for the voice command is different from the threshold gain. Based on determining that the gain for the voice command is different from the threshold gain, the interactive media guidance application executes, using the control circuitry, the voice command.

Hearing Assistance Device for Informing About State of Wearer
20180007475 · 2018-01-04 ·

A hearing assistance device for informing about the state of a wearer includes: an input part configured to receive a selection input for either an ambient listening function or a music listening function; at least one microphone configured to pick up ambient sound; a speaker configured to send the ambient sound to the wearer; a communication part configured to perform wired or wireless communication with an external electronic communication device; an indication part configured to indicate that the ambient listening function or the music listening function is being performed; and a controller configured to perform the ambient listening function to pick up ambient sound from the microphone according to a selection input from the input part and send the ambient sound to the speaker, or perform the music listening function to play stored music or music received from the communication part and send the music to the speaker.

Voice detection using ear-based devices

This disclosure describes techniques for detecting voice commands from a user of an ear-based device. The ear-based device may include an in-ear facing microphone to capture sound emitted in an ear of the user, and an exterior facing microphone to capture sound emitted in an exterior environment of the user. The in-ear microphone may generate an inner audio signal representing the sound emitted in the ear, and the exterior microphone may generate an outer audio signal representing sound from the exterior environment. The ear-based device may compute a ratio of a power of the inner audio signal to the outer audio signal and may compare this ratio to a threshold. If the ratio is larger than the threshold, the ear-based device may detect the voice of the user. Further, the ear-based device may set a value of the threshold based on a level of acoustic seal of the ear-based device.

APPROACH FOR DETECTING ALERT SIGNALS IN CHANGING ENVIRONMENTS
20180014112 · 2018-01-11 ·

In an audio system, an audio signal is preprocessed to provide an input signal to a fast detector and a slow detector, the input signal comprising alert signals and ambient sounds. The slow detector determines the ambient sound level of the input signal which is output to an alert signal detector. The alert signal detector uses the ambient sound level to compute an adaptive threshold level using an adaptive threshold function. The fast detector determines the envelope level of the input signal which is output to the alert signal detector. The alert signal detector compares the envelope level to the adaptive threshold level to determine if an alert signal is present in the input signal. The adaptive threshold level varies depending on the ambient sound level of the input signal and the alert signal detection of the audio system automatically adapts to changing acoustic environments having different ambient sound levels.

SYSTEMS AND METHODS FOR DISTINGUISHING VALID VOICE COMMANDS FROM FALSE VOICE COMMANDS IN AN INTERACTIVE MEDIA GUIDANCE APPLICATION
20230223020 · 2023-07-13 ·

Systems and methods for distinguishing valid voice commands from false voice commands in an interactive media guidance application. In some aspects, the interactive media guidance application receives, at a user device, a signature sound sequence. The interactive media guidance application determines, using control circuitry, based on the signature sound sequence, a threshold gain for the current location of the user device. The interactive media guidance application receives, at the user device, a voice command. The interactive media guidance application determines, using the control circuitry, based on the voice command, a gain for the voice command. The interactive media guidance application determines, using the control circuitry, whether the gain for the voice command is different from the threshold gain. Based on determining that the gain for the voice command is different from the threshold gain, the interactive media guidance application executes, using the control circuitry, the voice command.

SENSITIVITY MODE FOR AN AUDIO SPOTTING SYSTEM
20230223042 · 2023-07-13 ·

An audio spotting system configured for various operating modes including a regular mode and sensitivity mode is described. An example cascade audio spotting system may include a high-power subsystem including a high-power trigger and a transfer module. This high-power trigger includes one or more detection models used to detect whether a target sound activity is included in the one or more audio streams. The one or more detection models are associated with a first set of hyperparameters when the cascade audio spotting system is in a regular mode, and the one or more detection models are associated with a second set of hyperparameters when the cascade audio spotting system is in a sensitivity mode. The transfer module provides at least one of one or more processed audio streams for further processing in response to the high-power trigger detecting the target sound activity in the one or more audio streams.

CASCADE AUDIO SPOTTING SYSTEM
20230223041 · 2023-07-13 ·

Systems and methods for identifying audio events in one or more audio streams include the use of a cascade audio spotting system (such as a cascade keyword spotting system (KWS)) to reduce power consumption while maintaining a desired performance. An example cascade audio spotting system may include a first module and a high-power subsystem. The first module is to receive an audio stream from one or more audio streams, process the audio stream to detect a first target sound activity in the audio stream, and provide a first signal in response to detecting the first target sound activity in the audio stream. The high-power subsystem is to (in response to the first signal being provided by the first module) receive the one or more audio streams and process the one or more audio streams to detect a second target sound activity in the one or more audio streams.

Adaptive energy limiting for transient noise suppression
11694706 · 2023-07-04 · ·

The present disclosure describes aspects of adaptive energy limiting for transient noise suppression. In some aspects, an adaptive energy limiter sets a limiter ceiling for an audio signal to full scale and receives a portion of the audio signal. For the portion of the audio signal, the adaptive energy limiter determines a maximum amplitude and evaluates the portion with a neural network to provide a voice likelihood estimate. Based on the maximum amplitude and the voice likelihood estimate, the adaptive energy limiter determines that the portion of the audio signal includes noise. In response to determining that the portion of the audio signal includes noise, the adaptive energy limiter decreases the limiter ceiling and provides the limiter ceiling to a limiter module effective to limit an amount of energy of the audio signal. This may be effective to prevent audio signals from carrying full energy transient noise into conference audio.

HEARING DEVICE COMPRISING A SPEECH INTELLIGIBILITY ESTIMATOR
20220400349 · 2022-12-15 · ·

A hearing device, e.g. a hearing aid, comprises a) an input unit configured to provide at least one time-variant electric input signal representing sound, the at least one electric input signal comprising target signal components and optionally noise signal components, the target signal components originating from a target sound source; b) a signal processing unit for processing the at least one electric input signal and providing a processed signal; c) an output unit for creating output stimuli configured to be perceivable by the user as sound based on the processed signal from the signal processing unit; d) a speech presence probability prediction unit for repeatedly providing a measure of a predicted speech presence probability of the at least one electric input signal, or of a signal originating therefrom; and e) a speech intelligibility prediction unit for repeatedly providing a current measure of a predicted speech intelligibility of the at least one electric input signal, or of a signal originating therefrom. The speech intelligibility prediction unit is configured to determine said current measure of the predicted speech intelligibility in dependence of said measure of the predicted speech presence probability. A method of operating a hearing device is further disclosed. The invention may e.g. be used in hearing aids, headsets, earpieces (ear buds), etc.

Apparatus and method for voice event detection

A voice event detection apparatus is disclosed. The apparatus comprises a vibration to digital converter and a computing unit. The vibration to digital converter is configured to convert an input audio signal into vibration data. The computing unit is configured to trigger a downstream module according to a sum of vibration counts of the vibration data for a number X of frames. In an embodiment, the voice event detection apparatus is capable of correctly distinguishing a wake phoneme from the input vibration data so as to trigger a downstream module of a computing system. Thus, the power consumption of the computing system is saved.