Patent classifications
G10L25/09
Voice detection method and apparatus, and storage medium
Embodiments of the present disclosure provide a voice detection method. An audio signal can be divided into a plurality of audio segments. Audio characteristics can be extracted from each of the plurality of audio segments. The audio characteristics of the respective audio segment include a time domain characteristic and a frequency domain characteristic of the respective audio segment. At least one target voice segment can be detected from the plurality of audio segments according to the audio characteristics of the plurality of audio segments.
User programmable voice command recognition based on sparse features
A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal. The extracted sparse sound parameter information is processed using a speaker dependent sound signature database stored in the sound recognition sensor to identify sounds or speech contained in the analog signal. The sound signature database may include several user enrollments for a sound command each representing an entire word or multiword phrase. The extracted sparse sound parameter information may be compared to the multiple user enrolled signatures using cosine distance, Euclidean distance, correlation distance, etc., for example.
Speech signal cascade processing method, terminal, and computer-readable storage medium
A method for improving speech signal intelligibility is performed at a device. A speech signal is obtained. A correspondence between the speech signal and a respective user group among different user groups having distinct voice characteristics is identified. Pre-encoding signal augmentation is performed on the speech signal with a respective pre-augmentation filtering coefficient that corresponds to the respective user group to obtain a group-specific pre-augmented speech signal. The device encodes the pre-augmented speech signal for subsequent transmission through the voice communication channel. An encoded version of the pre-augmented speech signal has reduced loss of signal quality as compared to an encoded version of the speech signal that is obtained without the pre-encoding signal augmentation.
METHOD ENABLING THE DETECTION OF THE SPEECH SIGNAL ACTIVITY REGIONS
The invention is about a method enabling the detection of the speech signal activity regions with a new method proposal. The invention relates particularly to a method for encoding signals with a method that allow to determine the voice activity detection (VAD) regions for different input noise signal levels, in which the maximum average energy levels are maintained and least affected from the increasing amount of variance.
METHOD ENABLING THE DETECTION OF THE SPEECH SIGNAL ACTIVITY REGIONS
The invention is about a method enabling the detection of the speech signal activity regions with a new method proposal. The invention relates particularly to a method for encoding signals with a method that allow to determine the voice activity detection (VAD) regions for different input noise signal levels, in which the maximum average energy levels are maintained and least affected from the increasing amount of variance.
Methods and apparatus for low cost voice activity detector
In described examples, a method for detecting voice activity includes: receiving a first input signal containing noise; sampling the first input signal to form noise samples; determining a first value corresponding to the noise samples; subsequently receiving a second input signal; sampling the second input signal to form second signal samples; determining a second value corresponding to the second signal samples; forming a ratio of the second value to the first value; comparing the ratio to a predetermined threshold value; and responsive to the comparing, indicating whether voice activity is detected in the second input signal.
Methods and apparatus for low cost voice activity detector
In described examples, a method for detecting voice activity includes: receiving a first input signal containing noise; sampling the first input signal to form noise samples; determining a first value corresponding to the noise samples; subsequently receiving a second input signal; sampling the second input signal to form second signal samples; determining a second value corresponding to the second signal samples; forming a ratio of the second value to the first value; comparing the ratio to a predetermined threshold value; and responsive to the comparing, indicating whether voice activity is detected in the second input signal.
SOUND RECOGNITION APPARATUS
A sound recognition apparatus (100) comprises a microphone (110) for capturing a posterior sound signal; and a processing circuit comprising a processor (180). The processing circuit is configured to process the posterior sound signal to derive posterior data, generate, using the processor (180), amalgamated data from the posterior data and anterior data derived from a previously captured anterior signal, determine, by the processor (180), whether there are correlations between the amalgamated data, the posterior data, and the anterior data that indicate that the posterior data matches the anterior data by comparing the posterior data and the amalgamated data, and the anterior data and the amalgamated data, and upon the posterior data matching the anterior data, output, by the processor (180), an indication that the posterior data matches the anterior data.
ELECTRONIC DEVICE AND METHOD OF RECOGNIZING AUDIO SCENE
An electronic device and method of recognizing an audio scene are provided. The method of recognizing an audio scene includes: separating, according to a predetermined criterion, an input audio signal into channels; recognizing, according to each of the separated channels, at least one audio scene from the input audio signal by using a plurality of neural networks trained to recognize an audio scene; and determining, based on a result of the recognizing of the at least one audio scene, at least one audio scene included in audio content by using a neural network trained to combine audio scene recognition results for respective channels, wherein the plurality of neural networks includes: a first neural network trained to recognize the audio scene based on a time-frequency shape of an audio signal, a second neural network trained to recognize the audio scene based on a shape of a spectral envelope of the audio signal, and a third neural network trained to recognize the audio scene based on a feature vector extracted from the audio signal.
METHODS AND SYSTEM FOR CUE DETECTION FROM AUDIO INPUT, LOW-POWER DATA PROCESSING AND RELATED ARRANGEMENTS
Methods and arrangements involving electronic devices, such as smartphones, tablet computers, wearable devices, etc., are disclosed. One arrangement involves a low-power processing technique for discerning cues from audio input. Another involves a technique for detecting audio activity based on the Kullback-Liebler divergence (KLD) (or a modified version thereof) of the audio input. Still other arrangements concern techniques for managing the manner in which policies are embodied on an electronic device. Others relate to distributed computing techniques. A great variety of other features are also detailed.