Patent classifications
G10L2025/783
AUTOMATIC SMOOTHED CAPTIONING OF NON-SPEECH SOUNDS FROM AUDIO
A content server accessing an audio stream, and inputs portions of the audio stream into one or more non-speech classifiers for classification, the non-speech classifiers generating, for portions of the audio stream, a set of raw scores representing likelihoods that the respective portion of the audio stream includes an occurrence of a particular class of non-speech sounds associated with each of the non-speech classifiers. The content server generates binary scores for the sets of raw scores, the binary scores generated based on a smoothing of a respective set of raw scores. The content server applies a set of non-speech captions to portions of the audio stream in time, each of the sets of non-speech captions based on a different one of the set binary scores of the corresponding portion of the audio stream.
Sound classification system for hearing aids
A hearing aid includes a sound classification module to classify environmental sound sensed by a microphone. The sound classification module executes an advanced sound classification algorithm. The hearing aid then processes the sound according to the classification.
ANCHORED SPEECH DETECTION AND SPEECH RECOGNITION
A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.
Speech recognition processing device, speech recognition processing method and display device
The voice recognition processing apparatus includes a voice acquirer, a first voice recognizer, a storage device, and a recognition result determiner. The voice acquirer acquires a user's voice, and outputs voice information. The first voice recognizer converts the voice information into first information. The storage device previously stores a dictionary in which an exclusion vocabulary is registered. The recognition result determiner compares the first information with the exclusion vocabulary to determine whether the first information includes a word that agrees with a word included in the exclusion vocabulary. The recognition result determiner determines that the first information is information to be rejected, when the first information includes the word that agrees with a word included in the exclusion vocabulary, and determines that the first information is information to be executed, when the first information does not include the word that agrees with a word included in the exclusion vocabulary.
SYSTEM AND METHOD FOR PERFORMING AUTOMATIC GAIN CONTROL USING AN ACCELEROMETER IN A HEADSET
A method performing automatic gain control (AGC) using an accelerometer in a headset starts with an accelerometer-based voice activity detector (VADa) generating a VADa output based on (i) acoustic signals received from at least one microphone included in a pair of earbuds and (ii) data output by at least one accelerometer that is included in the pair of earbuds. The at least one accelerometer detects vibration of the user's vocal chords. The headset includes the pair of earbuds. An AGC controller then performs automatic gain control (AGC) on the acoustic signals from the at least one microphone based on the VADa output. Other embodiments are also described.
ANALOG VOICE ACTIVITY DETECTION
According to some embodiments, an analog processing portion may receive an audio signal from a microphone. The analog processing portion may then convert the audio signal into sub-band signals and estimate an energy statistic value, such as a Signal-to-Noise Ratio (“SNR”) value, for each sub-band signal. A classification element may classify the estimated energy statistic values with analog processing such that a wakeup signal is generated when voice activity is detected. The wakeup signal may be associated with, for example, a battery-powered, always-listening audio application.
COMPUTERIZED SYSTEM AND METHOD FOR EVALUATING A PSYCHOLOGICAL STATE BASED ON VOICE ANALYSIS
A computerized method and system for analyzing and evaluating a psychological state of at least one user by operating a voice analysis system, the voice analysis system comprising a voice analyzer processor, comprising: tuning a user's voice by playing at least one beep sound followed by silence; receiving and recording at least one voice measurement from the user; utilizing the voice analyzer processor for automatically analyzing the voice measurements to evaluate psychological state of the user; wherein the voice analysis is based on harmonic analysis of the voice measurements; and providing the user, by visually indicating upon a display, with a feedback related to the user's psychological state.
Conversational Software Agent
Voice input is received from a user. An ASR system generates in memory a set of words it has identified in the voice input, and update the set each time it identifies a new word in the voice input to add the new word to the set. A condition indicative of speech inactivity in the voice input is detected. A response for outputting to the user is generated based on the set of identified words, in response to the detection of the speech inactivity condition. The generated response is outputted to the user after an interval of time—commencing with the detection of the speech inactivity condition—has ended and only if no more words have been identified in the voice input by the ASR system in that interval of time.
HEARING DEVICE COMPRISING A NOISE REDUCTION SYSTEM
A hearing device adapted for being worn at or in an ear of a user, comprises a) an input unit comprising at last two input transducers each for converting sound around said hearing device to an electric input signal representing said sound, thereby providing at least two electric input signals; b) a beamformer filter comprising a minimum processing beamformer defined by optimized beamformer weights, the beamformer filter being configured to provide a filtered signal in dependence of said at least two electric input signals and said optimized beamformer weights; c) a reference signal representing sound around said hearing device; d) a performance criterion for said minimum processing beamformer. The minimum processing beamformer is a beamformer that provides the filtered signal with as little modification as possible in terms of a selected distance measure compared to said reference signal, while still fulfilling said performance criterion. The optimized beamformer weights are adaptively determined in dependence of said at least two electric input signals, said reference signal, said distance measure, and said performance criterion. A method of operating a hearing device is further disclosed. The invention may e.g. be used in hearing aids or headsets.
ADAPTIVE MANAGEMENT OF CASTING REQUESTS AND/OR USER INPUTS AT A RECHARGEABLE DEVICE
Implementations set forth herein relate to management of casting requests and user inputs at a rechargeable device, which provides access to an automated assistant and is capable of rendering data that is cast from a separate device. Casting requests can be handled by the rechargeable device despite a device SoC of the rechargeable device operating in a sleep mode. Furthermore, spoken utterances provided by a user for invoking the automated assistant can also be adaptively managed by the rechargeable device in order mitigate idle power consumption by the device SoC. Such spoken utterances can be initially processed by a digital signal processor (DSP), and, based on one or more features (e.g., voice characteristic, conformity to a particular invocation phrase, etc.) of the spoken utterance, the device SoC can be initialized for an amount of time that is selected based on the features of the spoken utterance.