Patent classifications
G10L2025/786
Method for robust directed source separation
An apparatus includes an interface for microphones, a separated source processor configured to analyze channels from the microphones, and a voice activity detector (VAD) circuit. The VAD circuit is configured to generate a voice estimate (VE) value. The VE value is to indicate a likelihood of human speech received by the microphones. Generating the VE value includes adjusting the VE value based upon a delay between two of the microphones. The VAD circuit is configured to provide the VE value to the separated source processor.
AUDIO SYSTEMS AND METHODS FOR VOICE ACTIVITY DETECTION
Audio systems, methods, and processor instructions are provided that detect voice activity of a user and provide an output voice signal. The systems, methods, and instructions receive a plurality of microphone signals and combine the plurality of microphone signals according to a first combination and a second combination. The first combination produces a primary signal having enhanced response in the direction of the user's mouth, and the second combination produces a reference signal having reduced response in the direction of the user's mouth. The primary signal and the reference signal are added and subtracted to produce a voice-enhanced signal and a voice-reduced signal, respectively. The voice-enhanced signal and the voice-reduced signal are compares and an output voice signal is provided based upon the comparison.
METHOD AND SYSTEM FOR TIME AND FEATURE MODIFICATION OF SIGNALS
The application relates to a computer implemented method and system for modifying at least one feature of an input audio signal based on features in a guide audio signal. The method comprises: determining matchable and unmatchable sections of the guide and input audio signals; generating a time-alignment path for modifying the at least one feature of the input audio signal in the matchable sections of the input audio signal based on corresponding features in the matchable sections of the guide audio signal, based on the time-alignment path, modifying the at least one feature in the matchable sections of the audio input signal.
METHOD FOR PROCESSING AN AUDIO SIGNAL, METHOD FOR CONTROLLING AN APPARATUS AND ASSOCIATED SYSTEM
In a method for processing an audio signal, the audio signal is continuously analyzed substantially in real time from a recognized beginning of the speech input to provide a speech analysis result. The speech analysis result is used to dynamically define an end of the speech input. A speech data stream is provided based on the audio signal between the beginning and the end. The speech data stream may be further analyzed to identify one or more speech commands.
Apparatuses and methods for enhanced speech recognition in variable environments
Systems, apparatuses, and methods are described to increase a signal-to-noise ratio difference between a main channel and reference channel. The increased signal-to-noise ratio difference is accomplished with an adaptive threshold for a desired voice activity detector (DVAD) and shaping filters. The DVAD includes averaging an output signal of a reference microphone channel to provide an estimated average background noise level. A threshold value is selected from a plurality of threshold values based on the estimated average background noise level. The threshold value is used to detect desired voice activity on a main microphone channel.
Automatic Leveling of Speech Content
Embodiments are disclosed for automatic leveling of speech content. In an embodiment, a method comprises: receiving, using one or more processors, frames of an audio recording including speech and non-speech content; for each frame: determining, using the one or more processors, a speech probability; analyzing, using the one or more processors, a perceptual loudness of the frame; obtaining, using the one or more processors, a target loudness range for the frame; computing, using the one or more processors, gains to apply to the frame based on the target loudness range and the perceptual loudness analysis, where the gains include dynamic gains that change frame-by-frame and that are scaled based on the speech probability; and applying the gains to the frame so that a resulting loudness range of the speech content in the audio recording fits within the target loudness range.
Electronic device and operating method thereof
An electronic device includes a memory storing one or more instructions; and a processor configured to execute the one or more instructions stored in the memory to receive audio data corresponding to a user's utterance, to identify the user's utterance characteristics based on the received audio data, to determine a parameter for performing voice activity detection, by using the identified user's utterance characteristics, and to perform voice activity detection on the received audio data with respect to the user's utterance by using the determined parameter.
Information processing system, and information processing method
The present disclosure provides an information processing system and an information processing method capable of auditing the utterance data of an agent more flexibly. In one example, an information processing system includes: a storage section that stores utterance data of an agent; a communication section that receives request information transmitted from a client terminal and requesting utterance data of a specific agent from a user; and a control section that, when the request information is received through the communication section, replies to the client terminal with corresponding utterance data, and in accordance with feedback from the user with respect to the utterance data, updates an utterance probability level expressing a probability that the specific agent will utter utterance content indicated by the utterance data, and records the updated utterance probability level in association with the specific agent and the utterance content in the storage section.
Audio systems and methods for voice activity detection
Audio systems, methods, and processor instructions are provided that detect voice activity of a user and provide an output voice signal. The systems, methods, and instructions receive a plurality of microphone signals and combine the plurality of microphone signals according to a first combination and a second combination. The first combination produces a primary signal having enhanced response in the direction of the user's mouth, and the second combination produces a reference signal having reduced response in the direction of the user's mouth. The primary signal and the reference signal are added and subtracted to produce a voice-enhanced signal and a voice-reduced signal, respectively. The voice-enhanced signal and the voice-reduced signal are compares and an output voice signal is provided based upon the comparison.
Voice processing apparatus and voice processing method
A voice processing apparatus calculates a phase difference between first and second frequency signals obtained by transforming first and second voice signals generated by two voice input units for each frequency, calculates, for each extension range set outside or inside a reference range, a presence ratio based on the number of frequencies with the phase difference between the first and second frequency signals falling within the extension range, the reference range representing a range of the phase difference between the first and second voice signals for each frequency and corresponding to a direction in which a target sound source is assumed to be located, and sets, as a non-suppression range, a first extension range having the presence ratio higher than a predetermined value and a second extension range closer to the phase difference at the center of the reference range than the first extension range is within the reference range.