Patent classifications
G10L25/45
COMPANDING SYSTEM AND METHOD TO REDUCE QUANTIZATION NOISE USING ADVANCED SPECTRAL EXTENSION
Embodiments are directed to a companding method and system for reducing coding noise in an audio codec. A compression process reduces an original dynamic range of an initial audio signal through a compression process that divides the initial audio signal into a plurality of segments using a defined window shape, calculates a wideband gain in the frequency domain using a non-energy based average of frequency domain samples of the initial audio signal, and applies individual gain values to amplify segments of relatively low intensity and attenuate segments of relatively high intensity. The compressed audio signal is then expanded back to the substantially the original dynamic range that applies inverse gain values to amplify segments of relatively high intensity and attenuating segments of relatively low intensity. A QMF filterbank is used to analyze the initial audio signal to obtain a frequency domain representation.
SYSTEMS AND METHODS FOR GENERATING A DYNAMIC LIST OF HINT WORDS FOR AUTOMATED SPEECH RECOGNITION
Systems and methods are provided for determining hint words that improve the accuracy of automated speech recognition (ASR) systems. Hint words are typically determined in the context of a user issuing voice commands in connection with a voice interface system, however, a voice interface system may capture terms from overheard content and/or conversations. A system may determine a sliding window of hint words using set of qualifier rules. The system may capture audio, e.g., from a conversation or played back content, as a first input and decipher a plurality of words including a qualifying first term added to the hint words. The voice interface system may capture more audio as a second input and decipher a second plurality of words including a qualifying second term. The first term may be removed from the set of hint words, e.g., when the second term is added or after an expiration time.
Audio signal processing method and device, and storage medium
An audio signal processing method includes: acquiring audio signals from at least two sound sources respectively through at least two microphones (MICs) to obtain respective original noisy signals of the at least two MICs in a time domain; for each frame in the time domain, using a first asymmetric window to perform a windowing operation on the respective original noisy signals of the at least two MICs to acquire windowed noisy signals; performing time-frequency conversion on the windowed noisy signals to acquire respective frequency-domain noisy signals of the at least two sound sources; acquiring frequency-domain estimated signals of the at least two sound sources according to the frequency-domain noisy signals; and obtaining audio signals produced respectively by the at least two sound sources according to the frequency-domain estimated signals.
Audio signal processing method and device, and storage medium
An audio signal processing method includes: acquiring audio signals from at least two sound sources respectively through at least two microphones (MICs) to obtain respective original noisy signals of the at least two MICs in a time domain; for each frame in the time domain, using a first asymmetric window to perform a windowing operation on the respective original noisy signals of the at least two MICs to acquire windowed noisy signals; performing time-frequency conversion on the windowed noisy signals to acquire respective frequency-domain noisy signals of the at least two sound sources; acquiring frequency-domain estimated signals of the at least two sound sources according to the frequency-domain noisy signals; and obtaining audio signals produced respectively by the at least two sound sources according to the frequency-domain estimated signals.
Estimating lung volume by speech analysis
Described embodiments include an apparatus that includes a network interface and a processor. The processor is configured to receive, via the network interface, a speech signal that represents speech uttered by a subject, the speech including one or more speech segments, divide the speech signal into multiple frames, such that one or more sequences of the frames represent the speech segments, respectively, compute respective estimated total volumes of air exhaled by the subject while the speech segments were uttered, by, for each of the sequences, computing respective estimated flow rates of air exhaled by the subject during the frames belonging to the sequence and, based on the estimated flow rates, computing a respective one of the estimated total volumes of air, and, in response to the estimated total volumes of air, generate an alert. Other embodiments are also described.
Estimating lung volume by speech analysis
Described embodiments include an apparatus that includes a network interface and a processor. The processor is configured to receive, via the network interface, a speech signal that represents speech uttered by a subject, the speech including one or more speech segments, divide the speech signal into multiple frames, such that one or more sequences of the frames represent the speech segments, respectively, compute respective estimated total volumes of air exhaled by the subject while the speech segments were uttered, by, for each of the sequences, computing respective estimated flow rates of air exhaled by the subject during the frames belonging to the sequence and, based on the estimated flow rates, computing a respective one of the estimated total volumes of air, and, in response to the estimated total volumes of air, generate an alert. Other embodiments are also described.
METHOD FOR PROCESSING AN AUDIO SIGNAL, METHOD FOR CONTROLLING AN APPARATUS AND ASSOCIATED SYSTEM
In a method for processing an audio signal, the audio signal is continuously analyzed substantially in real time from a recognized beginning of the speech input to provide a speech analysis result. The speech analysis result is used to dynamically define an end of the speech input. A speech data stream is provided based on the audio signal between the beginning and the end. The speech data stream may be further analyzed to identify one or more speech commands.
Stereo signal encoding method and encoding apparatus
A stereo signal encoding method includes determining a window length of an attenuation window based on an inter-channel time difference; determining a modified linear prediction analysis window based on the window length of the attenuation window, where values of at least some points from a point (L−sub_window_len) to a point (L−1) in the modified linear prediction analysis window are less than values of corresponding points from a point (L−sub_window_len) to a point (L−1) in an initial linear prediction analysis window, and the window length of the modified linear prediction analysis window is equal to a window length of the initial linear prediction analysis window; and performing linear prediction analysis on a to-be-processed sound channel signal based on the modified linear prediction analysis window.
SYSTEMS AND METHODS TO IMPROVE TIMESTAMP TRANSITION RESOLUTION
Example methods and apparatus to improve timestamp transition resolution of watermarks are disclosed. A disclosed example apparatus is to determine an initial resolution for timestamp transitions based on a first number of time units between first ones of watermarks detected in media, and determine an updated resolution for the timestamp transitions based on a predicted timestamp transition window and a second number of time units between second ones of the watermarks detected in the media, the second ones of the watermarks to be subsequent to the first ones of the watermarks in the media, the predicted timestamp transition window associated with the initial resolution for time stamp transitions, the updated resolution for the timestamp transitions corresponding to a third number of time units, the third number of time units less than the second number of time units.
SYSTEMS AND METHODS TO IMPROVE TIMESTAMP TRANSITION RESOLUTION
Example methods and apparatus to improve timestamp transition resolution of watermarks are disclosed. A disclosed example apparatus is to determine an initial resolution for timestamp transitions based on a first number of time units between first ones of watermarks detected in media, and determine an updated resolution for the timestamp transitions based on a predicted timestamp transition window and a second number of time units between second ones of the watermarks detected in the media, the second ones of the watermarks to be subsequent to the first ones of the watermarks in the media, the predicted timestamp transition window associated with the initial resolution for time stamp transitions, the updated resolution for the timestamp transitions corresponding to a third number of time units, the third number of time units less than the second number of time units.