Patent classifications
G10L2015/022
TRAINING ACOUSTIC MODELS USING CONNECTIONIST TEMPORAL CLASSIFICATION
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
SPEECH RECOGNITION SYSTEM WITH FINE-GRAINED DECODING
Provided is a speech recognition system including an acoustic model, a decoding graph module, a history buffer, and a decoder. The acoustic model is configured to receive an acoustic input from an input module, divide the acoustic input into audio clips, and return scores evaluated for the audio clips. The decoding graph module is configured to store a decoding graph having at least one possible path of the keyword. The history buffer is configured to store history information corresponding to the possible path in the decoding graph module. The decoder is connected to the acoustic model, the decoding graph module, and the history buffer, and configured to receive the scores from the acoustic model, loop up the possible path in the decoding graph module, and predict an output keyword.
SPEECH ANALYSIS ALGORITHMIC SYSTEM AND METHOD FOR OBJECTIVE EVALUATION AND/OR DISEASE DETECTION
Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.
Acoustic model training method, speech recognition method, apparatus, device and medium
An acoustic model training method, a speech recognition method, an apparatus, a device and a medium. The acoustic model training method comprises: performing feature extraction on a training speech signal to obtain an audio feature sequence; training the audio feature sequence by a phoneme mixed Gaussian Model-Hidden Markov Model to obtain a phoneme feature sequence; and training the phoneme feature sequence by a Deep Neural Net-Hidden Markov Model-sequence training model to obtain a target acoustic model. The acoustic model training method can effectively save time required for an acoustic model training, improve the training efficiency, and ensure the recognition efficiency.
ACOUSTIC MODEL TRAINING METHOD, SPEECH RECOGNITION METHOD, APPARATUS, DEVICE AND MEDIUM
An acoustic model training method, a speech recognition method, an apparatus, a device and a medium. The acoustic model training method comprises: performing feature extraction on a training speech signal to obtain an audio feature sequence; training the audio feature sequence by a phoneme mixed Gaussian Model-Hidden Markov Model to obtain a phoneme feature sequence; and training the phoneme feature sequence by a Deep Neural Net-Hidden Markov Model-sequence training model to obtain a target acoustic model. The acoustic model training method can effectively save time required for an acoustic model training, improve the training efficiency, and ensure the recognition efficiency.
TRAINING ACOUSTIC MODELS USING CONNECTIONIST TEMPORAL CLASSIFICATION
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
Training acoustic models using connectionist temporal classification
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.
SPEECH RECOGNITION METHOD AND APPARATUS
A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.
Speech analysis algorithmic system and method for objective evaluation and/or disease detection
Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.
Speech recognition method and apparatus
A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.