Patent classifications
G10L2015/0633
Speech recognition method and apparatus
A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.
METHOD AND DEVICE FOR PERFORMING VOICE RECOGNITION USING GRAMMAR MODEL
A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.
Method and device for performing voice recognition using grammar model
A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.
Identification of candidate training utterances from human conversations with an intelligent interactive assistant
A method for creating binary classification models and using the binary classification models to select candidate training utterances from a plurality of live utterances is provided. The method may include receiving a plurality of intents and associated training utterances. The method may include creating, from the training utterances, a binary classification model for each intent. The binary classification model may include a vector representation of a line of demarcation between utterances associated with the intent and utterances disassociated from the intent. The method may also include receiving live utterances. An intent may be determined for each live utterance. The method may include creating a vector representation of the live utterance. The method may include selecting candidate training utterances based on a comparison between the vector representation of the live utterance and the vector representation included in the binary classification model of the intent determined for the live utterance.
Systems and methods for generating disambiguated terms in automatically generated transcriptions including instructions within a particular knowledge domain
System and method for generating disambiguated terms in automatically generated transcripts and employing the system are disclosed. Exemplary implementations may: obtain a set of transcripts representing various speech from users; obtain indications of correlated correct and incorrect transcripts of spoken terms; use a vector generation model to generate vectors for individual instances of the correctly transcribed terms and individual instances the incorrectly transcribed terms based on text and contexts of the individual transcribed terms; and train the vector generation model to reduce spatial separation of the vectors generated for the spoken terms in the correlated correct transcripts and the incorrect transcripts.
METHOD AND SYSTEM FOR DETECTING VOICE ACTIVITY IN NOISY CONDITIONS
A voice activity detection method includes: training one or more computerized neural networks having a denoising autoencoder and a classifier, wherein the training is performed utilizing one or more models including Mel-frequency cepstral coefficients (MFCC) features, features, features, and Pitch features, each model being recorded at one or more differing associated predetermined signal to noise ratios; recording a raw audio waveform and transmitting the raw audio waveform to the computerized neural network; denoising the raw audio wave utilizing the denoising autoencoder; and determining whether the raw audio waveform contains human speech; extracting any human speech from the raw audio waveform.
IDENTIFICATION OF CANDIDATE TRAINING UTTERANCES FROM HUMAN CONVERSATIONS WITH AN INTELLIGENT INTERACTIVE ASSISTANT
A method for creating binary classification models and using the binary classification models to select candidate training utterances from a plurality of live utterances is provided. The method may include receiving a plurality of intents and associated training utterances. The method may include creating, from the training utterances, a binary classification model for each intent. The binary classification model may include a vector representation of a line of demarcation between utterances associated with the intent and utterances disassociated from the intent. The method may also include receiving live utterances. An intent may be determined for each live utterance. The method may include creating a vector representation of the live utterance. The method may include selecting candidate training utterances based on a comparison between the vector representation of the live utterance and the vector representation included in the binary classification model of the intent determined for the live utterance.
User-customizable and domain-specific responses for a virtual assistant for multi-dwelling units
The present disclosure provides systems, methods, and computer-readable storage devices for enabling user management and control of responses of a virtual assistant for use in responding to questions related to multi-dwelling units without requiring reprogramming of the virtual assistant. To illustrate, a question to be answered by a virtual assistant may be received, and one or more responses to the question may be retrieved from a response database for the virtual assistant. A user interface (UI) may be provided that indicates the one or more responses. A user-selected response may be received via the UI, the user-selected response including a selected response from the one or more responses or a user-created response. An entry in the response database may be updated based on the user-selected response and a priority associated with the entry may be set, such as by increasing the priority based on user selection of the response.
METHOD AND DEVICE FOR PERFORMING VOICE RECOGNITION USING GRAMMAR MODEL
A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.
Noise data augmentation for natural language processing
Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.