G10L2015/0633

Vocal recognition using generally available speech-to-text systems and user-defined vocal training

Techniques for augmenting the output of generally available speech-to-text systems using local profiles are presented. An example method includes receiving an audio recording of a natural language command. The received audio recording of the natural language command is transmitted to a speech-to-text system, and a text string generated from the audio recording is received from the speech-to-text system. The text string is corrected based on a local profile mapping incorrectly transcribed words from the speech-to-text system to corrected words. A function in a software application is invoked based on the corrected text string.

Low-power automatic speech recognition device

A decoder comprises a feature extraction circuit for calculating one or more feature vectors; an acoustic model circuit coupled to receive one or more feature vectors from said feature extraction circuit and assign one or more likelihood values to the one or more feature vectors; a memory for storing states of transition of the decoder; and a search circuit for receiving an input from said acoustic model circuit corresponding to the one or more likelihood values based upon the one or more feature vectors, and for choosing states of transition from the memory based on the input from said acoustic model.

SYSTEM AND METHOD FOR CREATING DATA TO TRAIN A CONVERSATIONAL BOT

A system and method for creating input data to be used to train a conversational bot may include receiving a set of conversations, each conversation including sentences, classifying each sentence into a dialog act taken from a number of dialog acts, for each set of sentences classified into a dialog act, clustering the set of sentences into clusters based on the content (e.g. text) of the sentences, each cluster having a cluster name or label, and generating a language model based on the cluster labels. Slots may be identified in the sentences based in part on the dialog act classifications. A bot may be trained using data such as the slots, language model, and clusters.

NOISE DATA AUGMENTATION FOR NATURAL LANGUAGE PROCESSING

Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.

SYLLABLE BASED AUTOMATIC SPEECH RECOGNITION
20210193117 · 2021-06-24 ·

Systems, methods, and computer programs are described which utilize the structure of syllables as an organizing element of automated speech recognition processing to overcome variations in pronunciation, to efficiently resolve confusable aspects, to exploit context, and to map the speech to orthography.

Method and device for performing voice recognition using grammar model

A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.

Methods and systems for cockpit speech recognition acoustic model training with multi-level corpus data augmentation

A method for initializing a device for performing acoustic speech recognition (ASR) using an ASR model, by a computer system including at least one processor and a system memory element. The method includes obtaining a plurality of voice data articulations of predetermined phrases, by the at least one processor via a user interface. The plurality of voice data articulations includes a first quantity of audio samples of actual articulated voice data, and each of the plurality of voice data articulations includes one of the audio samples including acoustic frequency components. The method further includes performing a plurality of augmentations to the plurality of voice data articulations of predetermined phrases, to generate a corpus audio data set that includes the first quantity of audio samples and a second quantity of audio samples including augmented versions of the first quantity of audio samples.

Method and device for performing voice recognition using grammar model

A method of updating speech recognition data including a language model used for speech recognition, the method including obtaining language data including at least one word; detecting a word that does not exist in the language model from among the at least one word; obtaining at least one phoneme sequence regarding the detected word; obtaining components constituting the at least one phoneme sequence by dividing the at least one phoneme sequence into predetermined unit components; determining information regarding probabilities that the respective components constituting each of the at least one phoneme sequence appear during speech recognition; and updating the language model based on the determined probability information.

Syllable based automatic speech recognition
10916235 · 2021-02-09 · ·

Systems, methods, and computer programs are described which utilize the structure of syllables as an organizing element of automated speech recognition processing to overcome variations in pronunciation, to efficiently resolve confusable aspects, to exploit context, and to map the speech to orthography.

INFORMATION PROCESSING APPARATUS, KEYWORD DETECTING APPARATUS, AND INFORMATION PROCESSING METHOD
20210065684 · 2021-03-04 · ·

According to one embodiment, an information processing apparatus includes following units. The acquisition unit acquires first training data including a combination of a voice feature quantity and a correct phoneme label of the voice feature quantity. The training unit trains an acoustic model using the first training data in a manner to output the correct phoneme label in response to input of the voice feature quantity. The extraction unit extracts from the first training data, second training data including voice feature quantities of at least one of a keyword, a sub-word, a syllable, or a phoneme included in the keyword. The adaptation processing unit adapts the trained acoustic model using the second training data to a keyword detection model.