G10L2015/0633

FINE-TUNING LANGUAGE MODELS FOR SUPERVISED LEARNING TASKS VIA DATASET PREPROCESSING

This application provides systems and methods for training a language model to perform one or more specific natural language processing tasks. The embodiments described herein fine-tune language models for downstream tasks solely by pre-processing the training data set. Rather than fine-tuning via architecture changes (e.g., addition of classification layers on top of a language model), the embodiments described herein fine-tune language model(s) via dataset pre-processing alone. This is much simpler for the practitioner. Furthermore, it allows iterative additions of functionality to the language model without a complete restructure of the architecture. This is possible because of the general nature of the language-modelling task, which essentially consists of predicting what comes next in a sequence given some context. If training data can be framed in this manner, a language model can be used to solve that task directly without architecture modifications.

Systems and methods for machine learning-based multi-intent segmentation and classification

Systems and methods for synthesizing training data for multi-intent utterance segmentation include identifying a first corpus of utterances comprising a plurality of distinct single-intent in-domain utterances; identifying a second corpus of utterances comprising a plurality of distinct single-intent out-of-domain utterances; identifying a third corpus comprising a plurality of distinct conjunction terms; forming a multi-intent training corpus comprising synthetic multi-intent utterances, wherein forming each distinct multi-intent utterance includes: selecting a first distinct in-domain utterance from the first corpus of utterances; probabilistically selecting one of a first out-of-domain utterance from the second corpus and a second in-domain utterance from the first corpus; probabilistically selecting or not selecting a distinct conjunction term from the third corpus; and forming a synthetic multi-intent utterance including appending the first in-domain utterance with one of the first out-of-domain utterance from the second corpus of utterances and the second in-domain utterance from the first corpus of utterances.

SYSTEM AND METHOD FOR CONTROLLING AN APPLICATION USING NATURAL LANGUAGE COMMUNICATION

A system and method are disclosed for setting up a communication link between a device or application and a system with a controller. The controller can collect and send information to the application. A user interfaces with the controller to access the functionality of the application through providing commands to the controller. The system allows the user to interface with multiple applications.

VOCAL RECOGNITION USING GENERALLY AVAILABLE SPEECH-TO-TEXT SYSTEMS AND USER-DEFINED VOCAL TRAINING
20200335100 · 2020-10-22 ·

Techniques for augmenting the output of generally available speech-to-text systems using local profiles are presented. An example method includes receiving an audio recording of a natural language command. The received audio recording of the natural language command is transmitted to a speech-to-text system, and a text string generated from the audio recording is received from the speech-to-text system. The text string is corrected based on a local profile mapping incorrectly transcribed words from the speech-to-text system to corrected words. A function in a software application is invoked based on the corrected text string.

METHODS AND SYSTEMS FOR COCKPIT SPEECH RECOGNITION ACOUSTIC MODEL TRAINING WITH MULTI-LEVEL CORPUS DATA AUGMENTATION
20200335084 · 2020-10-22 · ·

A method for initializing a device for performing acoustic speech recognition (ASR) using an ASR model, by a computer system including at least one processor and a system memory element. The method includes obtaining a plurality of voice data articulations of predetermined phrases, by the at least one processor via a user interface. The plurality of voice data articulations includes a first quantity of audio samples of actual articulated voice data, and each of the plurality of voice data articulations includes one of the audio samples including acoustic frequency components. The method further includes performing a plurality of augmentations to the plurality of voice data articulations of predetermined phrases, to generate a corpus audio data set that includes the first quantity of audio samples and a second quantity of audio samples including augmented versions of the first quantity of audio samples.

SPEECH RECOGNITION METHOD AND APPARATUS
20200320977 · 2020-10-08 ·

A speech recognition method comprises: generating, based on a preset speech knowledge source, a search space comprising preset client information and for decoding a speech signal; extracting a characteristic vector sequence of a to-be-recognized speech signal; calculating a probability at which the characteristic vector corresponds to each basic unit of the search space; and executing a decoding operation in the search space by using the probability as an input to obtain a word sequence corresponding to the characteristic vector sequence.

METHOD, DEVICE AND STORAGE MEDIUM FOR SPEECH RECOGNITION
20200294488 · 2020-09-17 ·

Disclosed are a method, device and readable storage medium for speech recognition. The method includes: determining speech features of the speech data by feature extraction on the speech data; determining syllable data corresponding to each of the speech features based on a plurality of feature extraction layers and a softmax function layer included in an acoustic model, where the acoustic model is configured to convert the speech feature into the syllable data; determining text data corresponding to the speech data based on a language model, a pronouncing dictionary and the syllable data, where the pronouncing dictionary is configured to convert the syllable data into the text data, and the language model is configured to evaluate the text data; and outputting the text data.

Artificial intelligence based method and apparatus for classifying voice-recognized text

Embodiments of the present disclosure disclose an artificial intelligence based method and apparatus for classifying a voice-recognized text. A specific embodiment of the method includes: acquiring a current interactive text of a voice query from a user; analyzing the current interactive text using a lexical analyzer to obtain a current lexical structure; determining whether the current lexical structure matches a template of a category in a classifier; and classifying, if the current lexical structure matches the template of the category in the classifier, the current interactive text corresponding to the current lexical structure into the category belonging to the matched template. The embodiment can fast classify texts, effectively reduce the magnitude of manually annotated texts, and improve the annotation efficiency in intelligent voice interaction services.

SYSTEMS AND METHODS FOR MACHINE LEARNING BASED MULTI INTENT SEGMENTATION AND CLASSIFICATION

Systems and methods for synthesizing training data for multi-intent utterance segmentation include identifying a first corpus of utterances comprising a plurality of distinct single-intent in-domain utterances; identifying a second corpus of utterances comprising a plurality of distinct single-intent out-of-domain utterances; identifying a third corpus comprising a plurality of distinct conjunction terms; forming a multi-intent training corpus comprising synthetic multi-intent utterances, wherein forming each distinct multi-intent utterance includes: selecting a first distinct in-domain utterance from the first corpus of utterances; probabilistically selecting one of a first out-of-domain utterance from the second corpus and a second in-domain utterance from the first corpus; probabilistically selecting or not selecting a distinct conjunction term from the third corpus; and forming a synthetic multi-intent utterance including appending the first in-domain utterance with one of the first out-of-domain utterance from the second corpus of utterances and the second in-domain utterance from the first corpus of utterances.

SYSTEMS AND METHODS FOR MACHINE LEARNING-BASED MULTI-INTENT SEGMENTATION AND CLASSIFICATION

Systems and methods for synthesizing training data for multi-intent utterance segmentation include identifying a first corpus of utterances comprising a plurality of distinct single-intent in-domain utterances; identifying a second corpus of utterances comprising a plurality of distinct single-intent out-of-domain utterances; identifying a third corpus comprising a plurality of distinct conjunction terms; forming a multi-intent training corpus comprising synthetic multi-intent utterances, wherein forming each distinct multi-intent utterance includes: selecting a first distinct in-domain utterance from the first corpus of utterances; probabilistically selecting one of a first out-of-domain utterance from the second corpus and a second in-domain utterance from the first corpus; probabilistically selecting or not selecting a distinct conjunction term from the third corpus; and forming a synthetic multi-intent utterance including appending the first in-domain utterance with one of the first out-of-domain utterance from the second corpus of utterances and the second in-domain utterance from the first corpus of utterances.