G10L2015/0633

System and method for creating data to train a conversational bot

A system and method for creating input data to be used to train a conversational bot may include receiving a set of conversations, each conversation including sentences, classifying each sentence into a dialog act taken from a number of dialog acts, for each set of sentences classified into a dialog act, clustering the set of sentences into clusters based on the content (e.g. text) of the sentences, each cluster having a cluster name or label, and generating a language model based on the cluster labels. Slots may be identified in the sentences based in part on the dialog act classifications. A bot may be trained using data such as the slots, language model, and clusters.

NOISE DATA AUGMENTATION FOR NATURAL LANGUAGE PROCESSING

Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.

SYSTEM AND METHOD FOR SPEECH RECOGNITION USING MACHINE TRANSLITERATION AND TRANSFER LEARNING
20230169282 · 2023-06-01 ·

A method for converting speech in one of a plurality of input languages into text using machine transliteration and transfer learning is disclosed. The method includes a training stage. The training stage includes receiving a training set of a plurality of audio files and an input text corresponding to the audio input in any input language using the speech recognition engine; transliterating the training set to transform the input text into transliterated text that includes characters of a base language and training acoustic model with the plurality of audio files and corresponding transliterated text using transfer learning. The method further includes an inference stage. The inference stage includes performing decoding on output of the trained acoustic model to generate text includes characters of the base language at inference and transliterating the generated text to output text includes characters in input language using reverse transliteration.

System and method for speech-enabled access to media content by a ranked normalized weighted graph using speech recognition

Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information, wherein the weighted graph is further normalized to yield a normalized weighted graph to help with speech query searching of media content using speech recognition. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.

System and method for building diverse language models

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.

Name recognition system
09721563 · 2017-08-01 · ·

A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.

Language modeling based on spoken and unspeakable corpuses

A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.

System and method for building diverse language models

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.

INFORMATION PROCESSING METHOD, NON-TRANSITORY RECORDING MEDIUM, INFORMATION PROCESSING APPARATUS, AND INFORMATION PROCESSING SYSTEM
20230260505 · 2023-08-17 ·

An information processing method includes obtaining speech data based on a distance between a sound collection device and a speaker, obtaining text data input in a service for exchanging messages, and outputting first learning data that is based on the speech data and second learning data that includes the text data.

DIFFERENCE EXTRACTION DEVICE, METHOD AND PROGRAM

According to one embodiment, a difference extraction device includes processing circuitry. The processing circuitry acquires a text in which an input notation string is described. The processing circuitry converts the input notation string into a pronunciation string. The processing circuitry executes a pronunciation string conversion process in which the pronunciation string is converted into an output notation string. The processing circuitry extracts a difference by comparing the input notation string and the output notation string with each other.