G10L15/197

FLEXIBLE-FORMAT VOICE COMMAND

A voice-based system is configured to process commands in a flexible format, for example, in which a wake word does not necessarily have to occur at the beginning of an utterance. As in natural speech, the system being addressed may be named within or at the end of a spoken utterance rather than at the beginning, or depending on the context, may not be named at all.

Generating contextually relevant text transcripts of voice recordings within a message thread

The present disclosure relates to systems, non-transitory computer-readable media, and methods for generating contextually relevant transcripts of voice recordings based on social networking data. For instance, the disclosed systems receive a voice recording from a user corresponding to a message thread including the user and one or more co-users. The disclosed systems analyze acoustic features of the voice recording to generate transcription-text probabilities. The disclosed systems generate term weights for terms corresponding to objects associated with the user within a social networking system by analyzing user social networking data. Using the contextually aware term weights, the disclosed systems adjust the transcription-text probabilities. Based on the adjusted transcription-text probabilities, the disclosed systems generate a transcript of the voice recording for display within the message thread.

Question Answering using trained generative adversarial network based modeling of text

Mechanisms are provided for implementing a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag-of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation. The QA system obtains a plurality of candidate answers to a natural language question, where each candidate answer comprises one or more ngrams. For each candidate answer, a confidence score is generated based on a comparison of the one or more ngrams in the candidate answer to ngrams in the BoN output of the generator neural network of the GAN. A final answer to the input natural language question is selected from the plurality of candidate answers based on the confidence scores associated with the candidate answers, and is output.

Question Answering using trained generative adversarial network based modeling of text

Mechanisms are provided for implementing a Question Answering (QA) system utilizing a trained generator of a generative adversarial network (GAN) that generates a bag-of-ngrams (BoN) output representing unlabeled data for performing a natural language processing operation. The QA system obtains a plurality of candidate answers to a natural language question, where each candidate answer comprises one or more ngrams. For each candidate answer, a confidence score is generated based on a comparison of the one or more ngrams in the candidate answer to ngrams in the BoN output of the generator neural network of the GAN. A final answer to the input natural language question is selected from the plurality of candidate answers based on the confidence scores associated with the candidate answers, and is output.

ELECTRONIC DEVICE AND SPEECH PROCESSING METHOD THEREOF
20230085539 · 2023-03-16 ·

According to various example embodiments, an electronic device includes a microphone configured to receive an audio signal including speech of a user, a processor, and a memory configured to store instructions executable by the processor and personal information of the user, in which the processor is configured to extract a plurality of speech recognition candidates by analyzing a feature of the speech of the user, extract a keyword based on the plurality of speech recognition candidates, search for replacement data, based on the keyword and the personal information, and generate a recognition result corresponding to the speech of the user, based on the replacement data.

ELECTRONIC DEVICE AND SPEECH PROCESSING METHOD THEREOF
20230085539 · 2023-03-16 ·

According to various example embodiments, an electronic device includes a microphone configured to receive an audio signal including speech of a user, a processor, and a memory configured to store instructions executable by the processor and personal information of the user, in which the processor is configured to extract a plurality of speech recognition candidates by analyzing a feature of the speech of the user, extract a keyword based on the plurality of speech recognition candidates, search for replacement data, based on the keyword and the personal information, and generate a recognition result corresponding to the speech of the user, based on the replacement data.

Hypothesis generation and selection for inverse text normalization for search

Techniques for speech-to-text hypothesis generation and hypothesis selection described. A text input representing at least part of a voice recording is received from a speech-to-text component. A first text alternative is generated using a finite state transducer based at least in part on the text input. A hypothesis from a hypothesis set is selected using a language model that includes probabilities for sequences of words, the hypothesis set including the text input and the first text alternative. A selected hypothesis text associated with the selected hypothesis is sent to a search engine.

Hypothesis generation and selection for inverse text normalization for search

Techniques for speech-to-text hypothesis generation and hypothesis selection described. A text input representing at least part of a voice recording is received from a speech-to-text component. A first text alternative is generated using a finite state transducer based at least in part on the text input. A hypothesis from a hypothesis set is selected using a language model that includes probabilities for sequences of words, the hypothesis set including the text input and the first text alternative. A selected hypothesis text associated with the selected hypothesis is sent to a search engine.

Dialog device, dialog method, and dialog computer program

The dialog device according to the present invention includes a prediction unit 254 configured to predict an utterance length attribute of a user utterance in response to a the machine utterance, a selection unit 256 configured to use the utterance length attribute to select, as a feature model for usage in an end determination of the user utterance, at least one of an acoustic feature model or a lexical feature model, and an estimation unit 258 configured to estimate an end point in the user utterance using the selected model. By using this dialog device, it is possible to shorten the waiting time until a response is output to a user utterance by a machine, and to realize a more natural conversation between a user and a machine.

Dialog device, dialog method, and dialog computer program

The dialog device according to the present invention includes a prediction unit 254 configured to predict an utterance length attribute of a user utterance in response to a the machine utterance, a selection unit 256 configured to use the utterance length attribute to select, as a feature model for usage in an end determination of the user utterance, at least one of an acoustic feature model or a lexical feature model, and an estimation unit 258 configured to estimate an end point in the user utterance using the selected model. By using this dialog device, it is possible to shorten the waiting time until a response is output to a user utterance by a machine, and to realize a more natural conversation between a user and a machine.