Patent classifications
G10L2015/0638
PROVIDING PRE-COMPUTED HOTWORD MODELS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.
ACOUSTIC MODEL TRAINING USING CORRECTED TERMS
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
Determining hotword suitability
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.
Method and system for generating dynamic text responses for display after a search
A system and method for operating the same includes a language processing module generating a search request text signal and determining identified data from the search request text signal. A search module generates search results in response to the search request text signal. A dialog manager classifies the search request text signal into a response classification associated with a plurality of templates, selects a first template from the plurality of templates in response to the response classification and corrects search results in response to the identified data and the template to form a corrected response signal. A device receives and displays the corrected response signal.
SYSTEM AND METHOD FOR IMPROVING AIR TRAFFIC COMMUNICATION (ATC) TRANSCRIPTION ACCURACY BY INPUT OF PILOT RUN-TIME EDITS
Systems and methods are provided for training of an Automatic Speech Recognition (ASR) model during runtime of a transcription system, the system includes a background processor configured to operate with the transcription system to display a speech-to-text sample of an audio segment of a cockpit communication with an identifier which is converted using an ASR model wherein the background processor receives a response by a user during runtime of the transcription system and display of the speech-to-text sample and causes a change to the identifier to either a positive or negative attribute upon a determination of the correctness of a conversion process of the speech-to-text sample using the ASR model by review of a display of the content of the speech-to-text sample; and to train the ASR model based on information associated with the content of the speech-to-text sample in accordance with the response by the user.
Building a text-to-speech system from a small amount of speech data
A method of building a text-to-speech (TTS) system from a small amount of speech data includes receiving a first plurality of recorded speech samples from an assortment of speakers and a second plurality of recorded speech samples from a target speaker where the assortment of speakers does not include the target speaker. The method further includes training a TTS model using the first plurality of recorded speech samples from the assortment of speakers. Here, the trained TTS model is configured to output synthetic speech as an audible representation of a text input. The method also includes re-training the trained TTS model using the second plurality of recorded speech samples from the target speaker combined with the first plurality of recorded speech samples from the assortment of speakers. Here, the re-trained TTS model is configured to output synthetic speech resembling speaking characteristics of the target speaker.
Task-oriented dialog system and method through feedback
An automatic agent may be improved through feedback. A user input may be received through a user interface. A plurality of current utterance variables may be obtained by tokenizing the user input. The automatic agent may execute a machine learning policy to generate a reply to the user input based on the plurality of current utterance variables. A different reply may be obtained in response to an indication that the reply will lead to a breakdown, wherein the breakdown comprises an unhuman response from the automatic agent according to the machine learning policy. The machine learning policy may be adjusted based on the plurality of current utterance variables and the different reply.
DETERMINING HOTWORD SUITABILITY
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.
SYSTEMS AND METHODS FOR TRAINING VOICE QUERY MODELS
Methods for automatically evaluating ASR outputs and providing annotations, including corrections, on the transcriptions—in order to improve recognition—may be based on an analysis of sessions of user voice queries, utilizing time-ordered ASR transcriptions of user voice queries (i.e., user utterances). This utterance-based approach may involve extracting both session-level and query-level characteristics from a voice query sessions and identifying patterns of query reformulation in order to detect erroneous transcriptions and automatically determine an appropriate correction. Alternative, or in addition, ASR outputs may be evaluated based on user behavior. The outcomes may be classified as positive or negative. An ASR transcription may be labeled using the description of the outcome. The labeled transcription may be used as training data to train a model to output improved transcriptions of voice queries.
Determining hotword suitability
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining hotword suitability. In one aspect, a method includes receiving speech data that encodes a candidate hotword spoken by a user, evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, generating a hotword suitability score for the candidate hotword based on evaluating the speech data or a transcription of the candidate hotword, using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user.