Patent classifications
G10L2015/0638
Providing an automated summary
Methods, systems, and apparatus, including computer programs encoded on computer storage media for summarizing a call. One of the methods includes generating text corresponding to processing audio produced during an interaction between two participants by executing natural language processing logic. The method includes identifying one or more topics by providing the generated text to a machine learning system, the machine learning system trained to identify topics based on text. The method also includes generating a summary of the interaction based on the one or more topics and the text.
Systems and Methods for Handling Customer Conversations at a Contact Center
A contact center server receives an utterance from a customer device as part of a conversation. The contact center server identifies utterance parameters of the utterance, generates a first response to the utterance based on the utterance parameters, and outputs the utterance parameters and the first response to an agent device. The contact center server receives agent-identified-information corresponding to the utterance or agent-modified-information corresponding to the utterance parameters from the agent device. Subsequently, the contact center server generates a second response to the utterance based on the agent-identified-information or the agent-modified-information and outputs the second response to the agent device.
Providing pre-computed hotword models
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.
Acoustic model training using corrected terms
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.
INCREMENTAL POST-EDITING AND LEARNING IN SPEECH TRANSCRIPTION AND TRANSLATION SERVICES
Computer systems and computer-implemented methods provide for interactive and incremental post-editing of real-time speech transcription and translation. A first component is automatic identification of potentially problematic regions in the output (e.g., transcription or translation) that are either likely to be technically processed badly or risky in terms of their content or expression. A second component is intelligent, efficient interfaces that permit multiple editors to correct system output concurrently, collaboratively, efficiently, and simultaneously, so that corrections can be seamlessly inserted and become part of a running presentation. A third component is incremental learning and adaptation that allows the system to use the human corrective feedback to deliver instantaneous improvement of system behavior down-stream. A fourth component is transfer learning to transfer short-term learning into long term learning if the modifications warrant long-term retention.
Multi-user configuration
Examples of multi-user configuration are disclosed. An example method includes, at an electronic device: receiving a request; and in response to the request: if the voice input does not match a voice profile associated with an account associated with the electronic device: causing output of first information based on the request using a first account associated with the electronic device; if a setting of the electronic device has a first state, causing update of account data of the first account based on the request; and if the setting has a second state, forgoing causing update of the account data; and if the voice input matches a voice profile associated with an account associated with the electronic device: causing output of the first information using the account associated with the matching voice profile; and causing update of account data of the account based on the request.
Training an artificial intelligence of a voice response system based on non_verbal feedback
Methods, systems, and computer program products for training an artificial intelligence (AI) of a voice response system. Aspects include receiving, by the voice response system from a user, a voice command to perform a requested action and interpreting, by an AI model, the voice command. Aspects also include performing an action based on the interpretation of the voice command and receiving non-verbal feedback from the user. Aspects further include updating the AI model based on a determination that the non-verbal feedback indicates that the user is not satisfied with the action performed.
Display apparatus and method for registration of user command
A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.
Compounding corrective actions and learning in mixed mode dictation
Techniques performed by a data processing system for processing voice content received from a user herein include receiving a first audio input from the user comprising a mixed-mode dictation, analyzing, using one or more machine learning (ML) models, the first audio input to obtain a first interpretation of the mixed-mode dictation, presenting the first interpretation to the user in an application on the data processing system, receiving a second audio input from the user comprising a corrective command, analyzing the second audio input to obtain a second interpretation of the restatement of the mixed-mode dictation presenting the second interpretation to the user, receiving an indication from the user that the second interpretation is a correct interpretation of the mixed-mode dictation, and modifying the operating parameters of the one or more machine learning models to interpret the subsequent instances of the mixed-mode dictation based on the second interpretation.
NOISE DATA AUGMENTATION FOR NATURAL LANGUAGE PROCESSING
Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.