Patent classifications
G10L2015/0638
TWO-STAGE TRAINING OF A SPOKEN DIALOGUE SYSTEM
Described herein are systems and methods for two-stage training of a spoken dialogue system. The first stage trains a policy network using external data to produce a semi-trained policy network. The external data includes one or more known fixed dialogues. The second stage trains the semi-trained policy network through interaction to produce a trained policy network. The interaction may be interaction with a user simulator.
MULTI-USER CONFIGURATION
Examples of multi-user configuration are disclosed. An example method includes, at an electronic device: receiving a request; and in response to the request: if the voice input does not match a voice profile associated with an account associated with the electronic device: causing output of first information based on the request using a first account associated with the electronic device; if a setting of the electronic device has a first state, causing update of account data of the first account based on the request; and if the setting has a second state, forgoing causing update of the account data; and if the voice input matches a voice profile associated with an account associated with the electronic device: causing output of the first information using the account associated with the matching voice profile; and causing update of account data of the account based on the request.
MULTI-TIER RULE AND AI PROCESSING FOR HIGH-SPEED CONVERSATION SCORING
Apparatus and methods for leveraging machine learning and artificial intelligence to assess a sentiment of an utterance expressed by a user during an interaction between an interactive response system and the user is provided. The methods may include a natural language processor processing the utterance to output an utterance intent. The methods may also include a signal extractor processing the utterance, the utterance intent and previous utterance data to output utterance signals. The methods may additionally include an utterance sentiment classifier using a hierarchy of rules to extract, from a database, a label, the extracting being based on the utterance signals. The methods may further include a sequential neural network classifier using a trained algorithm to process the label and a sequence of historical labels to output a sentiment score.
Background voice recognition trainer
A method of operating a speech recognition system includes converting a spoken utterance by a user into an electrical voice signal by use of a local microphone associated with a local electronic device. The electrical voice signal is transmitted to a remote voice recognizer. The remote voice recognizer is used to transcribe the electrical voice signal and to produce a confidence score. The confidence score indicates a level of confidence that the transcription of the electrical voice signal substantially matches the words of the spoken utterance. The transcription of the electrical voice signal and the confidence score are transmitted from the remote voice recognizer to the local electronic device. The electrical voice signal, the transcription of the electrical voice signal, and the confidence score are used at the local device to train a local voice recognizer.
METHOD OF ACCESSING A DIAL-UP SERVICE
A method of accessing a dial-up service is disclosed. An example method of providing access to a service includes receiving a first speech signal from a user to form a first utterance; recognizing the first utterance using speaker independent speaker recognition; requesting the user to enter a personal identification number; and when the personal identification number is valid, receiving a second speech signal to form a second utterance and providing access to the service.
Multi-user configuration
Examples of multi-user configuration are disclosed. An example method includes, at an electronic device: receiving a request; and in response to the request: if the voice input does not match a voice profile associated with an account associated with the electronic device: causing output of first information based on the request using a first account associated with the electronic device; if a setting of the electronic device has a first state, causing update of account data of the first account based on the request; and if the setting has a second state, forgoing causing update of the account data; and if the voice input matches a voice profile associated with an account associated with the electronic device: causing output of the first information using the account associated with the matching voice profile; and causing update of account data of the account based on the request.
Compounding Corrective Actions and Learning in Mixed Mode Dictation
Techniques performed by a data processing system for processing voice content received from a user herein include receiving a first audio input from the user comprising a mixed-mode dictation, analyzing, using one or more machine learning (ML) models, the first audio input to obtain a first interpretation of the mixed-mode dictation, presenting the first interpretation to the user in an application on the data processing system, receiving a second audio input from the user comprising a corrective command, analyzing the second audio input to obtain a second interpretation of the restatement of the mixed-mode dictation presenting the second interpretation to the user, receiving an indication from the user that the second interpretation is a correct interpretation of the mixed-mode dictation, and modifying the operating parameters of the one or more machine learning models to interpret the subsequent instances of the mixed-mode dictation based on the second interpretation.
Method and apparatus for voice interaction control of movement base on material movement
A voice interaction method and apparatus is described. The method includes setting a correspondence between an image material movement command and an interaction keyword. The method also includes displaying an image material, recording a user voice file and parsing the user voice file so as to obtain an interaction keyword. The method also includes determining, according to the interaction keyword obtained by parsing, an image material movement command corresponding to the interaction keyword, and controlling movement of the image material based on the determined image material movement command.
Method, apparatus, and system for conflict detection and resolution for competing intent classifiers in modular conversation system
A method, apparatus, and system are provided for resolving conflicts between training data conflicts by retrieving independent training data sets, each comprising a plurality of intents and end-user utterances for use in training one or more classifiers to recognize a corresponding intent from one or more of the end-user utterances, providing a first test end-user utterance associated with a first intent from the first independent training data set to the one or more classifiers to select an output intent generated by the one or more classifiers; identifying a first conflict when the first intent does not match the output intent, and automatically generating, by the system, one or more conflict resolution recommendations for display and selection by an end user to resolve the first conflict.
Communication system
Systems and methods for responding to spoken language input or multi-modal input are described herein. More specifically, one or more user intents are determined or inferred from the spoken language input or multi-modal input to determine one or more user goals via a dialogue belief tracking system. The systems and methods disclosed herein utilize the dialogue belief tracking system to perform actions based on the determined one or more user goals and allow a device to engage in human like conversation with a user over multiple turns of a conversation. Preventing the user from having to explicitly state each intent and desired goal while still receiving the desired goal from the device, improves a user's ability to accomplish tasks, perform commands, and get desired products and/or services. Additionally, the improved response to spoken language inputs from a user improves user interactions with the device.