Patent classifications
G10L2015/0636
RECOGNITION OF USER INTENTS AND ASSOCIATED ENTITIES USING A NEURAL NETWORK IN AN INTERACTION ENVIRONMENT
Systems and methods determine an intent of a received voice input corresponds to an intent label and determine an entity for the intent label. The entity may be responsive to a formulation associated with the intent. A value for the entity may be determined and populated to provide the entity as a command to one or more interaction environments. The interaction environment may execute commands responsive to a user input based on the value associated with the entity.
USER-SPECIFIC ACOUSTIC MODELS
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
Automated collection of machine learning training data
Methods and systems for automatically generating training data for use in machine learning are disclosed. The methods can involve the use of environmental data derived from first and second environmental sensors for a single event. The environmental data types derived from each environmental sensor are different. The event is detected based on first environmental data derived from the first environmental sensor, and a portion of second environmental data derived from the second environmental sensor is selected to generate training data for the detected event. The resulting training data can be employed to train machine learning models.
LEARNING PERSONALIZED ENTITY PRONUNCIATIONS
Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.
VOICE COACHING SYSTEM AND RELATED METHODS
Voice coaching system, voice coaching device, and related methods, in particular a method of operating a voice coaching system comprising a voice coaching device is disclosed, the method comprising obtaining audio data representative of one or more voices, the audio data including first audio data of a first voice; obtaining first voice data based on the first audio data; determining whether the first voice data satisfies a first training criterion; in accordance with determining that the first voice data satisfies the first training criterion, determining a first training session; outputting, via the interface of the voice coaching device, first training information indicative of the first training session.
UTTERANCE EVALUATION METHOD AND UTTERANCE EVALUATION DEVICE
An utterance evaluation method evaluates an utterance of a speaker based on a plurality of evaluation items. The utterance evaluation method is performed by a terminal device. The utterance evaluation method includes acquiring utterance voice data of the speaker and a subjective evaluation result provided by a listener, learning a weighting factor corresponding to each of the plurality of evaluation items based on the subjective evaluation result so as to calculate a new weighting factor, and evaluating each of the plurality of evaluation items based on the utterance voice data and the calculated new weighting factor and outputting a comprehensive evaluation result of the utterance of the speaker.
Detecting and suppressing voice queries
A computing system receives requests from client devices to process voice queries that have been detected in local environments of the client devices. The system identifies that a value that is based on a number of requests to process voice queries received by the system during a specified time interval satisfies one or more criteria. In response, the system triggers analysis of at least some of the requests received during the specified time interval to trigger analysis of at least some received requests to determine a set of requests that each identify a common voice query. The system can generate an electronic fingerprint that indicates a distinctive model of the common voice query. The fingerprint can then be used to detect an illegitimate voice query identified in a request from a client device at a later time.
Question answering method and apparatus
A question answering method includes obtaining target question information; determining a candidate question and answer pair based on the target question information; calculating a confidence of answer information in the candidate question and answer pair, where the confidence is used to indicate a probability that question information in the candidate question and answer pair belongs to an answer database or an adversarial database; determining whether the confidence is less than a first preset threshold; and when the confidence is less than the first preset threshold, outputting information indicating incapable of answering.
Automated word correction in speech recognition systems
Systems and methods for correcting recognition errors in speech recognition systems are disclosed herein. Natural conversational variations are identified to determine whether a query intends to correct a speech recognition error or whether the query is a new command. When the query intends to correct a speech recognition error, the system identifies a location of the error and performs the correction. The corrected query can be presented to the user or be acted upon as a command for the system.
SYSTEM TO DETECT AND REDUCE UNDERSTANDING BIAS IN INTELLIGENT VIRTUAL ASSISTANTS
Disclosed is a system and method for detecting and addressing bias in training data prior to building language models based on the training data. Accordingly system and method, detect bias in training data for Intelligent Virtual Assistant (IVA) understanding and highlight any found. Suggestions for reducing or eliminating them may be provided This detection may be done for each model within the Natural Language Understanding (NLU) component. For example, the language model, as well as any sentiment or other metadata models used by the NLU, can introduce understanding bias. For each model deployed, training data is automatically analyzed for bias and corrections suggested.