Patent classifications
G10L15/075
SYSTEMS AND METHODS FOR LEARNING FOR DOMAIN ADAPTATION
A method for training parameters of a first domain adaptation model. The method includes evaluating a cycle consistency objective using a first task specific model associated with a first domain and a second task specific model associated with a second domain, and evaluating one or more first discriminator models to generate a first discriminator objective using the second task specific model. The one or more first discriminator models include a plurality of discriminators corresponding to a plurality of bands that corresponds domain variable ranges of the first and second domains respectively. The method further includes updating, based on the cycle consistency objective and the first discriminator objective, one or more parameters of the first domain adaptation model for adapting representations from the first domain to the second domain.
METHOD AND APPARATUS FOR EVALUATING USER INTENTION UNDERSTANDING SATISFACTION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method and apparatus for generating a user intention understanding satisfaction evaluation model, a method and apparatus for evaluating a user intention understanding satisfaction, an electronic device and a storage medium are provided, relating to intelligent voice recognition and knowledge graphs. The method for generating a user intention understanding satisfaction evaluation model is: acquiring a plurality of sets of intention understanding data, at least one set of which comprises a plurality of sequences corresponding to multi-round behaviors of an intelligent device in multi-round man-machine interactions; and learning the plurality of sets of intention understanding data through a first machine learning model, to obtain the user intention understanding satisfaction evaluation model after the learning, wherein the user intention understanding satisfaction evaluation model is configured to evaluate user intention understanding satisfactions of the intelligent device in the multi-round man-machine interactions according to the plurality of sequences corresponding to the multi-round man-machine interactions.
Personalization of conversational agents through macro recording
A computer-implemented conversational agent engages in a natural language conversation with a user, interpreting the natural language conversation by parsing and tokenizing utterances in the natural language conversation. Based on interpreting, a set of utterances in the natural language conversation to be recorded as a macro is determined. The macro is stored in a database with an associated macro identifier. Replaying of the macro executes a function specified in the set of utterances.
Streaming Action Fulfillment Based on Partial Hypotheses
A method for streaming action fulfillment receives audio data corresponding to an utterance where the utterance includes a query to perform an action that requires performance of a sequence of sub-actions in order to fulfill the action. While receiving the audio data, but before receiving an end of speech condition, the method processes the audio data to generate intermediate automated speech recognition (ASR) results, performs partial query interpretation on the intermediate ASR results to determine whether the intermediate ASR results identify an application type needed to perform the action and, when the intermediate ASR results identify a particular application type, performs a first sub-action in the sequence of sub-actions by launching a first application to execute on the user device where the first application is associated with the particular application type. The method, in response to receiving an end of speech condition, fulfills performance of the action.
SPEAKER AWARENESS USING SPEAKER DEPENDENT SPEECH MODEL(S)
Techniques disclosed herein enable training and/or utilizing speaker dependent (SD) speech models which are personalizable to any user of a client device. Various implementations include personalizing a SD speech model for a target user by processing, using the SD speech model, a speaker embedding corresponding to the target user along with an instance of audio data. The SD speech model can be personalized for an additional target user by processing, using the SD speech model, an additional speaker embedding, corresponding to the additional target user, along with another instance of audio data. Additional or alternative implementations include training the SD speech model based on a speaker independent speech model using teacher student learning.
METHODS AND SYSTEMS FOR PREDICTING NON-DEFAULT ACTIONS AGAINST UNSTRUCTURED UTTERANCES
A method to adaptively predict non-default actions against unstructured utterances by an automated assistant operating in a computing-system is provided. The method includes extracting voice-features based on receiving an input utterance from at-least one speaker by an automatic speech recognition (ASR) device, identifying the input utterance as an unstructured utterance based on the extracted voice-features and a mapping between the input utterance with one or more default actions as drawn by the ASR, obtaining at least one probable action to be performed in response to the unstructured utterance through a dynamic bayesian network (DBN). The method further includes providing the at least one probable action obtained by the DBN to the speaker in an order of the posterior probability with respect to each action.
MEMORY DETERIORATION DETECTION AND AMELIORATION
Memory deterioration detection and evaluation includes capturing human utterances with a voice interface and generating, for a user, a human utterances corpus that comprises human utterances selected from the plurality of human utterances based on meanings of the human utterances as determined by natural language processing by a computer processor. Based on data generated in response to signals sensed by one or more sensing devices operatively coupled with the computer processor, contextual information corresponding to one or more human utterances of the corpus is determined. Patterns among the corpus of human utterances are recognized based on pattern recognition performed by the computer processor using one or more machine learning models. Based on the pattern recognition a change in memory functioning of the user is identified. The identified change is classified, based on the contextual information, as to whether the change is likely due to memory impairment of the user.
METHODS AND APPARATUS FOR CORRECTING FAILURES IN AUTOMATED SPEECH RECOGNITION SYSTEMS
Systems and methods are disclosed and described for correcting errors in ASR transcriptions. For an incorrect transcription, different words or phrases from the transcription, and/or related words or phrases, are submitted as hint words to the ASR system, and the voice query is submitted again, to determine new transcriptions. This process is repeated with different transcription terms, until a different and more proper transcription is generated. This increases the accuracy of ASR systems.
SPEAKER ADAPTATION FOR ATTENTION-BASED ENCODER-DECODER
Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, and a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution. The second attention-based encoder-decoder model is trained to classify output tokens based on input speech frames of a target speaker and simultaneously trained to maintain a similarity between the first output distribution and the second output distribution.
Cross domain personalized vocabulary learning in intelligent assistants
A method includes determining, by an electronic device, a skill from a first natural language (NL) input. Upon successful determination of the skill, the first NL input is transmitted to a custom skill parser for determination of a skill intent. The custom skill parser is trained based on data including at least a custom training data set. Upon unsuccessful determination of the skill, the first NL input is transmitted to a generic parser for determination of a general intent of the first NL input.