Patent classifications
G10L2015/0636
System to detect and reduce understanding bias in intelligent virtual assistants
Disclosed is a system and method for detecting and addressing bias in training data prior to building language models based on the training data. Accordingly system and method, detect bias in training data for Intelligent Virtual Assistant (IVA) understanding and highlight any found. Suggestions for reducing or eliminating them may be provided This detection may be done for each model within the Natural Language Understanding (NLU) component. For example, the language model, as well as any sentiment or other metadata models used by the NLU, can introduce understanding bias. For each model deployed, training data is automatically analyzed for bias and corrections suggested.
Trial-based calibration for audio-based identification, recognition, and detection system
The disclosed technologies include methods for generating a calibration model using data that is selected to match the conditions of a particular trial that involves an automated comparison of data samples, such as a comparison-based trial performed by an audio-based recognition, identification, or detection system. The disclosed technologies also include improved methods for selecting candidate data used to build the calibration model. The disclosed technologies further include methods for evaluating the performance of the calibration model and for rejecting a trial when not enough matched candidate data is available to build the calibration model. The disclosed technologies additionally include the use of regularization and automated data generation techniques to further improve the robustness of the calibration model.
SYSTEM AND METHOD FOR DETECTING UNHANDLED APPLICATIONS IN CONTRASTIVE SIAMESE NETWORK TRAINING
A method includes determining, using at least one processing device of an electronic device, a target embedding vector for each class of a plurality of classes. The method also includes generating, using the at least one processing device, an utterance embedding vector using a pre-trained language model, where the utterance embedding vector represents an input utterance associated with an expected class. The method further includes obtaining, using the at least one processing device, a predicted class associated with the input utterance based on distances of the utterance embedding vector to spatial parameters representing the plurality of classes, where the spatial parameter of each class is based on the target embedding vector associated with that class. In addition, the method includes updating, using the at least one processing device, parameters of the language model based on a difference between the predicted class and the expected class.
DETECTING AND SUPPRESSING VOICE QUERIES
A computing system receives requests from client devices to process voice queries that have been detected in local environments of the client devices. The system identifies that a value that is based on a number of requests to process voice queries received by the system during a specified time interval satisfies one or more criteria. In response, the system triggers analysis of at least some of the requests received during the specified time interval to trigger analysis of at least some received requests to determine a set of requests that each identify a common voice query. The system can generate an electronic fingerprint that indicates a distinctive model of the common voice query. The fingerprint can then be used to detect an illegitimate voice query identified in a request from a client device at a later time.
Artificial intelligence based system and method for controlling virtual agent task flow
The present system and method may generally include organizing the task flow of a virtual agent in a way that is controlled by a set of rules and set of conditional probability distributions. The system and method may include receiving a user utterance including a first task, identifying the first task from the user utterance, and obtaining a set of rules related to the plurality of tasks. The set of rules may determine whether pre-tasks and/or pre-conditions are to be executed before executing the first task. The set of rules may also determine whether post-tasks and/or post-conditions are to be executed after executing the first task. The system and method may include executing the task; running a probabilistic graphical model on the plurality of tasks to determine a second task based on the first task; suggesting to the user the second task; and updating the probabilistic graphical model after a threshold number of runs.
System to characterize vocal presentation
A device with a microphone acquires audio data of a user's speech. That speech comprises utterances, that together comprise a session. The audio data is processed to determine sentiment data indicative of perceived emotional content of the speech as conveyed by individual utterances of the user. That information is then used to determine the emotional content of the session. For example, the information may include several words describing the overall and outlying emotions of the session. Numeric metrics may also be determined, such as activation and valence. A user interface may present the words and metrics to the user. The user may use this information to assess their state of mind, facilitate interactions with others, and so forth.
User-specific acoustic models
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
Information processing apparatus, information processing system, and information processing method, and program
Implemented are an apparatus and a method that enable highly accurate intent estimation of a user utterance. An utterance learning adaptive processing unit analyzes a plurality of user utterances input from a user, generates learning data in which entity information included in a user utterance with an unclear intent is associated with a correct intent, and stores the generated learning data is a storage unit. The utterance learning adaptive processing unit generates learning data in which an intent, acquired from a response utterance from the user to an apparatus utterance after input of a first user utterance with an unclear intent, is recorded in association with entity information included in the first user utterance. The learning data is recorded to include superordinate semantic concept information of the entity information. At the time of estimating an intent for a new user utterance, learning data with similar superordinate semantic concept information is used.
USER-SPECIFIC ACOUSTIC MODELS
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
AUDIO-BASED LINK GENERATION
First and second speech data can be received from respective first and second devices. The first and second speech data can be determined to be from a same dialog. A link can be generated based on the dialog.