Patent classifications
G10L2015/225
COMPUTATIONALLY CUSTOMIZING INSTRUCTIONAL CONTENT
A computing system causes instructional media to be played on a device to a user. An instructor in the instructional media provides guidance as to how to perform an activity when the instructional media is played on the device. The computing system obtains user data pertaining to performance of the activity by the user. The computing system generates a user-customized portion of the instructional media based upon the user data and a computer-implemented model. The computing system causes the user-customized portion to be played on the device to the user, where the device emits audible words reproduced in a voice of the instructor, where the audible words are based upon the user data, and further where the device displays generated images of the instructor depicting the instructor speaking the audible words as the device emits the audible words.
SIRI INTEGRATION WITH GUEST VOICES
Systems and processes for operating an intelligent automated assistant are provided. In one example process, a natural-language user input is received at an electronic device and a user intent determined. Where an audio recording corresponding to the intent is available, a digital assistant of the electronic device provides a first spoken output introducing the audio recording, the audio recording itself, and a second spoken output indicating the end of the audio recording.
Assisting Users with Efficient Information Sharing among Social Connections
In one embodiment, a method includes receiving a user input from a first user at the first client system, determining that the user input is a sharing request to share content, determining multiple second users the sharing request is directed to, determining, for each second user, modalities associated with the respective second user based on the content, a user profile associated with the respective second user, and modalities supported by a second client system the respective second user is currently engaged with, the respective second user being associated with two or more second client systems, and sending, to one or more second client systems currently associated with the second users, instructions for accessing the content based on the determined modalities for each second user.
Handling calls on a shared speech-enabled device
In some implementations, a determination that a first party has spoken a query for a voice-enabled virtual assistant during a voice call between the first party and a second party is made, in response to the determination that the first party has spoken the query for the voice-enabled virtual assistant during the voice call between the first party and the second party, the voice call between the first party and the second party is placed on hold, a determination that the voice-enabled virtual assistant has resolved the query is made, and in response to the determination that the voice-enabled virtual assistant has handled the query, the voice call between the first party and the second party is resumed from hold.
Voice user interface for intervening in conversation of at least one user by adjusting two different thresholds
An electronic device is provided. The electronic device includes a memory configured to store at least one instruction, and at least one processor where the at least one processor is configured to execute the instruction to obtain voice data from a conversation of at least one user, convert the voice data to text data, determine at least one parameter indicating characteristic of the conversation based on at least one of the voice data or the text data, adjust a condition for triggering intervention in the conversation based on the determined at least one parameter, and output a feedback based on the text data when the adjusted condition is satisfied, wherein the adjustment of the condition includes adjusting a first and a second threshold based on change of the at least one parameter.
Development of voice and other interaction applications
Among other things, a developer of an interaction application for an enterprise can create items of content to be provided to an assistant platform for use in responses to requests of end-users. The developer can deploy the interaction application using defined items of content and an available general interaction model including intents and sample utterances having slots. The developer can deploy the interaction application without requiring the developer to formulate any of the intents, sample utterances, or slots of the general interaction model.
Sentiment aware voice user interface
Described herein is a system for responding to a frustrated user with a response determined based on spoken language understanding (SLU) processing of a user input. The system detects user frustration and responds to a repeated user input by confirming an action to be performed or presenting an alternative action, instead of performing the action responsive to the user input. The system also detects poor audio quality of the captured user input, and responds by requesting the user to repeat the user input. The system processes sentiment data and signal quality data to respond to user inputs.
SERVER-PROVIDED VISUAL OUTPUT AT A VOICE INTERFACE DEVICE
A method at an electronic device with an array of indicator lights includes: obtaining first visual output instructions stored at the electronic device, where the first visual output instructions control operation of the array of indicator lights based on operating state of the electronic device; receiving a voice input; obtaining from a remote system a response to the voice input and second visual output instructions, where the second visual output instructions are provided by the remote system along with the response in accordance with a determination that the voice input satisfies one or more criteria; executing the response; and displaying visual output on the array of indicator lights in accordance with the second visual output instructions, where otherwise in absence of the second visual output instructions the electronic device displays visual output on the array of indicator lights in accordance with the first visual output instructions.
Systems and methods to translate a spoken command to a selection sequence
Systems and methods to translate a spoken command to a selection sequence are disclosed. Exemplary implementations may: obtain audio information representing sounds captured by a client computing platform; analyze the sounds to determine spoken terms; determine whether the spoken terms include one or more of the terms that are correlated with the commands; responsive to determining that the spoken terms are terms that are correlated with a particular command stored in the electronic storage, perform a set of operations that correspond to the particular command; responsive to determine that the spoken terms are not the terms correlated with the commands stored in the electronic storage, determining a selection sequence that causes a result subsequent to the analysis of the sounds; correlate the spoken terms with the selection sequence; store the correlation of the spoken terms with the selection sequence; and perform the selection sequence to cause the result.
CORRECTING SPEECH MISRECOGNITION OF SPOKEN UTTERANCES
Implementations can receive audio data corresponding to a spoken utterance of a user, process the audio data to generate a plurality of speech hypotheses, determine an action to be performed by an automated assistant based on the speech hypotheses, and cause the computing device to render an indication of the action. In response to the computing device rendering the indication, implementations can receive additional audio data corresponding to an additional spoken utterance of the user, process the additional audio data to determine that a portion of the spoken utterance is similar to an additional portion of the additional spoken utterance, supplant the action with an alternate action, and cause the automated assistant to initiate performance of the alternate action. Some implementations can determine whether to render the indication of the action based on a confidence level associated with the action.