Patent classifications
G10L2015/223
Method and apparatus for determining periods of excessive noise for receiving smart speaker voice commands
Methods and systems for determining periods of excessive noise for smart speaker voice commands. An electronic timeline of volume levels of currently playing content is made available to a smart speaker. From this timeline, periods of high content volume are determined, and the smart speaker alerts users during periods of high volume, requesting that they wait until the high-volume period has passed before issuing voice commands. In this manner, the smart speaker helps prevent voice commands that may not be detected, or may be detected inaccurately, due to the noise of the content currently being played.
Machine learning dataset generation using a natural language processing technique
A server can receive a plurality of records at a databases such that each record is associated with a phone call and includes at least one request generated based on a transcript of the phone call. The server can generate a training dataset based on the plurality of records. The server can further train a binary classification model using the training dataset. Next, the server can receive a live transcript of a phone call in progress. The server can generate at least one live request based on the live transcript using a natural language processing module of the server. The server can provide the at least one live request to the binary classification model as input to generate a prediction. Lastly, the server can transmit the prediction to an entity receiving the phone call in progress. The prediction can cause a transfer of the call to a chatbot.
Contextualized speech to text conversion
Methods, computer program products, and systems are presented. The methods, computer program products, and systems can include, for instance: determining, in performance of an interactive voice response (IVR) session, prompting data for presenting to a user, and storing text based data defining the prompting data into a data repository; presenting the prompting data to the user; receiving return voice string data from the user in response to the prompting data; generating a plurality of candidate text strings associated to the return voice string of the user; examining the text based data defining the prompting data; augmenting the plurality of candidate text strings in dependence on a result of the examining to provide a plurality of augmented candidate text strings associated to the return voice string data; and evaluating respective ones of the plurality of augmented candidate text strings associated to the return voice string data; and selecting one of the augmented candidate text strings as a returned transcription associated to the return voice string data.
VOICE COMMAND-DRIVEN DATABASE
A voice command-driven system and computer-implemented method are disclosed for selecting a data item in a list of text-based data items stored in a database using a simple affirmative voice command input without utilizing a connection to a network. The text-based data items in the list are converted to speech using an embedded text-to-speech engine and an audio output of a first converted data item is provided. A listening state is entered into for a predefined pause time to await receipt of the simple affirmative voice command input. If the simple affirmative voice command input is received during the predefined pause time, the first converted data item is selected for processing. If the simple affirmative voice command input is not received during the predefined pause time, an audio output of a next converted data item in the list is provided.
DIALOG MANAGEMENT WITH MULTIPLE APPLICATIONS
Features are disclosed for performing functions in response to user requests based on contextual data regarding prior user requests. Users may engage in conversations with a computing device in order to initiate some function or obtain some information. A dialog manager may manage the conversations and store contextual data regarding one or more of the conversations. Processing and responding to subsequent conversations may benefit from the previously stored contextual data by, e.g., reducing the amount of information that a user must provide if the user has already provided the information in the context of a prior conversation. Additional information associated with performing functions responsive to user requests may be shared among applications, further improving efficiency and enhancing the user experience.
SIMPLE AFFIRMATIVE RESPONSE OPERATING SYSTEM
A simple affirmative response operating system for selecting a data item from a list of options using a unique affirmative action. Text-based labels in a listing of content are converted to speech using an embedded text-to-speech engine and an audio output of a first converted label is provided. A listening state is entered into for a predefined pause time to await receipt of the simple affirmative action. If the simple affirmative action is performed during the predefined pause time, an associated content item is selected for output. If the simple affirmative action is not performed during the predefined pause time, an audio output of a next converted label in the list is provided. This protocol may be used to control a variety of computing devices safely and efficiently while a user is distracted or disabled from using traditional input methods.
VOICE CONTROL AND TELECOMMUNICATIONS SERVICE INTEGRATION
This disclosure describes techniques that facilitate selectively interacting with a computing resource based on receipt of an incoming voice command. Particularly, a voice control integration system may parse content of an incoming voice command to authenticate an identity of the client, and further determine an intended meaning of the incoming voice command. In doing so, the voice control integration system may interact with a computing resource to perform an action that fulfills a client request. Computing resources may be associated with service providers or client devices of a client. Further, the voice control integration system may authenticate a client identity based on a one or two-factor authentication protocol, of which one may correspond to a biometric analysis of the incoming voice command. Further, a second-factor of the two-factor authentication protocol may be implemented via a voice interaction device, another client device accessible to the client, or a combination of both.
Privacy device for smart speakers
Systems, apparatuses, and methods are described for a privacy blocking device configured to prevent receipt, by a listening device, of video and/or audio data until a trigger occurs. A blocker may be configured to prevent receipt of video and/or audio data by one or more microphones and/or one or more cameras of a listening device. The blocker may use the one or more microphones, the one or more cameras, and/or one or more second microphones and/or one or more second cameras to monitor for a trigger. The blocker may process the data. Upon detecting the trigger, the blocker may transmit data to the listening device. For example, the blocker may transmit all or a part of a spoken phrase to the listening device.
Removal of identifying traits of a user in a virtual environment
A virtual environment platform may receive, from a user device, a request to access a virtual reality (VR) environment and may verify, based on the request, a user of the user device to allow the user device access to the VR environment. The virtual environment platform may receive, after verifying the user of the user device, user voice input and user handwritten input from the user device. The virtual environment platform may generate processed user speech by processing the user voice input, wherein a characteristic of the processed user speech and a corresponding characteristic of the user voice input are different and may generate formatted user text by processing the user handwritten input, wherein the formatted user text is machine-encoded text. The virtual environment platform may cause the processed user speech to be audibly presented and the formatted user text to be visually presented in the VR environment.
Phonetic comparison for virtual assistants
In an approach for optimizing an intelligent virtual assistant by using phonetic comparison to find a response stored in a local database, a processor receives an audio input on a computing device. A processor transcribes the audio input to text. A processor compares the text to a set of user queries and commands in a local database of the computing device using a phonetic algorithm. A processor determines whether a user query or command of the set of user queries and commands meets a pre-defined threshold of similarity. Responsive to determining that the user query or command meets the pre-defined threshold of similarity, a processor identifies an intention of a set of intentions stored in the local database corresponding to the user query or command. A processor identifies a response of a set of responses in the local database corresponding to the intention. A processor outputs the response audibly.