Patent classifications
G10L15/32
MULTI-TIER SPEECH PROCESSING AND CONTENT OPERATIONS
A multi-tier architecture is provided for processing user voice queries and making routing decisions for generating responses, including responses to book browsing requests and other content requests. When an utterance is associated with multiple applications in a given domain, the applications may be organized into a subdomain and a tier of routing decisions may be added to the inter-domain and intra-domain routing decision system. The system uses contextual signals to make subdomain routing decisions, including signals regarding content items that are already in a user's content catalog, consumption status of individual content items in the user's catalog, and the like
System and method for using multimedia content as search queries
There is provided a method for searching a plurality of information sources using a multimedia element, the method may include receiving at least one multimedia element; generating, by a signature generator, for the at least one multimedia element at least one signature that is unidirectional, and yields compression; generating at least one textual search query using the at least one signature; wherein the generating of the textual search query comprises: (a) searching for at least one matching stored signature that matches one or more of the at least one signature; and (b) using a mapping between stored signatures and textual search queries, selecting at least one textual search query mapped to at least one matching stored signature; searching the plurality of information sources using the at least one textual search query; and causing a display of search results retrieved from the plurality of information sources.
System and method for using multimedia content as search queries
There is provided a method for searching a plurality of information sources using a multimedia element, the method may include receiving at least one multimedia element; generating, by a signature generator, for the at least one multimedia element at least one signature that is unidirectional, and yields compression; generating at least one textual search query using the at least one signature; wherein the generating of the textual search query comprises: (a) searching for at least one matching stored signature that matches one or more of the at least one signature; and (b) using a mapping between stored signatures and textual search queries, selecting at least one textual search query mapped to at least one matching stored signature; searching the plurality of information sources using the at least one textual search query; and causing a display of search results retrieved from the plurality of information sources.
Task redirection by a voice assistant
Disclosed are various aspects of postponing or migrating tasks from a first assistant device to another assistant device. In some examples, an assistant device can facilitate task completion. Tasks can be recommended for postponement based upon the complexity of the task, a historical user profile, or the location of the assistant device.
Task redirection by a voice assistant
Disclosed are various aspects of postponing or migrating tasks from a first assistant device to another assistant device. In some examples, an assistant device can facilitate task completion. Tasks can be recommended for postponement based upon the complexity of the task, a historical user profile, or the location of the assistant device.
COMPUTING RESOURCE-SAVING VOICE ASSISTANT
A voice assistant includes an electronic processor unit connected to at least one microphone and to remote equipment. The electronic processor unit includes both single detection modules for detecting respective single keywords from an audio signal supplied by the microphone, and also a control unit connected to the single detection modules to select predetermined actions as a function of the detected keywords and to perform those actions. The control module is also arranged to detect whether actions are doable and to activate or deactivate the single detection modules as a function of the doability of the actions.
COMPUTING RESOURCE-SAVING VOICE ASSISTANT
A voice assistant includes an electronic processor unit connected to at least one microphone and to remote equipment. The electronic processor unit includes both single detection modules for detecting respective single keywords from an audio signal supplied by the microphone, and also a control unit connected to the single detection modules to select predetermined actions as a function of the detected keywords and to perform those actions. The control module is also arranged to detect whether actions are doable and to activate or deactivate the single detection modules as a function of the doability of the actions.
ELECTRONIC APPARATUS AND METHOD OF CONTROLLING THE SAME
An electronic device includes a processor configured to: receive a user voice input, identify a state of the electronic device corresponding to at least one item related to the electronic device, select a voice recognition engine corresponding to the identified state, from among a plurality of voice recognition engines, based on correlations between the plurality of voice recognition engines and a plurality of states, and perform an operation corresponding to the user voice input based on the selected voice recognition engine.
VOCAL COMMAND RECOGNITION
A method to detect a vocal command, the method including: analyzing audio data received from a transducer configured to convert audio into an electric signal and analyzing the data using a first neural network. The method also includes detecting a keyword from the audio data using the first neural network on the edge device, the first neural network being trained to recognize the keyword. The method further includes activating a second neural network after the keyword is identified by the first neural network and analyzing the audio data using the second neural network, the second neural network being trained to recognize a set of vocal commands. The method to detect a vocal command may also include detecting the vocal command word using the second neural network.
HYBRID VOICE COMMAND PROCESSING
Digitized audio command is decoded to generate audio features. An in-domain confidence score is calculated for a model trained by a limited set of peripheral device commands. An out-domain confidence score is calculated for a model trained without the peripheral device commands. The best score determines whether to process the audio locally or at a remote server. In some embodiments, a likelihood ratio (LR) is calculated of the in-domain and out-domain confidence scores. Based on the likelihood ratio, a locally decoded audio command is performed, or the audio features are sent to a remote server for processing to determine the audio command.