Patent classifications
G10L2015/228
Electronic device and method for controlling the electronic device
Disclosed are an electronic device capable of efficiently performing speech recognition and natural language understanding and a method for controlling thereof. The electronic device includes: a microphone; a non-volatile memory configured to store virtual assistant model data comprising data that is classified according to a plurality of domains and data that is commonly used for the plurality of domains; a volatile memory; and a processor configured to: based on receiving, through the microphone, a trigger input to perform speech recognition for a user speech, initiate loading the virtual assistant model data from the non-volatile memory into the volatile memory, load, into the volatile memory, first data from among the data classified according to the plurality of domains and, while loading the first data into the volatile memory, load at least a part of the data commonly used for the plurality of domains into the volatile memory.
Agent device, agent device control method, and storage medium
An agent device includes an agent functional controller configured to provide a service including causing an output device to output a response of voice in response to an utterance of an occupant of a vehicle, and a controller configured to permit an operation of a power window of the vehicle when a speed of the vehicle is less than a first threshold value and limit the operation of the power window of the vehicle when the speed of the vehicle is equal to or greater than the first threshold value when the agent functional controller is activated.
ELECTRONIC DEVICE AND METHOD FOR PROCESSING USER INPUT
An electronic device and method are disclosed herein. The electronic device includes a communication circuit, a processor, and a memory. The processor implements the method, including: receiving, from each of one or more external devices receiving a voice signal of a user, via the communication circuit, a first probability value based on usage frequency, and a second probability value based on signal-to-noise (SNR) magnitude, calculating final probability values for each of the one or more external devices, based on respective first and second probability values of each of the one or more external devices, and selecting an external device from among the one or more external devices having a highest final probability value from among the calculated final probability values.
EARLY INVOCATION FOR CONTEXTUAL DATA PROCESSING
A speech processing system uses contextual data to determine the specific domains, subdomains, and applications appropriate for taking action in response to spoken commands and other utterances. The system can use signals and other contextual data associated with an utterance, such as location signals, content catalog data, data regarding historical usage patterns, data regarding content visually presented on a display screen of a computing device when an utterance was made, other data, or some combination thereof.
DYNAMIC CONTEXT-BASED ROUTING OF SPEECH PROCESSING
A speech processing system uses contextual data to determine the specific domains, subdomains, and applications appropriate for taking action in response to spoken commands and other utterances. The system can use signals and other contextual data associated with an utterance, such as location signals, content catalog data, data regarding historical usage patterns, data regarding content visually presented on a display screen of a computing device when an utterance was made, other data, or some combination thereof.
MULTI-TIER SPEECH PROCESSING AND CONTENT OPERATIONS
A multi-tier architecture is provided for processing user voice queries and making routing decisions for generating responses, including responses to book browsing requests and other content requests. When an utterance is associated with multiple applications in a given domain, the applications may be organized into a subdomain and a tier of routing decisions may be added to the inter-domain and intra-domain routing decision system. The system uses contextual signals to make subdomain routing decisions, including signals regarding content items that are already in a user's content catalog, consumption status of individual content items in the user's catalog, and the like
CONTEXTUAL SPELLING CORRECTION (CSC) FOR AUTOMATIC SPEECH RECOGNITION (ASR)
Novel solutions for speech recognition provide contextual spelling correction (CSC) for automatic speech recognition (ASR). Disclosed examples include receiving an audio stream; performing an ASR process on the audio stream to produce an ASR hypothesis; receiving a context list; and, based on at least the ASR hypothesis and the context list, performing spelling correction to produce an output text sequence. A contextual spelling correction (CSC) model is used on top of an ASR model, precluding the need for changing the original ASR model. This permits run-time user customization based on contextual data, even for large-size context lists. Some examples include filtering ASR hypotheses for the audio stream and, based on at least the ASR hypotheses filtering, determining whether to trigger spelling correction for the ASR hypothesis. Some examples include generating text to speech (TTS) audio using preprocessed transcriptions with context phrases to train the CSC model.
ELECTRONIC DEVICE MOUNTED IN VEHICLE, AND METHOD OF OPERATING THE SAME
A method and electronic device for a vehicle are disclosed herein. The electronic device is mounted in the vehicle and includes a display, a memory storing voice commands, and a processor. The processor implements the method, including: obtaining at least one of vehicle driving information, occupant information, or display output information, generating one or more short commands by shortening one or more of the voice commands, based on the obtained at least one of the vehicle driving information, the occupant information and the display output information, and controlling the display to display one or more voice command guidance user interface (UI) displaying the one or more short commands.
MULTI-DOMAIN INTENT HANDLING WITH CROSS-DOMAIN CONTEXTUAL SIGNALS
A multi-tier domain is provided for processing user voice queries and making routing decisions for generating responses, including for user voice queries that include multi-domain trigger words or phrases. When an utterance is recognized as different intents in different domains, a routing system for a domain may consider contextual signals, including those associated with other domains, to determine whether the domain is the proper one to handle the request. This determination can be performed with a statistical model specifically trained to make such determinations using the available contextual data.
Voice recognition grammar selection based on context
The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.