G10L2015/228

Device-directed utterance detection

A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

Platform selection for performing requested actions in audio-based computing environments
11694688 · 2023-07-04 · ·

Systems and methods of selecting digital platforms for execution of voice-based commands are provided. The system receives an application that performs an action associated with a service via digital platforms. The system debugs the application to validate parameters of the action on at least two platforms of the digital platforms. The system receives data packets comprising an input audio signal detected by a sensor of a client device, and parses the input audio signal to identify the action and the service. The system selects a first platform from the digital platforms to perform the action. The system initiates, responsive to selection of the first platform, an interactive data exchange to populate parameters of an action data structure corresponding to the action. The system executes the action via the selected platform using the action data structure.

Intelligent automated order-based customer dialogue system
11694039 · 2023-07-04 · ·

Based on a detection that a customer has arrived at an enterprise location to pick up a previously-placed order, an intelligent automated customer dialogue system generates an interface via which an intelligent customer dialogue application dialogues with the customer. The application generates and initially offers, at the interface using natural language, content which is contextual to one or more items of the order, e.g., by using a specially trained intelligent dialogue machine learning model. The application may intelligently respond to the customer's natural language responses and/or requests to refine, augment, or redirect subsequently-offered content and/or dialogue, e.g., by using the model. Offered content (e.g., product information, services, coupons, suggestions, recommendations, etc.) generally provides value-add to the customer as well as maintains customer engagement. The system may be implemented at least partially by using a chatbot upon curbside pick-up, for example, as well as through other electronic customer facing channels.

Triggering voice control disambiguation

In various embodiments, a voice command is associated with a plurality of processing steps to be performed. The plurality of processing steps may include analysis of audio data using automatic speech recognition, generating and selecting a search query from the utterance text, and conducting a search of database of items using a search query. The plurality of processing steps may include additional or different steps, depending on the type of the request. In performing one or more of these processing steps, an error or ambiguity may be detected. An error or ambiguity may either halt the processing step or create more than one path of actions. A model may be used to determine if and how to request additional user input to attempt to resolve the error or ambiguity. The voice-enabled device or a second client device is then causing to output a request for the additional user input.

Context configurable keywords

A system incorporating configurable keywords. The system can detect a keyword in audio data and execute one function for the keyword if a first application is operating, but a second function for the keyword if a second function is operating. Each keyword may be associated with multiple different functions. If a keyword is recognized during keyword detection, a function associated with that keyword is determined based on another application running on the system. Thus detection of a same keyword may result in a different function based on system context.

Techniques for language independent wake-up word detection
11545146 · 2023-01-03 · ·

A user device configured to perform wake-up word detection in a target language. The user device comprises at least one microphone (430) configured to obtain acoustic information from the environment of the user device, at least one computer readable medium (435) storing an acoustic model (150) trained on a corpus of training data (105) in a source language different than the target language, and storing a first sequence of speech units obtained by providing acoustic features (110) derived from audio comprising the user speaking a wake-up word in the target language to the acoustic model (150), and at least one processor (415,425) coupled to the at least one computer readable medium (435) and programmed to perform receiving, from the at least one microphone (430), acoustic input from the user speaking in the target language while the user device is operating in a low-power mode, applying acoustic features derived from the acoustic input to the acoustic model (150) to obtain a second sequence of speech units corresponding to the acoustic input, determining if the user spoke the wake-up word at least in part by comparing the first sequence of speech units to the second sequence of speech units, and exiting the low-power mode if it is determined that the user spoke the wake-up word.

Dynamically delaying execution of automated assistant actions and/or background application requests

Implementations set forth herein allow a user to access a first application in a foreground of a graphical interface, and simultaneously employ an automated assistant to respond to notifications arising from a second application. The user can provide an input, such as a spoken utterance, while viewing the first application in the foreground in order to respond to notifications from the second application without performing certain intervening steps that can arise under certain circumstances. Such intervening steps can include providing a user confirmation, which can be bypassed, and/or time-limited according to a timer, which can be displayed in response to the user providing a responsive input directed at the notification. A period for the timer can be set according to one or more characteristics that are associated with the notification, the user, and/or any other information that can be associated with the user receiving the notification.

Information processing device, information processing system, and information processing method, and program
11545153 · 2023-01-03 · ·

Provided is a device, a method that allow a remote terminal to perform a process on the basis of a local-terminal-side user utterance. There are a local terminal and a remote terminal. The local terminal performs a process of a semantic analysis of a user utterance input into the local terminal. On the basis of a result of the semantic analysis, the local terminal determines whether or not the user utterance is a request to the remote terminal for a process. Moreover, in a case where the user utterance is a request to the remote terminal for a process, the local terminal transmits the result of the semantic analysis by a semantic-analysis part to the remote terminal. The remote terminal receives the result of the semantic analysis of the local-terminal-side user utterance, and performs a process based on the received result of the semantic analysis of the local-terminal-side user utterance.

Artificial intelligence device
11544602 · 2023-01-03 · ·

An artificial intelligence device according to an embodiment of the present disclosure may receive voice data corresponding to viewing information and a search command from a display device, convert the received voice data into text data, obtain a first query indicating intention of the converted text data, convert the first query into a second query based on the viewing information, obtain a search result corresponding to the converted second, and transmit the obtained search result to the display device.

Using context information with end-to-end models for speech recognition

A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.