G10L25/00

Voice controlled system

A distributed voice controlled system has a primary assistant and at least one secondary assistant. The primary assistant has a housing to hold one or more microphones, one or more speakers, and various computing components. The secondary assistant is similar in structure, but is void of speakers. The voice controlled assistants perform transactions and other functions primarily based on verbal interactions with a user. The assistants within the system are coordinated and synchronized to perform acoustic echo cancellation, selection of a best audio input from among the assistants, and distributed processing.

Voice controlled system

A distributed voice controlled system has a primary assistant and at least one secondary assistant. The primary assistant has a housing to hold one or more microphones, one or more speakers, and various computing components. The secondary assistant is similar in structure, but is void of speakers. The voice controlled assistants perform transactions and other functions primarily based on verbal interactions with a user. The assistants within the system are coordinated and synchronized to perform acoustic echo cancellation, selection of a best audio input from among the assistants, and distributed processing.

Expediting interaction with a digital assistant by predicting user responses

A computer-implemented technique is described herein for expediting a user's interaction with a digital assistant. In one implementation, the technique involves receiving a system prompt generated by a digital assistant in response to an input command provided by a user via an input device. The technique then generates a predicted response based on linguistic content of the system prompt, together with contextual features pertaining to a circumstance in which the system prompt was issued. The predicted response corresponds to a prediction of how the user will respond to the system prompt. The technique then selects one or more dialogue actions from a plurality of dialogue actions, based on a confidence value associated with the predicted response. The technique expedites the user's interaction with the digital assistant by reducing the number of system prompts that the user is asked to respond to.

Virtual Assistant For Generating Personalized Responses Within A Communication Session
20200394366 · 2020-12-17 ·

Intelligent agents (IA) for automatically generating responses to content within a communication session (CS) are disclosed. An IA is trained to target the responses to a user and the user's context within the CS. An IA receives CS content that includes natural language expressions encoding users' conversations and determines content features based on natural language models. The content features indicate intended semantics of the expressions. The IA identifies likely-relevant content to the targeted user, to generate a response for. Identifying such content includes determining a relevance of the content based on content features, a context of the CS, a user-interest model, and a content-relevance model. Identifying the likely-relevant content to respond to is based on the determined relevance of the content and relevance thresholds. Various responses to the identified portions of the content are automatically generated and provided based on a natural language response-generation model targeted to the user.

User voice detection based on acoustic near field
10867619 · 2020-12-15 · ·

Processing sound received by a device can include receiving a first signal from a first microphone of the device and a second signal from a second microphone of the device, where the first and second microphones capture sounds from a sound field. A ratio between the acoustic pressure and the particle velocity of the sound field can be calculated. In response to the ratio exceeding a threshold, speech signal processing is performed on one or more of the microphone signals. Other aspects are also described and claimed.

Dialogue processing apparatus, a vehicle having same, and a dialogue processing method

A dialogue processing apparatus and method monitor an intensity of an acoustic signal that is input in real time and determine that speech recognition has started, when the intensity of the input acoustic signal is equal to or greater than a reference value, allowing a user to start speech recognition by an utterance without an additional trigger. A vehicle can include the apparatus and method. The apparatus includes: a monitor to compare an input signal level with a reference level in real time and to determine that speech is input when the input signal level is greater than the reference level; a speech recognizer to output a text utterance by performing speech recognition on the input signal when it is determined that the speech is input; a natural language processor to extract a domain and a keyword based on the utterance; and a dialogue manager to determine whether a previous context is maintained based on the domain and the keyword.

Domain specific endpointing

An automatic speech recognition (ASR) system detects an endpoint of an utterance based on a domain of the utterance. The ASR system processes a first portion of the utterance to determine the domain and then determines an endpoint of the remainder of the utterance depending on the domain.

Voice input processing method and electronic device for supporting the same

An electronic device is provided. The electronic device includes a microphone, a communication circuitry, an indicator configured to provide at least one visual indication, and a processor configured to be electrically connected with the microphone, the communication circuitry, and the indicator, and a memory. The memory stores instructions, when executed, cause the processor to receive a first voice input through the microphone, perform a first voice recognition for the first voice input, if a first specified word for waking up the electronic device is included in a result of the first voice recognition, display a first visual indication through the indicator, receive a second voice input through the microphone, perform a second voice recognition for the second voice input, and if a second specified word corresponding to the first visual indication is included in a result of the second voice recognition, wake up the electronic device.

Lighting devices having wireless communication and built-in artificial intelligence bot
10834562 · 2020-11-10 ·

Methods and systems are described. One system is a light switch having a housing configured for a circuit for a switch and a wireless chip for wireless communication with an end-node. The light switch further includes a processor chip and memory for executing instructions and interfacing with the wireless chip and the switch. A microphone is integrated with the housing and a speaker is integrated with the housing. The instructions are processed by the processor chip in response to voice commands received by the microphone, and the processing of instructions is further configured to send data to the end node and receive data from the end node. The data received from the end node is used to provide an audible voice reply to one or more of the voice commands received by the microphone of the housing of the light switch. The voice commands are handled by the end node for artificial intelligence processing that includes accessing one or more external data sources and applying one or more learning algorithms for outputting the audible voice reply via the speaker of the light switch.

Method and system for ordering content using a voice menu system
10827066 · 2020-11-03 · ·

A method and system for ordering content includes a voice menu system and a phone device communicating a phone signal to the voice menu system. The voice menu system determines the phone number associated with the phone device through the phone signal and generates a voice prompt for recording a content selection from the voice menu system. The phone device selects a recording content option. The voice menu system generates prompts for determining a content title. The phone device selects a content title by communicating a selection signal to the voice menu system. The voice menu system enables a content recording at a recording device in response to the selection signal.