G10L2015/226

IN-VEHICLE USER POSITIONING METHOD, IN-VEHICLE INTERACTION METHOD, VEHICLE-MOUNTED APPARATUS, AND VEHICLE
20230038039 · 2023-02-09 ·

This application provides an in-vehicle user positioning method, an in-vehicle interaction method, a vehicle-mounted apparatus, and a vehicle. In an example, the in-vehicle user positioning method includes: obtaining a sound signal collected by an in-vehicle microphone; in response to that a first voice command is recognized from the sound signal, determining a first user who sends the first voice command; and determining an in-vehicle location of the first user based on a mapping relationship between an in-vehicle user and an in-vehicle location.

SECOND TRIGGER PHRASE USE FOR DIGITAL ASSISTANT BASED ON NAME OF PERSON AND/OR TOPIC OF DISCUSSION

In one aspect, a device may include at least one processor and storage accessible to the at least one processor. The storage includes instructions executable by the at least one processor to correlate a first trigger phrase for a digital assistant to a name of a person within a proximity to the device and/or a topic of discussion. Based on the correlation, the instructions are executable to set the digital assistant to decline to monitor for utterance of the first trigger phrase and instead monitor for utterance of a second trigger phrase that is different from the first trigger phrase.

Virtual Conversational Agent
20230026945 · 2023-01-26 ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating and operating voice conversing virtual agents with pre-modeled and inherited human behavior across use cases and domains. One of the methods includes: using a first non-domain specific neural network based model to predict a non-domain specific conversational situation, the first neural network based model trained with labelled parts of conversations from more than one domain; forwarding the non-domain specific conversational situation to a second domain specific neural network based model; using the second domain specific neural network based model to predict a conversational situation and to provide a system intent, the second domain specific neural network based model trained with labelled parts of conversation from a specified domain; and generating a response based at least in part on the predicted conversational situation and system intent.

Always listening and active voice assistant and vehicle operation
11704533 · 2023-07-18 · ·

A vehicle includes a controller configured to select one of a group of topics for generating an answer to a question embedded within the group based on an operating parameter of the vehicle and a syntax of a phrase. The selection is responsive to input originating from utterances including a preceding topic and a following topic having a moniker therebetween that is associated with only one of the topics through the syntax. The vehicle may operate an interface to output the answer.

Language models using domain-specific model components
11557289 · 2023-01-17 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

System and method to interpret natural language requests and handle natural language responses in conversation

A system and method to interpret natural language requests and handle natural language responses in conversation is disclosed. The system includes an intent creation subsystem to receive one or more predefined intents to create one or more corresponding intent databases; a natural language message handling subsystem to receive a plurality of natural language messages from a user to identify one or more intents, to match one or more identified intents associated with the plurality of received natural language messages with the one or more predefined intents, handle the one or more identified intents by using a first message handling scheme when a similar match is found and a second message handling scheme in case of a dissimilar match; a natural language response handling subsystem to extract information from plurality of received natural language messages, to rectify the plurality of received natural language messages to handle a structured natural language response.

Digital assistant and a corresponding method for voice-based interactive communication based on detected user gaze indicating attention

Method for voice-based interactive communication using a digital assistant, wherein the method comprises, an attention detection step, in which the digital assistant detects a user attention and as a result is set into a listening mode; a speaker detection step, in which the digital assistant detects the user as a current speaker; a speech sound detection step, in which the digital assistant detects and records speech uttered by the current speaker, which speech sound detection step further comprises a lip movement detection step, in which the digital assistant detects a lip movement of the current speaker; a speech analysis step, in which the digital assistant parses said recorded speech and extracts speech-based verbal informational content from said recorded speech; and a subsequent response step, in which the digital assistant provides feed-back to the user based on said recorded speech.

NETWORKED MICROPHONE DEVICES, SYSTEMS, & METHODS OF LOCALIZED ARBITRATION
20230215424 · 2023-07-06 ·

A first playback device is configured to perform functions comprising: detecting sound, identifying a wake word based on the sound as detected by the first device, receiving an indication that a second playback device has also detected the sound and identified the wake word based on the sound as detected by the second device, after receiving the indication, evaluating which of the first and second devices is to extract sound data representing the sound and thereby determining that the extraction of the sound data is to be performed by the second device over the first device, in response to the determining, foregoing extraction of the sound data, receiving VAS response data that is indicative of a given VAS response corresponding to a given voice input identified in the sound data extracted by the second device, and based on the VAS response data, output the given VAS response.

Method of performing function of electronic device and electronic device using same

An electronic device includes: a camera; a microphone; a display; a memory; and a processor configured to receive an input for activating an intelligent agent service from a user while at least one application is executed, identify context information of the electronic device, control to acquire image information of the user through the camera, based on the identified context information, detect movement of a user's lips included in the acquired image information to recognize a speech of the user, and perform a function corresponding to the recognized speech.

Electronic device and method for controlling the electronic device thereof

An electronic device and a method for controlling the same are provided. The electronic device includes a microphone, a memory storing at least one instruction and dialogue history information, and a processor configured to be connected to the microphone and the memory and control the electronic device, in which the processor, by executing the at least one instruction, is configured to, based on a user's voice being input via the microphone, obtain response information for generating a response sentence to the user's voice, select at least one template phrase for generating the response sentence to the user's voice based on the stored dialogue history information, generate the response sentence using the response information and the at least one template phrase, and output the generated response sentence.