G10L2015/228

DIGITAL ASSISTANT REFERENCE RESOLUTION

Systems and processes for operating a digital assistant are provided. An example process for performing a task includes, at an electronic device having one or more processors and memory, receiving a spoken input including a request, receiving an image input including a plurality of objects, selecting a reference resolution module of a plurality of reference resolution modules based on the request and the image input, determining, with the selected reference resolution module, whether the request references a first object of the plurality of objects based on at least the spoken input, and in accordance with a determination that the request references the first object of the plurality of objects, determining a response to the request including information about the first object.

Task resumption in a natural understanding system

A speech-processing system may provide access to one or more skills via spoken commands and/or responses in the form of synthesized speech. The system may be capable of keeping one or more skills active in the background while a user interacts (e.g., provides inputs to and/or receives outputs from) with a skill running in the foreground. A background skill may receive some trigger data, and determine to request the system to return the background skill to the foreground to, for example, request a user input regarding an action previously requested by the user. In some cases, the user may invoke a background skill to continue a previous interaction. The system may return the background skill to the foreground. The resumed skill may continue a previous interaction to, for example, to query the user for instructions, provide an update or alert, or continue a previous output.

Electronic device configured to perform action using speech recognition function and method for providing notification related to action using same

A method includes receiving a designated event related to a second application while an execution screen of a first application is displayed on a display. The method also includes executing an artificial intelligent application in response to the designated event. The method further includes transmitting data related to the designated event to an external server, based on the executed artificial intelligent application. Additionally, the method includes sensing a user utterance related to the designated event for a designated period of time. The method also includes transmitting the user utterance to the external server. The method further includes receiving an action order for performing a function related to the user utterance from the external server. The method also includes executing the second application at least based on the received action order. The method further includes outputting a result of performing the function by using the second application.

Electronic apparatus and control method thereof
11580964 · 2023-02-14 · ·

An electronic apparatus is provided. The electronic apparatus includes a microphone, a memory configured to store a plurality of keyword recognition models, and a processor, which is coupled with the microphone and the memory, configured to control the electronic apparatus, wherein the processor is configured to selectively execute at least one keyword recognition model among the plurality of keyword recognition models based on operating state information of the electronic apparatus, based on a first user voice being input through the microphone, identify whether at least one keyword corresponding to the executed keyword recognition model is included in the first user voice by using the executed keyword recognition model, and based on at least one keyword identified as being included in the first user voice, perform an operation of the electronic apparatus corresponding to the at least one keyword.

Systems and methods for response selection in multi-party conversations with dynamic topic tracking

Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.

System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection

A method, an electronic device and computer readable medium for dialogue breakdown detection are provided. The method includes obtaining a verbal input from an audio sensor. The method also includes generating a reply to the verbal input. The method additionally includes identifying a local context from the verbal input and a global context from the verbal input, additional verbal inputs previously received by the audio sensor, and previous replies generated in response to the additional verbal inputs. The method further includes identifying a dialogue breakdown in response to determining that the reply does not correspond to the local context and the global context. In addition, the method includes generating sound corresponding to the reply through a speaker when the dialogue breakdown is not identified.

SYSTEM AND METHOD FOR CONTROLLING A PLURALITY OF DEVICES

Provided is a system and method for controlling a plurality of devices. The method includes generating a command script by processing a text string with at least one model, the text string including a natural language input by a user, modifying the command script based on contextual data, the command script including a configuration for at least one device, generating at least one command signal based on the command script, and controlling at least one device based on the at least one command signal.

Dynamically assigning multi-modality circumstantial data to assistant action requests for correlating with subsequent requests

Implementations set forth herein relate to an automated assistant that uses circumstantial condition data, generated based on circumstantial conditions of an input, to determine whether the input should affect an action been initialized by a particular user. The automated assistant can allow each user to manipulate their respective ongoing action without necessitating interruptions for soliciting explicit user authentication. For example, when an individual in a group of persons interacts with the automated assistant to initialize or affect a particular ongoing action, the automated assistant can generate data that correlates that individual to the particular ongoing action. The data can be generated using a variety of different input modalities, which can be dynamically selected based on changing circumstances of the individual. Therefore, different sets of input modalities can be processed each time a user provides an input for modifying an ongoing action and/or initializing another action.

Methods and systems for generating domain-specific text summarizations

Embodiments provide methods and systems for generating domain-specific text summary. Method performed by processor includes receiving request to generate text summary of textual content from user device of user and applying pre-trained language generation model over textual content for encoding textual content into word embedding vectors. Method includes predicting current word of the text summary, by iteratively performing: generating first probability distribution of first set of words using first decoder based on word embedding vectors, generating second probability distribution of second set of words using second decoder based on word embedding vectors, and ensembling first and second probability distributions using configurable weight parameter for determining current word. First probability distribution indicates selection probability of each word being selected as current word. Method includes providing custom reward score as feedback to second decoder based on custom reward model and modifying second probability distribution of words for text summary based on feedback.

Enabling speech interactions on web-based user interfaces

Web content with a speech interaction user interface capability is provided. Interactable elements of the web content are identified. For each of the interactable elements, one or more associated identifiers are determined and associated with a corresponding interactable element of the identified interactable elements in a data structure. A speech input is received from a user. Using the data structure, one of the interactable elements is matched to the received speech input. An action is automatically performed on the matched interactable element.