G06F2216/13

Processing Multimodal User Input for Assistant Systems
20230222605 · 2023-07-13 ·

In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.

Auto-completion for gesture-input in assistant systems

In one embodiment, a method includes receiving an initial input in a first modality from a first user from a client system associated with the first user, determining one or more intents corresponding to the initial input by an intent-understanding module, generating one or more candidate continuation-inputs based on the one or more intents, where the one or more candidate continuation-inputs are in one or more candidate modalities, respectively, and wherein the candidate modalities are different from the first modality, and sending instructions for presenting one or more suggested inputs corresponding to one or more of the candidate continuation-inputs to the client system.

Assisting Users with Efficient Information Sharing among Social Connections
20220374460 · 2022-11-24 ·

In one embodiment, a method includes receiving a user input from a first user at the first client system, determining that the user input is a sharing request to share content, determining multiple second users the sharing request is directed to, determining, for each second user, modalities associated with the respective second user based on the content, a user profile associated with the respective second user, and modalities supported by a second client system the respective second user is currently engaged with, the respective second user being associated with two or more second client systems, and sending, to one or more second client systems currently associated with the second users, instructions for accessing the content based on the determined modalities for each second user.

Intent Identification for Agent Matching by Assistant Systems

In one embodiment, a method includes receiving a user request from a first user at a client system, wherein the user request is associated with a semantic-intent, identifying dialog-intents associated with the user request by the client system based on the semantic-intent and context information associated with the user request, wherein each dialog-intent is a sub-intent of the semantic-intent; determining agents for executing tasks associated with the dialog-intents by the client system, and presenting information returned from the agents responsive to executing the tasks at the client system.

Generating Multi-Perspective Responses by Assistant Systems

In one embodiment, a method includes receiving a user query inputted on a head-mounted device from the head-mounted device, wherein the user query corresponds to multiple dialog-intents, executing multiple tasks corresponding to the multiple dialog-intents, generating a multi-perspective response by a stitching model based on two or more of execution results of the multiple tasks, wherein the stitching model combines the two or more of the execution results based on natural language processing, and wherein the multi-perspective response comprises a natural-language response combining the two or more execution results, and sending instructions to the head-mounted device for presenting the multi-perspective response on the head-mounted device.

Processing multimodal user input for assistant systems

In one embodiment, a method includes receiving a user input based on a plurality of modalities at the client system, wherein at least one of the modalities of the user input is a visual modality, determining one or more subjects and one or more attributes associated with the one or more subjects, respectively, based on the visual modality of the user input, resolving one or more entities corresponding to the one or more subjects based on the determined one or more attributes, and presenting a communication content at the client system responsive to the user input, wherein the communication content comprises information associated with executing results of one or more tasks corresponding to the one or more resolved entities.

Predictive injection of conversation fillers for assistant systems

In one embodiment, a method includes, by one or more computing systems, receiving, from a client system associated with a first user, a first user input from the first user, identifying one or more entities referenced by the first user input, determining a classification of the first user input based on a machine-learning classifier model, generating several candidate conversational fillers based on the classification of the first user input and the one or more identified entities, wherein each candidate conversational filler references at least one of the one or more identified entities, ranking the candidate conversational fillers based on a relevancy of the candidate conversational filler to the first user input and a decay model hysteresis, and sending instructions for presenting a top-ranked candidate conversational filler as an initial response to the first user.

Assisting users with efficient information sharing among social connections

In one embodiment, a method includes receiving a sharing request to share content generated during a current dialog session from a client system associated with a first user, identifying one or more content objects associated with the sharing request based on a natural-language understanding module, wherein the one or more content objects were previously generated during the current dialog session, determining one or more second users the sharing request is directed to based on a user profile associated with first user, and sending instructions for accessing one or more of the identified content objects to one or more client systems associated with the one or more second users.

Method and apparatus for pre-fetching place page data for subsequent display on a mobile computing device
09813521 · 2017-11-07 · ·

A computer-implemented method and system for pre-fetching place page data from a from a remote mapping system for display on a client computing device is disclosed. User preference data collected from various data sources including applications executing on the client device, online or local user profiles, and other sources may be analyzed to generate a request for place page data from the remote mapping system. The user preference data may indicate a map feature such as a place of business, park, or historic landmark having the characteristics of both a user's preferred geographic location and the user's personal interests. For example, where the user indicates a geographic preference for “Boston” and a personal interest for “home brewing” the system and method may request place page data for all home brewing or craft beer-related map features near Boston.

Personalized gesture recognition for user interaction with assistant systems

In one embodiment, a method includes receiving a user request from a first user from a client system associated with a first user, wherein the user request comprise a gesture-input from the first user and a speech-input from the first user, determining an intent corresponding to the user request based on the gesture-input by a personalized gesture-classification model associated with the first user, executing one or more tasks based on the determined intent and the speech-input, and sending instructions for presenting execution results of the one or more tasks to the client system responsive the user request.