G10L2015/227

AUTOMATED ASSISTANTS THAT ACCOMMODATE MULTIPLE AGE GROUPS AND/OR VOCABULARY LEVELS

Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected age range and/or “vocabulary level” of a user who is engaging with the automated assistant. In various implementations, data indicative of a user's utterance may be used to estimate one or more of the user's age range and/or vocabulary level. The estimated age range/vocabulary level may be used to influence various aspects of a data processing pipeline employed by an automated assistant. In various implementations, aspects of the data processing pipeline that may be influenced by the user's age range/vocabulary level may include one or more of automated assistant invocation, speech-to-text (“STT”) processing, intent matching, intent resolution (or fulfillment), natural language generation, and/or text-to-speech (“TTS”) processing. In some implementations, one or more tolerance thresholds associated with one or more of these aspects, such as grammatical tolerances, vocabularic tolerances, etc., may be adjusted.

System and Method for Automated Digital Twin Behavior Modeling for Multimodal Conversations
20230099393 · 2023-03-30 · ·

Methods and systems for a multimodal conversational system are described. A method for interactive multimodal conversation includes parsing multimodal conversation from a physical human for content, recognizing and sensing one or more multimodal content from the parsed content, identifying verbal and non-verbal behavior of the physical human from the one or more multimodal content, generating learned patterns from the identified verbal and non-verbal behavior of the physical human, training a multimodal dialog manager with and using the learned patterns to provide responses to end-user multimodal conversations and queries, and training a virtual human clone of the physical human with interactive verbal and non-verbal behaviors of the physical human, wherein appropriate interactive verbal and non-verbal behaviors are provided by the virtual human clone when providing the responses to the end-user multimodal conversations and queries.

DYNAMIC ADAPTATION OF GRAPHICAL USER INTERFACE ELEMENTS BY AN AUTOMATED ASSISTANT AS A USER ITERATIVELY PROVIDES A SPOKEN UTTERANCE, OR SEQUENCE OF SPOKEN UTTERANCES
20230035713 · 2023-02-02 ·

Implementations described herein relate to an automated assistant that iteratively renders various GUI elements as a user iteratively provides a spoken utterance, or sequence of spoken utterances, corresponding to a request directed to the automated assistant. These various GUI elements can be dynamically adapted as the user iteratively provides the spoken utterance to assist the user with efficiently completing the request. In some implementations, a generic container graphical element associated with candidate intent(s) can be initially rendered at a display interface of a computing device and dynamically adapted with tailored container graphical elements as a particular intent is determined while the user iteratively provides the spoken utterance. In additional or alternative implementations, the tailored container graphical elements can include a current status of one or more settings associated with the computing device or additional computing device(s) such that the user can view the current status while completing the spoken utterance.

ELECTRONIC DEVICE AND METHOD FOR CONTROLLING THE SAME

An electronic device is provided. The electronic device includes a microphone to receive audio, a communicator, a memory configured to store computer-executable instructions, and a processor configured to execute the computer-executable instructions. The processor is configured to determine whether the received audio includes a predetermined trigger word; based on determining that the predetermined trigger word is included in the received audio; activate a speech recognition function of the electronic device; detect a movement of a user while the speech recognition function is activated; and based on detecting the movement of the user, transmit a control signal, to a second electronic device to activate a speech recognition function of the second electronic device.

CALL ROUTING BASED ON TECHNICAL SKILLS OF USERS

Aspects of call routing based on technical skills of users are discussed. Responses to a set of questions posed to a user are received to assess a technical skill level of the user. The user may be categorized in a category from among a plurality of categories based on the technical skill level and a decision may be provided to a route a call from the user to one of a human agent and a virtual agent based on the categorization.

METHOD AND SYSTEM PROVIDING SERVICE BASED ON USER VOICE
20220351730 · 2022-11-03 · ·

A method for providing a service based on a user's voice includes steps of extracting a voice of a first user, generating text information or voice waveform information based on the voice of the first user, analyzing a disposition of the first user based on the text information and the voice waveform information, and then selecting a second user corresponding to the disposition of the first user based on the analysis result, providing the first user with a conversation connection service with the second user and acquiring information on a change in an emotional state of the first user based on conversation information between the first user and the second user, and re-selecting the second user corresponding to the disposition of the first user based on the acquired information on the change in the emotional state of the first user.

CONFERENCE GALLERY VIEW INTELLIGENCE SYSTEM
20220353465 · 2022-11-03 ·

A conference gallery view intelligence system determines regions of interest for display within views of conferencing software based on input streams received from devices within a conference room during a conference. Conference participants are detected in the conference room based on an input video stream received from a video capture device. A direction of audio from the conference participants is determined based on an input audio stream received from a multi-directional audio capture device. A conversational context within the conference room is then determined based on the direction of the audio and locations of the one or more conference participants in the conference room. A region of interest to output within conferencing software is determined based on the conversational context, and the region of interest is output for display within a view of the conferencing software.

Natural human-computer interaction for virtual personal assistant systems
11609631 · 2023-03-21 · ·

Technologies for natural language interactions with virtual personal assistant systems include a computing device configured to capture audio input, distort the audio input to produce a number of distorted audio variations, and perform speech recognition on the audio input and the distorted audio variants. The computing device selects a result from a large number of potential speech recognition results based on contextual information. The computing device may measure a user's engagement level by using an eye tracking sensor to determine whether the user is visually focused on an avatar rendered by the virtual personal assistant. The avatar may be rendered in a disengaged state, a ready state, or an engaged state based on the user engagement level. The avatar may be rendered as semitransparent in the disengaged state, and the transparency may be reduced in the ready state or the engaged state. Other embodiments are described and claimed.

Providing audio information with a digital assistant

In an exemplary technique for providing audio information, an input is received, and audio information responsive to the received input is provided using a speaker. While providing the audio information, an external sound is detected. If it is determined that the external sound is a communication of a first type, then the provision of the audio information is stopped. If it is determined that the external sound is a communication of a second type, then the provision of the audio information continues.

Tele-health networking, interaction, and care matching tool and methods of use

An integrated tele-health networking, interaction, and care-matching tool for tele-health services may include a tele-health operations server in communication with patient, provider, matching, and interaction databases. A management engine running on the server may execute a database management module, a rule module, and a GUI module configured to display a GUI having a plurality of preconfigured screens at a plurality of user terminals. In operation, the management engine implements a referral-based care network that connects a patient with healthcare providers, non-medical-professional caregivers, advocates, and friends or family via the user terminals. The management engine also receives rating and personality assessment information from the patient and records interaction variables and emotional reaction information from care interactions and uses that information to respectively match the patient with an optimal healthcare provider(s) selected from the network and to determine an empathy meter score for the care interaction. Other embodiments are disclosed.