G10L15/1815

Virtual personal agent leveraging natural language processing and machine learning

Providing inter-virtual agent communication between communication devices owned by different users is provided. A first communication channel and a second communication channel are established with a remote data processing system. A virtual agent-to-virtual agent handshake is performed during establishment of the first communication channel. Virtual agent commands are exchanged with a remote virtual agent located on the remote data processing system via the first communication channel. An action corresponding to a virtual agent command received from the remote virtual agent located on the remote data processing system is performed while a human conversation is conducted via the second communication channel.

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
20230012053 · 2023-01-12 ·

An information processing device according to the present disclosure includes: an acquisition unit that acquires outline information indicating an outline of a user who makes a body motion; and a specification unit that specifies, among body parts, a main part corresponding to the body motion and a related part, which is to be a target of correction processing of motion information corresponding to the body motion, on the basis of the outline information acquired by the acquisition unit.

DETECTING AN IN-FIELD EVENT
20230010941 · 2023-01-12 ·

Examples are disclosed that relate to methods, computing devices, and systems for detecting an in-field event. One example provides a method comprising, during a training phase, receiving one or more training data streams. The training data stream(s) include an audio input comprising a semantic indicator. The audio input is processed to recognize the semantic indicator. A subset of data is selected and used to train a machine learning model to detect the in-field event, and the method further comprises outputting the trained machine learning model. During a run-time phase, the method comprises receiving one or more run-time input data streams. The trained machine learning model is used to detect a second instance of the in-field event in the one or more run-time input data streams. The method further comprises outputting an indication of the second instance of the in-field event.

Natural language processing routing

Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.

Multimodal sentiment classification

Sentiment classification can be implemented by an entity-level multimodal sentiment classification neural network. The neural network can include left, right, and target entity subnetworks. The neural network can further include an image network that generates representation data that is combined and weighted with data output by the left, right, and target entity subnetworks to output a sentiment classification for an entity included in a network post.

Method and apparatus for predicting customer satisfaction from a conversation
11553085 · 2023-01-10 · ·

A method and an apparatus for predicting satisfaction of a customer pursuant to a call between the customer and an agent, in which the method comprises receiving a transcribed text of the call, dividing the transcribed text into a plurality of phases of a conversation, extracting at least one call feature for each of the plurality of phases, receiving call metadata, extracting metadata features from the call metadata, combining the call features and the metadata features, and generating an output, using a trained machine learning (ML) model, based on the combined features, indicating whether the customer is satisfied or not. The ML model is trained to generate an output indicating whether the customer is satisfied or not, based on an input of the combined features.

Systems and methods to automatically perform actions based on media content

Systems and methods are provided for automatically responding to network connectivity issues in a media stream. One example method includes transmitting, from a first computing device, a media stream to one or more secondary computing devices. A network connectivity issue between the first computing device and one or more of the secondary computing devices is detected. If a network connectivity issue is detected, a notification is transmitted to one or more of the secondary computing devices.

Determining topics and action items from conversations

Embodiments are directed to organizing conversation information. Two or more machine learning (ML) models and a plurality of sentences provided from a conversation may be employed to generate insight scores for each sentence such that each insight score correlates to a probability that its sentence includes one or more of an action or a question. In response to one or more sentences having insight scores that exceed a threshold value an information score and a definiteness score may be determined for the one or more sentences. And one or more insights associated with the conversation may be generated based on the one or more sentences. A report may be generated that associates the one or more insights with one or more portions of the conversation that include the one or more sentences that are associated with the insights.

Systems and methods for parsing multiple intents in natural language speech

A system for parsing separate intents in natural language speech configured to (i) receive, from the user computer device, a verbal statement of the user including a plurality of words; (ii) translate the verbal statement into text; (iii) label each of the plurality of words in the verbal statement; (iv) detect one or more potential splits in the verbal statement; (v) divide the verbal statement into a plurality of intents based upon the one or more potential splits; and (vi) generate a response based upon the plurality of intents.

STATE MACHINE BASED CONTEXT-SENSITIVE SYSTEM FOR MANAGING MULTI-ROUND DIALOG
20180004729 · 2018-01-04 ·

The present invention discloses a state machine based context-sensitive multi-round dialog management system, comprising: an input module, for receiving multi-modal input information from a user; an intention identification engine module, for identifying intention information in the multi-modal input information; an intention module, for bringing multiple intention information identified by the intention identification engine module into one-to-one correspondence with multiple intention sub-modules at back ends; a state machine module, comprising a plurality of state machines for managing a relevant context in the dialog management system and providing support for an output result; an instruction parsing engine module, comprising a plurality of instruction parsing engine sub-modules for parsing corresponding intention information and acquiring the parsed multiple intention information; and an output module, for acquiring policy information according to the results from the parsing engine module and the intention identification module, and transmitting the policy information to the state machine module.