G10L15/16

Method and apparatus for evaluating user intention understanding satisfaction, electronic device and storage medium

A method and apparatus for generating a user intention understanding satisfaction evaluation model, a method and apparatus for evaluating a user intention understanding satisfaction, an electronic device and a storage medium are provided, relating to intelligent voice recognition and knowledge graphs. The method for generating a user intention understanding satisfaction evaluation model is: acquiring a plurality of sets of intention understanding data, at least one set of which comprises a plurality of sequences corresponding to multi-round behaviors of an intelligent device in multi-round man-machine interactions; and learning the plurality of sets of intention understanding data through a first machine learning model, to obtain the user intention understanding satisfaction evaluation model after the learning, wherein the user intention understanding satisfaction evaluation model is configured to evaluate user intention understanding satisfactions of the intelligent device in the multi-round man-machine interactions according to the plurality of sequences corresponding to the multi-round man-machine interactions.

Method and apparatus for evaluating user intention understanding satisfaction, electronic device and storage medium

A method and apparatus for generating a user intention understanding satisfaction evaluation model, a method and apparatus for evaluating a user intention understanding satisfaction, an electronic device and a storage medium are provided, relating to intelligent voice recognition and knowledge graphs. The method for generating a user intention understanding satisfaction evaluation model is: acquiring a plurality of sets of intention understanding data, at least one set of which comprises a plurality of sequences corresponding to multi-round behaviors of an intelligent device in multi-round man-machine interactions; and learning the plurality of sets of intention understanding data through a first machine learning model, to obtain the user intention understanding satisfaction evaluation model after the learning, wherein the user intention understanding satisfaction evaluation model is configured to evaluate user intention understanding satisfactions of the intelligent device in the multi-round man-machine interactions according to the plurality of sequences corresponding to the multi-round man-machine interactions.

Learning apparatus, generation apparatus, classification apparatus, learning method, and non-transitory computer readable storage medium
11580362 · 2023-02-14 · ·

According to one aspect of an embodiment a learning apparatus includes a first acquiring unit that acquires first output information that is output by an output layer when predetermined input information is input to a model that includes an input layer, a plurality of intermediate layers, and the output layer. The learning apparatus includes a second acquiring unit that acquires intermediate output information that is based on pieces of intermediate information that are output by the plurality of intermediate layers when the input information is input to the model. The learning apparatus includes a learning unit that learns the model based on the first output information and the intermediate output information.

Systems and methods for response selection in multi-party conversations with dynamic topic tracking

Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.

Systems and methods for response selection in multi-party conversations with dynamic topic tracking

Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.

Emitting word timings with end-to-end models

A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Robot and method for recognizing wake-up word thereof
11577379 · 2023-02-14 · ·

Provided is a robot including a microphone configured to acquire a sound signal corresponding to a sound generated near the robot, a camera, an output interface including at least one of a display configured to output a wake-up screen or a speaker configured to output a wake-up sound when the robot wakes up, and a processor configured to recognize whether the acquired sound includes a voice of a person, activate the camera when the sound includes a voice of a person, recognize whether a person is present in an image acquired by the activated camera, set a wake-up word recognition sensitivity based on a recognition result as to whether a person is present, and recognize whether a wake-up word is included voice data of a user acquired through the microphone based on the set wake-up word recognition sensitivity.

Machine learning for interpretation of subvocalizations

Provided is an in-ear device and associated computational support system that leverages machine learning to interpret sensor data descriptive of one or more in-ear phenomena during subvocalization by the user. An electronic device can receive sensor data generated by at least one sensor at least partially positioned within an ear of a user, wherein the sensor data was generated by the at least one sensor concurrently with the user subvocalizing a subvocalized utterance. The electronic device can then process the sensor data with a machine-learned subvocalization interpretation model to generate an interpretation of the subvocalized utterance as an output of the machine-learned subvocalization interpretation model.

Configurable conversation engine for executing customizable chatbots

A conversation engine performs conversations with users using chatbots customized for performing a set of tasks that can be performed using an online system. The conversation engine loads a chatbot configuration that specifies the behavior of a chatbot including the tasks that can be performed by the chatbot, the types of entities relevant to each task, and so on. The conversation may be voice based and use natural language. The conversation engine may load different chatbot configurations to implement different chatbots. The conversation engine receives a conversation engine configuration that specifies the behavior of the conversation engine across chatbots. The system may be a multi-tenant system that allows customization of the chatbots for each tenant.

Configurable conversation engine for executing customizable chatbots

A conversation engine performs conversations with users using chatbots customized for performing a set of tasks that can be performed using an online system. The conversation engine loads a chatbot configuration that specifies the behavior of a chatbot including the tasks that can be performed by the chatbot, the types of entities relevant to each task, and so on. The conversation may be voice based and use natural language. The conversation engine may load different chatbot configurations to implement different chatbots. The conversation engine receives a conversation engine configuration that specifies the behavior of the conversation engine across chatbots. The system may be a multi-tenant system that allows customization of the chatbots for each tenant.