G10L15/16

EMOTIONALLY-AWARE CONVERSATIONAL RESPONSE GENERATION METHOD AND APPARATUS

Techniques for generating conversational responses for a conversational user interface are disclosed. In one embodiment, a method is disclosed comprising obtaining user input from a user via a conversational user interface, using the user input to obtain a user emotion and a user intent, obtaining candidate probabilities for a fragment of a response to the user input using the obtained user emotion, the obtained user intent and the user input, generating the response to the user input using the candidate probabilities obtained for the fragment to select a candidate for the fragment of the response, and communicating the response to the user via the conversational user interface.

CONVERSATION FACILITATING METHOD AND ELECTRONIC DEVICE USING THE SAME
20230041272 · 2023-02-09 ·

A method for facilitating a multiparty conversation is disclosed. An electronic device using the method may facilitate a multiparty conversation by identifying participants of a conversation, localizing relative positions of the participants, detecting speeches of the conversation, matching one of the participants to each of the detected speeches according to the relative positions of the participants, counting participations of the matched participant in the conversation, identifying a passive subject from all the participants according to the participations of all the participants in the conversation, finding a topic of the conversation between the participants, and engaging the passive subject by addressing the passive subject and speaking a sentence related to the topic.

CONVERSATION FACILITATING METHOD AND ELECTRONIC DEVICE USING THE SAME
20230041272 · 2023-02-09 ·

A method for facilitating a multiparty conversation is disclosed. An electronic device using the method may facilitate a multiparty conversation by identifying participants of a conversation, localizing relative positions of the participants, detecting speeches of the conversation, matching one of the participants to each of the detected speeches according to the relative positions of the participants, counting participations of the matched participant in the conversation, identifying a passive subject from all the participants according to the participations of all the participants in the conversation, finding a topic of the conversation between the participants, and engaging the passive subject by addressing the passive subject and speaking a sentence related to the topic.

Satisfaction estimation model learning apparatus, satisfaction estimating apparatus, satisfaction estimation model learning method, satisfaction estimation method, and program

Estimation accuracies of a conversation satisfaction and a speech satisfaction are improved. A learning data storage unit (10) stores learning data including a conversation voice containing a conversation including a plurality of speeches, a correct answer value of a conversation satisfaction for the conversation, and a correct answer value of a speech satisfaction for each speech included in the conversation. A model learning unit (13) learns a satisfaction estimation model using a feature quantity of each speech extracted from the conversation voice, the correct answer value of the speech satisfaction, and the correct answer value of the conversation satisfaction, the satisfaction estimation model configured by connecting a speech satisfaction estimation model part that receives a feature quantity of each speech and estimates the speech satisfaction of each speech with a conversation satisfaction estimation model part that receives at least the speech satisfaction of each speech and estimates the conversation satisfaction.

Confusion network distributed representation generation apparatus, confusion network classification apparatus, confusion network distributed representation generation method, confusion network classification method and program

There is provided a technique for transforming a confusion network to a representation that can be used as an input for machine learning. A confusion network distributed representation sequence generating part that generates a confusion network distributed representation sequence, which is a vector sequence, from an arc word set sequence and an arc weight set sequence constituting the confusion network is included. The confusion network distributed representation sequence generating part comprises: an arc word distributed representation set sequence transforming part that, by transforming an arc word included in the arc word set to a word distributed representation, obtains an arc word distributed representation set and generates an arc word distributed representation set sequence; and an arc word distributed representation set weighting/integrating part that generates the confusion network distributed representation sequence from the arc word distributed representation set sequence and the arc weight set sequence.

Confusion network distributed representation generation apparatus, confusion network classification apparatus, confusion network distributed representation generation method, confusion network classification method and program

There is provided a technique for transforming a confusion network to a representation that can be used as an input for machine learning. A confusion network distributed representation sequence generating part that generates a confusion network distributed representation sequence, which is a vector sequence, from an arc word set sequence and an arc weight set sequence constituting the confusion network is included. The confusion network distributed representation sequence generating part comprises: an arc word distributed representation set sequence transforming part that, by transforming an arc word included in the arc word set to a word distributed representation, obtains an arc word distributed representation set and generates an arc word distributed representation set sequence; and an arc word distributed representation set weighting/integrating part that generates the confusion network distributed representation sequence from the arc word distributed representation set sequence and the arc weight set sequence.

End-to-end streaming keyword spotting
11557282 · 2023-01-17 · ·

A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

End-to-end streaming keyword spotting
11557282 · 2023-01-17 · ·

A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Training multiple neural networks with different accuracy
11556793 · 2023-01-17 · ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability vector for a first feature vector using a first neural network, determining whether one of the scores in the first posterior probability vector satisfies a first threshold value, generating a second posterior probability vector for each subsequent feature vector using a second neural network, wherein the second neural network is trained to identify the same key words and key phrases and includes more inner layer nodes than the first neural network, and determining whether one of the scores in the second posterior probability vector satisfies a second threshold value.

Training multiple neural networks with different accuracy
11556793 · 2023-01-17 · ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a deep neural network. One of the methods includes generating a plurality of feature vectors that each model a different portion of an audio waveform, generating a first posterior probability vector for a first feature vector using a first neural network, determining whether one of the scores in the first posterior probability vector satisfies a first threshold value, generating a second posterior probability vector for each subsequent feature vector using a second neural network, wherein the second neural network is trained to identify the same key words and key phrases and includes more inner layer nodes than the first neural network, and determining whether one of the scores in the second posterior probability vector satisfies a second threshold value.