G10L25/63

Determination of transcription accuracy
11699043 · 2023-07-11 · ·

A method may include obtaining audio of a communication session between a first device of a first user and a second device of a second user. The method may further include obtaining a transcription of second speech of the second user. The method may also include identifying one or more first sound characteristics of first speech of the first user. The method may also include identifying one or more first words indicating a lack of understanding in the first speech. The method may further include determining an experienced emotion of the first user based on the one or more first sound characteristics. The method may also include determining an accuracy of the transcription of the second speech based on the experienced emotion and the one or more first words.

PSYCHOLOGY COUNSELING DEVICE AND METHOD THEREOF

A psychology counseling device is provided. The device includes a user interface configured to receive an input from a user and provide information; a microphone configured to collect a voice of the user; a speaker configured to convey auditory information to the user; a processor configured to control the user interface, the microphone, and the speaker; and a memory accessible by the processor and configured to store executable instructions. The memory is configured to further store texts to be provided to the user and voice data received from the user. The executable instructions, when executed by the processor, causes the processor to perform: recognizing an emotional state of the user based on the user's input; providing texts including different contents to the user according to the emotional state of the user; receiving a voice that the user articulates the texts and storing the voice in the memory as the voice data; obtaining a plurality of modulated voices by converting the voice data; and providing at least two among the plurality of modulated voices to the user.

DETERMINING MENTAL STATES BASED ON BIOMETRIC DATA

Various embodiments of an apparatus, methods, systems and computer program products described herein are directed to an Analytics Engine that receives one more signal files that include neural signal data of a user based on voltages detected by one or more electrodes on a set of headphones worn by a user. The Analytics Engine preprocesses the data, extracts features from the received data, and feeds the extracted features into one or more machine learning models to generate determined output that corresponds to at least one of a current mental state of the user and a type of facial gesture performed by the user. The Analytics Engine sends the determined output to a computing device to perform an action based on the determined output.

Label generation device, model learning device, emotion recognition apparatus, methods therefor, program, and recording medium

With correct emotion classes selected as correct values of an emotion of an utterer of a first utterance from among a plurality of emotion classes C.sub.1, . . . , C.sub.K by listeners who have listened to the first utterance, as an input, the numbers of times n.sub.i that emotion classes C.sub.i have been selected as the correct emotion classes are obtained, and rates of the numbers of times n.sub.k to a sum total of the numbers of times n.sub.1, . . . , n.sub.K or smoothed values of the rates are obtained as correct emotion soft labels t.sub.k.sup.(s) corresponding to the first utterance.

Label generation device, model learning device, emotion recognition apparatus, methods therefor, program, and recording medium

With correct emotion classes selected as correct values of an emotion of an utterer of a first utterance from among a plurality of emotion classes C.sub.1, . . . , C.sub.K by listeners who have listened to the first utterance, as an input, the numbers of times n.sub.i that emotion classes C.sub.i have been selected as the correct emotion classes are obtained, and rates of the numbers of times n.sub.k to a sum total of the numbers of times n.sub.1, . . . , n.sub.K or smoothed values of the rates are obtained as correct emotion soft labels t.sub.k.sup.(s) corresponding to the first utterance.

SYSTEMS AND METHODS FOR AUTOMATED REAL-TIME GENERATION OF AN INTERACTIVE AVATAR UTILIZING SHORT-TERM AND LONG-TERM COMPUTER MEMORY STRUCTURES

Systems and methods enabling rendering an avatar attuned to a user. The systems and methods include receiving audio-visual data of user communications of a user. Using the audio-visual data, the systems and methods may determine vocal characteristics of the user, facial action units representative of facial features of the user, and speech of the user based on a speech recognition model and/or natural language understanding model. Based on the vocal characteristics, an acoustic emotion metric can be determined. Based on the speech recognition data, a speech emotion metric may be determined. Based on the facial action units, a facial emotion metric may be determined. An emotional complex signature may be determined to represent an emotional state of the user for rendering the avatar attuned to the emotional state based on a combination of the acoustic emotion metric, the speech emotion metric and the facial emotion metric.

SYSTEMS AND METHODS FOR AUTOMATED REAL-TIME GENERATION OF AN INTERACTIVE AVATAR UTILIZING SHORT-TERM AND LONG-TERM COMPUTER MEMORY STRUCTURES

Systems and methods enabling rendering an avatar attuned to a user. The systems and methods include receiving audio-visual data of user communications of a user. Using the audio-visual data, the systems and methods may determine vocal characteristics of the user, facial action units representative of facial features of the user, and speech of the user based on a speech recognition model and/or natural language understanding model. Based on the vocal characteristics, an acoustic emotion metric can be determined. Based on the speech recognition data, a speech emotion metric may be determined. Based on the facial action units, a facial emotion metric may be determined. An emotional complex signature may be determined to represent an emotional state of the user for rendering the avatar attuned to the emotional state based on a combination of the acoustic emotion metric, the speech emotion metric and the facial emotion metric.

VOICE-BASED CONTROL OF SEXUAL STIMULATION DEVICES
20230210716 · 2023-07-06 ·

A system and method for voice-based control of sexual stimulation devices. In some configurations, the system and method involve receiving voice data, analyzing the voice data to detect spoken commands, and generating control signals based on the commands. In some configurations, the system and method involve receiving voice data, analyzing the voice data for non-speech vocalizations, detecting voice stress patterns, and generating control signals based on the detected patterns. In some configurations, the analyses of the voice data are performed by machine learning algorithms which may be trained on associations between speech and non-speech vocalizations of a user while the user engages in one or more voice-based training tasks, associating speech and non-speech vocalizations with controls of the sexual stimulation device. In some configurations, machine learning algorithms are used to make the associations. In some configurations, data from other biometric sensors is included in the associations.

VOICE-BASED CONTROL OF SEXUAL STIMULATION DEVICES
20230210716 · 2023-07-06 ·

A system and method for voice-based control of sexual stimulation devices. In some configurations, the system and method involve receiving voice data, analyzing the voice data to detect spoken commands, and generating control signals based on the commands. In some configurations, the system and method involve receiving voice data, analyzing the voice data for non-speech vocalizations, detecting voice stress patterns, and generating control signals based on the detected patterns. In some configurations, the analyses of the voice data are performed by machine learning algorithms which may be trained on associations between speech and non-speech vocalizations of a user while the user engages in one or more voice-based training tasks, associating speech and non-speech vocalizations with controls of the sexual stimulation device. In some configurations, machine learning algorithms are used to make the associations. In some configurations, data from other biometric sensors is included in the associations.

Dynamic system response configuration

A natural language processing system may use system response configuration data to determine customized output data forms when outputting data for a user. The system response configuration data may represent various output attributes the system may use when creating output data. The system may have a certain number of existing profiles where a profile is associated with certain settings for the system response configuration data/attributes. The system may also use various data such as context data, sentiment data, or the like to customize system response configuration data during a dialog. Other components, such as natural language generation (NLG), text-to-speech (TTS), or the like, may use the customized system response configuration data to determine the form, timing, etc. of output data to be presented to a user.