G10L13/033

METHOD AND SYSTEM FOR GENERATING AN INTELLIGENT VOICE ASSISTANT RESPONSE
20230223007 · 2023-07-13 ·

A method and a system for generating an intelligent voice assistant response are provided. The method includes receiving a preliminary voice assistant response to a user command and determining a subjective polarity score of the preliminary voice assistant response and a dynamic polarity score indicative of an instant user reaction to the preliminary voice assistant response, once the preliminary voice assistant response is delivered. The method thereafter determines a sentiment score of the preliminary voice assistant response based on the subjective polarity score and the dynamic polarity score. The method identifies an emotionally uplifting information for the user that is to be combined with the preliminary voice assistant response. The method further includes generating a personalized note to be combined with the preliminary voice assistant response and generating the intelligent voice assistant response by combining the preliminary voice assistant response with the emotionally uplifting information and the personalized note.

Using speech to text data in training text to speech models

A system and method for providing a text to speech output by receiving user audio data, determining a user region-specific-pronunciation classification according to the audio data, determining text for a response to the user according to the audio data, identifying a portion from the text, where a region specific-pronunciation dictionary includes the portion, and using a phoneme string, from the dictionary selected according to the user region-specific pronunciation classification, for the word in a text to speech output to the user.

Audio Processing Apparatus
20230213349 · 2023-07-06 ·

An apparatus configured to: determine, with a position sensor, position information; determine at least one keyword within at least one audio signal, wherein at least the at least one keyword is configured to be spatially processed; obtain at least one spatial processing parameter based at least partially, on the position information, wherein the at least one spatial processing parameter is configured to be used to spatially process at least the at least one keyword to be perceived from a direction during rendering, wherein the direction indicates a navigation direction; generate at least one processed audio signal, comprising processing at least the at least one keyword based on the at least one spatial processing parameter; and provide the at least one processed audio signal, comprising the at least one processed keyword, for generation of a virtual audio image.

Audio Processing Apparatus
20230213349 · 2023-07-06 ·

An apparatus configured to: determine, with a position sensor, position information; determine at least one keyword within at least one audio signal, wherein at least the at least one keyword is configured to be spatially processed; obtain at least one spatial processing parameter based at least partially, on the position information, wherein the at least one spatial processing parameter is configured to be used to spatially process at least the at least one keyword to be perceived from a direction during rendering, wherein the direction indicates a navigation direction; generate at least one processed audio signal, comprising processing at least the at least one keyword based on the at least one spatial processing parameter; and provide the at least one processed audio signal, comprising the at least one processed keyword, for generation of a virtual audio image.

Terminal and Operating Method Thereof
20230215418 · 2023-07-06 · ·

A terminal may include a display that is divided into at least two areas, when a real time broadcasting, where a user of the terminal is a host, starts through a broadcasting channel, and of which one area of the at least two areas is allocated to the host; an input/output interface that receives a voice of the host; a communication interface that receives one item selected of at least one or more items and a certain text from a terminal of a certain guest, of at least one or more guests who entered the broadcasting channel; and a processor that generates a voice message converted from the certain text into the voice of the host or a voice of the certain guest.

Terminal and Operating Method Thereof
20230215418 · 2023-07-06 · ·

A terminal may include a display that is divided into at least two areas, when a real time broadcasting, where a user of the terminal is a host, starts through a broadcasting channel, and of which one area of the at least two areas is allocated to the host; an input/output interface that receives a voice of the host; a communication interface that receives one item selected of at least one or more items and a certain text from a terminal of a certain guest, of at least one or more guests who entered the broadcasting channel; and a processor that generates a voice message converted from the certain text into the voice of the host or a voice of the certain guest.

Multi-Purpose Protective Face Mask
20230210201 · 2023-07-06 ·

A protective face mask implemented with a pocket located on a front surface of the mask A removable amplifier unit configured to be placed into the pocket, the removable amplifier unit comprising: a micro-processor configured to process voice data; a rechargeable battery coupled to the micro-processor; a Bluetooth device coupled to the micro-processor; a microphone coupled to the micro-processor and configured to provide the voice data to the micro-processor; and a speaker unit configured to output the voice data processed by the micro-processor.

Multi-Purpose Protective Face Mask
20230210201 · 2023-07-06 ·

A protective face mask implemented with a pocket located on a front surface of the mask A removable amplifier unit configured to be placed into the pocket, the removable amplifier unit comprising: a micro-processor configured to process voice data; a rechargeable battery coupled to the micro-processor; a Bluetooth device coupled to the micro-processor; a microphone coupled to the micro-processor and configured to provide the voice data to the micro-processor; and a speaker unit configured to output the voice data processed by the micro-processor.

PSYCHOLOGY COUNSELING DEVICE AND METHOD THEREOF

A psychology counseling device is provided. The device includes a user interface configured to receive an input from a user and provide information; a microphone configured to collect a voice of the user; a speaker configured to convey auditory information to the user; a processor configured to control the user interface, the microphone, and the speaker; and a memory accessible by the processor and configured to store executable instructions. The memory is configured to further store texts to be provided to the user and voice data received from the user. The executable instructions, when executed by the processor, causes the processor to perform: recognizing an emotional state of the user based on the user's input; providing texts including different contents to the user according to the emotional state of the user; receiving a voice that the user articulates the texts and storing the voice in the memory as the voice data; obtaining a plurality of modulated voices by converting the voice data; and providing at least two among the plurality of modulated voices to the user.

SYSTEMS AND METHODS FOR AUTOMATED REAL-TIME GENERATION OF AN INTERACTIVE AVATAR UTILIZING SHORT-TERM AND LONG-TERM COMPUTER MEMORY STRUCTURES

Systems and methods enabling rendering an avatar attuned to a user. The systems and methods include receiving audio-visual data of user communications of a user. Using the audio-visual data, the systems and methods may determine vocal characteristics of the user, facial action units representative of facial features of the user, and speech of the user based on a speech recognition model and/or natural language understanding model. Based on the vocal characteristics, an acoustic emotion metric can be determined. Based on the speech recognition data, a speech emotion metric may be determined. Based on the facial action units, a facial emotion metric may be determined. An emotional complex signature may be determined to represent an emotional state of the user for rendering the avatar attuned to the emotional state based on a combination of the acoustic emotion metric, the speech emotion metric and the facial emotion metric.