G10L15/1807

Handsfree Communication System and Method
20230036791 · 2023-02-02 ·

A method, computer program product, and computing system for interfacing a generic virtual assistant with a medical management system; and monitoring the diction of a medical specialist using the generic virtual assistant.

Extracting content from speech prosody

A prosodic speech recognition engine configured to identify prosodic features and patterns in a speech continuum for the extraction of linguistic content including para-syntactic content, discourse function, information structure, meaning, and speaker sentiment.

Systems and methods for determining whether to trigger a voice capable device based on speaking cadence

Systems and methods are described for determining whether to activate a voice activated device based on a speaking cadence of the user. When the user speaks with a first cadence the system may determine that the user does not intend to activate the device and may accordingly not to trigger a voice activated device. When the user speaks with a second cadence the system may determine that the user does wish to trigger the device and may accordingly trigger the voice activated device.

System and method for cross-speaker style transfer in text-to-speech and training data generation

Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.

MULTIMODAL SENTIMENT CLASSIFICATION

Sentiment classification can be implemented by an entity-level multimodal sentiment classification neural network. The neural network can include left, right, and target entity subnetworks. The neural network can further include an image network that generates representation data that is combined and weighted with data output by the left, right, and target entity subnetworks to output a sentiment classification for an entity included in a network post.

SYSTEMS AND METHODS FOR CLASSIFICATION AND RATING OF CALLS BASED ON VOICE AND TEXT ANALYSIS
20230066797 · 2023-03-02 · ·

Methods and systems include sending recording data of a call to a first server and a second server, wherein the recording data includes a first voice of a first participant of the call and a second voice of a second participant of the call; receiving, from the first server, a first emotion score representing a degree of a first emotion associated with the first voice, and a second emotion score representing a degree of a second emotion associated with the first voice; receiving, from the second server, a first sentiment score, a second sentiment score, and a third sentiment score; determining a quality score and classification data for the recording data based on the first emotion score, the second emotion score, the first sentiment score, the second sentiment score, and the third sentiment score; and outputting the quality score and the classification data for visualization of the recording data.

Pervasive advisor for major expenditures

A pervasive advisor for major purchases and other expenditures may detect that a customer is contemplating a major purchase (e.g., through active listening). The advisor may assist the customer with the timing and manner of making the purchase in a way that is financially sensible in view of the customer's financial situation. A customer may be provided with dynamically-updated information in response to recent actions that may affect an approved loan amount and/or interest rate. Underwriting of a loan may be triggered based on the geo-location of the user. Financial advice may be provided to customers to help them meet their goals using information obtained from third party sources, such as purchase options based on particular goals. The pervasive advisor may thus intervene to assist with budgeting, financing, and timing of major expenditures based on the customer's location and on the customer's unique and changing circumstances.

AUTOMATED CALL REQUESTS WITH STATUS UPDATES
20230164268 · 2023-05-25 ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to synthetic call status updates. In some implementations, a method includes determining, by a task manager module, that a triggering event has occurred to provide a current status of a user call request. The method may then determine, by the task manager module, the current status of the user call request. A representation of the current status of the user call request is generated. Then, the generated representation of the current status of the user call request is provided to the user.

INFORMATION PROCESSING DEVICE AND PROGRAM
20230116808 · 2023-04-13 ·

An information processing device that determines a psychological state of a member in a meeting, includes: a voice processor that extracts a predetermined feature amount from a voice of a part corresponding to a speech of the member among voices collected during the meeting by a microphone; an estimator that receives an input of a feature amount and estimates a likelihood that the member is in a predetermined psychological state; and a determiner that determines a likelihood that the member is in the predetermined psychological state during the meeting on a basis of the likelihood estimated by the estimator and an index value according to a purpose of the meeting.

Recognizing accented speech
11651765 · 2023-05-16 · ·

Techniques and apparatuses for recognizing accented speech are described. In some embodiments, an accent module recognizes accented speech using an accent library based on device data, uses different speech recognition correction levels based on an application field into which recognized words are set to be provided, or updates an accent library based on corrections made to incorrectly recognized speech.