G10L2015/027

SPEECH ANALYSIS ALGORITHMIC SYSTEM AND METHOD FOR OBJECTIVE EVALUATION AND/OR DISEASE DETECTION
20210193173 · 2021-06-24 ·

Systems and methods use patient speech samples as inputs, use subjective multi-point ratings by speech-language pathologists of multiple perceptual dimensions of patient speech samples as further inputs, and extract laboratory-implemented features from the patient speech samples. A predictive software model learns the relationship between speech acoustics and the subjective ratings of such speech obtained from speech-language pathologists, and is configured to apply this information to evaluate new speech samples. Outputs may include objective evaluation of the plurality of perceptual dimensions for new speech samples and/or evaluation of disease onset, disease progression, or disease treatment efficacy for a condition involving dysarthria as a symptom, utilizing the new speech samples.

Foreign language reading and displaying device and a method thereof, motion learning device based on foreign language rhythm detection sensor and motion learning method, electronic recording medium, and learning material
10978045 · 2021-04-13 · ·

A foreign language reading and displaying device and a method thereof, a motion learning device based on a foreign language rhythm detection sensor and a motion learning method, includes generating the phonemes corresponding to a syllable of the separated foreign language phonemes into one native language phonemes from among consonants and vowels in accordance with a predetermined pronunciation rules, combining the generated native language phonemes in accordance with a foreign language combination rules to generate and display native language syllables, words, and sentences, and displaying a part of the separated foreign language phonemes not corresponding to a syllable of a foreign language word as a foreign language phoneme according to a predetermined foreign language pronunciation rule; and displaying at least one of the native language sentence and the inputted foreign language sentence on a screen.

PHONEME SOUND BASED CONTROLLER
20210104225 · 2021-04-08 ·

Disclosed herein is a phoneme sound based controller apparatus including: a sound input for receiving a sound signal; a phoneme sound detection module connected to the sound input to determine if at least one phoneme is detected in the sound signal; a dictionary containing at least one word, the word including at least one syllable, the syllable including the at least one phoneme; a grammar containing at least one rule, the at least one rule containing the at least one word, the at least one rule further containing at least one control action. At least one control action is taken if the at least one phoneme is detected in the sound input signal by the phoneme sound detection module. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

Artificial intelligence-based wakeup word detection method and apparatus, device, and medium

This application discloses an artificial intelligence-based (AI-based) wakeup word detection method performed by a computing device. The method includes: constructing, by using a preset pronunciation dictionary, at least one syllable combination sequence for self-defined wakeup word text inputted by a user; obtaining to-be-recognized speech data, and extracting speech features of speech frames in the speech data; inputting the speech features into a pre-constructed deep neural network (DNN) model, to output posterior probability vectors of the speech features corresponding to syllable identifiers; determine a target probability vector from the posterior probability vectors according to the syllable combination sequence; and calculate a confidence according to the target probability vector, and determine that the speech frames include the wakeup word text when the confidence is greater than or equal to a threshold.

Speech to text conversion engine for non-standard speech

Using a computing device to convert verbal communications including non-standard speech to text. The computing device receives an audio recording of voice and generates a standard text log. A standard word dictionary is retrieved. Non-standard words not found in the word dictionary are determined. Portions of the audio recording corresponding to the non-standard words are retrieved. Portions of the audio recording corresponding to non-standard words into input into a natural language understanding model. The computing device utilizes the results of the natural language understanding model to determine a best-match non-standard dictionary. One or more portions of the audio recording are used to generate a non-standard text log. The standard text log and non-standard text log are merged.

Stimuli for symptom detection

Embodiments are disclosed for health assessment and diagnosis implemented in an artificial intelligence (AI) system. In an embodiment, a method comprises: obtaining, using one or more processors of a device, a speech sample from a user uttering a first sentence; processing the speech sample through a neural network to predict a first set of one or more disease-related symptoms of the user; and generating, using the one or more processors, a second sentence to predict a second set of one or more disease-related symptoms or confirm the first set of disease-related symptoms.

METHOD AND SYSTEM FOR SPEECH EMOTION RECOGNITION

A method for speech emotion recognition for enriching speech to text communications between users in speech chat sessions including: implementing a speech emotion recognition model to enable converting observed emotions in speech samples to enrich text with visual emotion content by: generating a data set of speech samples with labels of a plurality of emotion classes; extracting a set of acoustic features from each of the emotion classes; generating a machine learning (ML) model based on the acoustic features and data set; training the ML model from acoustic features from speech samples during speech chat sessions; predicting emotion content based on a trained ML model in the observed speech; generating enriched text based on predicted emotion content of the trained ML model; and presenting the enriched text in speech to text communications between users in the chat session for visual notice of an observed emotion in the speech sample.

Syllable based automatic speech recognition
10916235 · 2021-02-09 · ·

Systems, methods, and computer programs are described which utilize the structure of syllables as an organizing element of automated speech recognition processing to overcome variations in pronunciation, to efficiently resolve confusable aspects, to exploit context, and to map the speech to orthography.

Quantification of bulbar function

System, method and media for quantifying bulbar function of a subject. At a high level, embodiments of the invention measure and quantify bulbar function of a test subject based on video data, audio data, or other sensor data of a subject performing a test of bulbar function, such as speech, swallowing, and orofacial movements. This sensor data is then analyzed to identify key events such as syllable enunciations. Based on one or more characteristics of these key events (such as, for example, their rate, count, assessed accuracy, or trends over time), the bulbar function of the subject can accurately, reliably, and objectively be quantified.

Information processing device, in-vehicle device, and storage medium
10916246 · 2021-02-09 · ·

An information processing device enables a user to register a wake-up-word for activating a predetermined function by voice recognition. The information processing device includes a receiving unit configured to receive, from a user, an input word for registering a wake-up-word, and a determination unit configured to determine whether the input word received by the receiving unit satisfies conditions for an accuracy of voice recognition.