G10L25/75

Systems and methods for variably paced real-time translation between the written and spoken forms of a word
11581006 · 2023-02-14 · ·

An enunciation system (ES) enables users to gain acquaintance, understanding, and mastery of the relationship between letters and sounds in the context of an alphabetic writing system. The ES enables the user to experience the action of sounding out a word, before their own phonics knowledge enables them to sound out the word independently; its continuous, unbroken speech output or input avoids the common confusions that ensue from analyzing words by breaking them up into discrete sounds; its user-controlled pacing allows the user to slow down enunciation at specific points of difficulty within the word; its real-time touch control allows the written word to be “played” like a musical instrument, with expressive and aesthetic possibilities; and its highlighting of the letter cluster that is responsible for the recognized phoneme enunciated by the user as it occurs allows the user to more easily associated the letters with the sounds.

DIFFICULT AIRWAY EVALUATION METHOD AND DEVICE BASED ON MACHINE LEARNING VOICE TECHNOLOGY

The present disclosure relates to a difficult airway evaluation method and device based on a machine learning voice technology. The method includes the following steps: acquiring voice data of a patient; carrying out feature extraction on the voice data, obtaining a pitch period of pronunciations, and acquiring a voiced sound feature and unvoiced sound features based on the pitch period of pronunciations; and constructing a difficult airway evaluation classifier based on the machine learning voice technology, analyzing the received voiced sound feature and unvoiced sound features by the trained difficult airway evaluation classifier, and carrying out scoring on the severity of a difficult airway to obtain an evaluation result of the difficult airway.

ORAL FUNCTION VISUALIZATION SYSTEM, ORAL FUNCTION VISUALIZATION METHOD, AND RECORDING MEDIUM MEDIUM

An oral function visualization system includes: an outputter that outputs information for prompting a user to utter a predetermined voice; an obtainer that obtains an uttered voice of the user uttered in accordance with the output; an analyzer that analyzes the uttered voice obtained by the obtainer; and an estimator that estimates a state of oral organs of the user from a result of analysis of the uttered voice by the analyzer. The outputter outputs, based on the state of the oral organs of the user estimated by the estimator, information for the user to achieve a state of the oral organs suitable for utterance of the predetermined voice.

Urgency level estimation apparatus, urgency level estimation method, and program

An urgency level estimation technique of estimating an urgency level of a speaker for free uttered speech, which does not require a specific word, is provided. An urgency level estimation apparatus includes a feature amount extracting part configured to extract a feature amount of an utterance from uttered speech, and an urgency level estimating part configured to estimate an urgency level of a speaker of the uttered speech from the feature amount based on a relationship between a feature amount extracted from uttered speech and an urgency level of a speaker of the uttered speech, the relationship being determined in advance, and the feature amount includes at least one of a feature indicating speaking speed of the uttered speech, a feature indicating voice pitch of the uttered speech and a feature indicating a power level of the uttered speech.

METHOD AND SYSTEM FOR AUTOMATIC DETECTION AND CORRECTION OF SOUND DISTORTION

A computer-implemented method for correcting muffled speech caused by facial coverings is disclosed. The computer-implemented method includes monitoring a user's speech for speech distortion. The computer-implemented method further includes determining that the user's speech is distorted. The computer-implemented method further includes determining that a cause of the user's speech distortion is based, at least in part, on a presence of a particular type of facial covering. The computer-implemented method further includes automatically correcting the speech distortion of the user based, at least in part, on the particular type of facial covering causing the speech distortion.

METHOD AND SYSTEM FOR AUTOMATIC DETECTION AND CORRECTION OF SOUND DISTORTION

A computer-implemented method for correcting muffled speech caused by facial coverings is disclosed. The computer-implemented method includes monitoring a user's speech for speech distortion. The computer-implemented method further includes determining that the user's speech is distorted. The computer-implemented method further includes determining that a cause of the user's speech distortion is based, at least in part, on a presence of a particular type of facial covering. The computer-implemented method further includes automatically correcting the speech distortion of the user based, at least in part, on the particular type of facial covering causing the speech distortion.

Audio translator
11605369 · 2023-03-14 · ·

Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.

Audio translator
11605369 · 2023-03-14 · ·

Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.

Estimating Clean Speech Features Using Manifold Modeling
20170316790 · 2017-11-02 ·

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at one or more processing devices, a portion of an input signal representing noisy speech, and extracting, from the portion of the input signal, one or more frequency domain features of the noisy speech. The method also includes generating a set of projected features by projecting each of the one or more frequency domain features on a manifold that represents a model of frequency domain features for clean speech. The method further includes using the set of projected features for at least one of: a) generating synthesized speech that represents a noise-reduced version of the noisy speech, b) performing speaker recognition, or c) performing speech recognition.

TECHNOLOGIES FOR AUTOMATIC SPEECH RECOGNITION USING ARTICULATORY PARAMETERS
20170278517 · 2017-09-28 ·

Technologies for automatic speech recognition using articulatory parameters are disclosed. An automatic speech recognition device may capture speech data from a speaker and also capture an image of the speaker. The automatic speech recognition device may determine one or more articulatory parameters based on the image, such as such as a jaw angle, a lip protrusion, or a lip height, and compare those parameters with articulatory parameters of training users. After selecting training users with similar articulatory parameters as the training speaker, the automatic speech recognition device may select training data associated with the selected training speakers, including parameters to use for an automatic speech recognition algorithm. By using the parameters already optimized for training users with similar articulatory parameters as the speaker, the automatic speech recognition device may quickly adapt an automatic speech recognition algorithm to the speaker.