G10L2021/0575

PERSONALIZED VOICE CONVERSION SYSTEM

A personalized voice conversion system includes a cloud server and an intelligent device that communicates with the cloud server. The intelligent device upstreams an original voice signal to the cloud server. The cloud server converts the original voice signal into an intelligible voice signal based on an intelligible voice conversion model. The intelligent device downloads and plays the intelligible voice signal. Based on the original voice signal and the corresponding intelligible voice signal, the cloud server and the intelligent device train an off-line voice conversion model provided to the intelligent device. When the intelligent device stops communicating with the cloud server, the intelligent device converts a new original voice signal into a new intelligible voice signal based on the off-line voice conversion model and plays the new intelligible voice signal.

HEARING DEVICE COMPRISING AN ADAPTIVE FILTER BANK

A hearing device comprises a) at least one input transducer configured to pick up sound from an acoustic environment around the user when the user is wearing the hearing device, the at least one input transducer providing at least one electric input signal representative of said sound, b) at least one analysis filter bank configured to provide said at least one electric input signal as a multitude of frequency sub-band signals, the at least one analysis filter bank comprising b1) a plurality of M first filters h.sub.m(n), whose impulse responses are modulated from a first prototype filter h(n), where m=0, 1, . . . , M−1 is a frequency band index, and n is a time index, c) a processor for processing said at least one electric input signal provided by said at least one analysis filter bank, or a signal originating therefrom, and providing a processed signal, d) an output transducer configured to provide stimuli perceivable as sound to the user in dependence of said processed signal, and e) a controller for controlling said analysis filter bank by applying a different first prototype filter to said at least one filter bank in dependence of said current acoustic environment. A method of operating a hearing device is further disclosed.

TRAINING APPARATUS, METHOD OF THE SAME AND PROGRAM

A training device changes feedback formant frequencies which are formant frequencies of a picked-up speech signal, applies a lowpass filter, converts the picked-up speech signal, adds high-pass noise to the converted speech signal, feeds back the converted speech signal with the high-pass noise added to a subject, calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on the compensatory response vector and a correct compensatory response vector.

AUDIO OUTPUT MODULE FOR USE IN ARTIFICIAL VOICE SYSTEMS

The invention disclosed is an im-proved audio output module for use with an artificial voice generation device, having a housing separated into a sound system chamber, an interface chamber, and a power source chamber. The interface and power source chambers may be combined. The sound cham-ber is isolated from external air by the housing, the cover plate, and a separating wall, which separates it from other chambers of the module. Volumetric para-meters based on speaker characteristics and design re-quirements can thus be implemented independent from the choice of interface type. The module is con-figurable to be mounted to an external structure or to a speech generating system. It may likewise be de-tachable from a quick release cradle and receive wire-less audio signals from the speech generating system.

Personalized voice conversion system

A personalized voice conversion system includes a cloud server and an intelligent device that communicates with the cloud server. The intelligent device upstreams an original voice signal to the cloud server. The cloud server converts the original voice signal into an intelligible voice signal based on an intelligible voice conversion model. The intelligent device downloads and plays the intelligible voice signal. Based on the original voice signal and the corresponding intelligible voice signal, the cloud server and the intelligent device train an off-line voice conversion model provided to the intelligent device. When the intelligent device stops communicating with the cloud server, the intelligent device converts a new original voice signal into a new intelligible voice signal based on the off-line voice conversion model and plays the new intelligible voice signal.

Synthesized data augmentation using voice conversion and speech recognition models

A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Device and method for generating synchronous corpus

A device and a method for generating synchronous corpus is disclosed. Firstly, script data and a dysarthria voice signal having a dysarthria consonant signal are received and the position of the dysarthria consonant signal is detected, wherein the script data have text corresponding to the dysarthria voice signal. Then, normal phoneme data corresponding to the text are searched and the text is converted into a normal voice signal based on the normal phoneme data corresponding to the text. The dysarthria consonant signal is replaced with the normal consonant signal based on the positions of the normal consonant signal and the dysarthria consonant signal, thereby synchronously converting the dysarthria voice signal into a synthesized voice signal. The synthesized voice signal and the dysarthria voice signal are provided to train a voice conversion model, retain the timbre of the dysarthria voices and improve the communication situations.

Training apparatus, method of the same and program

A training device changes feedback formant frequencies which are formant frequencies of a picked-up speech signal, applies a lowpass filter, converts the picked-up speech signal, adds high-pass noise to the converted speech signal, feeds back the converted speech signal with the high-pass noise added to a subject, calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on the compensatory response vector and a correct compensatory response vector.

INTRA-ORAL DEVICE FOR FACILITATING COMMUNICATION
20220300083 · 2022-09-22 ·

A method comprising determining, by a processing system, based on a first oral gesture detected by an intra-oral device located in a mouth of a user, an intended communication partner from among a plurality of available communication partners; determining, by the processing system, a message based on a series of one or more second oral gestures detected by the intra-oral device; and sending, by the processing system, the message to a communication device associated with the intended communication partner.

Synthesized Data Augmentation Using Voice Conversion and Speech Recognition Models

A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.