G10L21/057

SPEAKING RHYTHM TRANSFORMATION APPARATUS, MODEL LEARNING APPARATUS, METHODS THEREFOR, AND PROGRAM

It is intended to accurately convert a speech rhythm. A model storage unit (10) stores a speech rhythm conversion model which is a neural network that receives, as an input thereto, a first feature value vector including information related to a speech rhythm of at least a phoneme extracted from a first speech signal resulting from a speech uttered by a speaker in a first group, converts the speech rhythm of the first speech signal to a speech rhythm of a speaker in a second group, and outputs the speech rhythm of the speaker in the second group. A feature value extraction unit (11) extracts, from the input speech signal resulting from the speech uttered by the speaker in the first group, information related to a vocal tract spectrum and information related to the speech rhythm. A conversion unit (12) inputs the first feature value vector including the information related to the speech rhythm extracted from the input speech signal to the speech rhythm conversion model and obtains the post-conversion speech rhythm. A speech synthesis unit (13) uses the post-conversion speech rhythm and the information related to the vocal tract spectrum extracted from the input speech signal to generate an output speech signal.

Synthesized Data Augmentation Using Voice Conversion and Speech Recognition Models

A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Audio output module for use in artificial voice systems

The invention disclosed is an improved audio output module for use with an artificial voice generation device, having a housing separated into a sound system chamber, an interface chamber, and a power source chamber. The interface and power source chambers may be combined. The sound chamber is isolated from external air by the housing, the cover plate, and a separating wall, which separates it from other chambers of the module. Volumetric parameters based on speaker characteristics and design requirements can thus be implemented independent from the choice of interface type. The module is configurable to be mounted to an external structure or to a speech generating system. It may likewise be detachable from a quick release cradle and receive wireless audio signals from the speech generating system.

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

Provided are an electronic apparatus and a controlling method thereof. The electronic apparatus includes an inputter and a processor configured to, based on receiving an audio signal through the inputter, obtain a speech intelligibility for the audio signal, and modify the audio signal so that the speech intelligibility becomes a target intelligibility that is set based on scene information regarding a type of audio included in the audio signal, and the type of audio includes at least one of a sound effect, shouting, music, or a speech.

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

Provided are an electronic apparatus and a controlling method thereof. The electronic apparatus includes an inputter and a processor configured to, based on receiving an audio signal through the inputter, obtain a speech intelligibility for the audio signal, and modify the audio signal so that the speech intelligibility becomes a target intelligibility that is set based on scene information regarding a type of audio included in the audio signal, and the type of audio includes at least one of a sound effect, shouting, music, or a speech.

DIAGNOSING AND TREATMENT OF SPEECH PATHOLOGIES USING ANALYSIS BY SYNTHESIS TECHNOLOGY
20210158834 · 2021-05-27 ·

There are provided herein, a method and system for creating a speech/language pathologies classifier, the method comprising: producing a pathological speech repository of pathological speech samples of multiple impairments; computing speech qualities/pathologies, based on data receive from the pathological speech repository; producing a text repository, the text repository comprises multiple known text passages; converting each one of a selection of the text passages from the multiple known text passages, to a speech segment, while introducing to the speech segment one or more of the computed speech pathologies, thereby creating multiple synthetic impaired speech segments; and training a classifier with the multiple synthetic impaired speech segments thereby creating a speech/language pathologies classifier.

DIAGNOSING AND TREATMENT OF SPEECH PATHOLOGIES USING ANALYSIS BY SYNTHESIS TECHNOLOGY
20210158834 · 2021-05-27 ·

There are provided herein, a method and system for creating a speech/language pathologies classifier, the method comprising: producing a pathological speech repository of pathological speech samples of multiple impairments; computing speech qualities/pathologies, based on data receive from the pathological speech repository; producing a text repository, the text repository comprises multiple known text passages; converting each one of a selection of the text passages from the multiple known text passages, to a speech segment, while introducing to the speech segment one or more of the computed speech pathologies, thereby creating multiple synthetic impaired speech segments; and training a classifier with the multiple synthetic impaired speech segments thereby creating a speech/language pathologies classifier.

Method of operating a hearing device and a hearing device providing speech enhancement based on an algorithm optimized with a speech intelligibility prediction algorithm

A method of training an algorithm for optimizing intelligibility of speech components of a sound signal in hearing aids, headsets, etc., comprises a) providing a first database comprising a multitude of predefined time segments of first electric input signals representing sound and corresponding measured speech intelligibilities; b) determining optimized first parameters of a first algorithm by optimizing it with said predefined time segments and said corresponding measured speech intelligibilities, the first algorithm providing corresponding predicted speech intelligibilities; c) providing a second database comprising a multitude of time segments of second electric input signals representing sound, d) determining optimized second parameters of a second algorithm by optimizing it with said multitude of time segments, said second algorithm being configured to provide processed second electric input signals exhibiting respective predicted speech intelligibilities estimated by said first algorithm, said optimizing being conducted under a constraint of maximizing said predicted speech intelligibility.

Devices for Real-time Speech Output with Improved Intelligibility
20240005944 · 2024-01-04 ·

Real-time speech output with improved intelligibility are described. One example embodiment includes a device. The device includes a microphone configured to capture one or more frames of unintelligible speech from a user. The device also includes an analog-to-digital converter (ADC) configured to convert the one or more captured frames of unintelligible speech into a digital representation. Additionally, the device includes a computing device. The computing device is configured to receive the digital representation from the ADC. The computing device is also configured to apply a machine-learned model to the digital representation to generate one or more frames with improved intelligibility. Further, the computing device is configured to output the one or more frames with improved intelligibility. In addition, the device includes a digital-to-analog converter (DAC) configured to convert the one or more frames with improved intelligibility into an analog form. Yet further, the device includes a speaker.

Devices for Real-time Speech Output with Improved Intelligibility
20240005944 · 2024-01-04 ·

Real-time speech output with improved intelligibility are described. One example embodiment includes a device. The device includes a microphone configured to capture one or more frames of unintelligible speech from a user. The device also includes an analog-to-digital converter (ADC) configured to convert the one or more captured frames of unintelligible speech into a digital representation. Additionally, the device includes a computing device. The computing device is configured to receive the digital representation from the ADC. The computing device is also configured to apply a machine-learned model to the digital representation to generate one or more frames with improved intelligibility. Further, the computing device is configured to output the one or more frames with improved intelligibility. In addition, the device includes a digital-to-analog converter (DAC) configured to convert the one or more frames with improved intelligibility into an analog form. Yet further, the device includes a speaker.