G10L21/057

Determining a Playback Rate of Media for a Requester
20170238026 · 2017-08-17 ·

A method, a system, and a computer program product for providing media to a requester at a particular playback rate associated with the requester. The method includes receiving a request from a requester for a playback session of media that includes a time varying content. In response to receiving the request, a profile associated with the requester is accessed to determine a playback rate of the media for the requester. In response to determining the playback rate of the media for the requester, the media is provided to the requester at the determined playback rate. The method further includes monitoring the playback session of the media for playback changes by the requester and dynamically adapting the playback rate associated with the requester based on the type and frequency of playback changes.

Determining a Playback Rate of Media for a Requester
20170238026 · 2017-08-17 ·

A method, a system, and a computer program product for providing media to a requester at a particular playback rate associated with the requester. The method includes receiving a request from a requester for a playback session of media that includes a time varying content. In response to receiving the request, a profile associated with the requester is accessed to determine a playback rate of the media for the requester. In response to determining the playback rate of the media for the requester, the media is provided to the requester at the determined playback rate. The method further includes monitoring the playback session of the media for playback changes by the requester and dynamically adapting the playback rate associated with the requester based on the type and frequency of playback changes.

Playback apparatus, setting apparatus, playback method, and program
09728201 · 2017-08-08 · ·

A playback apparatus includes: an acquiring unit that acquires auditory language data including data to be played back as a spoken voice; an analyzing unit that analyzes the auditory language data to output an analysis result; a setting unit that sets at least a portion of the auditory language data to a control portion to be played back at a set playback speed, based on the analysis result; and a voice playback unit that plays back the control portion as a spoken voice at the set playback speed.

Playback apparatus, setting apparatus, playback method, and program
09728201 · 2017-08-08 · ·

A playback apparatus includes: an acquiring unit that acquires auditory language data including data to be played back as a spoken voice; an analyzing unit that analyzes the auditory language data to output an analysis result; a setting unit that sets at least a portion of the auditory language data to a control portion to be played back at a set playback speed, based on the analysis result; and a voice playback unit that plays back the control portion as a spoken voice at the set playback speed.

METHOD AND ELECTRONIC UNIT FOR ADJUSTING PLAYBACK SPEED OF MEDIA FILES
20170322766 · 2017-11-09 ·

A method for adjusting speed of playback of at least a segment of a media file, comprising generating a text file by speech-to-text conversion of the media file; and determining a speed measure for the media file, including determining a plurality of speech elements in the text file, and associating a time stamp for each speech element of the generated text file. The method may further include determining a degree of comprehensibility of the media file; and adjusting a current speed of playback of the media file based on the determined speed measure and the determined degree of comprehensibility.

Voice processing method for processing voice signal representing voice, voice processing device for processing voice signal representing voice, and recording medium storing program for processing voice signal representing voice
11348596 · 2022-05-31 · ·

A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.

Voice processing method for processing voice signal representing voice, voice processing device for processing voice signal representing voice, and recording medium storing program for processing voice signal representing voice
11348596 · 2022-05-31 · ·

A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.

Personalized voice conversion system

A personalized voice conversion system includes a cloud server and an intelligent device that communicates with the cloud server. The intelligent device upstreams an original voice signal to the cloud server. The cloud server converts the original voice signal into an intelligible voice signal based on an intelligible voice conversion model. The intelligent device downloads and plays the intelligible voice signal. Based on the original voice signal and the corresponding intelligible voice signal, the cloud server and the intelligent device train an off-line voice conversion model provided to the intelligent device. When the intelligent device stops communicating with the cloud server, the intelligent device converts a new original voice signal into a new intelligible voice signal based on the off-line voice conversion model and plays the new intelligible voice signal.

Synthesized data augmentation using voice conversion and speech recognition models

A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

COMPUTER-READABLE RECORDING MEDIUM HAVING STORED THEREIN PROGRAM FOR GENERATING MODEL, INFORMATION PROCESSING APPARATUS, AND METHOD FOR GENERATING MODEL
20230306984 · 2023-09-28 · ·

A computer-readable recording medium has stored therein a program for causing a computer to execute a process including: generating a voice processing model by executing machine learning using training data, the training data associating first training voice data obtained with a first microphone, second training voice data obtained with a second microphone different from the first microphone, and clarified training voice data with one another, the clarified training voice data being obtained by a clarifying process on voice contained at least one of the first training voice data and the second training voice data, the voice processing model generating clarified voice data in response to input of first inference voice data and second inference voice data.