IPIQ

G10L21/043

SPEECH SYNTHESIS METHOD AND APPARATUS, AND READABLE STORAGE MEDIUM

20230075891 · 2023-03-09 ·

A speech synthesis method includes: converting a text input sequence into a text feature representation sequence; inputting the text feature representation sequence into an encoder including N encoding layers; the N encoding layers including an encoding layer E.sub.i and an encoding layer E.sub.i+1; the encoding layer E.sub.i+1 including a first multi-head self-attention network; acquiring a first attention matrix and a historical text encoded sequence outputted by the encoding layer E.sub.i, and generating a second attention matrix of the encoding layer E.sub.i+1 according to residual connection between the first attention matrix and the first multi-head self-attention network and the historical text encoded sequence; and generating a target text encoded sequence of the encoding layer E.sub.i+1 according to the second attention matrix and the historical text encoded sequence, and generating synthesized speech data matched with the text input sequence based on the target text encoded sequence.

SYSTEMS AND METHODS FOR PROVIDING AUDIO-FILE LOOP-PLAYBACK FUNCTIONALITY

20230133084 · 2023-05-04 ·

Systems and methods for providing audio-file loop-playback functionality are provided. The system includes a processor that performs a method including setting a playback loop start-point based on a first selection of a button; setting a loop end-point, associating a loop with an audio file, and entering into the loop based on a second selection of the button; and exiting the loop based on a third selection of the button. Associating the loop with the audio file includes adding metadata to the audio file. The metadata associates the loop with a button. The method includes reentering the loop based on a fourth selection of the button and exiting the loop based on a fifth selection of the button.

SYSTEMS AND METHODS FOR PROVIDING AUDIO-FILE LOOP-PLAYBACK FUNCTIONALITY

20230133084 · 2023-05-04 ·

COMMUNICATION APPARATUS MOUNTED WITH SPEECH SPEED CONVERSION DEVICE

20170345444 · 2017-11-30 ·

Toshimichi Tokuda

In a communication apparatus, an encoder compresses telephone call voice which is transmitted from another communication apparatus. A voice accumulator preserves the telephone call voice, which is compressed by the encoder, as a message. A decoder expands the telephone call voice which is preserved in the voice accumulator. A signal memory temporarily maintains the telephone call voice which is expanded by the decoder. A speech speed convertor performs speech speed conversion on the telephone call voice, which is read from the signal memory, and outputs resulting voice from a speaker. A memory monitor temporarily stops to expand the telephone call voice in the decoder in a case where the memory monitor determines that an idle capacity of the signal memory approaches a predetermined lower limit value.

COMMUNICATION APPARATUS MOUNTED WITH SPEECH SPEED CONVERSION DEVICE

20170345444 · 2017-11-30 ·

Toshimichi Tokuda

Fast playback in media files with reduced impact to speech quality

11488620 · 2022-11-01 ·

International Business Machines Corporation

Deepa Jain

The present invention is a computer program product and method for increasing the playback speed of audio or other media files. The computer program product and method identifies pedagogic media files and adds a flag to the metadata of the media file. The flag represents the number and type of pauses or silent sections in the pedagogic media file. Based on the flag, the computer program product and method may fast forward or remove a portion of the pauses and silent sections to provide a new playback speed.

Fast playback in media files with reduced impact to speech quality

11488620 · 2022-11-01 ·

International Business Machines Corporation

Deepa Jain

Transcription of audio

11488604 · 2022-11-01 ·

Sorenson IP Holdings, Llc

David Thomson

A method may include obtaining first features of first audio data that includes speech and obtaining second features of second audio data that is a revoicing of the first audio data. The method may further include providing the first features and the second features to an automatic speech recognition system and obtaining a single transcription generated by the automatic speech recognition system using the first features and the second features.

TRANSCRIPTION OF AUDIO

20220059094 · 2022-02-24 ·

David Thomson

AUDIO DRIVEN ACCELERATED BINGE WATCH

20170309296 · 2017-10-26 ·

Kai Sun

Example embodiments provide systems and methods for accelerating digital content playback based on speech. A content acceleration system electronically accesses digital content. The system analyzes the digital content to detect at least one audio portion within the digital content, each of the at least one audio portion comprising speech. The system creates at least one digital content segment from the digital content based on the at least one audio portion, whereby a beginning of each digital content segment of the at least one digital content segment coincides with a beginning of a corresponding audio portion of the at least one audio portion. The system then accelerates playback of the digital content by fast forwarding through parts of the at least one digital content segment where speech is absent.

Patent classifications

G10L21/043