Patent classifications
G10L21/043
Creating a Printed Publication, an E-Book, and an Audio Book from a Single File
As an example, a server may receive, from a computing device, a submission created by an author. The submission includes book data associated with a book and author data associated with the author. The author data includes incarceration data indicating whether the author was incarcerated. The server may determine, based on the author data and the book data, that the submission is publishable. The server may create, based on the book data, a printable book, an e-book, and an audio book and make one or more of the printable book, the e-book, and the audio book available for acquisition.
METHODS AND SYSTEMS FOR TRANSCRIPTION PLAYBACK WITH VARIABLE EMPHASIS
Methods and systems are provided for assisting operation of a vehicle using speech recognition and transcription using text-to-speech for transcription playback with variable emphasis. One method involves analyzing a transcription of an audio communication with respect to the vehicle to identify an operational term pertaining to a current operational context of the vehicle within the transcription, creating an indicator identifying the operational term within the transcription for emphasis when the operational term pertains to the current operational context of the vehicle, identifying a user-configured playback rate; and generating an audio reproduction of the transcription of the audio communication in accordance with the user-configured playback rate, wherein the operational term is selectively emphasized within the audio reproduction based on the indicator.
SPEECH RECOGNITION APPARATUS AND METHOD
According to one embodiment, a speech recognition apparatus includes processing circuitry. The processing circuitry generates a plurality of augmented speech data, based on input speech data, generates a plurality of acoustic scores, based on the plurality of augmented speech data and an acoustic model, generates a plurality of adjusted acoustic scores by resampling the acoustic scores, generates an integrated acoustic score by integrating the adjusted acoustic scores, generates an integrated lattice, based on the integrated acoustic score, a pronunciation dictionary, and a language model, and searches a speech recognition result with a highest likelihood from the integrated lattice.
CONTROLLING PLAYBACK OF AUDIO DATA
Playback of audio data is controlled by: receiving a speech signal to be conveyed to a user simultaneously with playback of the audio data. Volume and/or spectral appearance of selected elements of the audio data are then modified to obtain adjusted audio data, and the adjusted audio data is played back. The received speech signal may then be played back simultaneously with the adjusted audio data.
AI-BASED DJ SYSTEM AND METHOD FOR DECOMPOSING, MISING AND PLAYING OF AUDIO DATA
The present invention relates to a method for processing and playing audio data comprising the steps of receiving mixed input data and playing recombined output data. Furthermore, the invention relates to a device 10 for processing and playing audio data, preferably DJ equipment, comprising an audio input unit for receiving a mixed input signal, a recombination unit 32 and a playing unit 34 for playing recombined output data. In addition, the present invention relates to a method and a device for representing audio data, i.e. on a display.
AUDIO SIGNAL PROCESSING METHOD, DEVICE AND STORAGE MEDIUM
An audio signal processing method, device and storage medium, are provided. The method includes performing sub-band filtering on a to-be-processed audio signal to obtain a plurality of sub-band signals, wherein the number of the sub-band signals is determined according to a lowest frequency of a band-pass filter and a cut-off frequency of an audio apparatus, and the sub-band signals comprise sub-band band-pass signals; and obtaining a target audio signal according to each of the sub-band band-pass signals and a processing algorithm of virtual bass enhancement signal.
AUDIO SIGNAL PROCESSING METHOD, DEVICE AND STORAGE MEDIUM
An audio signal processing method, device and storage medium, are provided. The method includes performing sub-band filtering on a to-be-processed audio signal to obtain a plurality of sub-band signals, wherein the number of the sub-band signals is determined according to a lowest frequency of a band-pass filter and a cut-off frequency of an audio apparatus, and the sub-band signals comprise sub-band band-pass signals; and obtaining a target audio signal according to each of the sub-band band-pass signals and a processing algorithm of virtual bass enhancement signal.
COMPUTER IMPLEMENTED METHOD, DEVICE AND COMPUTER PROGRAM PRODUCT FOR SETTING A PLAYBACK SPEED OF MEDIA CONTENT COMPRISING AUDIO
A computer implemented method for setting a playback speed of media content comprising audio, the media content having a defined normal playback speed, the method comprising: receiving an indication that the media content is to be played at a speed different from the normal playback speed of the media content, analysing the audio for determining a type of audio; and determining a playback speed different from the normal playback speed depending on the determined type of audio, and setting the playback speed of the media content to the determined playback speed.
COMPUTER IMPLEMENTED METHOD, DEVICE AND COMPUTER PROGRAM PRODUCT FOR SETTING A PLAYBACK SPEED OF MEDIA CONTENT COMPRISING AUDIO
A computer implemented method for setting a playback speed of media content comprising audio, the media content having a defined normal playback speed, the method comprising: receiving an indication that the media content is to be played at a speed different from the normal playback speed of the media content, analysing the audio for determining a type of audio; and determining a playback speed different from the normal playback speed depending on the determined type of audio, and setting the playback speed of the media content to the determined playback speed.
SPEECH SYNTHESIS METHOD AND APPARATUS, AND READABLE STORAGE MEDIUM
A speech synthesis method includes: converting a text input sequence into a text feature representation sequence; inputting the text feature representation sequence into an encoder including N encoding layers; the N encoding layers including an encoding layer E.sub.i and an encoding layer E.sub.i+1; the encoding layer E.sub.i+1 including a first multi-head self-attention network; acquiring a first attention matrix and a historical text encoded sequence outputted by the encoding layer E.sub.i, and generating a second attention matrix of the encoding layer E.sub.i+1 according to residual connection between the first attention matrix and the first multi-head self-attention network and the historical text encoded sequence; and generating a target text encoded sequence of the encoding layer E.sub.i+1 according to the second attention matrix and the historical text encoded sequence, and generating synthesized speech data matched with the text input sequence based on the target text encoded sequence.