Patent classifications
G10L21/055
Audiovisual capture and sharing framework with coordinated, user-selectable audio and video effects filters
Coordinated audio and video filter pairs are applied to enhance artistic and emotional content of audiovisual performances. Such filter pairs, when applied in audio and video processing pipelines of an audiovisual application hosted on a portable computing device (such as a mobile phone or media player, a computing pad or tablet, a game controller or a personal digital assistant or book reader) can allow user selection of effects that enhance both audio and video coordinated therewith. Coordinated audio and video are captured, filtered and rendered at the portable computing device using camera and microphone interfaces, using digital signal processing software executable on a processor and using storage, speaker and display devices of, or interoperable with, the device. By providing audiovisual capture and personalization on an intimate handheld device, social interactions and postings of a type made popular by modern social networking platforms can now be extended to audiovisual content.
Real-time speech to singing conversion
A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.
Real-time speech to singing conversion
A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.
METHOD AND SYSTEM FOR SYNCHRONIZIING PRESENTATION SLIDE CONTENT WITH SOUNDTRACK
A method for synchronizing a plurality of presentation slide content with a soundtrack comprises obtaining the plurality of presentation slide content and the soundtrack including a plurality of audio samples. The presentation slide content comprises a video or an animation in the presentation slide. Each presentation slide content is associated with a metadata and each audio sample is indexed with a corresponding timecode. The method comprises detecting triggering event that identifies a current audio sample of the soundtrack as an audio sample to transition from a first presentation slide content to a second presentation slide content, and obtaining a timecode indexed with the identified audio sample, associating the timecode with the metadata of the second presentation slide content to link the second presentation slide content with the identified audio sample, and generating a synchronized presentation multimedia file having the linked second presentation slide content with the identified audio sample.
AUTOMATIC DETERMINATION OF TIMING WINDOWS FOR SPEECH CAPTIONS IN AN AUDIO STREAM
A content system accessing an audio stream. The content system inputs segments of the audio stream into a speech classifier for classification, the speech classifier generating, for the segments of the audio stream, raw scores representing likelihoods that the respective segment of the audio stream includes an occurrence of a speech sound. The content system generates binary scores for the audio stream based on the set of raw scores, each binary score generated based on an aggregation of raw scores from consecutive series of the segments of the audio stream. The content system generates one or more timing windows for the speech sounds in the audio stream based on the binary scores, each timing window indicating an estimate of a beginning and ending timestamps of one or more speech sounds in the audio stream.
Fast playback in media files with reduced impact to speech quality
The present invention is a computer program product and method for increasing the playback speed of audio or other media files. The computer program product and method identifies pedagogic media files and adds a flag to the metadata of the media file. The flag represents the number and type of pauses or silent sections in the pedagogic media file. Based on the flag, the computer program product and method may fast forward or remove a portion of the pauses and silent sections to provide a new playback speed.
SAMPLING RATE PROCESSING METHOD, APPARATUS, AND SYSTEM, STORAGE MEDIUM, AND COMPUTER DEVICE
A sampling rate processing method performed by a computer device are disclosed. The method includes: obtaining a first audio signal recorded by a transmitting device, the first audio signal being recorded according to an initial sampling rate of the transmitting device; obtaining a second audio signal recorded by a receiving device during playing of the first audio signal, the second audio signal being recorded according to the initial sampling rate; determining a frequency response gain value of the receiving device according to a power spectrum of the first audio signal and a power spectrum of the second audio signal; determining a target sampling rate of the transmitting device according to the initial sampling rate and the frequency response gain value; and configuring the transmitting device to record audio signals according to the target sampling rate.
SAMPLING RATE PROCESSING METHOD, APPARATUS, AND SYSTEM, STORAGE MEDIUM, AND COMPUTER DEVICE
A sampling rate processing method performed by a computer device are disclosed. The method includes: obtaining a first audio signal recorded by a transmitting device, the first audio signal being recorded according to an initial sampling rate of the transmitting device; obtaining a second audio signal recorded by a receiving device during playing of the first audio signal, the second audio signal being recorded according to the initial sampling rate; determining a frequency response gain value of the receiving device according to a power spectrum of the first audio signal and a power spectrum of the second audio signal; determining a target sampling rate of the transmitting device according to the initial sampling rate and the frequency response gain value; and configuring the transmitting device to record audio signals according to the target sampling rate.
INFORMATION OUTPUT APPARATUS
With occurrence of an event as a trigger, outputting a notification sound of auditory information is started. Start of a character guide of visual information is delayed relatively to a timing of the trigger so as to be synchronized with start of a voice guide of auditory information. A user is surely made aware of a schedule of the start of the guides by the output of the notification sound. After the user moves his/her eyes to a screen for the character guide, the character guide and the voice guide are executed. Since no time lag is generated between the character guide and the voice guide, it is possible to prevent occurrence of a sense of incompatibility.
INFORMATION OUTPUT APPARATUS
With occurrence of an event as a trigger, outputting a notification sound of auditory information is started. Start of a character guide of visual information is delayed relatively to a timing of the trigger so as to be synchronized with start of a voice guide of auditory information. A user is surely made aware of a schedule of the start of the guides by the output of the notification sound. After the user moves his/her eyes to a screen for the character guide, the character guide and the voice guide are executed. Since no time lag is generated between the character guide and the voice guide, it is possible to prevent occurrence of a sense of incompatibility.