Patent classifications
G10L21/013
Contact center of celebrities
Customers can become bored with an interaction with an agent. By providing speech and/or images, of a celebrity disguising the speech, and/or image, of the agent, customers can appear to interact with a particular celebrity. As a result, customers are more likely to stay engaged and have a positive experience. The celebrity, or a particular persona of a celebrity, may be selected from customer preferences and/or a purpose of the call. For example, a movie star's role may have a persona, such as a “heavy,” suitable for collection calls (audio or audio-video), whereas a scientific or technical innovator may be selected for technical support calls.
Evaluation apparatus, training apparatus, methods and programs for the same
An evaluation device applies a lowpass filter with a cutoff frequency being a first predetermined value or a second predetermined value greater than the first predetermined value with or without change of feedback formant frequencies which are formant frequencies of a picked-up speech signal, converts the picked-up speech signal, feeds back the converted speech signal to a subject, and includes an evaluation unit that calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on a compensatory response vector for each cutoff frequency.
Evaluation apparatus, training apparatus, methods and programs for the same
An evaluation device applies a lowpass filter with a cutoff frequency being a first predetermined value or a second predetermined value greater than the first predetermined value with or without change of feedback formant frequencies which are formant frequencies of a picked-up speech signal, converts the picked-up speech signal, feeds back the converted speech signal to a subject, and includes an evaluation unit that calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on a compensatory response vector for each cutoff frequency.
NEURAL PITCH-SHIFTING AND TIME-STRETCHING
Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.
NEURAL PITCH-SHIFTING AND TIME-STRETCHING
Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.
Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)
Embodiments described provide a method for mixing vocal performances from different vocalists. A vocal score temporally synchronized with a corresponding backing track and lyrics is retrieved via a communications interface of a portable computing device. A first vocal performance of a user is captured, via a microphone interface of the portable computing device, and in correspondence with the backing track. An open call indication for soliciting, from a second vocalist, a second vocal performance to be mixed for audible rendering with the first vocal performance is transmitted. A mix to one of the user and the second vocalist is provided by selecting, based on to whom the mix is provided, the mix from alternative mixes each having a different prominent vocal performance.
Social music system and method with continuous, real-time pitch correction of vocal performance and dry vocal capture for subsequent re-rendering based on selectively applicable vocal effect(s) schedule(s)
Embodiments described provide a method for mixing vocal performances from different vocalists. A vocal score temporally synchronized with a corresponding backing track and lyrics is retrieved via a communications interface of a portable computing device. A first vocal performance of a user is captured, via a microphone interface of the portable computing device, and in correspondence with the backing track. An open call indication for soliciting, from a second vocalist, a second vocal performance to be mixed for audible rendering with the first vocal performance is transmitted. A mix to one of the user and the second vocalist is provided by selecting, based on to whom the mix is provided, the mix from alternative mixes each having a different prominent vocal performance.
Audiovisual capture and sharing framework with coordinated, user-selectable audio and video effects filters
Coordinated audio and video filter pairs are applied to enhance artistic and emotional content of audiovisual performances. Such filter pairs, when applied in audio and video processing pipelines of an audiovisual application hosted on a portable computing device (such as a mobile phone or media player, a computing pad or tablet, a game controller or a personal digital assistant or book reader) can allow user selection of effects that enhance both audio and video coordinated therewith. Coordinated audio and video are captured, filtered and rendered at the portable computing device using camera and microphone interfaces, using digital signal processing software executable on a processor and using storage, speaker and display devices of, or interoperable with, the device. By providing audiovisual capture and personalization on an intimate handheld device, social interactions and postings of a type made popular by modern social networking platforms can now be extended to audiovisual content.
Real-time speech to singing conversion
A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.
Real-time speech to singing conversion
A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.