Patent classifications
G10L21/013
Coordinating and mixing vocals captured from geographically distributed performers
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
Coordinating and mixing vocals captured from geographically distributed performers
Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.
OPTIMIZING INTERACTION RESULTS USING AI-GUIDED MANIPULATED VIDEO
Real-time modification of audio of humans allows for the audio to be modified so that an expression of a subject human may be changed. Customer service agents may have more successful interactions with customers if they provide vocalization attribute in their speech that are appropriate, such as to provide a particular emotional state. By determining an appropriate vocalization attribute, and any deviation from a customer service agent's current vocalization attribute, a modification to the audio of the customer service agent's speech may be determined and applied. As a result, agents may not have a vocalization attribute that is best suited to successfully resolve a purpose of the interaction, altered to have the customer be presented with the customer service agent's speech having the best-suited vocalization attribute.
OPTIMIZING INTERACTION RESULTS USING AI-GUIDED MANIPULATED VIDEO
Real-time modification of audio of humans allows for the audio to be modified so that an expression of a subject human may be changed. Customer service agents may have more successful interactions with customers if they provide vocalization attribute in their speech that are appropriate, such as to provide a particular emotional state. By determining an appropriate vocalization attribute, and any deviation from a customer service agent's current vocalization attribute, a modification to the audio of the customer service agent's speech may be determined and applied. As a result, agents may not have a vocalization attribute that is best suited to successfully resolve a purpose of the interaction, altered to have the customer be presented with the customer service agent's speech having the best-suited vocalization attribute.
Sound playback system and output sound adjusting method thereof
A sound playback system and an output sound adjusting method thereof are disclosed. The method includes the following steps: receiving an input sound signal from a user, wherein the input sound signal includes a voice signal indicating the age of the user; transmitting the input sound signal to a remote voice system; performing a voice recognition process according to the voice signal of the input sound signal to obtain a voice recognition result; adjusting a gain value of each frequency band of an output sound signal according to the voice recognition result; and transmitting the output sound signal to a near-end electronic device to output the output sound signal to be heard by the user.
Sound playback system and output sound adjusting method thereof
A sound playback system and an output sound adjusting method thereof are disclosed. The method includes the following steps: receiving an input sound signal from a user, wherein the input sound signal includes a voice signal indicating the age of the user; transmitting the input sound signal to a remote voice system; performing a voice recognition process according to the voice signal of the input sound signal to obtain a voice recognition result; adjusting a gain value of each frequency band of an output sound signal according to the voice recognition result; and transmitting the output sound signal to a near-end electronic device to output the output sound signal to be heard by the user.
System and method for voice morphing
A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift.
SOUND SIGNAL GENERATION METHOD, GENERATIVE MODEL TRAINING METHOD, SOUND SIGNAL GENERATION SYSTEM, AND RECORDING MEDIUM
A computer-implemented sound signal generation method includes: obtaining a first sound source spectrum of a sound signal to be generated; obtaining a first spectral envelope of the sound signal; and estimating fragment data representative of samples of the sound signal based on the obtained first sound source spectrum and the obtained first spectral envelope.
Systems, methods and computer program products for generating script elements and call to action components therefor
A call to action processor receives an entity datapoint containing data related to an entity, a campaign objective datapoint containing data associated with a campaign objective, at least one definite script element based on the campaign objective, and entity metadata containing data associated with the entity. The call to action further performs generating at least one variable script element based on the entity metadata, presenting to a device the at least one definite script element the at least one variable script element.
Systems, methods and computer program products for generating script elements and call to action components therefor
A call to action processor receives an entity datapoint containing data related to an entity, a campaign objective datapoint containing data associated with a campaign objective, at least one definite script element based on the campaign objective, and entity metadata containing data associated with the entity. The call to action further performs generating at least one variable script element based on the entity metadata, presenting to a device the at least one definite script element the at least one variable script element.