Patent classifications
G10L21/013
System and method for context aware audio enhancement
Contact centers strive to provide a positive and productive customer-agent interaction to successfully resolve the issue for a call. While audio content, such as music or messages, on hold are commonplace, selecting audio enhancements to be inserted into, and concurrently with, the customer-agent provides the customer and/or agent with cues and motivations to promote the successful completion of the call. Cues may be provided to announce the arrival or departure of an agent, virtually take a customer from one location to another for a different portion of the interaction, add excitement and anticipation to an upcoming event by providing an audio experience foreshadowing of the actual event, calm frayed nerves, or other purpose.
System and method for context aware audio enhancement
Contact centers strive to provide a positive and productive customer-agent interaction to successfully resolve the issue for a call. While audio content, such as music or messages, on hold are commonplace, selecting audio enhancements to be inserted into, and concurrently with, the customer-agent provides the customer and/or agent with cues and motivations to promote the successful completion of the call. Cues may be provided to announce the arrival or departure of an agent, virtually take a customer from one location to another for a different portion of the interaction, add excitement and anticipation to an upcoming event by providing an audio experience foreshadowing of the actual event, calm frayed nerves, or other purpose.
METHOD OF CONVERTING VOICE FEATURE OF VOICE
A method and apparatus for converting a voice of a first speaker into a voice of a second speaker by using a plurality of trained artificial neural networks are provided. The method of converting a voice feature of a voice comprises (i) generating a first audio vector corresponding to a first voice by using a first artificial neural network, (ii) generating a first text feature value corresponding to the first text by using a second artificial neural network, (iii) generating a second audio vector by removing the voice feature value of the first voice from the first audio vector by using the first text feature value and a third artificial neural network, and (iv) generating, by using the second audio vector and a voice feature value of a target voice, a second voice in which a feature of the target voice is reflected.
METHOD OF CONVERTING VOICE FEATURE OF VOICE
A method and apparatus for converting a voice of a first speaker into a voice of a second speaker by using a plurality of trained artificial neural networks are provided. The method of converting a voice feature of a voice comprises (i) generating a first audio vector corresponding to a first voice by using a first artificial neural network, (ii) generating a first text feature value corresponding to the first text by using a second artificial neural network, (iii) generating a second audio vector by removing the voice feature value of the first voice from the first audio vector by using the first text feature value and a third artificial neural network, and (iv) generating, by using the second audio vector and a voice feature value of a target voice, a second voice in which a feature of the target voice is reflected.
Method for modifying a style of an audio object, and corresponding electronic device, computer readable program products and computer readable storage medium
Method for modifying a style of an audio object, and corresponding electronic device, computer readable program products and computer readable storage medium The disclosure relates to a method for processing an input audio signal. According to an embodiment, the method includes obtaining a base audio signal being a copy of the input audio signal and generating an output audio signal from the base signal, the output audio signal having style features obtained by modifying the base signal so that a distance between base style features representative of a style of the base signal and a reference style feature decreases. The disclosure also relates to corresponding electronic device, computer readable program product and computer readable storage medium.
Method for modifying a style of an audio object, and corresponding electronic device, computer readable program products and computer readable storage medium
Method for modifying a style of an audio object, and corresponding electronic device, computer readable program products and computer readable storage medium The disclosure relates to a method for processing an input audio signal. According to an embodiment, the method includes obtaining a base audio signal being a copy of the input audio signal and generating an output audio signal from the base signal, the output audio signal having style features obtained by modifying the base signal so that a distance between base style features representative of a style of the base signal and a reference style feature decreases. The disclosure also relates to corresponding electronic device, computer readable program product and computer readable storage medium.
EMBEDDED PLUG-IN PRESENTATION AND CONTROL OF TIME-BASED MEDIA DOCUMENTS
A software plug-in module that interfaces to a media editing host application generates and embeds information about a media composition being edited directly within portions of the user interface generated by the host application. The information may include a custom representation of media data of a time-based element of the media composition that replaces, augments, or overlays a timeline representation of the element generated by the host application. Media editing functionality provided by the plug-in may be accessed by an operator based on viewing or interacting with the custom representation. Results of analysis of the media composition by the plug-in may be displayed within the host-generated timeline and used by an operator as a basis for performing edit operations with standard host tools or with plug-in generated tools. Plug-ins may embed their interfaces within user interfaces of host digital audio workstations, non-linear video editing systems, and music notation applications.
Methods and apparatus for reducing stuttering
A feedback system may play back, to a user, an altered version of the user's voice in real time, in order to reduce stuttering by the user. The system may operate in different feedback modes at different times. For instance, the system may detect when the severity of a user's stuttering increases, which is indicative of the user habituating to the current feedback mode. The system may then switch to a different feedback mode. In some cases, the feedback modes include at least a Whisper mode, a Reverb mode, and a Harmony mode. In Whisper mode, the user's voice may be transformed to sound as if it were whispering in the user's ears. In Harmony mode, the user's voice may be altered as if the user were harmonizing with himself or herself. In Reverb mode, the user's voice may be altered so that it reverberates.
Methods and apparatus for reducing stuttering
A feedback system may play back, to a user, an altered version of the user's voice in real time, in order to reduce stuttering by the user. The system may operate in different feedback modes at different times. For instance, the system may detect when the severity of a user's stuttering increases, which is indicative of the user habituating to the current feedback mode. The system may then switch to a different feedback mode. In some cases, the feedback modes include at least a Whisper mode, a Reverb mode, and a Harmony mode. In Whisper mode, the user's voice may be transformed to sound as if it were whispering in the user's ears. In Harmony mode, the user's voice may be altered as if the user were harmonizing with himself or herself. In Reverb mode, the user's voice may be altered so that it reverberates.
END-TO-END MODULAR SPEECH SYNTHESIS SYSTEMS AND METHODS
A method for speech synthesis using prosody capture and transfer includes receiving a first speech in a target prosody and receiving a second speech in a target voice; extracting prosodic features from a first speech segment in the target prosody; generating a synthetic speech segment in the target voice with the target prosody based on transferring the prosodic features from the first speech segment per phoneme to a second speech segment.