Patent classifications
G10H2240/325
Enhanced virtual singers generation by incorporating singing dynamics to personalized text-to-speech-to-singing
A technique to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation is disclosed. The present invention efficiently preserves the speaker identity and improves sound quality by incorporating speaker-independent natural singing information into TTS-based Speech-to-Singing (STS). The Template-based Text-to-Singing (TTTS) system merges qualities of a singing voice generated from a TTS system with qualities of a singing voice generated from an actual voice singing the song. The qualities are represented in terms of Mel-generalized cepstrum (MGC) coefficients. In particular, low-order MGC coefficients from the TTS-based singing voice with high-order MGC coefficients from the voice of an actual singer.
SYSTEMS AND METHODS FOR GENERATING AUDIBLE VERSIONS OF TEXT SENTENCES FROM AUDIO SNIPPETS
A method is performed at a server system of a media-providing service. The server system has one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a text sentence including a plurality of words from a device of a first user and extracting a plurality of audio snippets from one or more audio tracks. A respective audio snippet in the plurality of audio snippets corresponds to one or more words in the plurality of words of the text sentence. The method also includes assembling the plurality of audio snippets in a first order to produce an audible version of the text sentence. The method further includes providing, for playback at the device of the first user, the audible version of the text sentence including the plurality of audio snippets in the first order.
Method and device for processing, playing and/or visualizing audio data, preferably based on AI, in particular decomposing and recombining of audio data in real-time
The present invention relates to a method for processing and playing audio data comprising the steps of receiving mixed input data and playing recombined output data. Furthermore, the invention relates to a device for processing and playing audio data, preferably DJ equipment, comprising an audio input unit for receiving a mixed input signal, a recombination unit and a playing unit for playing recombined output data. In addition, the present invention relates to a method and a device for representing audio data, i.e. on a display.
Methods and systems for vocalist part mapping
Systems and methods for mapping parts in a digital sheet music file for a harmony. The method may include receiving a selection of a music segment for part mapping, receiving a digital sheet music representation of the selected music segment, and determining a plurality of plausible part mapping for the digital sheet music representation. A part mapping identifies one or more distinct musical parts in the digital sheet music representation, each of said one or more distinct musical parts corresponding to a performer of the harmony. The method may also include analyzing one or more features of the plurality of plausible part mapping to identify a highest probability part mapping based on previously stored information, and outputting the highest probability part mapping.
Network-based processing and distribution of multimedia content of a live musical performance
Methods, systems, and computer program products for network-based processing and distribution of multimedia content of a live performance are disclosed. In some implementations, recording devices can be configured to record a multimedia event (e.g., a musical performance). The recording devices can provide the recordings to a server while the event is ongoing. The server automatically synchronizes, mixes and masters the recordings. The server performs the automatic mixing and mastering using reference audio data previously captured during a rehearsal. The server streams the mastered recording to multiple end users through the Internet or other public or private network. The streaming can be live streaming.
System and method for generating an audio file
A system and method for synchronizing an audio or MIDI file with a video file are provided. The method includes receiving a first audio or MIDI file, receiving a video file, and operating an audio synchronization module to perform steps of synchronizing the first audio or MIDI file with the video file, marking an event in the video file at a point on a timeline, detecting a first musical key for the event, retrieving a musical stinger or swell from a library, in which the musical stinger or swell is a second audio or MIDI file and is tagged with a second musical key, and the second musical key is relevant to the first musical key, and placing the musical stinger or swell at the point of the timeline marked for the event.
SERVER SIDE CROSSFADING FOR PROGRESSIVE DOWNLOAD MEDIA
In exemplary embodiments of the present invention systems and methods are provided to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements in a personalized media delivery service so that each client or user has a consistent high quality experience. The effects or crossfade processing can occur on the broadcast, publisher or server-side, but can still be personalized to a specific user, thus still allowing a personalized experience for each individual user, in a manner where the processing burden is minimized on the downstream side or client device. This approach enables a consistent user experience, independent of client device capabilities, both static and dynamic. The cross-fade can be implemented after decoding the relevant chunks of each component clip, processing, recoding and rechunking, or, in a preferred embodiment, the cross-fade or other effect can be implemented on the relevant chunks to the effect in the compressed domain, thus obviating any loss of quality by re-encoding. A large scale personalized content delivery service can be implemented by limiting the processing to essentially the first and last chunks of any file, since there is no need to processing the full clip. In exemplary embodiments of the present invention this type of processing can easily be accommodated in cloud computing technology, where the first and last files may be conveniently extracted and processed within the cloud to meet the required load. Processing may also be done locally, for example, by the broadcaster, with sufficient processing power to manage peak load.
Arbitrary signal insertion method and arbitrary signal insertion system
An arbitrary signal insertion method and an arbitrary signal insertion system, capable of inserting a transmittable arbitrary signal (insertion information M) at a predetermined insertion timing into an acoustic sound played in real time. An insertion timing is previously associated with a predetermined time code with master rhythm information. An acoustic sound into which insertion information will be inserted is music sound generated by a real-time performance unit and is accompanied by a second rhythm. The insertion information is inserted into the music sound generated by the real-time performance unit at the insertion timing after the rhythm of master rhythm information and the rhythm of the music sound generated by the real-time performance unit are synchronized. The synchronization is achieved by a rhythm transmitter which notifies a player of a rhythm session musical instrument of the rhythm of the master rhythm information with sound or light.
Systems and methods for coordinating movement of a group of people
Described herein are methods for coordinating movement of a group of people, comprising: receiving numerical coordinates for each count of a drill for each person in a group of people, wherein the numerical coordinates indicate a location on a field for each person; receiving a starting count for each of the one or more sets and a number of counts in each of the one or more sets for each person; combining the numerical coordinates with the start count and the number of counts for each of the one or more sets; deriving human readable coordinates for each count of the drill for each person, wherein the human readable coordinates indicate the location on the field for each person; and transmitting the human readable coordinates to one or more receiving devices each associated with one person in the group of people.
Fade device
A fade device includes a processor that couples a crossfader to a mixer, the processor using crossfader input to apply an acceleration or a latency to an audio signal passing through a mixer.