Patent classifications
G10H2250/571
SERVER SIDE CROSSFADING FOR PROGRESSIVE DOWNLOAD MEDIA
In exemplary embodiments of the present invention systems and methods are provided to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements in a personalized media delivery service so that each client or user has a consistent high quality experience. The effects or crossfade processing can occur on the broadcast, publisher or server-side, but can still be personalized to a specific user, thus still allowing a personalized experience for each individual user, in a manner where the processing burden is minimized on the downstream side or client device. This approach enables a consistent user experience, independent of client device capabilities, both static and dynamic. The cross-fade can be implemented after decoding the relevant chunks of each component clip, processing, recoding and rechunking, or, in a preferred embodiment, the cross-fade or other effect can be implemented on the relevant chunks to the effect in the compressed domain, thus obviating any loss of quality by re-encoding. A large scale personalized content delivery service can be implemented by limiting the processing to essentially the first and last chunks of any file, since there is no need to processing the full clip. In exemplary embodiments of the present invention this type of processing can easily be accommodated in cloud computing technology, where the first and last files may be conveniently extracted and processed within the cloud to meet the required load. Processing may also be done locally, for example, by the broadcaster, with sufficient processing power to manage peak load.
Server side crossfading for progressive download media
Systems and methods are provided to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements in a personalized media delivery service to experience consistent high quality. The effects or crossfade processing may occur on the broadcast/publisher/server-side, but may be personalized to a specific user, allowing a personalized experience for each user, where the processing burden is minimized on the downstream side/client device. This approach enables a consistent user experience, independent of client device capabilities. A large-scale personalized content delivery service may be implemented by limiting the processing to the first and last chunks of any file. In exemplary embodiments, this type of processing may easily be accommodated in cloud computing technology, where first and last files are extracted and processed within the cloud to meet the required load. Processing may be done locally, by the broadcaster, with sufficient processing power to manage peak load.
SYSTEMS AND METHODS FOR IMPLEMENTING EFFICIENT CROSS-FADING BETWEEN COMPRESSED AUDIO STREAMS
Systems and methods are presented for efficient cross-fading of compressed domain information streams on a user/client device. Exemplary systems may provide cross-fade between AAC/Enhanced AAC Plus information streams, between MP3 information streams, or between information streams of unmatched formats. These systems are distinguished in that cross-fade is directly applied to compressed bitstreams so a single decode operation is performed on the resulting bitstream. Thus, a set of frames from each input stream associated with the time interval in which a cross fade is decoded, and combined and recoded with a cross fade or other effect now in the compressed bitstream. Once sent through the client device's decoder, the user hears the transitional effect. The only input data that is decoded and processed is that associated with the portion of each stream used the crossfade, blend or other interstitial, and thus the vast majority of input streams are left compressed.
MULTI-STRUCTURAL, MULTI-LEVEL INFORMATION FORMALIZATION AND STRUCTURING METHOD, AND ASSOCIATED APPARATUS
Systems and methods for structuring information include determining information quantity (IQ) and information value (IV) in an original digital information file (ODIF). An initial manipulation process applied to the ODIF forms a first resulting DIF (FRDIF), and a subsequent manipulation process applied to the FRDIF forms a second resulting DIF, wherein each manipulation process removes at least one element of the processed DIF and/or represents an element combination with a representative element and a first indicia of an interrelationship between the representative element and one or more elements in the combination, to reduce the IQ of the processed DIF, while retaining the IV thereof within a threshold. Manipulation processes are successively applied to the previously resulting DIF until successive applications do not achieve a threshold reduction in IQ. The last resulting DIF has a primary structure with a reduced IQ and an IV within the threshold of the original IV.
AUDIO GENERATION USING GENERATIVE ARTIFICIAL INTELLIGENCE MODEL
A method. The method including receiving a prompt describing desired characteristics of audio. The method further including generating, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz. The method further including generating, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate greater than the latent rate. The audio file including the audio based on the latent space representation of the audio. The audio having a length greater than 90 seconds.
Audio generation using generative artificial intelligence model
A method. The method including receiving a prompt describing desired characteristics of audio. The method further including generating, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz. The method further including generating, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate greater than the latent rate. The audio file including the audio based on the latent space representation of the audio. The audio having a length greater than 90 seconds.