Server side crossfading for progressive download media
11823714 · 2023-11-21
Assignee
Inventors
Cpc classification
G10L19/00
PHYSICS
H04H20/38
ELECTRICITY
G10H1/06
PHYSICS
G11B27/031
PHYSICS
G10H2240/075
PHYSICS
G06F3/165
PHYSICS
G10H2210/125
PHYSICS
G10H2240/305
PHYSICS
G06F3/04847
PHYSICS
G10H1/0025
PHYSICS
G10H2250/035
PHYSICS
G10H2240/125
PHYSICS
G10H2250/571
PHYSICS
G06F3/0481
PHYSICS
G06F9/5066
PHYSICS
International classification
H04B1/00
ELECTRICITY
G06F3/0481
PHYSICS
G06F3/04847
PHYSICS
G06F9/50
PHYSICS
G10H1/06
PHYSICS
G10L19/00
PHYSICS
G11B27/031
PHYSICS
G11B27/10
PHYSICS
G11B27/32
PHYSICS
H04H20/38
ELECTRICITY
Abstract
In exemplary embodiments of the present invention systems and methods are provided to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements in a personalized media delivery service so that each client or user has a consistent high quality experience. The effects or crossfade processing can occur on the broadcast, publisher or server-side, but can still be personalized to a specific user, thus still allowing a personalized experience for each individual user, in a manner where the processing burden is minimized on the downstream side or client device. This approach enables a consistent user experience, independent of client device capabilities, both static and dynamic. The cross-fade can be implemented after decoding the relevant chunks of each component clip, processing, recoding and rechunking, or, in a preferred embodiment, the cross-fade or other effect can be implemented on the relevant chunks to the effect in the compressed domain, thus obviating any loss of quality by re-encoding. A large scale personalized content delivery service can be implemented by limiting the processing to essentially the first and last chunks of any file, since there is no need to processing the full clip. In exemplary embodiments of the present invention this type of processing can easily be accommodated in cloud computing technology, where the first and last files may be conveniently extracted and processed within the cloud to meet the required load. Processing may also be done locally, for example, by the broadcaster, with sufficient processing power to manage peak load.
Claims
1. A system comprising: at least one processor; and memory containing instruction that, when executed, cause the at least one processor to: identify a plurality of media clips to use in an effect; determine, for each of the plurality of clips, multiple chunks to process for the effect; receive the multiple chunks at a compressed crossfade module; and combine compressed data from each of the multiple chunks to generate a revised chunk containing the compressed data; wherein the revised chunk is equal in quality to the plurality of media clips.
2. The system of claim 1, wherein the processing further includes creating a new index file for the revised chunk.
3. The system of claim 1, wherein the determining is performed via an algorithm that assesses the number of chunks to process for the effect.
4. The system of claim 3, wherein the algorithm inputs at least one of the following parameters: (I) intra and outro information of each of the plurality of clips, (ii) type of the effect, and (iii) audio trajectory of the effect.
5. A system for implementing server side crossfade or other transitional effect using chunked media, the system comprising: at least one processor; and memory containing instruction that, when executed, cause the at least one processor to: identify a plurality of media clips to use in the effect; determine, for each of the plurality of clips, beginning and end chunks to process for the effect and core chunks to be excluded from the effect; process the beginning and end chunks of a first one of the plurality of clips in compressed format by time aligning and combining the beginning and end chunks of the first clip with one or more chunks of at least one other clip of the plurality of clips; and transmit core chunks and the processed beginning and end chunks of the first clip to a plurality of client devices, each corresponding to a respective user, wherein the processing is personalized such that the beginning and end chunks of the first clip are processed differently for each respective user.
6. The system of claim 5, wherein the processing further includes creating a new index file for the first clip.
7. The system of claim 5, wherein the determining is performed via an algorithm that assesses the number of chunks to process for the effect.
8. The system of claim 7, wherein the algorithm inputs at least one of the following parameters: (i) intra and outro information of each of the plurality of clip, (ii) type of the effect, and (iii) audio trajectory of the effect.
9. The system of claim 5, wherein the processing is personalized such that the custom chunks of the first clip are processed differently for each respective user, and the core chunks of the first clip are provided to each respective user without additional modifications.
10. The system of claim 5, wherein the processing further includes receiving data from the client device used to generate inferences from behavior of the respective user on the client device, wherein the processing is personalized based on the received data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
SUMMARY OF THE INVENTION
(6) Systems, methods and computer readable media are provided to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements in a personalized media delivery service so that each client or user has a consistent high quality experience. The various effects or crossfade processing can occur on the broadcast, music service, publisher or other server-side, but can still be personalized to a specific user, thus still allowing for a personalized experience for each individual user, in a manner where the processing burden is minimized on the downstream side, or client device. This approach enables a consistent user experience, independent of client device capabilities, both static and dynamic. The cross-fade, for example, can be implemented after decoding the relevant chunks of each component clip, and then processing, recoding and rechunking. Alternatively, in a preferred embodiment, the cross-fade or other effect can be implemented on the relevant chunks to that effect in the compressed domain, thus obviating any loss of quality by re-encoding. A large scale personalized content delivery service can be implemented by limiting the processing to essentially the first and last chunks of any file, since there is no need to process the full clip. This type of processing can easily be accommodated in cloud computing technology, where the first and last files may be conveniently extracted and processed within the cloud to meet the required load. Processing may also be done locally, for example, by the broadcaster, with sufficient processing power to manage peak load.
DETAILED DESCRIPTION OF THE INVENTION
(7) In exemplary embodiments of the present invention, systems and methods are provided to implement and facilitate cross-fading, interstitials and other effects/processing of two or more media elements in a personalized media delivery service so that each client or user has a consistent high quality experience.
(8) Thus, in exemplary embodiments of the present invention, the desired effects, for example, crossfade processing, can occur on the broadcast, publisher, music service, or other server-side, but may still be personalized to a specific user, thus allowing a personalized experience for each individual user in a manner where the processing burden is minimized on the downstream side, or client device. This approach enables a consistent user experience, independent of client device capabilities, both static and dynamic.
(9) One contemporary method of media delivery is called “Progressive Download”. According to this method, compressed audio or video bitstreams are encoded at the broadcast side and divided into chunks, for example, of mostly equal lengths. This chunking produces many small files which can easily be distributed over a low-cost infrastructure such as, for example, a web server. In addition, streams may be encoded using multiple bit rates, and a client device can switch between the streams, at file boundaries, to optimize playback quality at any given delivery rate. Examples of this kind of technique include, for example, Apple's HLS or Microsoft's Smooth Streaming. Players for this type of decoding and playback are able to seamlessly splice the discrete file chunks, and provide a smooth, uninterrupted playback of audio and video. In exemplary embodiments of the present invention, the chunking feature of Progressive Download can be leveraged to implement cross fading, blends and other interstitial processing/effects on the server side, as next described.
(10) Exemplary Server Side Cross Fade: Sub-Optimal Method
(11)
(12) In the following illustrative description, with reference to
(13) Returning to
(14) It is noted that in the example of
(15) Additionally, unless the last chunk of Clip1—here Clip1:Chunk 4—or the first chunk of Clip2—here Clip2:Chunk 1, is equal to the crossfade length, the resultant chunk made from the last chunk of the prior clip, the first chunk of the subsequent clip and the overlap between them, will often be longer than the chunk length limit. In such cases there may, for example, be two “revised” chunks taking the place of, for example, original Clip1:Chunk 4 and Clip2:Chunk 1. For example, a last chunk of 3 second duration with a 2 second crossfade could either (a) produce a new 1 second chunk followed by a 20 second chunk which includes the crossfade, or (b) produce a 21 second chunk including the crossfade. In exemplary embodiments of the present invention, either option is valid.
(16) The particular embodiment shown in
(17) Exemplary Server Side Cross Fade: Optimal Method
(18)
(19) Core Chunks and Processed Chunks—Optimizing Personalized Service Delivery
(20)
(21) Thus,
(22) As noted, if the desired effect(s) or crossfade duration is less than that of the last chunk in the file, then only the last chunk needs to be retrieved and processed. As also noted, if the required crossfade duration is longer than the last chunk N, then chunk N and second to the last chunk N−1 must be retrieved for processing. Similar considerations apply to the first chunk in the subsequent file. For example, as noted, if the crossfade duration is longer than 20 seconds as considered in the examples provided above, then both Chunk 1 and Chunk 2 of Clip2 must be retrieved for processing.
(23) As can thus readily be seen, by limiting the processing to essentially the first and last chunks of a file, techniques according to exemplary embodiments of the present invention can be successfully scaled to many hundreds of thousands, or even millions, of simultaneous users, since it does not require processing the full clip—as to any of the many clips in a given library. In exemplary embodiments of the present invention, this type of processing can easily be accommodated in cloud computing technology, where the first and last files may be conveniently extracted and processed within the cloud to meet the required load. Alternatively, processing may also be done locally, for example, by the broadcaster, or music service provider, with sufficient processing power to manage peak load.
(24) Encryption of Content Clips
(25) For security reasons, in exemplary embodiments of the present invention compressed audio chunks may be encrypted. If this is the case, decryption will be required for the first and last chunk (or, for example, additional chunks if the effect's duration exceeds a chunk length, as noted above) prior to processing. An alternate approach would be to allow the first and last chunks (or additional chunks, as noted) to remain unencrypted, which would alleviate the requirement for decryption, but not sacrifice security in any material way.
(26) Server Side Crossfade System
(27) Exemplary generation of server-side crossfade segments is illustrated in
(28) Exemplary Cross Fade (Client) To illustrate a simple crossfade, involving two sequential songs using server side generated crossfade clips, the following example is provided.
(29) An exemplary crossfade on an Andriod device was accomplished using two Media Players and setVolume( ). Crossfade using the transition clips works by playing the from-clip followed by the transition clip (constructed using techniques as described in PCT/2013/158804 for compressed domain crossfades, or by adding together the PCM (Pulse Code Modulation) values of the from-clip and to-clip over the transition interval for non-compressed domain crossfades).
(30)
(31) As an example, a Playlist Type can have four distinct functions that can be used for differing channels (or playlists), including, for example, a logarithmic function, a linear function and two sigmoid functions. The Playlist Type can also have parameters, which can be constants that adjust the trajectory of the function over the interval, as described below. Table I below provides exemplary values for such functions and parameters for four such Playlist Types, namely Heavy Metal, Easy Listening, Country and Rock.
(32) TABLE-US-00001 TABLE I Example Playlist Types Outro Playlist Time Outro Intro Intro Type (sec) OutroType ParamA Time Intro Type ParamA Heavy 1 Linear Linear Metal Easy 4 Arctan 1 4 Arctan 1 Listening Country 2 Logarithmic 2 Logarithmic Rock 2 Tanh 2 Tanh
(33) Similarly, Table II below provides exemplary TransitionTypes that can be applied in exemplary embodiments of the present invention.
(34) TABLE-US-00002 TABLE II Example Transition Types Transition Type Time (sec) Algorithm ParamA Param B LinearFadeInSlow 2 Linear LinearFadeInFast 1 Linear SmoothFadeInSlow 3 ArcTan 1 SmoothFadeInFast 2 ArcTan 2 QuickFadeOutSlow 2 ArcTan 1 QuickFadeOutFast 2 ArcTan 2 ZeroFadeIn 0 Linear ZeroFadeOut 0 Linear
(35) Where the mathematical functions follow (i) some form of logarithm (as is popular in the broadcast community), (ii) a sigmoid function or (iii) some other monotonically increasing function, the parameters “ParamA” and “ParamB”, as described in the two tables provided above can be, for example, constants which can adjust the slope of the function. For example, when using the tanh function, a parameter ParamA can be used such that tanh(Ax) is the actual value. It is noted that FIG. 3A of U.S. patent application Ser. No. 14/358,919, entitled SYSTEMS AND METHODS FOR IMPLEMENTING CROSS-FADING, INTERSTITIALS AND OTHER EFFECTS DOWNSTREAM, referenced above, provides a comparison of some exemplary Sigmoid functions. In that
(36) The interaction between the Playlist Type (which defines an overall experience for a particular type of Playlist, such as a Channel) and the TransitionType (which defines an experience between two successive clips independent of the Channel) is one of priority. Thus, in exemplary embodiments of the present invention, if there is no TransitionType defined between two adjacent clips then a standard Playlist Type transition for that Channel can be used. If, on the other hand, a TransitionType is defined for those clips, then the defined Transition Type can be used instead of a default Playlist Type transition.
(37) Continuing with reference to
(38) Information stored in Device Profile repository 560 can then, for example, be used by Content Scheduler 540 to schedule content for distribution and client device management. An example Device Profile table, Table III, with two sample entries, is provided below for illustrative purposes. In exemplary embodiments of the present invention the information provided in such a table allows an exemplary Content Scheduler 540 to optimize what content can be downloaded and played on a given device, and at what times.
(39) TABLE-US-00003 TABLE III Exemplary Device Profiles Hardware HW Frame Decode SW Frame Decode DeviceType Decoders Time (30 ms packet) Time (30 ms packet) SmartPhone AAC+, 10 msec 25 msec MPEG 1, Layer 2 LowCosPhone N/A 25 msec
(40) In exemplary embodiments of the present invention, a system can further contain a Content Repository 550 which can be used, for example, to store actual audio clips in compressed form. In the exemplary system of
(41) Coordination of the delivery of content to a client device can, for example, be accomplished by Play Control instructions issuing from Content Scheduler 540, and/or Client Device Control instructions issuing from Decoder Management 590 to particular client devices. Further, for example, Content Scheduler 540 may provide message communication regarding the availability of the playlists authored by Playlist Editor 510, and can also, for example, be responsible for allowing a client to obtain profile information regarding both devices and user preferences. Decoder Management 590 (also known as “Client Device Management” 590) can, for example, provide message communication regarding low level interactions between the service provider and the client with respect to delivery of clip elements to a particular user that are to be merged together. In general, such messages will reflect a “taxonomy” of variables, parameters, and data fields defined by the content distribution system as needed to adequately manage a client device downstream component to perform the various transition effects for any cross-fade.
(42) Exemplary Implementations
(43) Any suitable programming language can be used to implement the routines of particular exemplary embodiments including, but not limited to, the following: C, C++, Java, JavaScript, Python, Ruby, CoffeeScript, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps shown as sequential in this specification can be performed at the same time.
(44) Particular embodiments may be implemented in a computer-readable storage device or non-transitory computer readable medium for use by or in connection with the instruction execution system, apparatus, system, or device. Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both. The control logic, when executed by one or more processors, may be operable to perform that which is described in particular embodiments.
(45) Particular embodiments may be implemented by using a programmed general purpose digital computer, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms may be used. In general, the functions of particular embodiments can be achieved by any means as is known in the art. Distributed, networked systems, components, and/or circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
(46) It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that can be stored in a machine-readable medium, such as a storage device, to permit a computer to perform any of the methods described above.
(47) As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
(48) While there have been described methods for implementing efficient cross-fading between compressed audio streams, it is to be understood that many changes may be made therein without departing from the spirit and scope of the invention. Insubstantial changes from the claimed subject matter as viewed by a person with ordinary skill in the art, no known or later devised, are expressly contemplated as being equivalently within the scope of the claims. Therefore, obvious substitutions now or later known to one with ordinary skill in the art are defined to be within the scope of the defined elements. The described embodiments of the invention are presented for the purpose of illustration and not of limitation
(49) The above-presented description and accompanying figures are intended by way of example only and are not intended to limit the present invention in any way except as set forth in the following claims. It is particularly noted that persons skilled in the art can readily combine the various technical aspects of the various exemplary embodiments described into a wide variety of techniques, systems and methods, all being encompassed within the present invention. For the sake of clarity, it is noted that the term “crossfade” includes any transition, blend or interstitial effect implemented on or near a boundary between two successive content clips or files provided in a content delivery service or method.