AI-BASED DJ SYSTEM AND METHOD FOR DECOMPOSING, MISING AND PLAYING OF AUDIO DATA
20230089356 · 2023-03-23
Inventors
Cpc classification
G10H2210/056
PHYSICS
H04S2400/15
ELECTRICITY
G10H2210/081
PHYSICS
G06F3/04886
PHYSICS
H04R2430/03
ELECTRICITY
G06F3/165
PHYSICS
G10H2210/125
PHYSICS
G10H2220/106
PHYSICS
G10H2240/325
PHYSICS
G10H2220/101
PHYSICS
H04R5/04
ELECTRICITY
H04R2420/01
ELECTRICITY
G10H2230/015
PHYSICS
G10H2210/241
PHYSICS
H04S2400/13
ELECTRICITY
H04S2420/07
ELECTRICITY
G06F3/04847
PHYSICS
G10H2250/641
PHYSICS
H04R2227/005
ELECTRICITY
G10H2250/311
PHYSICS
G10H2250/035
PHYSICS
H04R2430/01
ELECTRICITY
International classification
Abstract
The present invention relates to a method for processing and playing audio data comprising the steps of receiving mixed input data and playing recombined output data. Furthermore, the invention relates to a device 10 for processing and playing audio data, preferably DJ equipment, comprising an audio input unit for receiving a mixed input signal, a recombination unit 32 and a playing unit 34 for playing recombined output data. In addition, the present invention relates to a method and a device for representing audio data, i.e. on a display.
Claims
1-37. (canceled)
38. A method for processing and playing audio data, comprising: receiving mixed input data, said mixed input data being a sum signal obtained from mixing at least one first source track with at least one second source track; decomposing the mixed input data to obtain at least a first decomposed track resembling the at least one first source track; generating output data based on the first decomposed track; playing the output data through an audio output; and responsive to receiving input of a user, performing a scratching effect or skipping to different positions in a song during playback of the output data.
39. The method of claim 38, wherein decomposing the mixed input data is carried out segment-wise, wherein decomposing is carried out based on a first segment of the mixed input data to obtain a first segment of output data, and wherein decomposing of a second segment of the mixed input data is performed while playing the first segment of output data.
40. The method of claim 38, wherein the method steps are performed in a continuous process.
41. The method of claim 38, wherein the mixed input data are received via streaming from a remote server.
42. The method of claim 41, wherein streaming through the remote server comprises streaming through an internet.
43. The method of claim 38, wherein playback of output audio data can be started within a time period smaller than two (2) seconds from a receipt of the mixed input data.
44. The method of claim 38, wherein playback of output audio data can be started within a time period smaller than one hundred fifty (150) milliseconds from a receipt of the mixed input data.
45. The method of claim 38, wherein playback of output audio data can be started within a time period smaller than fifty (50) milliseconds.
46. The method of claim 38, wherein decomposing the mixed input data includes processing the mixed input data by an artificial intelligence (AI) system.
47. The method of claim 46, further comprising training the AI system using a plurality of sets of training audio data, wherein each set of training audio data at least includes a first training source track and a mixed track being a sum signal obtained from mixing at least the first training source track or a track that resembles the first training source track, with a second training source track.
48. The method of claim 38, wherein the mixed input data comprises first mixed input data based on a periodic beat structure, the method further comprising: receiving second mixed input data different from the first mixed input data and having a periodic beat signal, performing a tempo matching processing, wherein the tempo matching processing comprises: receiving first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, time stretching or resampling of at least one of the first input data and the second input data, and outputting first output data and second output data which have mutually matching tempos.
49. The method of claim 38, wherein the mixed input data comprises first mixed input data based on a periodic beat structure, the method further comprising: receiving second mixed input data different from the first mixed input data and having a periodic beat signal, performing a key matching processing, wherein the key matching processing comprises: receiving first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, pitch shifting of at least one of the first input data and the second input audio data, and outputting first output data and second output data which have mutually matching keys.
50. A method for processing and playing audio data, comprising: receiving mixed input data, the mixed input data being a sum signal obtained from mixing at least one first source track with at least one second source track; decomposing the mixed input data to obtain at least a first decomposed track resembling the at least one first source track; generating output data based on the first decomposed track; playing the output data through an audio output; reading a control input from the user, the control input representing a desired setting of a first volume level of the first decomposed track and a second volume level of a second track, wherein the second track is an independent track; recombining at least the first decomposed track at the first volume level with the second track at the second volume level to generate recombined output data; and playing the recombined output data.
51. The method of claim 50, further comprising: receiving second mixed input data, said second mixed input data being a sum signal obtained from mixing at least one third source track with at least one fourth source track; decomposing the second mixed input data to obtain a third decomposed track resembling the at least one third source track, and a fourth decomposed track resembling the at least one fourth source track; wherein the control input further represents a desired setting of a third volume level of the third decomposed track and a fourth volume level of the fourth decomposed track; and wherein generating the recombined output data comprises recombining the first decomposed track at the first volume level, the second decomposed track at the second volume level, the third decomposed track at the third volume level and the fourth decomposed track at the fourth volume level.
52. A device for processing and playing audio data, comprising: an audio input unit for receiving mixed input data, the mixed input data being a sum signal obtained from mixing at least a first source track with at least a second source track, a decomposing unit connected to the audio input unit for decomposing the mixed input data to obtain at least a first decomposed track resembling the first source track; a playing unit for playing output data based on the first decomposed track; and input means for receiving an input of a user for performing scratching effects during live performance or skipping to different time positions in a song during playback of the output data.
53. The device of claim 52, wherein the input means comprise: a DJ deck displayed on a display, or a Jog wheel.
54. A device for processing and playing audio data, comprising: an audio input unit for receiving mixed input data, the mixed input data being a sum signal obtained from mixing at least a first source track with at least a second source track; a decomposing unit connected to the audio input unit for decomposing the mixed input data to obtain at least a first decomposed track resembling the first source track; a playing unit for playing output data based on the first decomposed track; and a recombination unit for recombining at least the first decomposed track with a second track to generate the output data for the playing unit, wherein the second track is an independent track.
55. The device of claim 54, wherein the device comprises disk jockey (DJ) equipment.
56. The device of claim 54, further comprising a recompose controlling section adapted to generate, responsive to an input of a user, a control input representing a desired setting of a first volume level of the first decomposed track and a second volume level of the second track, wherein the recombination unit is configured to recombine at least the first decomposed track at the first volume level with the second track at the second volume level to generate the output data.
57. The device of claim 56, wherein the recompose controlling section comprises at least one single recompose control element which is operable by the user in a single control operation for controlling the first volume level and the second volume level.
58. The device of claim 57, wherein the single control operation comprises: (a) increasing the first volume level while at the same time decreasing the second volume level; or (b) increasing the second volume level while at the same time decreasing the first volume level.
59. The device of claim 54, wherein the decomposition unit comprises an artificial intelligence (AI) system for processing the mixed input data.
60. The device of claim 54, wherein the audio input unit comprises a first audio input unit for receiving first mixed input data based on a periodic beat structure, and wherein the device further comprises: a second audio input unit for receiving second mixed input data different from the first mixed input data and based on a periodic beat signal; and a tempo matching unit and a key matching unit, wherein the tempo matching unit is arranged to receive a first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, and wherein the tempo matching unit comprises a time stretching unit adapted to time stretch at least one of the first input data and the second input audio data, and to output first output data and second output data which have mutually matching tempos.
61. The device of claim 54, wherein the audio input unit comprises a first audio input unit for receiving first mixed input data based on a periodic beat structure, and wherein the device further comprises: a second audio input unit for receiving second mixed input data different from the first mixed input data and based on a periodic beat signal; and a key matching unit, wherein the key matching unit is arranged to receive a first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, and wherein the key matching unit comprises a pitch shifting unit adapted to pitch shift at least one of the first input data and the second input audio data, and to output first output data and second output data which have mutually matching keys.
Description
[0084] The present invention will now be further described based on specific examples shown in the drawings.
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093] With reference to
[0094] Device 10 further comprises a processing section 14, preferably including a RAM storage 16, a ROM storage 18, a persistent storage 19 (such as a hard drive or flash drive), a microprocessor 20, and at least one artificial intelligence system 22, for example first to fourth AI systems 22-1, . . . , 22-4 which are connected to the microprocessor 20. The processing section 14 is connected to the input section 12 to receive audio data of audio files A and B.
[0095] Device 10 further comprises a recompose controlling section 24 including at least one recompose control element 26, for example a first control element 26-1, a second recompose control element 26-2 and a mix control element 28. Recompose controlling section 24 may further comprise a first play control element 30-1 and a second play control element 30-2 for starting or stopping playback of audio signals originating from the first or second mixed input data, respectively.
[0096] In addition, device 10 may include a recombination unit 32 connected to the recompose controlling section 24 for recombining audio data based on the settings of the control elements. Recombination may be carried out by multiplying different channels of audio data with scalar values based on the settings of the control elements and then adding the channels together sample by sample. Furthermore, an audio interface 34 (for example a sound card) having a digital-to-analog-converter is preferably connected to the recombination unit 32 to receive recombined output data and to convert the digitally recombined output data into an analog audio signal. The analog audio signal may be provided at an audio output 36 which may feature conventional audio connectors to connect audio cables such as line connectors or XLR connectors or wireless output (e.g. Bluetooth), which allow the audio output 36 to be connected to a PA system or speakers or headphones etc. (not illustrated). The PA system may include an amplifier connected to speakers to output the audio signal. As an alternative, internal speakers of the device such as tablet speakers or computer speakers or headphones might be used to output the analog audio signal.
[0097] Some or all components and features described above with respect to the first embodiment may be provided by an electronic control unit (ECU), such as a computer, in particular a tablet computer 35 running a software application that is programmed to operate the ECU to allow input, decomposition, recombining and output of audio data as described above with respect to
[0098] Further details of the internal components and the signal flow within the device 10 are explained in the following with respect to
[0099] The first segmentation unit 42 of the first decomposition unit 38 receives the first input audio file A and is adapted to partition the audio file into a number of consecutive segments. Preferably, the complete input audio file A is partitioned into segments that correspond to time intervals in the audio signal that is playable from the audio file. Preferably, the starting segment is defined such that the starting point of the starting segment corresponds to the beginning of the audio file (playing position 0:00) on the time scale and the end point of the starting segment corresponds to the end of a first time interval at the beginning of the audio file. A second and each subsequent segments are then defined by consecutive time intervals of a same length, such that the starting points of the time intervals increase from one time interval to the next time interval.
[0100] More particular, consider an audio file as a digital representation of an analogue audio signal that is sampled with a predetermined sampling rate fs given by the number of samples per second. Sampling may be carried out during recording through an analog-to-digital-converter, such as an audio interface, for example. In case of digitally produced audio data (for example from digital synthesizers, drum computers etc.), the samples and in particular the audio data represented by each sample, are computer generated values. Each sample represents the signal value (e.g. a measured average value) within a sampling period T, wherein fs=1/T. For audio files, fs may be 44.1 kHz or 48 kHz, for example. One sample is also referred to as one frame. Now, in the present embodiment, a starting frame of the first segment may be the very first frame of the audio data in the audio file at a time position 0, the starting frame of the second segment may be the frame immediately following the end frame of the first segment, the starting frame of the third segment may be the frame immediately following the end frame of the second segment and so on. The segments may all have the same size with respect to their time scale of the playable audio signal or may have the same number of frames, except for the last segment, which may have an end point defined by the end point or the last frame of the (decoded) audio file or the end point of the playable audio signal on the time scale.
[0101] In fact, in methods and devices of the present invention, processing and in particular decomposition is preferably carried out on the basis of segments exactly defined by and/or corresponding to the frames of the input audio file, which ensures frame accurate positioning within the tracks, in particular within the decomposed tracks during recombining or playback, and direct translation of audio positions in the mixed input signal to audio positions in the decomposed track. A decomposed track obtained in this manner may therefore have exactly the same time scale as the mixed input track and can be further processed, for example by applying effects, resampling, time stretching, and seeking, e.g. for tempo and beat matching, without shift or loss in accuracy on the time scale. Preferably, a decomposed segment contains exactly the same amount of frames as the original input audio data corresponding to the segment.
[0102] Preferably, the size of the segments is chosen such that the length of the corresponding time intervals is smaller than 60 seconds and larger than one second. This ensures sufficient segmentation of the input audio file to achieve remarkable acceleration of the processing necessary to start playing from any given position. More preferably, the segments have a size corresponding to time intervals having a length which is between 5 seconds and 20 seconds. This ensures sufficient audio data for the AI systems 44 to achieve satisfying decomposition results on the one hand and reduce the audio data to be decomposed in one segment to a value small enough to achieve virtually immediate availability of the decomposed audio data to allow application of the device in a live performing situation.
[0103] In the output of the first segmentation unit 42 a segment of the input audio file A is provided to be transmitted to the at least one AI system 44. Preferably, the segment is doubled or copied to be transmitted to the first AI system 44-1 and, at the same time, i.e. in parallel, to the second AI system 44-2. One and the same segment of the input audio file A can therefore be processed at the same time in the first AI system 44-1 as well as in the second AI system 44-2.
[0104] Each of the AI systems used in the embodiments of the present invention may be trained artificial neural networks (trained ANN) as described above in this disclosure. In particular, a trained ANN as described by Prétet et al. could be used which is able to extract a first decomposed track representing a vocal track or a singing voice track from the mixed audio data. In particular, the AI systems 44 may calculate a Fourier transformation of the audio data (i.e. of the audio data contained in a segment of the input audio file) such as to obtain a spectrum of the frequencies contained in the audio data, wherein the spectrum is then introduced into the convolutional neural network which filters parts of the spectrum recognized as belonging to a certain source track or the sum of certain source tracks, for example belonging to the vocal part of the mix. The filtered spectrum is then retransformed into a waveform signal or audio signal which, when played back, contains only the filtered part of the original audio signal, for example the vocal part.
[0105] To be capable of this filtering analysis, an AI system such as an ANN may be used as described by Prétet et al. for example, which was trained by data sets containing large numbers of professionally recorded or produced songs from different genres, for example Hip Hop, Pop, Rock, Country, Electronic Dance Music etc., wherein said data sets do not only include the finished songs but also the respective vocal and instrumental tracks as separate recordings.
[0106] Stored within the first decomposition unit 38 of device 10 of the first embodiment (preferably within a RAM memory thereof, especially the internal
[0107] RAM of the computer 35) may be two separate and fully trained instances of AI systems (different or equal AI systems) of the above-mentioned type such as to be operable simultaneously and independent from one another to generate a first decomposed track and a second decomposed track, respectively. Preferably, first and second decomposed tracks are complements, which means that the sum of the first decomposed track and the second decomposed track, when recombined with normal volume levels (i.e. each at 100 percent), resembles the original mixed input data. For example, the first decomposed track may resemble the complete vocal part of the mixed input data, whereas the second decomposed track may resemble the complete remainder of the mixed input data, in particular the sum of all instrumental tracks, such that recombining both decomposed tracks at appropriate volume levels results in an audio signal that, in terms of its acoustic perception, very closely resembles or cannot even be distinguished from the original mixed input data.
[0108] Preferably, the first and/or second decomposed track are each stereo tracks containing a left-channel signal portion and a right-channel signal portion, respectively. Alternatively they may each or both be mono tracks or multi-channel tracks with more than two channels (such as 5.1 surround tracks, for example).
[0109] The second decomposition unit 40 may be configured in a manner similar or corresponding to that of the first composition unit 38, thus including the second segmentation unit 46 which partitions the second input audio file B into a number of segments of fixed starting points and end points, transmitting the segments consecutively to both a third AI system and a fourth AI system for parallel processing and decomposition to obtain a third decomposed track and a fourth decomposed track (each of which may be mono tracks, stereo tracks, or multi-channel tracks with more than two channels (such as 5.1 surround tracks, for example)).
[0110] The decomposed tracks from the first and second decomposition units 38 and 40 are then transmitted to the recombination unit 32 which is configured to recombine at least two of the decomposed tracks at specified and controllable volume levels and to generate recombined output data. The volume levels of the decomposed tracks may be controlled by a user by virtue of at least one control element. For example, a first control element 26-1 may be provided which allows a user to control a ratio between a first volume level of the first decomposed track and a second volume level of the second decomposed track, whereas, alternatively or in addition, a second control element 26-2 may be provided which allows a user to control a ratio between a third volume level of the third decomposed track and a fourth volume level of the fourth decomposed track.
[0111] In the recombination unit 32 the first and second decomposed tracks are then recombined with one another in a first recombination stage 32-1 based on the volume levels set by the first control element 26-1 to obtain a recombination A′ from the first input audio file A. Further, the third and fourth decomposed tracks may be recombined in a second recombination stage 32-2 of the recombination unit 32 according to the third and fourth volume levels set by the second control element 26-2 such as to obtain a second recombination B′ from the second input audio file B. Furthermore, recombination A′ and recombination B′ may be introduced into a mixing stage 48 which mixes the first recombination A′ and second recombination B′ according to the setting of the mix control element 28 controllable by the user. The mix control element 28 may be adapted to control a ratio between the volume levels of the first and second recombinations A′ and B′.
[0112] The recombined output data generated by the recombination unit 32 is then transmitted to a playing unit which may include audio interface 34 connected to audio output 36.
[0113] As can be seen in
[0114] Processing of the audio data within device 10 of the first embodiment of the invention is further illustrated with respect to
[0115] Output of the buffer may be connected to the recombination unit 32 which generates a recombined track according to the setting of the first control element 26-1.
[0116] If the device 10 includes one or more audio effect chains to apply audio effects to the signals, such as delay effects, reverb effects, equalizer effects, key or tempo changing effects, for example achieved by pitch-shifting, resampling and/or time stretching effects, etc. as conventionally known as such for DJ equipment, such effect chains could be inserted at different positions in the signal flow. For example, the decomposed tracks (segments) output by the buffer may each be routed through audio effect chains 51-1 and 51-2, respectively, such as to apply effects individually to the respective decomposed track as desired. The output of the audio effect chains 51-1, 51-2 may then be connected to the recombination unit 32. In addition or as an alternative, an effect chain 51-3 could be arranged at a position with respect to the signal flow at which the first and second decomposed tracks are recombined in accordance with the first and second volume levels set by the first control element 26-1, in particular at a position after the recombination unit 32 or after the first recombination stage 32-1 of recombination unit 32. The advantage of this arrangement is that the number of channels to be submitted to the audio effect chain 51-3 is reduced within the recombination process to at least one half of the number of channels before the first recombination stage and is in particular equal to the number of channels of the first mixed input data (one channel for a mono signal, two channels for a stereo signal, more than two channels for other formats such as surround signals). Thus, the additional functionality of the decomposition units of the present embodiment will not bring about any increased complexity or performance overload of the audio effect chain 51-3 as compared to the conventional processing of the mixed input data. The same audio effect chains as for conventional DJ equipment may even be used.
[0117] With reference to
[0118] In the second embodiment illustrated in
[0119] As a further feature of the second embodiment, which may be provided independent from (in addition or alternatively to) the DJ decks 50a, 50b, the first control element 26-1, and preferably also the second control element 26-2, may be embodied as sliders, either as hardware sliders mechanically movable by a user, or by virtual sliders presented on a touch screen or on a computer screen movable by a touch gesture or by a pointer, a computer mouse or any other user input. The slider of the first control element 26-1 allows continuous variation of the ratio between the first volume level of the first decomposed track and the second volume level of the second decomposed track in a range from one end position at which the first volume level is set to 100% and the second volume level is set to 0% to another end position at which the first volume level is set to 0% and the second volume level is set to 100%. Between the end positions, when moving the slider in one direction, one of the first and second volume levels is increased, while the other one of the first and second volume levels is decreased at the same proportion.
[0120] As a preferred default setting, at a center position of control element 26-1, both first and second volume levels are set to full/normal volume=100%, i.e. the recombination corresponds to the original first mixed input data. The volume adjustment curve can be user configurable though if needed. By default the volume levels may be calculated as follows:
first volume level=MIN(1.0, sliderValue*2.0),
second volume level=MIN(1.0, (1.0−sliderValue)*2.0),
wherein “MIN (value 1, value2)” represents a minimum value of value 1 and value 2, “sliderValue” represents a setting of control element 26-1 running from 0 (left end value) to 1.0 (right end value). Increasing and decreasing of the volume levels is reversed when moving the slider in the other direction. The user will thus be able to smoothly crossfade between the first decomposed track and the second decomposed track or adjust a desired recombination between both decomposed tracks by a single continuous movement with only one hand or even only one finger. Preferably, the second control element 26-2 is operable in the same manner as the first control element 26-1 to control the third and fourth volume levels of the third and fourth decomposed tracks, respectively.
[0121] Preferably, the mix control element 28 is also realized as a slider and may be positioned between the first and second control elements 26-1, 26-2 for intuitive operation of the device. As in the first embodiment, the mix control element 28 may be a crossfader and/or may be adapted to control a ratio between the volume levels of the first and second recombinations A′ and B′, wherein recombination A′ is obtained from recombining the first decomposed track and the second decomposed track, and recombination B′ is obtained from recombining the third decomposed track and the fourth decomposed track.
[0122] Device 10 may further be configured to display a first waveform section 52-1 in which waveforms representing the first and second decomposed tracks or a recombination thereof are displayed. First and second decomposed tracks may be visualized in an overlaid manner such as to share a common baseline/time axis, but using different signal axes and/or different drawing styles so as to be visually distinguishable from one another. In the example shown in
[0123] Likewise, device 10 may be configured to display a second waveform section 52-2 in which waveforms representing the third and fourth decomposed tracks are displayed in the same manner as described above for the first waveform section 52-1 and the first and second decomposed tracks, in particular by means of a zoom-in version 53-2 and a zoom-out version 55-2.
[0124] First and/or second waveform sections 52-1, 52-2 may be configured to receive user input commands such as touch gestures or mouse/pointer input commands in order to change the current playing position and to jump to a desired position within the audio data, for example by simple clicking or touching the desired position on the baseline in the zoom-out version 55-1/55-2.
[0125] In the example of
[0126] Likewise the first and second decomposed tracks of the zoom-out version 55-1 of the first waveform section 52-1 are display using different drawings styles. In particular, only an upper half of the waveform of the first decomposed track and only a lower half of the waveform of the second decomposed track are displayed. Furthermore the waveform of the first decomposed track may be displayed with a drawing style using a dark color, whereas the waveform of the second decomposed track may be displayed with a drawing style using a lighter color. Of course, all these drawing styles could be interchanged or modified and/or applied to the waveforms of the second waveform section 52-2.
[0127] The overlaid representations of the decomposed tracks in the first and second waveform sections 52-1, 52-2 may be provided by a method according an embodiment of the invention, which will be described in more detail below with respect to
[0128] Furthermore, settings of the control elements 26-1, 26-2, 28 and 30-1, 30-2 may be reflected in the visualization of the decomposed tracks in the first and second waveform sections 52-1, 52-2 through respective signal amplitude changes of the individual waveforms displayed. In particular, the signal axes of the waveforms of the decomposed tracks as displayed in the first and second waveform sections 52-1, 52-2 are scaled depending on the current settings of the volume levels of the respective decomposed tracks as set by the user through the control elements 26-1, 26-2, 28 and 30-1, 30-2. This allows direct and preferably immediate visual feedback of the volume settings to the user.
[0129] Device 10 may have a first cue control element 31-1 and/or a second cue control element 31-2, associated to the first and second mixed input files (songs A and B), respectively, which can be operated by a user to store a current playing position and to retrieve and jump to it at any point in time later as desired.
[0130] In the third embodiment illustrated in
[0131]
[0132] The third control element 26-3 and/or fourth control element 26-4 may be implemented as sliders (hardware slider or software user interface, e.g. virtual touch screen sliders) or as rotatable knobs (likewise as hardware knobs or virtual knobs on a touch screen, computer screen or any other display device).
[0133] In the first to fourth embodiments described above, device 10 was preferably realized as an all-in one device including input section 12, processing section 14, recombination unit 32, playing unit (in particular audio interface 34 (e.g. sound card) and audio output 36), in one single housing or, alternatively, as a complete virtual equipment realized as a software running on an electronic control unit (ECU) with the control elements being visualized on a display of the ECU and the electronic components of the processing section 14 being provided by the integrated electronic components of the ECU. Such ECU may be a standard personal computer, a multi-purpose computing device, a laptop computer, a tablet computer, a smartphone or an integrated, standalone DJ controller.
[0134] As a further alternative, according to a fifth embodiment shown in
[0135] A sixth embodiment of the present invention as shown in
[0136] In a seventh embodiment shown in
[0137] Preferably, upon operation of one of the buttons 26-5 to 26-8 by the user, the respective decomposed track is not switched ON or OFF immediately, but the device is controlled to continuously or stepwise increase or decrease the volume of the respective track within a certain time period of preferably more than 5 milliseconds or even more than 50 milliseconds, such as to avoid acoustic artefacts arising from instant signal transitions.
[0138] In an eighth embodiment shown in
[0139] Now, in the eighth embodiment, the device 10 may further include a vocal swap button 26-9 controllable by the user, in particular through one single operation such as simply pushing the button, to route the decomposed vocal track of song A to the second recombination stage and to route the decomposed vocal track of song B to the first recombination stage. In other words, operation of the vocal swap button 26-9 swaps the two decomposed vocal tracks of songs A and B before they enter the first and second recombination stages, respectively. Repeated operation of the vocal swap button 26-9 may again swap the two decomposed vocal tracks and so on.
[0140] In addition or alternatively, the device 10 may include an instrumental swap button 26-10 controllable by the user, in particular through one single operation such as simply pushing the button, to route the decomposed instrumental track of song A to the second recombination stage and to route the decomposed instrumental track of song B to the first recombination stage. In other words, operation of the instrumental swap button 26-10 swaps the two decomposed instrumental tracks of songs A and B before they enter the first and second recombination stages, respectively. Repeated operation of the instrumental swap button 26-10 may again swap the two decomposed instrumental tracks and so on.
[0141] Preferably, upon operation of one of the buttons 26-9 or 26-10 by the user, the respective swapping of the tracks will not be immediate, but the device is controlled to continuously or stepwise increase or decrease the respective volumes of the tracks within a certain time period of preferably more than 5 milliseconds or even more than 50 milliseconds, such as to avoid acoustic artefacts arising from instant signal transitions.
[0142] Alternatively the vocal swap button 26-9 can be controlled by the user to achieve a similar remix/mashup by obtaining a first recombination A′ by recombining the decomposed vocal track of song A at normal volume (in particular maximum volume) with the muted decomposed instrumental track of song A, and by obtaining a second recombination B′ by recombining the muted decomposed vocal track of song B with the decomposed instrumental track of song B at normal volume (in particular maximum volume), while setting the mix control element 28 to its center position such as to have recombinations A′ and B′ both audible at the same volume levels and at the same time.
[0143]
[0144] As a result, device 10 prepares a modified decomposed track 3′ which matches track A as regards tempo, beat phase and key such that it can be seamlessly recombined with decomposed track 2 of track A. If the swap button is activated, as can be seen in
[0145] Optionally, one or more audio effect chains may be inserted in the signal flow of any of the tracks, for example between the swapping step and the recombination stage such as to be applied to the respective decomposed tracks 1, 2 or 3′, for example.
[0146]
[0147] ECU, in particular a computer screen or on an integrated display of a separate peripheral device connected to a computer or as a standalone device, on a tablet, smartphone or a similar device. The graphical representation may be generated by suitable software which runs on the ECU (i.e. the computer, the standalone device, the tablet, the smartphone etc.) and which may be part of the software that carries out a method according to the present invention as described in the claims or in the embodiments above. The software may operate a graphic interface, such as a graphic card.
[0148] According to the embodiment, audio data are visualized as waveforms. Waveforms in this sense are representations having a linear time axis t which represents the playback time (usually a horizontal axis), and a signal axis (orthogonal to the time axis t, preferably a vertical axis), which represents an average signal strength or a signal amplitude of the audio data at each specific playback time. A playhead 58 may be provided which indicates the current playing position. During playback of the audio data, the playhead 58 is moving with respect to the waveform along the time axis t by visually moving either the waveform or the playhead or both.
[0149]
[0150] Actually displayed is then an overlay waveform 64 which is an overlaid representation of the first and second decomposed tracks 61-1, 61-2 using one single baseline for the waveforms of both decomposed tracks, which means that the time axes t of both waveforms are not running parallel to each other in a distance but are identical to form one common line. In order to allow a differentiation between both waveforms, they are displayed using different drawing styles. For example one of the two waveforms of the decomposed tracks may be displayed in a different color than the other waveform. In the example shown in
[0151] In another example shown in
[0152] Waveforms of decomposed tracks are preferably displayed such as to represent the settings of the control elements of the recompose controlling section and/or the settings of the recombination unit such as to provide a feedback to the user about the signal volumes assigned to the respective decomposed tracks. Preferably, at the same time as a user is manipulating one of the control elements to increase or decrease the volume of at least one decomposed track, the associated waveform of this decomposed track is displayed with an increasing or decreasing size with regard to its signal axis, or visually faded in or out. This graphical feedback is preferably immediate, thus with a delay time which is not disturbing or even not recognizable for the user, in particular a delay time below 500 milliseconds, preferably below 35 milliseconds such that it is not noticeable to the eye at a frame rate of 30 frames per second. Such display greatly assists operation of the device during live performance.
[0153]
[0154] Through the control element 26-13 the user may control playback of a song such as to hear only the decomposed vocal track or only the decomposed instrumental track or a recombination of both tracks. Such configuration might be useful for a karaoke application or a play-along application, for example. Preferably, device 10 is a computer or a mobile device, such as a smartphone or tablet, which runs a suitable software application to realize the above-described functionalities.
[0155]
[0156]