METHOD AND DEVICE FOR DECOMPOSING, RECOMBINING AND PLAYING AUDIO DATA

Abstract

A system processes mixed input data using a neural network trained to separate audio data of predetermined timbres from mixed audio data and to obtain a group of decomposed tracks comprising at least first, second, and third decomposed audio tracks representing audio signals of first, second, and third predetermined timbres, respectively. The system reads a control input representing a setting of a first volume level and of a second volume level. The system recombines at least a first selected track and a second selected track selected from the group of decomposed tracks to generate a first recombined track. The system recombines the first recombined track at the first volume level with at least a third track selected from the group of decomposed tracks, at the second volume level, to obtain a second recombined track. The system plays the audio data based on the second recombined track.

Claims

1. A method for processing and playing audio data, comprising: providing mixed input data, the mixed input data being obtained from mixing a plurality of source tracks; processing the mixed input data by an artificial intelligence (AI) system comprising a neural network trained to separate audio data of predetermined timbres from mixed audio data, wherein the mixed input data are processed by the AI system to obtain a group of decomposed tracks comprising at least a first decomposed track representing audio signals of a first predetermined timbre, a second decomposed track representing audio signals of a second predetermined timbre different from the first predetermined timbre, and a third decomposed track representing audio signals of a third predetermined timbre different from the first predetermined timbre and the second predetermined timbre; reading a control input, the control input representing a setting of a first volume level and of a second volume level; recombining at least a first selected track and a second selected track selected from the group of decomposed tracks to generate a first recombined track; recombining the first recombined track at the first volume level with at least a third track selected from the group of decomposed tracks, at the second volume level, to obtain a second recombined track; and playing the audio data based on the second recombined track.

2. The method of claim 1, wherein one or more of the first predetermined timbre, second predetermined timbre, or the third predetermined timbre is selected from a group comprising: a drum timbre, a vocal timbre, and a tonal timbre defining a harmony, a key, or a melody of the mixed input data.

3. The method of claim 1, wherein one or more of the first predetermined timbre, the second predetermined timbre, or the third predetermined timbre is a complement timbre, wherein a mixture of all decomposed tracks resembles the mixed input data.

4. A device for processing and playing audio data, comprising: an audio input unit for providing mixed input data, wherein the mixed input data are obtained from mixing a plurality of source tracks; an artificial intelligence (AI) system comprising a neural network trained to separate audio data of predetermined timbres from mixed audio data, wherein the AI system is configured to receive and process the mixed input data and to generate a group of decomposed tracks comprising at least a first decomposed track representing audio signals of a first predetermined timbre, a second decomposed track representing audio signals of a second predetermined timbre different from the first predetermined timbre, and a third decomposed track representing audio signals of a third predetermined timbre different from the first predetermined timbre and the second predetermined timbre; a controlling section configured to generate a control input representing a setting of a first volume level and a second volume level; a recombination unit configured to: recombine at least the first selected track and the second selected track selected from the group of decomposed tracks to generate a first recombined track; and recombine the first recombined track at the first volume level with at least a third track selected from the group of decomposed tracks, at the second volume level, to obtain a second recombined track; and a playing unit configured to play audio data based on the second recombined track.

5. The device of claim 4, wherein one or more of the first predetermined timbre, the second predetermined timbre, or the third predetermined timbre is selected from a group comprising: a drum timbre, a vocal timbre, and a tonal timbre defining a harmony, a key, or a melody of the mixed input data.

6. The device of claim 4, wherein one or more of the first predetermined timbre, the second predetermined timbre, or the third predetermined timbre is a complement timbre, wherein a mixture of all decomposed tracks resembles the mixed input data.

7. The device of claim 4, wherein the controlling section comprises at least one single control element which is operable in a single control operation for controlling the first volume level and the second volume level.

8. The device of claim 4, comprising a mode control unit configured to change an operational mode of the device at least between a first operational mode and a second operational mode, wherein: in the first operational mode, the recombination unit is configured to recombine a first set of selected tracks selected from the group of decomposed tracks to generate the first recombined track; and in the second operational mode, the recombination unit is configured to recombine a second set of selected tracks selected from the group of decomposed tracks to generate the first recombined track, the second set of selected tracks being different from the first set of selected tracks.

9. The device of claim 8, wherein the mode control unit comprises a mode control element operable to selectively set the device to the first operational mode or the second operational mode.

10. The device of claim 4, wherein the audio input unit comprises a first input section configured to receive first mixed input data and a second input section configured to receive second mixed input data different from the first mixed input data, and wherein the recombination unit is configured to recombine audio data originating from the first mixed input data with audio data originating from the second mixed input data.

11. The device of claim 10, further comprising a tempo matching unit arranged to receive first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, wherein the tempo matching unit comprises a time stretching unit configured to: time stretch or resample at least one of the first input data and the second input data, and output first output data and second output data, the first output data and the second output data having mutually matching tempos.

12. The device of claim 10, further comprising a key matching unit arranged to receive first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, wherein the key matching unit comprises a pitch shifting unit configured to: pitch shift at least one of the first input data and the second input data; and output first output data and second output data, wherein the first output data and the second output data have mutually matching keys.

13. A method for processing and playing audio data, comprising: receiving mixed input data, the mixed input data being a sum signal obtained from mixing at least one first source track with at least one second source track; decomposing the mixed input data to obtain at least a first decomposed track resembling the at least one first source track; generating output data based on the first decomposed track; playing the output data through an audio output; reading a control input, the control input representing a setting of a first volume level of the first decomposed track and a second volume level of a second track, wherein the second track is an independent track obtained from second mixed input data; recombining at least the first decomposed track at the first volume level with the second track at the second volume level to generate recombined output data; and playing the recombined output data.

14. The method of claim 13, wherein decomposing the mixed input data is carried out segment-wise, wherein decomposing is carried out based on a first segment of the mixed input data to obtain a first segment of output data, and wherein decomposing of a second segment of the mixed input data is carried out while playing the first segment of output data.

15. The method of claim 13, wherein the steps of receiving the mixed input data, decomposing the mixed input data, generating the output data, and playing the output data are carried out in a continuous process.

16. The method of claim 13, wherein the mixed input data are received via streaming from a remote server.

17. The method of claim 13, wherein in the step of receiving the mixed input data, an input audio file having a predetermined file size and a predetermined playback duration is received, which contains audio data to play the mixed input data, and a first segment is extracted from the input audio file, which contains audio data to play the mixed input data within a first time interval smaller than the predetermined playback duration, wherein in the step of decomposing the mixed input data, the first segment of the input audio file is decomposed to obtain a first segment of the first decomposed track and optionally a first segment of the second decomposed track, wherein in the step of generating the output data, a first segment of the output data is generated from the first segment of the first decomposed track by recombining at least the first segment of the first decomposed track at the first volume level with the first segment of the second decomposed track at the second volume level, and further comprising: extracting a second segment from the input audio file, wherein the second segment is different from the first segment and which contains audio data to play the mixed input data within a second time interval smaller than the predetermined playback duration of the input audio file and shifted in time with respect to the first time interval; and decomposing the second segment of the input audio file to obtain a second segment of the first decomposed track.

18. The method of claim 14, wherein the size of the first segment or the length of the first time interval is set such that the time required for decomposing the first segment is smaller than two (2) seconds.

19. The method of claim 13, wherein the mixed input data are first mixed input data being a sum signal obtained from mixing at least a first source track with a second source track, and further comprising: receiving second mixed input data, the second mixed input data being a sum signal obtained from mixing at least one third source track with at least one fourth source track; and decomposing the second mixed input data to obtain a third decomposed track resembling the at least one third source track and a fourth decomposed track resembling the at least one fourth source track, wherein in the step of reading the control input the control input represents a setting of the first volume level of the first decomposed track, the second volume level of the second decomposed track, a third volume level of the third decomposed track, and a fourth volume level of the fourth decomposed track, and wherein, in the step of recombining, the recombined output data is generated by recombining the first decomposed track at the first volume level, the second decomposed track at the second volume level, the third decomposed track at the third volume level, and the fourth decomposed track at the fourth volume level.

20. The method of claim 13, wherein decomposing the mixed input data includes processing the mixed input data by an artificial intelligence (AI) system, the AI system trained by a plurality of sets of training audio data, wherein each set of training audio data at least includes a first source track and a mixed track, the mixed track being a sum signal obtained from mixing at least the first source track or a track that resembles the first source track with a second source track.

21. The method of claim 13, wherein the mixed input data comprise first mixed input data based on a periodic beat structure, and further comprising: receiving second mixed input data different from the first mixed input data and having a periodic beat signal; and performing at least one of a tempo matching processing and a key matching processing, wherein the tempo matching processing comprises: receiving first input data obtained from the first mixed input data and second input data obtained from the second mixed input data; time stretching or resembling of at least one of the first input data and the second input data; and outputting first output data and second output data, wherein the first output data and the second output data have mutually matching tempos, and wherein the key matching processing comprises: receiving first input data obtained from the first mixed input data and second input data obtained from the second mixed input data; pitch shifting of at least one of the first input data and the second input audio data; and outputting first output data and second output data, wherein the first output data and the second output data have mutually matching keys.

22. A device for processing and playing audio data, comprising: an audio input unit for receiving mixed input data, the mixed input data being a sum signal obtained from mixing at least a first source track with at least a second source track; a decomposing unit connected to the audio input unit for decomposing the mixed input data to obtain at least a first decomposed track resembling the first source track; a playing unit for playing output data based on the first decomposed track; and a recombination unit for recombining at least the first decomposed track with a second track to generate the output data for the playing unit, wherein the second track is an independent track obtained from second mixed input data.

23. The device of claim 22, further comprising a recompose controlling section configured to be controlled to generate a control input representing a setting of a first volume level of the first decomposed track and a second volume level of the second track, wherein the recombination unit is configured to recombine at least the first decomposed track at the first volume level with the second track at the second volume level to generate the output data.

24. The device of claim 22, wherein the audio input unit is a first audio input unit for receiving first mixed input data, the first mixed input audio data being a sum signal obtained from mixing at least a first source track with a second source track, wherein the decomposing unit is a first decomposing unit, and wherein the device further comprises: a second audio input unit for receiving second mixed input data, the second mixed input data being a sum signal obtained from mixing at least a third source track with a fourth source track; and a second decomposing unit connected to the second audio input unit for decomposing the second mixed input data (B) to obtain a third decomposed track resembling the third source track and a fourth decomposed track resembling the fourth source track, wherein the recompose controlling section is configured to be controlled to generate a control input representing a setting of the first volume level of the first decomposed track, the second volume level of the second decomposed track, a third volume level of the third decomposed track, and a fourth volume level of the fourth decomposed track, and wherein the recombination unit is configured to generate the recombined output data by recombining the first decomposed track at the first volume level, the second decomposed track at the second volume level, the third decomposed track at the third volume level, and the fourth decomposed track at the fourth volume level.

25. The device of claim 23, wherein the recompose controlling section comprises at least one single recompose control element which is operable in a single control operation for controlling the first volume level and the second volume level.

26. The device of claim 22, wherein the recompose controlling section comprises: a first single recompose control element operable in a single control operation for controlling the first volume level and the second volume level; and a single mix control element operable in a single control operation for controlling a first sum signal and a second sum signal, the first sum signal being a sum of the first decomposed track at the first volume level and the second decomposed track at the second volume level and the second sum signal being a sum of the third decomposed track at the third volume level and the fourth decomposed track at the fourth volume level.

27. The device of claim 22, wherein the audio input unit is a first audio input unit for receiving first mixed input data based on a periodic beat structure, and further comprising: a second audio input unit for receiving second mixed input data different from the first mixed input data and based on a periodic beat signal; and at least one of a tempo matching unit and a key matching unit, wherein the tempo matching unit is configured to receive a first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, and wherein the tempo matching unit comprises a time stretching unit configured to time stretch at least one of the first input data and the second input audio data, and to output first output data and second output data, wherein the first output data and the second output data have mutually matching tempos, or wherein the key matching unit is configured to receive a first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, and wherein the key matching unit comprises a pitch shifting unit configured to pitch shift at least one of the first input data and the second input audio data, and to output first output data and second output data, wherein the first output data and the second output data have mutually matching keys.

28. The device of claim 7, wherein controlling the first volume level and the second volume level comprises changing a ratio between the first volume level and the second volume level from at least a value smaller than one to at least a value greater than one, or from at least a value greater than one to at least a value smaller than one.

29. The method of claim 17, wherein the second segment of the input audio file is further decomposed to obtain a second segment of the second decomposed track.

30. The method of claim 31, further comprising recombining at least the second segment of the first decomposed track at the first volume level with the second segment of the second decomposed track at the second volume level to generate a second segment of the recombined output data, wherein at least one of the following steps is performed while playing the first segment of the output data: extracting the second segment from the input audio file, decomposing the second segment of the input audio file, or recombining the at least the second segment of the first decomposed track at the first volume level with the second segment of the second decomposed track at the second volume level, and wherein generation of the second segment of the output data is completed before playing the first segment of the output data is completed.

31. The method of claim 18, wherein the time required for decomposing the first segment is smaller than fifty (50) milliseconds.

32. The method of claim 18, wherein the time required for decomposing the first segment is smaller than one hundred fifty (150) milliseconds.

32. The device of claim 25, wherein controlling the first volume level and the second volume level comprises increasing one of the first volume level and the second volume level, while at the same time decreasing the other one of the first volume level and the second volume level.

34. The device of claim 26, wherein controlling the first volume level and the second volume level comprises increasing one of the first volume level and the second volume level, while at the same time decreasing the other one of the first volume level and the second volume level, and wherein controlling the first sum signal and the second sum signal comprises increasing one of the first sum signal and the second sum signal, while at the same time decreasing the other one of the first sum signal and the second sum signal.

35. The device of claim 26, wherein controlling the third volume level and the fourth volume level, comprises increasing one of the third volume level and the fourth volume level, while at the same time decreasing the other one of the third volume level and the fourth volume level.

Description

[0101] The present invention will now be further described based on specific examples shown in the drawings.

[0102] FIG. 1 shows a schematic view of the components of the device for processing and playing an audio signal according to a first embodiment of the present invention.

[0103] FIG. 2 shows a functional diagram of the elements and signal flows in the device according to the first embodiment.

[0104] FIG. 3 shows a further functional diagram illustrating a signal flow in the device of the first embodiment.

[0105] FIGS. 4 to 10 show second to eighth embodiments of the present invention which are each modifications of the first embodiment.

[0106] FIG. 11 shows a diagram illustrating a swap process applicable in a device of the eighth embodiment of the invention.

[0107] FIGS. 12 and 13 show a graphical representations of waveforms according to embodiments of the invention.

[0108] FIG. 14 shows an audio player according to a ninth embodiment of the invention.

[0109] FIGS. 15 and 16 show tenth and eleventh embodiments of the present invention which are each modifications of the first embodiment.

[0110] FIGS. 17 and 18 show a twelfth embodiment of the present invention, which is a modification of the previous embodiments.

[0111] With reference to FIG. 1, the first embodiment of the present invention is a device 10, preferably a DJ device. Device 10 comprises an input section 12 capable of loading a first input audio file A such as a first song A, and a second input audio file B such as a second song B. Both input audio files A, B may contain audio data in a common audio file format such as MP3, WAV or AIFF, and they have a fixed file size and playback duration (in particular song length in seconds) as conventionally known to be input into DJ equipment or other playback devices. Audio files A and B may be provided, downloaded or streamed from a remote server via Internet or other network connection, or may be provided by a local computer or a storage device integrated in the device 10 itself. Input section 12 may include suitable user interface means allowing a user to select one of a plurality of available audio files as input audio file A and another one of the plurality of audio files as input audio file B.

[0112] Device 10 further comprises a processing section 14, preferably including a RAM storage 16, a ROM storage 18, a persistent storage 19 (such as a hard drive or flash drive), a microprocessor 20, and at least one artificial intelligence system 22, for example first to fourth AI systems 22-1, . . . , 22-4 which are connected to the microprocessor 20. The processing section 14 is connected to the input section 12 to receive audio data of audio files A and B.

[0113] Device 10 further comprises a recompose controlling section 24 including at least one recompose control element 26, for example a first control element 26-1, a second recompose control element 26-2 and a mix control element 28. Recompose controlling section 24 may further comprise a first play control element 30-1 and a second play control element 30-2 for starting or stopping playback of audio signals originating from the first or second mixed input data, respectively.

[0114] In addition, device 10 may include a recombination unit 32 connected to the recompose controlling section 24 for recombining audio data based on the settings of the control elements. Recombination may be carried out by multiplying different channels of audio data with scalar values based on the settings of the control elements and then adding the channels together sample by sample. Furthermore, an audio interface 34 (for example a sound card) having a digital-to-analog-converter is preferably connected to the recombination unit 32 to receive recombined output data and to convert the digitally recombined output data into an analog audio signal. The analog audio signal may be provided at an audio output 36 which may feature conventional audio connectors to connect audio cables such as line connectors or XLR connectors or wireless output (e.g. Bluetooth), which allow the audio output 36 to be connected to a PA system or speakers or headphones etc. (not illustrated). The PA system may include an amplifier connected to speakers to output the audio signal. As an alternative, internal speakers of the device such as tablet speakers or computer speakers or headphones might be used to output the analog audio signal.

[0115] Some or all components and features described above with respect to the first embodiment may be provided by an electronic control unit (ECU), such as a computer, in particular a tablet computer 35 running a software application that is programmed to operate the ECU to allow input, decomposition, recombining and output of audio data as described above with respect to FIG. 1, and to receive control input from a user, for example via a touchscreen 37 that displays the control elements of the recompose controlling section 24.

[0116] Further details of the internal components and the signal flow within the device 10 are explained in the following with respect to FIG. 2. Within input section 12, first and second input audio files A and B are obtained as described above. Input audio files A, B are then transmitted to processing section 14, which contains at least a first decomposition unit 38 and a second decomposition unit 40. First decomposition unit 38 includes a first segmentation unit 42 and at least one AI system, preferably a first AI system 44-1 and a second AI system 44-2. The second decomposition unit 40 may likewise include a second segmentation unit 46 and at least one AI system, preferably a third AI system 44-3 and a fourth AI system 44-4.

[0117] The first segmentation unit 42 of the first decomposition unit 38 receives the first input audio file A and is adapted to partition the audio file into a number of consecutive segments. Preferably, the complete input audio file A is partitioned into segments that correspond to time intervals in the audio signal that is playable from the audio file. Preferably, the starting segment is defined such that the starting point of the starting segment corresponds to the beginning of the audio file (playing position 0:00) on the time scale and the end point of the starting segment corresponds to the end of a first time interval at the beginning of the audio file. A second and each subsequent segments are then defined by consecutive time intervals of a same length, such that the starting points of the time intervals increase from one time interval to the next time interval.

[0118] More particular, consider an audio file as a digital representation of an analogue audio signal that is sampled with a predetermined sampling rate fs given by the number of samples per second. Sampling may be carried out during recording through an analog-to-digital-converter, such as an audio interface, for example. In case of digitally produced audio data (for example from digital synthesizers, drum computers etc.), the samples and in particular the audio data represented by each sample, are computer generated values. Each sample represents the signal value (e.g. a measured average value) within a sampling period T, wherein fs=1/T. For audio files, fs may be 44.1 kHz or 48 kHz, for example. One sample is also referred to as one frame. Now, in the present embodiment, a starting frame of the first segment may be the very first frame of the audio data in the audio file at a time position 0, the starting frame of the second segment may be the frame immediately following the end frame of the first segment, the starting frame of the third segment may be the frame immediately following the end frame of the second segment and so on. The segments may all have the same size with respect to their time scale of the playable audio signal or may have the same number of frames, except for the last segment, which may have an end point defined by the end point or the last frame of the (decoded) audio file or the end point of the playable audio signal on the time scale.

[0119] In fact, in methods and devices of the present invention, processing and in particular decomposition is preferably carried out on the basis of segments exactly defined by and/or corresponding to the frames of the input audio file, which ensures frame accurate positioning within the tracks, in particular within the decomposed tracks during recombining or playback, and direct translation of audio positions in the mixed input signal to audio positions in the decomposed track. A decomposed track obtained in this manner may therefore have exactly the same time scale as the mixed input track and can be further processed, for example by applying effects, resampling, time stretching, and seeking, e.g. for tempo and beat matching, without shift or loss in accuracy on the time scale. Preferably, a decomposed segment contains exactly the same amount of frames as the original input audio data corresponding to the segment.

[0120] Preferably, the size of the segments is chosen such that the length of the corresponding time intervals is smaller than 60 seconds and larger than one second. This ensures sufficient segmentation of the input audio file to achieve remarkable acceleration of the processing necessary to start playing from any given position. More preferably, the segments have a size corresponding to time intervals having a length which is between 5 seconds and 20 seconds. This ensures sufficient audio data for the AI systems 44 to achieve satisfying decomposition results on the one hand and reduce the audio data to be decomposed in one segment to a value small enough to achieve virtually immediate availability of the decomposed audio data to allow application of the device in a live performing situation.

[0121] In the output of the first segmentation unit 42 a segment of the input audio file A is provided to be transmitted to the at least one AI system 44. Preferably, the segment is doubled or copied to be transmitted to the first AI system 44-1 and, at the same time, i.e. in parallel, to the second AI system 44-2. One and the same segment of the input audio file A can therefore be processed at the same time in the first AI system 44-1 as well as in the second AI system 44-2.

[0122] Each of the AI systems used in the embodiments of the present invention may be trained artificial neural networks (trained ANN) as described above in this disclosure. In particular, a trained ANN as described by Prétet et al. could be used which is able to extract a first decomposed track representing a vocal track or a singing voice track from the mixed audio data. In particular, the AI systems 44 may calculate a Fourier transformation of the audio data (i.e. of the audio data contained in a segment of the input audio file) such as to obtain a spectrum of the frequencies contained in the audio data, wherein the spectrum is then introduced into the convolutional neural network which filters parts of the spectrum recognized as belonging to a certain source track or the sum of certain source tracks, for example belonging to the vocal part of the mix. The filtered spectrum is then retransformed into a waveform signal or audio signal which, when played back, contains only the filtered part of the original audio signal, for example the vocal part.

[0123] To be capable of this filtering analysis, an AI system such as an ANN may be used as described by Prétet et al. for example, which was trained by data sets containing large numbers of professionally recorded or produced songs from different genres, for example Hip Hop, Pop, Rock, Country, Electronic Dance Music etc., wherein said data sets do not only include the finished songs but also the respective vocal and instrumental tracks as separate recordings.

[0124] Stored within the first decomposition unit 38 of device 10 of the first embodiment (preferably within a RAM memory thereof, especially the internal RAM of the computer 35) may be two separate and fully trained instances of AI systems (different or equal AI systems) of the above-mentioned type such as to be operable simultaneously and independent from one another to generate a first decomposed track and a second decomposed track, respectively. Preferably, first and second decomposed tracks are complements, which means that the sum of the first decomposed track and the second decomposed track, when recombined with normal volume levels (i.e. each at 100 percent), resembles the original mixed input data. For example, the first decomposed track may resemble the complete vocal part of the mixed input data, whereas the second decomposed track may resemble the complete remainder of the mixed input data, in particular the sum of all instrumental tracks, such that recombining both decomposed tracks at appropriate volume levels results in an audio signal that, in terms of its acoustic perception, very closely resembles or cannot even be distinguished from the original mixed input data.

[0125] Preferably, the first and/or second decomposed track are each stereo tracks containing a left-channel signal portion and a right-channel signal portion, respectively. Alternatively they may each or both be mono tracks or multi-channel tracks with more than two channels (such as 5.1 surround tracks, for example).

[0126] The second decomposition unit 40 may be configured in a manner similar or corresponding to that of the first composition unit 38, thus including the second segmentation unit 46 which partitions the second input audio file B into a number of segments of fixed starting points and end points, transmitting the segments consecutively to both a third AI system and a fourth AI system for parallel processing and decomposition to obtain a third decomposed track and a fourth decomposed track (each of which may be mono tracks, stereo tracks, or multi-channel tracks with more than two channels (such as 5.1 surround tracks, for example)).

[0127] The decomposed tracks from the first and second decomposition units 38 and 40 are then transmitted to the recombination unit 32 which is configured to recombine at least two of the decomposed tracks at specified and controllable volume levels and to generate recombined output data. The volume levels of the decomposed tracks may be controlled by a user by virtue of at least one control element. For example, a first control element 26-1 may be provided which allows a user to control a ratio between a first volume level of the first decomposed track and a second volume level of the second decomposed track, whereas, alternatively or in addition, a second control element 26-2 may be provided which allows a user to control a ratio between a third volume level of the third decomposed track and a fourth volume level of the fourth decomposed track.

[0128] In the recombination unit 32 the first and second decomposed tracks are then recombined with one another in a first recombination stage 32-1 based on the volume levels set by the first control element 26-1 to obtain a recombination A′ from the first input audio file A. Further, the third and fourth decomposed tracks may be recombined in a second recombination stage 32-2 of the recombination unit 32 according to the third and fourth volume levels set by the second control element 26-2 such as to obtain a second recombination B′ from the second input audio file B. Furthermore, recombination A′ and recombination B′ may be introduced into a mixing stage 48 which mixes the first recombination A′ and second recombination B′ according to the setting of the mix control element 28 controllable by the user. The mix control element 28 may be adapted to control a ratio between the volume levels of the first and second recombinations A′ and B′.

[0129] The recombined output data generated by the recombination unit 32 is then transmitted to a playing unit which may include audio interface 34 connected to audio output 36.

[0130] As can be seen in FIG. 2, the first and second decomposed tracks as output by the first decomposition unit 38 may be input into a first visualization unit 49-1. In addition, the third and fourth decomposed tracks as output by the second decomposition unit 40 may be input into a second visualization unit 49-2. Moreover, first and/or second visualization units 49-1 and 49-2 may be connected to the recombination unit 32 to obtain information about the current settings of the control elements 26-1, 26-2, for example. First and/or second visualization units 49-1 and 49-2 are preferably configured to display an overlay waveform of recombination A′ and recombination B′, respectively, as will be explained in more detail later on.

[0131] Processing of the audio data within device 10 of the first embodiment of the invention is further illustrated with respect to FIG. 3, which shows the processing of only the first input audio file A as an example, which can be applied to the processing of the second input audio file B, or any additional other input audio file, in the same manner. As can be seen in FIG. 3, after the decomposition process in the processing section 14, segments of the first and second decomposed tracks are stored in an audio buffer (for example a ring buffer) for immediate further processing and in particular for playback, preferably real time playback. The audio buffer has multiple data arrays in order to store audio data from the current segment of the first decomposed track as well as audio data from the current segment of the second decomposed track, each with the given number of channels (Mono, Stereo, Surround, etc.). For example, if both decomposed tracks represent stereo signals, a four-array buffer may be used in order to store left and right channel portions of the first and the second decomposed track segments, respectively.

[0132] Output of the buffer may be connected to the recombination unit 32 which generates a recombined track according to the setting of the first control element 26-1.

[0133] If the device 10 includes one or more audio effect chains to apply audio effects to the signals, such as delay effects, reverb effects, equalizer effects, key or tempo changing effects, for example achieved by pitch-shifting, resampling and/or time stretching effects, etc. as conventionally known as such for DJ equipment, such effect chains could be inserted at different positions in the signal flow. For example, the decomposed tracks (segments) output by the buffer may each be routed through audio effect chains 51-1 and 51-2, respectively, such as to apply effects individually to the respective decomposed track as desired. The output of the audio effect chains 51-1, 51-2 may then be connected to the recombination unit 32. In addition or as an alternative, an effect chain 51-3 could be arranged at a position with respect to the signal flow at which the first and second decomposed tracks are recombined in accordance with the first and second volume levels set by the first control element 26-1, in particular at a position after the recombination unit 32 or after the first recombination stage 32-1 of recombination unit 32. The advantage of this arrangement is that the number of channels to be submitted to the audio effect chain 51-3 is reduced within the recombination process to at least one half of the number of channels before the first recombination stage and is in particular equal to the number of channels of the first mixed input data (one channel for a mono signal, two channels for a stereo signal, more than two channels for other formats such as surround signals). Thus, the additional functionality of the decomposition units of the present embodiment will not bring about any increased complexity or performance overload of the audio effect chain 51-3 as compared to the conventional processing of the mixed input data. The same audio effect chains as for conventional DJ equipment may even be used.

[0134] With reference to FIGS. 4 to 10, second to eighth embodiments are explained below. Each embodiment is a modification of the first embodiment described above with respect to FIGS. 1 to 3 and all features and functions described above for the first embodiment are preferably included in the same corresponding manner in each of the second to eighth embodiments unless described differently in the following. These same or corresponding features or functions will not be described again.

[0135] In the second embodiment illustrated in FIG. 4, a first DJ deck 50a and a second DJ deck 50b are displayed on a display, in particular a touch display which allows a user to operate them by means of gestures or movements corresponding to the operation of physical DJ decks. The second embodiment may in particular be advantageous to allow a user, in particular a DJ, to perform scratching effects during live performance or to skip to different time positions in a song.

[0136] As a further feature of the second embodiment, which may be provided independent from (in addition or alternatively to) the DJ decks 50a, 50b, the first control element 26-1, and preferably also the second control element 26-2, may be embodied as sliders, either as hardware sliders mechanically movable by a user, or by virtual sliders presented on a touch screen or on a computer screen movable by a touch gesture or by a pointer, a computer mouse or any other user input. The slider of the first control element 26-1 allows continuous variation of the ratio between the first volume level of the first decomposed track and the second volume level of the second decomposed track in a range from one end position at which the first volume level is set to 100% and the second volume level is set to 0% to another end position at which the first volume level is set to 0% and the second volume level is set to 100%. Between the end positions, when moving the slider in one direction, one of the first and second volume levels is increased, while the other one of the first and second volume levels is decreased at the same proportion.

[0137] As a preferred default setting, at a center position of control element 26-1, both first and second volume levels are set to full/normal volume=100%, i.e. the recombination corresponds to the original first mixed input data. The volume adjustment curve can be user configurable though if needed. By default the volume levels may be calculated as follows:

first volume level=MIN(1.0,sliderValue*2.0),

second volume level=MIN(1.0,(1.0−sliderValue)*2.0),

wherein “MIN (value 1, value2)” represents a minimum value of value 1 and value 2, “sliderValue” represents a setting of control element 26-1 running from 0 (left end value) to 1.0 (right end value). Increasing and decreasing of the volume levels is reversed when moving the slider in the other direction. The user will thus be able to smoothly crossfade between the first decomposed track and the second decomposed track or adjust a desired recombination between both decomposed tracks by a single continuous movement with only one hand or even only one finger. Preferably, the second control element 26-2 is operable in the same manner as the first control element 26-1 to control the third and fourth volume levels of the third and fourth decomposed tracks, respectively.

[0138] Preferably, the mix control element 28 is also realized as a slider and may be positioned between the first and second control elements 26-1, 26-2 for intuitive operation of the device. As in the first embodiment, the mix control element 28 may be a crossfader and/or may be adapted to control a ratio between the volume levels of the first and second recombinations A′ and B′, wherein recombination A′ is obtained from recombining the first decomposed track and the second decomposed track, and recombination B′ is obtained from recombining the third decomposed track and the fourth decomposed track.

[0139] Device 10 may further be configured to display a first waveform section 52-1 in which waveforms representing the first and second decomposed tracks or a recombination thereof are displayed. First and second decomposed tracks may be visualized in an overlaid manner such as to share a common baseline/time axis, but using different signal axes and/or different drawing styles so as to be visually distinguishable from one another. In the example shown in FIG. 4, the first waveform section 52-1 displays a zoom-in version 53-1 of the first and second waveforms, in which first and second waveforms are displayed in an overlaid manner using a common baseline that is scaled to view a time interval containing the current play position and preferably having a size between 1 second and 60 seconds, more preferably between 3 seconds and 10 seconds. The zoom-in version 53-1 may scroll along with the playback such as to maintain a current playing position visible, in particular at a fixed position on the display. In addition or alternatively, the first waveform section 52-1 may display a zoom-out version 55-1 of the first and second waveforms, in which first and second waveforms are displayed in an overlaid manner using a common baseline that is scaled to view a time interval containing the current play position and preferably having a size corresponding to the length of an input audio file, for example the whole song A and/or a size between 60 seconds and 20 minutes. Preferably, the zoom-out version 55-1 does not move with respect to the time axis, but rather shows a play head 58 representing the current playing position, which moves along the time axis.

[0140] Likewise, device 10 may be configured to display a second waveform section 52-2 in which waveforms representing the third and fourth decomposed tracks are displayed in the same manner as described above for the first waveform section 52-1 and the first and second decomposed tracks, in particular by means of a zoom-in version 53-2 and a zoom-out version 55-2.

[0141] First and/or second waveform sections 52-1, 52-2 may be configured to receive user input commands such as touch gestures or mouse/pointer input commands in order to change the current playing position and to jump to a desired position within the audio data, for example by simple clicking or touching the desired position on the baseline in the zoom-out version 55-1/55-2.

[0142] In the example of FIG. 4, the first and second decomposed tracks of the zoom-in version 53-1 of the first waveform section 52-1 are displayed using different signal axis and different drawings styles. In particular, the signal axis of the first decomposed track, for example the decomposed vocal track, is scaled significantly smaller than that of the second decomposed track, for example the decomposed instrumental track, such that the first decomposed track is visualized as lying within the second decomposed track and thus being visually distinguishable. Furthermore the waveform of the first decomposed track is displayed with a drawing style using a dark color, whereas the waveform of the second decomposed track is displayed with a drawing style using a lighter color.

[0143] Likewise the first and second decomposed tracks of the zoom-out version 55-1 of the first waveform section 52-1 are display using different drawings styles. In particular, only an upper half of the waveform of the first decomposed track and only a lower half of the waveform of the second decomposed track are displayed. Furthermore the waveform of the first decomposed track may be displayed with a drawing style using a dark color, whereas the waveform of the second decomposed track may be displayed with a drawing style using a lighter color. Of course, all these drawing styles could be interchanged or modified and/or applied to the waveforms of the second waveform section 52-2.

[0144] The overlaid representations of the decomposed tracks in the first and second waveform sections 52-1, 52-2 may be provided by a method according an embodiment of the invention, which will be described in more detail below with respect to FIGS. 12 and 13.

[0145] Furthermore, settings of the control elements 26-1, 26-2, 28 and 30-1, 30-2 may be reflected in the visualization of the decomposed tracks in the first and second waveform sections 52-1, 52-2 through respective signal amplitude changes of the individual waveforms displayed. In particular, the signal axes of the waveforms of the decomposed tracks as displayed in the first and second waveform sections 52-1, 52-2 are scaled depending on the current settings of the volume levels of the respective decomposed tracks as set by the user through the control elements 26-1, 26-2, 28 and 30-1, 30-2. This allows direct and preferably immediate visual feedback of the volume settings to the user.

[0146] Device 10 may have a first cue control element 31-1 and/or a second cue control element 31-2, associated to the first and second mixed input files (songs A and B), respectively, which can be operated by a user to store a current playing position and to retrieve and jump to it at any point in time later as desired.

[0147] In the third embodiment illustrated in FIG. 5, first and second control elements 26-1, 26-2 are similar in function to the respective control elements in the second embodiment except that they are rotatable knobs instead of sliders. However, the knobs can also be rotated between two end positions in which one of the first and second volume levels is set to 100% whereas the other one of the first and second volume levels is set to 0%. Again, the user may crossfade between the first and second decomposed tracks by means of a single continuous movement using only one hand or only one finger. The same configuration may be implemented for the second control element 26-2.

[0148] FIG. 6 illustrates a fourth embodiment of the present invention which uses a different controlling section to control the recombination unit. In particular, instead of or in addition to the first and second control elements 26-1, 26-2 as described for the first to third embodiments, in the fourth embodiment there is provided a third control element 26-3 which controls a ratio between the first volume level of the first decomposed track and the third volume level of the third decomposed track, in other words, volume levels of decomposed tracks of different decomposition units 38, 40. It furthermore may comprise a fourth control element 26-4 which allows a user to control a ratio between the second volume level of the second decomposed track and the fourth volume level of the fourth decomposed track. By means of these control elements 26-3, 26-4 it will be possible, for example to easily and directly control, by means of a single movement with one hand or one finger, a ratio between a vocal part of the first audio file and a vocal part of the second audio file by manipulating the third control element 26-3. Likewise, by manipulating the fourth control element 26-4 in a single movement by only one hand or only one finger, a user may control a ratio between the volume level of the instrumental part of the first audio file and the instrumental part of the second audio file. This allows a DJ for example to make an even more seamless transition by first cross fading the vocal track from song A to song B and subsequently cross fading the instrumental track from song A to song B, thus achieving a more continuous flow of the music.

[0149] The third control element 26-3 and/or fourth control element 26-4 may be implemented as sliders (hardware slider or software user interface, e.g. virtual touch screen sliders) or as rotatable knobs (likewise as hardware knobs or virtual knobs on a touch screen, computer screen or any other display device).

[0150] In the first to fourth embodiments described above, device 10 was preferably realized as an all-in one device including input section 12, processing section 14, recombination unit 32, playing unit (in particular audio interface 34 (e.g. sound card) and audio output 36), in one single housing or, alternatively, as a complete virtual equipment realized as a software running on an electronic control unit (ECU) with the control elements being visualized on a display of the ECU and the electronic components of the processing section 14 being provided by the integrated electronic components of the ECU. Such ECU may be a standard personal computer, a multi-purpose computing device, a laptop computer, a tablet computer, a smartphone or an integrated, standalone DJ controller.

[0151] As a further alternative, according to a fifth embodiment shown in FIG. 7, device 10 may be implemented as a combination of a computer 54 (personal computer, laptop computer, tablet or smartphone or other multi-purpose computing device) and a periphery device 56 which is an external hardware component that can be connected to the computer by cable (such as USB connection, MIDI connection, HID connection, fire wire connection, LAN connection etc.) or by any wireless connection using the usual wireless protocols (WIFI, GSM, Bluetooth etc.). Preferably, the periphery device 56 includes the recompose controlling section 24 with the control elements such as control elements 26-1, 26-2 and 28. Furthermore, the periphery device 56 may include Jog wheels 50a, 50b or other features known from conventional DJ equipment. The conventional hardware of the computer 54 may be used as the processing section 14, in particular to store and run the AI systems and the segmentation units in the RAM memory of the computer 54. Furthermore, a processor/CPU may also be included in the peripheral device 56 to perform some or all of the tasks of the processing section 14.

[0152] A sixth embodiment of the present invention as shown in FIG. 8 is a slight modification of the fifth embodiment, wherein the periphery device 56 of the sixth embodiment is relatively compact and just includes the recompose controlling section and the control elements in order to reduce the additional hardware required to carry out the present invention to a minimum and still provide for mechanical control elements.

[0153] In a seventh embodiment shown in FIG. 9, the device 10 comprises a song-A instrumental button 26-5 controllable by the user to switch ON or OFF the decomposed instrumental track of song A, and/or a song-A vocal button 26-6 controllable by the user to switch ON or OFF the decomposed vocal track of song A, and/or a song-B instrumental button 26-7 controllable by the user to switch ON or OFF the decomposed instrumental track of song B, and/or a song-B vocal button 26-8 controllable by the user to switch ON or OFF the decomposed vocal track of song B. By realizing some or all of these buttons 26-5 to 26-8 as separate buttons, the user can individually and by only one single operation (one tap with the finger) switch ON or OFF a selected one of the decomposed tracks. Note that in the present description, switching ON and OFF a track refers to unmuting and muting the track, respectively.

[0154] Preferably, upon operation of one of the buttons 26-5 to 26-8 by the user, the respective decomposed track is not switched ON or OFF immediately, but the device is controlled to continuously or stepwise increase or decrease the volume of the respective track within a certain time period of preferably more than 5 milliseconds or even more than 50 milliseconds, such as to avoid acoustic artefacts arising from instant signal transitions.

[0155] In an eighth embodiment shown in FIG. 10, the device 10 may comprise a first recombination stage configured to obtain a first recombination A′ by recombining the decomposed vocal track of song A with the decomposed instrumental track of song A, and a second recombination stage configured to obtain a second recombination B′ by recombining the decomposed vocal track of song B with the decomposed instrumental track of song B. Furthermore, device 10 may comprise a mix control element 28 configured such as to be operable by a user in a first direction to increase a volume level of the first recombination A′ or in a second direction to increase a volume level of the second recombination B′. In addition, there is preferably provided a mixing stage which mixes the first and second recombinations A′ and B′ with one another according to their respective volume levels to obtain the recombined output track. Such signal flow is similar to that explained above with reference to FIG. 2.

[0156] Now, in the eighth embodiment, the device 10 may further include a vocal swap button 26-9 controllable by the user, in particular through one single operation such as simply pushing the button, to route the decomposed vocal track of song A to the second recombination stage and to route the decomposed vocal track of song B to the first recombination stage. In other words, operation of the vocal swap button 26-9 swaps the two decomposed vocal tracks of songs A and B before they enter the first and second recombination stages, respectively. Repeated operation of the vocal swap button 26-9 may again swap the two decomposed vocal tracks and so on.

[0157] In addition or alternatively, the device 10 may include an instrumental swap button 26-10 controllable by the user, in particular through one single operation such as simply pushing the button, to route the decomposed instrumental track of song A to the second recombination stage and to route the decomposed instrumental track of song B to the first recombination stage. In other words, operation of the instrumental swap button 26-10 swaps the two decomposed instrumental tracks of songs A and B before they enter the first and second recombination stages, respectively. Repeated operation of the instrumental swap button 26-10 may again swap the two decomposed instrumental tracks and so on.

[0158] Preferably, upon operation of one of the buttons 26-9 or 26-10 by the user, the respective swapping of the tracks will not be immediate, but the device is controlled to continuously or stepwise increase or decrease the respective volumes of the tracks within a certain time period of preferably more than 5 milliseconds or even more than 50 milliseconds, such as to avoid acoustic artefacts arising from instant signal transitions.

[0159] Alternatively the vocal swap button 26-9 can be controlled by the user to achieve a similar remix/mashup by obtaining a first recombination A′ by recombining the decomposed vocal track of song A at normal volume (in particular maximum volume) with the muted decomposed instrumental track of song A, and by obtaining a second recombination B′ by recombining the muted decomposed vocal track of song B with the decomposed instrumental track of song B at normal volume (in particular maximum volume), while setting the mix control element 28 to its center position such as to have recombinations A′ and B′ both audible at the same volume levels and at the same time.

[0160] FIG. 11 shows a modification of the method of the eighth embodiment, especially as regards the operation of the swap buttons, for example the vocal swap button 26-9. Device 10 receives a track A (song A) as a master track and track B (song B) as a slave track. Track A is decomposed as described above to obtain decomposed tracks 1 and 2, whereas track B is decomposed as described above to obtain decomposed tracks 3 and 4, respectively. In order to prepare decomposed track 3 for the swap, its key, tempo and beat phase will be matched to that of the master track A. In particular, the device 10 determines a tempo (e.g. a BPM value (beats per minutes)) of track A and track B and if they don't match, decomposed track 3 will be resampled or time-stretched such as to match the tempo of the master track A. In addition, key matching will be carried out and the key of decomposed track 3 will be changed, if necessary, such as to match that of the master track A. Moreover, after tempo matching of decomposed track 3, the beat phase of decomposed track 3 is shifted in a synchronization step as necessary, such as to match the beat phase of track A.

[0161] As a result, device 10 prepares a modified decomposed track 3′ which matches track A as regards tempo, beat phase and key such that it can be seamlessly recombined with decomposed track 2 of track A. If the swap button is activated, as can be seen in FIG. 11, in the following processing of track A, decomposed track 3′ will be used instead of decomposed track 1 and will be routed to the recombination stage for recombination with decomposed track 2 and audio output.

[0162] Optionally, one or more audio effect chains may be inserted in the signal flow of any of the tracks, for example between the swapping step and the recombination stage such as to be applied to the respective decomposed tracks 1, 2 or 3′, for example.

[0163] FIGS. 12 and 13 show graphical representations of audio data which may be displayed on a display device in a method or device of an embodiment of the present invention, in particular in a device according to one of the first to eighth embodiments described above, during operation of the device. In particular, the graphical representation could be displayed on a display of the ECU, in particular a computer screen or on an integrated display of a separate peripheral device connected to a computer or as a standalone device, on a tablet, smartphone or a similar device. The graphical representation may be generated by suitable software which runs on the ECU (i.e. the computer, the standalone device, the tablet, the smartphone etc.) and which may be part of the software that carries out a method according to the present invention as described in the claims or in the embodiments above. The software may operate a graphic interface, such as a graphic card.

[0164] According to the embodiment, audio data are visualized as waveforms. Waveforms in this sense are representations having a linear time axis t which represents the playback time (usually a horizontal axis), and a signal axis (orthogonal to the time axis t, preferably a vertical axis), which represents an average signal strength or a signal amplitude of the audio data at each specific playback time. A playhead 58 may be provided which indicates the current playing position. During playback of the audio data, the playhead 58 is moving with respect to the waveform along the time axis t by visually moving either the waveform or the playhead or both.

[0165] FIG. 12 schematically shows the processing steps to arrive at the novel graphical representation of the invention. Mixed input data 60, for example song A, is received and decomposed to obtain first decomposed track 61-1, for example a decomposed vocal track, and second decomposed track 61-2, for example a decomposed instrumental track. First and second decomposed tracks 61-1 and 61-2 may be complementary tracks such that their sum corresponds to the mixed input data 60.

[0166] Actually displayed is then an overlay waveform 64 which is an overlaid representation of the first and second decomposed tracks 61-1, 61-2 using one single baseline for the waveforms of both decomposed tracks, which means that the time axes t of both waveforms are not running parallel to each other in a distance but are identical to form one common line. In order to allow a differentiation between both waveforms, they are displayed using different drawing styles. For example one of the two waveforms of the decomposed tracks may be displayed in a different color than the other waveform. In the example shown in FIG. 12, for one of the waveforms of the decomposed tracks, here the decomposed vocal track 61-1, only positive signal portions are displayed, while negative signal portions are left out, whereas for the waveform of the other of the decomposed tracks, here the decomposed instrumental track 61-2, only negative signal portions are displayed, while positive signal portions are left out. Alternatively the waveforms could be drawn using differently scaled signal axes or by using different drawing styles such as to allow the waveforms to be distinguished from one another. As an example of different drawing styles, one of the waveforms could be drawn as a dashed or a dotted line, or of different color, or of different opacity or transparency or any combination thereof.

[0167] In another example shown in FIG. 13 one of the waveforms of the decomposed tracks, here the waveform of the decomposed vocal track 61-1, is displayed with a signal axes scaled differently, here smaller, than that of the waveform of the other decomposed track, here the decomposed instrumental track 61-2. In addition the waveforms may be displayed with different colors.

[0168] Waveforms of decomposed tracks are preferably displayed such as to represent the settings of the control elements of the recompose controlling section and/or the settings of the recombination unit such as to provide a feedback to the user about the signal volumes assigned to the respective decomposed tracks. Preferably, at the same time as a user is manipulating one of the control elements to increase or decrease the volume of at least one decomposed track, the associated waveform of this decomposed track is displayed with an increasing or decreasing size with regard to its signal axis, or visually faded in or out. This graphical feedback is preferably immediate, thus with a delay time which is not disturbing or even not recognizable for the user, in particular a delay time below 500 milliseconds, preferably below 35 milliseconds such that it is not noticeable to the eye at a frame rate of 30 frames per second. Such display greatly assists operation of the device during live performance.

[0169] FIG. 14 shows a ninth embodiment of a device 10 of the present invention, which is an audio player including a recompose controlling section 24 having a control element 26-13 for controlling the first and second volume levels of respective first and second decomposed tracks (here decomposed vocal track and decomposed instrumental track) obtained from one audio file, and optionally a display region 66 displaying an overlaid representation of the first and second decomposed tracks. The device 10 of FIG. 14 may be adapted to play audio files one after another, for example from a playlist or based on individual user selection, and might have an input unit for receiving audio files via streaming from an audio streaming service, and may thus be adapted to play only one audio file at most of the time (apart from optional crossfading effects at a transition from the end of one song to the beginning of the next song). The user can start or stop playback by operation of a play control element 30 and/or can change the playback position by moving the playhead along the time axis.

[0170] Through the control element 26-13 the user may control playback of a song such as to hear only the decomposed vocal track or only the decomposed instrumental track or a recombination of both tracks. Such configuration might be useful for a karaoke application or a play-along application, for example. Preferably, device 10 is a computer or a mobile device, such as a smartphone or tablet, which runs a suitable software application to realize the above-described functionalities.

[0171] FIG. 15 shows a tenth embodiment of the present invention which comprises separate ON-OFF buttons 26-14 to 26-17 for each of the first to fourth decomposed tracks, in particular the first decomposed vocal track, the first decomposed instrumental track, the second decomposed vocal track and the second decomposed instrumental track, respectively. By operating one of the buttons, the volume of the respective decomposed track is switched between 0 and 100 percent or vice versa.

[0172] FIG. 16 shows an eleventh embodiment of the present invention which comprises separate faders 26-18 to 26-21 for each of the first to fourth decomposed tracks, in particular the first decomposed vocal track, the first decomposed instrumental track, the second decomposed vocal track and the second decomposed instrumental track, respectively. By operating one of the faders, the volume of the respective decomposed track are continuously changed between 0 and 100 percent or vice versa.

[0173] A twelfth embodiment of the present invention will be described in the following with reference to FIGS. 17 and 18. The twelfth embodiment is a modification of the first to eleventh embodiments and may therefore comprise any or all of the above-mentioned features and advantages of any of the first to eleventh embodiments unless otherwise described in the following.

[0174] A device 110 of the twelfth embodiment comprises an input unit with a first input section to receive and/or provide an input audio file A, for example a first song A, and preferably a second input section for receiving or providing a second input audio file B, for example a second song B. The first input audio file may be decoded or decompressed if provided in encoded or compressed format, and may be partitioned into segments in a first segmentation unit 142 in a same or corresponding manner as described above for the first embodiment.

[0175] The input audio file A (or its segments) are then transferred to a first AI system 144 capable of separating the audio data into at least four decomposed tracks, i. e. a drum track D1, a bass track D2, a vocal track D3, and a complement track D4. The drum track D1 contains components of the input audio file A which have a drum timbre, the bass track D2 contains components of the input audio file A that have a bass timbre, the vocal track D3 contains components of the input audio file A that have a vocal timbre and the complement track D4 is a remainder of the input audio file A, which means that a mixture of the drum track D1, the bass track D2, the vocal track D3, and the complement track D4 will result in an audio signal substantially equal to that of the input audio file A. In modifications of this embodiment, the AI system 144 may be configured and trained to separate from the input audio file A decomposed tracks D1 to D3 of any other timbres.

[0176] Decomposed tracks D1 to D4 are routed to recombination unit 132 which is configured to recombine selected tracks out of the decomposed tracks D1 to D4 according to user settings and/or a user control input. In particular, recombination unit 132 may comprise a first recombination section 132a which receives the individual decomposed tracks D1 to D4 as an input and outputs two tracks S1, obtained from passing through one of the decomposed tracks D1 to D4, and S2, obtained from grouping selected tracks out of D1 to D4. The selection of decomposed tracks and the respective grouping of the decomposed tracks D1 to D4 may be controlled by a mode control unit 145.

[0177] In the example shown in FIGS. 17 and 18, mode control unit 145 may selectively be set into a first operational mode shown in FIG. 17 or a second operational mode shown in FIG. 18. In the first operational mode, the first recombination section 132a is configured such that the drum track D1 is routed to the first track S1, i.e. S1 equals D1, whereas bass track D2, vocal track D3, and complement track D4 are selected and recombined into a single track, i. e. the second track S2. In other words, in the first operational mode D2, D3, and D4 are grouped to form a single track S2 and D1 is passed through such as to form track S1. On the other hand, in the second operational mode shown in FIG. 18, the first recombination section 132a is configured such that drum track D1, bass track D2, and complement track D4 are selected for a recombination, i. e. grouped to form a single track S2, whereas vocal track D3 is routed to track S1 alone.

[0178] Mode control unit 145 may comprise a mode control element (such as a genre button or genre switch) to be operated by a user to selectively switch between the first operational mode and the second operational mode. The first operational mode may for example be used primarily for electronic music (i.e. usually without vocals), while the second operational mode may be used for music usually containing vocals such as Hip-hop or Pop.

[0179] Tracks S1 and S2 are then routed to a second recombination section 132b which contains a single control element 126-1 controllable by a user to control a first volume level to be associated to the first track S1 and a second volume level to be associated to the second track S2. Preferably, control element 126-1 is operable by a user in a single control operation, for example as a crossfader between the first volume level and the second volume level, i.e. such as to change a ratio between the first and second volume levels. In particular, the single control element 126-1 may be configured to have a control range, wherein at least in part of the control range volume changes of the first and second volume levels are performed simultaneously, for example by increasing one of the first and second volume levels, and/or decreasing the other of the first and second volume levels. Preferably, the single control element 126-1 may have a control range extending from a first end point at which the first volume level has a maximum value and the second volume has a minimum value, to a second end point at which the first volume level has a minimum value and the second volume level has a maximum value. In the middle region of the control range the first and second volume levels may both have a maximum value.

[0180] Preferably, the single control element 126-1 is a single rotatable knob or a single fader element. Based on the settings of the first and second volume levels as input by the user through the control element 126-1, the second recombination section 132b recombines the first track S1 and the second track S2 in order to obtain a second recombined track A′ routed towards an audio interface 134 for playback.

[0181] Tracks S1 and S2 may further be routed to a visualization unit 149-1 for visualization of their waveforms on a display or the like, as described above for the visualization units 49-1 and 49-2 in the previous embodiments.

[0182] The second audio input file B may be processed in a similar manner as the first input audio file A, for example in a second decomposition unit 140 which may comprise a second AI-system. Decomposed tracks obtained from the second decomposition unit 140 may then be routed through the recombination unit 132 and recombined therein in groups or individually in the same or corresponding manner as described above for the first input audio file A. A recombined track B′ obtained in this manner from the second input audio file B may then be recombined/mixed with the recombined track A′ obtained from first input audio file A, in particular within a further mixing stage controlled by a mix control element 128 in the manner described in more detail above for the first to eleventh embodiments. The output of this mixing stage may then be routed to audio interface 134 for playback.

[0183] Aspects and embodiments of the present invention can further be described by the following items: [0184] Item 1: Method for processing and playing audio data, comprising the steps of: [0185] a) receiving mixed input data, said mixed input data being a sum signal obtained from mixing at least one first source track with at least one second source track, [0186] b) decomposing the mixed input data to obtain at least a first decomposed track resembling the at least one first source track, [0187] c) generating output data based on the first decomposed track, [0188] d) playing the output data through an audio output. [0189] Item 2: Method of item 1, further comprising the following steps: [0190] reading a control input from a user, said control input representing a desired setting of a first volume level of the first decomposed track and a second volume level of a second track, and [0191] recombining at least the first decomposed track at the first volume level with the second track at the second volume level to generate recombined output data, [0192] playing the recombined output data. [0193] Item 3: Method of item 2, [0194] wherein the second track is obtained in the step of decomposing the mixed input data and forms a second decomposed track resembling the at least one second source track. [0195] Item 4: Method of at least one of the preceding items, wherein decomposing the mixed input data is carried out segment-wise, wherein decomposing is carried out based on a first segment of the mixed input data such as to obtain a first segment of output data, and wherein decomposing of a second segment of the mixed input data is carried out while playing the first segment of output data. [0196] Item 5: Method of at least one of the preceding items, wherein the method steps, in particular steps (a) to (d), are carried out in a continuous process. [0197] Item 6: Method of at least one of the preceding items, wherein the mixed input data are received via streaming from a remote server, preferably through the internet. [0198] Item 7: Method of at least one of the preceding items, [0199] wherein in step (a) an input audio file having a predetermined file size and a predetermined playback duration is received, which contains audio data to play the mixed input data, and a first segment is extracted from the input audio file, which contains audio data to play the mixed input data within a first time interval smaller than the predetermined playback duration, [0200] wherein in step (b) the first segment of the input audio file is decomposed to obtain a first segment of the first decomposed track and optionally a first segment of the second decomposed track, [0201] wherein in step (c) a first segment of the output data is generated from the first segment of the first decomposed track, preferably by recombining at least the first segment of the first decomposed track at the first volume level with the first segment of the second decomposed track at the second volume level, and [0202] wherein the method further comprises the steps of: [0203] a2. extracting a second segment from the input audio file, which is different from the first segment and which contains audio data to play the mixed input data within a second time interval smaller than the predetermined playback duration of the input audio file and shifted in time with respect to the first time interval, [0204] b2. decomposing the second segment of the input audio file to obtain a second segment of the first decomposed track and optionally a second segment of the second decomposed track, [0205] optionally c2. recombining at least the second segment of the first decomposed track at the first volume level with the second segment of the second decomposed track at the second volume level to generate a second segment of the recombined output data, wherein at least one of the steps (a2), (b2) and (c2) is performed while playing the first segment of the output data, and wherein generation of the second segment of the output data is completed before playing the first segment of the output data is completed. [0206] Item 8: Method of at least one of items 4 to 7, wherein the size of the first segment or the length of the first time interval is set such that the time required for decomposing the first segment is smaller than 2 seconds, preferably smaller than 150 milliseconds, most preferably smaller than 50 milliseconds. [0207] Item 9: Method of at least one of the preceding items, comprising the steps of [0208] receiving an input audio file having a predetermined file size and a predetermined playback duration, which contains audio data to play the mixed input data, [0209] partitioning the input audio file into a plurality of segments in succession, which contain audio data to play the mixed input data within a plurality of time intervals following each other, [0210] receiving a play position command from a user representing a user's command to play the input audio file from a certain start play position, [0211] identifying a first segment out of the plurality of segments such that the start play position is within the time interval which corresponds to the first segment, [0212] decomposing the first segment of the input audio file to obtain a first segment of the first decomposed track and optionally a first segment of the second decomposed track, [0213] generating a first segment of the output data based on the first segment of the first decomposed track, preferably by recombining at least the first segment of the first decomposed track at the first volume level with the first segment of the second decomposed track at the second volume level, and [0214] playing the first segment of the output data starting at the start play position, which is a play position later than or equal to the start of the time interval of the first segment. [0215] Item 10: Method of at least one of the preceding items, wherein the mixed input data are first mixed input data being a sum signal obtained from mixing at least a first source track with a second source track and wherein the method further comprises the steps of [0216] receiving second mixed input data, said second mixed input data being a sum signal obtained from mixing at least one third source track with at least one fourth source track, [0217] decomposing the second mixed input data to obtain a third decomposed track resembling the at least one third source track, and a fourth decomposed track resembling the at least one fourth source track, [0218] wherein in the step of reading the control input from a user, said control input represents a desired setting of the first volume level of the first decomposed track, the second volume level of the second decomposed track, a third volume level of the third decomposed track, and a fourth volume level of the fourth decomposed track, and [0219] wherein, in the step of recombining, the recombined output data is generated by recombining the first decomposed track at the first volume level, the second decomposed track at the second volume level, the third decomposed track at the third volume level and the fourth decomposed track at the fourth volume level. [0220] Item 11: Method of at least one of the preceding items, wherein at least one, preferably all of the mixed input data and the decomposed track signals represent stereo signals, each comprising a left-channel signal portion and a right-channel signal portion, respectively. [0221] Item 12: Method of at least one of the preceding items, wherein decomposing the mixed input data includes processing the mixed input data by an AI system, said AI system preferably being trained by a plurality of sets of training audio data, wherein each set of training audio data at least includes a first source track and a mixed track being a sum signal obtained from mixing at least the first source track or a track that resembles the first source track, with a second source track. [0222] Item 13: Method of at least one of the preceding items, wherein the mixed input data are processed within a first AI system and a second AI system separate from the first AI system, wherein the first AI system processes the mixed input data to obtain only the first decomposed track and the second AI system processes the mixed input data to obtain only the/a second decomposed track, [0223] wherein the method is preferably processing the mixed input data as a first mixed input data and is further processing a second mixed input data within a third AI system separate from the first and the second AI system, and within a fourth AI system separate from each of the first to third AI systems, wherein the third AI system processes the second mixed input data to obtain only the third decomposed track and the fourth AI system processes the second mixed input data to obtain only the fourth decomposed track. [0224] Item 14: Method of at least one of the preceding items, wherein said mixed input data are first mixed input data based on a periodic beat structure and wherein the method further comprises: [0225] receiving second mixed input data different from the first mixed input data and having a periodic beat signal, [0226] performing at least one of a tempo matching processing and a key matching processing, [0227] wherein the tempo matching processing comprises: receiving first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, time stretching or resembling of at least one of the first input data and the second input data, and outputting first output data and second output data which have mutually matching tempos, [0228] wherein the key matching processing comprises: receiving first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, pitch shifting of at least one of the first input data and the second input audio data, and outputting first output data and second output data which have mutually matching keys. [0229] Item 15: Device (10) for processing and playing audio data, preferably DJ equipment, comprising [0230] an audio input unit for receiving mixed input data (A, B), said mixed input data being a sum signal obtained from mixing at least a first source track with at least a second source track, [0231] a decomposing unit (38, 40) connected to the audio input unit for decomposing the mixed input data to obtain at least a first decomposed track resembling the first source track, and [0232] a playing unit (34, 36) for playing output data based on the first decomposed track. [0233] Item 16: Device (10) of item 15, further comprising a recombination unit (32) for recombining at least the first decomposed track with a second track to generate the output data for the playing unit. [0234] Item 17: Device (10) of item 15 or item 16, further comprising a recompose controlling section (24) adapted to be controlled by a user to generate a control input representing a desired setting of a first volume level of the first decomposed track and a second volume level of the second track, wherein the recombination unit (32) is configured to recombine at least the first decomposed track at the first volume level with the second track at the second volume level to generate the output data. [0235] Item 18: Device (10) of at least one of items 15 to 17, wherein the audio input unit is a first audio input unit for receiving first mixed input data (A) being a sum signal obtained from mixing at least a first source track with a second source track, and the decomposing unit is a first decomposing unit (38), and [0236] wherein the device further comprises: [0237] a second audio input unit for receiving second mixed input data (B), said second mixed input data being a sum signal obtained from mixing at least a third source track with a fourth source track, [0238] a second decomposing unit (40) connected to the second audio input unit for decomposing the second mixed input data (B) to obtain a third decomposed track resembling the third source track and a fourth decomposed track resembling the fourth source track, [0239] wherein the recompose controlling section (24) is adapted to be controlled by a user to generate a control input representing a desired setting of the first volume level of the first decomposed track, the second volume level of the second decomposed track, a third volume level of the third decomposed track, and a fourth volume level of the fourth decomposed track, and [0240] wherein the recombination unit (32) is adapted to generate the recombined output data by recombining the first decomposed track at the first volume level, the second decomposed track at the second volume level, the third decomposed track at the third volume level and the fourth decomposed track at the fourth volume level. [0241] Item 19: Device (10) of at least one of items 15 to 18, wherein the recompose controlling section (24) comprises at least one single recompose control element (26-1, 26-2) which is operable by a user in a single control operation for controlling the first volume level and the second volume level, preferably (1) increasing one of the first volume level and the second volume level, while at the same time decreasing the other one of the first volume level and the second volume level, or (2) changing a ratio between the first volume level and the second volume level from at least a value smaller than 1 to at least a value greater than 1 or vice versa. [0242] Item 20: Device (10) of at least one of items 15 to 19, wherein the recompose controlling section (24) comprises [0243] a first single recompose control element (26-1) which is operable by a user in a single control operation for controlling the first volume level and the second volume level, preferably (1) increasing one of the first volume level and the second volume level, while at the same time decreasing the other one of the first volume level and the second volume level, or (2) changing a ratio between the first volume level and the second volume level from at least a value smaller than 1 to at least a value greater than 1 or vice versa, and [0244] a single mix control element (28), which is operable by a user in a single control operation for controlling a first sum signal and a second sum signal, preferably (1) increasing one of the first sum signal and the second sum signal, while at the same time decreasing the other one of the first sum signal and the second sum signal, or (2) changing a ratio between a volume level of the first sum signal and a volume level of the second sum signal from at least a value smaller than 1 to at least a value greater than 1 or vice versa, [0245] the first sum signal being a sum of the first decomposed track at the first volume level and the second decomposed track at the second volume level and the second sum signal being a sum of the third decomposed track at the third volume level and the fourth decomposed track at the fourth volume level, and [0246] preferably a second single recompose control element (26-2) which is operable by a user in a single control operation for controlling the third volume level and the fourth volume level, preferably (1) increasing one of the third volume level and the fourth volume level, and/or decreasing the other one of the third volume level and the fourth volume level, or (2) changing a ratio between the third volume level and the fourth volume level from at least a value smaller than 1 to at least a value greater than 1 or vice versa. [0247] Item 21: Device (10) of at least one of items 15 to 20, further comprising [0248] an input audio file buffer for loading therein segments of an input audio file having a predetermined file size and a predetermined playback duration, which contains audio data to play the mixed input data, [0249] a first segment buffer connected to the decomposing unit to receive and store a segment of the first decomposed track obtained from a segment of the input audio file, [0250] a second segment buffer connected to the decomposing unit to receive and store a segment of the second decomposed track obtained from the same segment of the input audio file, [0251] wherein the playing unit (34, 36) comprises an audio interface having an analog-to-digital converter to generate an analog audio signal from the recombined output data, said audio interface having an audio buffer for buffering portions of the output data for playing, [0252] wherein the size of the first segment buffer and/or the second segment buffer is larger than the size of the audio buffer of the audio interface, but smaller than the input audio file data or the predetermined file size of the input audio file. [0253] Item 22: Device (10) of at least one of items 15 to 21, wherein the audio input unit is a first audio input unit for receiving first mixed input data based on a periodic beat structure, and wherein the device further comprises: [0254] a second audio input unit for receiving second mixed input data (B) different from the first mixed input data (A) and based on a periodic beat signal, [0255] at least one of a tempo matching unit and a key matching unit, [0256] wherein the tempo matching unit is arranged to receive a first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, and wherein the tempo matching unit comprises a time stretching unit adapted to time stretch at least one of the first input data and the second input audio data, and to output first output data and second output data which have mutually matching tempos, and/or [0257] wherein the key matching unit is arranged to receive a first input data obtained from the first mixed input data and second input data obtained from the second mixed input data, and wherein the key matching unit comprises a pitch shifting unit adapted to pitch shift at least one of the first input data and the second input audio data, and to output first output data and second output data which have mutually matching keys. [0258] Item 23: Device (10) of at least one of items 15 to 22, wherein all of its components, in particular the audio input unit, the decomposing unit and the playing unit, are integrated within a single unit or within a number of local units connected to one another via a local network or via peripheral cable connections or via nearfield wireless connection. [0259] Item 24: Method for representing audio data, said audio data comprising at least a first track and a second track, which are components of a joint audio mix, said method comprising displaying a first waveform representative of the first track and displaying a second waveform representative of the second track, wherein the first waveform and the second waveform are displayed in an overlaid manner using one single baseline, and wherein the waveforms are displayed using different signal axes and/or different drawing styles such as to be visually distinguishable from one another. [0260] Item 25: Method of item 24, wherein the first waveform is displayed using a first drawing style which draws signal portions of the first waveform primarily or exclusively in a positive region relative to a baseline, and the second waveform is displayed using a second drawing style which draws signal portions of the second waveform primarily or exclusively in a negative region relative to the same baseline, wherein, preferably, the first waveform is displayed using a first drawing style which draws primarily or exclusively a positive signal portion of the first track, and the second waveform is displayed using a second drawing style which draws primarily or exclusively a negative signal portion of the second track. [0261] Item 26: Method of item 24 or item 25, wherein the first and second waveforms are displayed using first and second drawing styles which both draw primarily or exclusively the positive signal portion or which both draw primarily or exclusively the negative signal portion, and wherein the first waveform is displayed using a first signal axis and the second waveform is displayed using a second signal axis that runs opposite to the first signal axis. [0262] Item 27: Method of at least one of items 24 to 26, wherein the first waveform and/or the second waveform is displayed by rendering the waveform in a predetermined time interval with a color that depends on a frequency information of the respective track within the predetermined time interval, said frequency information preferably being indicative of a dominant frequency of the audio data over the predetermined time interval, which is preferably obtained from a frequency analysis of an audio signal derived from the audio data of the respective track within the predetermined time interval. [0263] Item 28: Method of at least one of items 24 to 27 and preferably one of items 1 to 14, comprising the steps of [0264] receiving mixed input data, said mixed input data being a sum signal obtained from mixing at least one first source track with at least one second source track, [0265] decomposing the mixed input data to obtain at least a first decomposed track resembling the at least one first source track, and a second decomposed track resembling the at least one second source track, [0266] reading a control input from a user, said control input representing a desired setting of a first volume level of the first decomposed track and a second volume level of the second decomposed track, [0267] displaying a first waveform representative of the first decomposed track and displaying a second waveform representative of the second decomposed track, wherein the first waveform and the second waveform are displayed in an overlaid manner using one single baseline, and wherein the waveforms are displayed using different signal axes and/or different drawing styles such as to be visually distinguishable from one another, [0268] wherein the first waveform is displayed with its signal axis being scaled or its appearance being modified depending on the first volume level, wherein the second waveform is displayed with its signal axis being scaled or its appearance being modified depending on the second volume level. [0269] Item 29: Method of item 28, wherein the first and second waveforms are displayed with their signal axes being scaled on the basis of current values of the first and second volume levels within a time period of not more than 2 seconds, preferably not more than 100 milliseconds, more preferably not more than 35 milliseconds. [0270] Item 30: Device (10) for processing and playing audio data, preferably DJ equipment, comprising [0271] a processing unit for processing audio data of at least a first track and a second track, [0272] a controlling section adapted to be controlled by a user to generate a control input representing a desired setting of a first volume level of the first track and a second volume level of the second track, [0273] a recombination/mixing unit configured to combine the first track at a first volume level with the second track at a second volume level to generate output data, [0274] a visualization unit configured to generate waveform data for visualizing at least one waveform based on the first track, the second track and the control input, [0275] a playing unit (34, 36) for playing audio data based on the output data, and [0276] optionally a display unit for displaying the waveform data. [0277] Item 31: Device of item 30, wherein the visualization unit is configured to generate a first waveform based on the first track, wherein a scaling of a signal axis or a drawing style of the first waveform is set depending on the first volume level, and/or to generate a second waveform based on the second track, wherein a scaling of a signal axis or a drawing style of the second waveform is set depending on the second volume level. [0278] Item 32: Device of item 30 or item 31, wherein the visualization unit is configured to calculate a combination track representing a combination of at least the first track at the first volume level and the second track at the second volume level, and to generate the waveform data such as to visualize the waveform of the combination track. [0279] Item 33: Device of at least one of items 30 to 32, wherein the device is configured to allow waveform data based on a particular control input to be generated and displayed on the display within a time period of not more than 2 seconds, preferably not more than 100 milliseconds, more preferably not more than 35 milliseconds, after the particular control input is generated by the user. [0280] Item 34: Device of at least one of items 30 to 33, wherein the visualization unit is configured to generate waveform data for visualizing a first waveform based on the first track and the control input, and a second waveform based on the second track and the control input, and wherein the waveform data are generated such as to display the first waveform and the second waveform in an overlaid manner using one single baseline, but different signal axes and/or different drawing styles such as to be visually distinguishable from one another. [0281] Item 35: Device of at least one of items 30 to 34, wherein the device is adapted to carry out the method of at least one of items 1 to 14, and/or is a device according to at least one of items 15 to 23, wherein the first track is preferably the first decomposed track and/or the second track is preferably the second decomposed track. [0282] Item 36: Device for processing and playing audio data, preferably according to at least one of the items 15 to 23 and 30 to 35 and/or device configured to carry out a method of at least one of items 1 to 14 and 24 to 29, the device comprising: [0283] an audio input unit for receiving a first track and a second track, said first track being a component of an audio mix track, [0284] a controlling section (24) adapted to be controlled by a user to generate a control input representing a desired setting of a first volume level of the first track and a second volume level of the second track, [0285] a playing unit (34, 36) for playing output data base on the first track at the first volume level and the second track at the second volume level, [0286] wherein the controlling section comprises at least one single control element (26-1, 26-2) which is operable by a user in a single control operation for controlling the first volume level and the second volume level, in particular changing a ratio between the first volume level and the second volume level from at least a value smaller than 1 to at least a value greater than 1 or vice versa. [0287] Item 37: Device of item 36, wherein the first track and the second track are components of the same audio mix track, wherein preferably the first track is a vocal track and the second track is a corresponding instrumental track.

METHOD AND DEVICE FOR DECOMPOSING, RECOMBINING AND PLAYING AUDIO DATA

Assignee

Inventors

Cpc classification

Classification Explorer

G10H2210/056

PHYSICS

Classification Explorer

G06F1/165

PHYSICS

Classification Explorer

G10H1/0025

PHYSICS

Classification Explorer

G10H2250/311

PHYSICS

Classification Explorer

G10H1/08

PHYSICS

Classification Explorer

G10H2210/125

PHYSICS

International classification

Classification Explorer

G10H1/00

PHYSICS

Classification Explorer

G10H1/08

PHYSICS

Classification Explorer

G06F1/16

PHYSICS

Abstract

Claims

Description