Real-time jamming assistance for groups of musicians
10504498 · 2019-12-10
Assignee
Inventors
Cpc classification
G10H2210/061
PHYSICS
G10H1/383
PHYSICS
G10H1/0025
PHYSICS
G10H1/368
PHYSICS
G10H2210/141
PHYSICS
G10H2210/066
PHYSICS
G10H2210/105
PHYSICS
International classification
Abstract
Real-time jamming is automatically assisted for musicians. A real-time audio signal is received of played music that is played by at least one person. Beat is tracked of the played music from the real-time audio signal and accordingly a time of a next beat is predicted. At least one of chords; notes; and drum sounds is recognized from the real-time audio signal and repetitions in the played music are accordingly detected. A next development is predicted in the played music, based on the detected repetitions, including at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat. A real-time output is produced based on the predicted next development in the played music.
Claims
1. A method comprising: automatically receiving a real-time audio signal of played music that is played by at least one person; automatically tracking beat of the played music from the real-time audio signal and accordingly automatically predicting a time of a next beat; automatically recognising from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly automatically detecting repetitions in the played music; automatically predicting a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat; and automatically producing a real-time output based on the predicted next development in the played music wherein: the predicting of the at least one of chords; notes; and drum sounds is performed by detecting self-similarity in the played music; and the user is allowed to select a desired musical style and the predicting of the next development is performed accordingly.
2. The method of claim 1, wherein a pre-defined library of musical patterns is used to assist in the predicting of the next development in the played music.
3. The method of claim 2, wherein the library contains any one or more musical patterns selected from a group consisting of: popular chord progressions; musical rules about note progressions; and popular drum sound patterns.
4. A method comprising: automatically receiving a real-time audio signal of played music that is played by at least one person; automatically tracking beat of the played music from the real-time audio signal and accordingly automatically predicting a time of a next beat; automatically recognising from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly automatically detecting repetitions in the played music; automatically predicting a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat; and automatically producing a real-time output based on the predicted next development in the played music; wherein the predicting of the at least one of the chords; notes; and drum sounds is performed by detecting self-similarity in the played music; the method further comprising: computing a self-similarity matrix; updating the matrix by comparing the frame against all the previously buffered frames when a new frame is formed from the real-time audio signal.
5. A method comprising: automatically receiving a real-time audio signal of played music that is played by at least one person; automatically tracking beat of the played music from the real-time audio signal and accordingly automatically predicting a time of a next beat; automatically recognising from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly automatically detecting repetitions in the played music; automatically predicting a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat; and automatically producing a real-time output based on the predicted next development in the played music; wherein: the predicting of the at least one of chords; notes; and drum sounds in performed by detecting self-similarity in the played music; wherein hashing is used to enable using longer periods of the received audio signal; and locality sensitive hashing (LSH) is used to detect a sequence of past frames of the received audio signal that matches the latest sequence of frames.
6. The method of claim 5, wherein the tracking of the beat of the played music from the real-time audio signal adapts to fluctuation of the tempo of the played music.
7. The method of claim 5, wherein: the self-similarity is calculated using analysing of the received real-time audio signal so as to extract an internal representation for the played music; and the internal representation comprises: a sequence of feature vectors that represent the musical contents of a short segments of received audio signal; or a sequence of high-level descriptors of the received audio signal, wherein the high-level descriptors comprise any one or more of chords; notes; and drum sound notes.
8. The method of claim 5, wherein the producing of the real-time output comprises displaying any one or more of: musical notation; chords; drum notes; given fret indication; an instrument key indication; and a drum specific indication.
9. The method of claim 5, wherein the producing of the real-time output comprises displaying a timeline with indication of events placed on the timeline such that the timeline comprises several rows on the screen.
10. The method of claim 5, wherein the producing of the real-time output comprises visualising repeating sequences.
11. The method of claim 5, wherein the user is allowed to select one or more recorded songs and the recorded songs are processed as if previously received in the real-time audio signal.
12. The method of claim 5, wherein the producing of the real-time output comprises performing one or more instruments along with the played music.
13. A non-transitory memory medium comprising computer executable program code which when executed by at least one processor causes an apparatus at least to perform the method of claim 1.
14. A method comprising: automatically receiving a real-time audio signal of played music that is played by at least one person; automatically tracking beat of the played music from the real-time audio signal and accordingly automatically predicting a time of a next beat; automatically recognising from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly automatically detecting repetitions in the played music; automatically predicting a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat; and automatically producing a real-time output based on the predicted next development in the played music wherein: the predicting of the at least one of chords; notes; and drum sounds is performed by detecting self-similarity in the played music; and a musical key of the played music is shown to the user.
15. A non-transitory memory medium comprising computer executable program code which when executed by at least one processor causes an apparatus at least to perform the method of claim 14.
16. An apparatus comprising a processor and computer program code configured to cause the apparatus to automatically perform, on executing by the processor of the computer program code: receiving a real-time audio signal of played music that is played by at least one person; tracking beat of the played music from the real-time audio signal and accordingly predicting a time of next beat; recognising from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly detecting repetitions in the played music; predicting a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat; and producing a real-time output based on the predicted next development in the played music; wherein the processor and computer program code are configured to cause the apparatus to perform, on executing by the processor; the predicting of the at least one of: chords; notes; and drum sounds by detecting self-similarity in the played music; and allowing the user to select a desired musical style and performing accordingly the predicting of the next development.
17. The apparatus of claim 16, wherein the processor and computer program code are configured to cause the apparatus to perform, on executing by the processor, the tracking of the beat of the played music from the real-time audio signal adapting to fluctuation of the tempo of the played music.
18. An apparatus comprising a processor and computer program code configured to cause the apparatus to automatically perform, on executing by the processor of the computer program code: receiving a real-time audio signal of played music that is played by at least one person; tracking beat of the played music from the real-time audio signal and accordingly predicting a time of next beat; recognising from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly detecting repetitions in the played music; predicting a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the predicted time of the next beat; and producing a real-time output based on the predicted next development in the played music; wherein the processor and computer program code are further configured to cause the apparatus to: perform the predicting of the at least one of chords; notes; and drum sounds by detecting self-similarity in the played music; and show a musical key of the played music to the user.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Some example embodiments of the present disclosure will be described with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION
(7) In the following description, like reference signs denote like elements or steps.
(8)
(9)
(10) receiving 210 a real-time audio signal of played music that is played by at least one person;
(11) tracking beat 220 of the played music from the real-time audio signal and accordingly estimating a time of a next beat;
(12) recognising 230 from the real-time audio signal at least one of chords; notes; and drum sounds and accordingly detecting repetitions in the played music;
(13) predicting 240 a next development in the played music, based on the detected repetitions, comprising at least one of chords; notes; and drum sounds that will be played next, and respective timing based on the estimated time of the next beat; and
(14) producing 250 a real-time output based on the predicted next development in the played music.
(15) In an embodiment, signals of a plurality of the instruments 110 are combined to the received audio signal. The combining is performed e.g. acoustically by capturing with one microphone sound produced by plural instruments 110 and/or electrically by combining electric signals representing outputs of different instruments 120.
(16) The real-time audio signal of the played music is received e.g. using the internal microphone 122, external microphone 130 and/or an instrument input such as MIDI or electric guitar input.
(17) The tracking 220 adapts, in an embodiment, to fluctuation of the tempo of the played music.
(18) In an embodiment, the tracking of the beat comprises detecting a temporal regularity in the music. The tracking of the beat may simulate tapping the foot to the music by musicians.
(19) The predicting of the at least one of chords; notes; and drum sounds can be performed by detecting self-similarity in the played music. Certain chord/note/drum sound progressions tend to be repeated and varied within a song. That allows a competent musician to start playing along a previously-unheard song after listening to it for a while, since they detect a part that they have heard earlier in the song. The jamming assistant 120 is provided to help also less experienced people with this respect.
(20) In order to calculate self-similarity, the received real-time audio signal can be analysed and an internal representation for the played music can be extracted, such as a sequence of feature vectors and/or a sequence of high-level descriptors of the received audio signal.
(21) The feature vectors can be numeric. Each feature vector may represent a short segment of music represented by the audio signal, such as frames of 10 ms to 200 ms of the audio signal. A sequence of successive frames represents longer segments of the received audio signal. The sequence may comprise at least 20, 50, 100, 200, 500, 1 000, 10 000, 20 000, 50 000, 100 000, 200 000, 500 000, 1 000 000, or 2 000 000, frames.
(22) The high-level descriptors comprise, for example, chords, notes, and/or drum sound sounds or notes (in a human readable form).
(23) Let us denote the internal representation by R and to a latest frame by T so that R(T) refers to the internal representation of the latest frame. R(T1) then refers to the second-latest frame. Let us further assume that a total of N frames are buffered or kept in a memory of the jamming assistant, for example. R(TN+1) will then refer to an oldest frame that is buffered. N can be chosen to cover the real-time audio signal for a period from half a minute up to several days. The buffer (of frames) is maintained in one embodiment from one music or jamming session to another, possibly regardless whether an apparatus running the method would be shut down or software implementing the method would be closed.
(24) In an embodiment, a self-similarity matrix is computed in order to detect repetitions in the played music.
(25) The self-similarity matrix is computed, for example, by comparing a plurality of frames (e.g. every frame) in the memory against a plurality of other frames (e.g. every other frame). When a new frame is formed from the real-time audio signal, the matrix can be updated by comparing the frame against all the previously buffered frames. The matrix can so be formed to contain similarity estimates between all pairs of the buffered frames. The similarity estimates can be calculated using a similarity metric between the internal representations R for the frames being compared. An inverse of the cosine (or Euclidean) distance between feature vectors may be used.
(26) In an embodiment, hashing is used to enable using longer periods of the received audio signal. For example, in the case of extremely long memory lengths N (for example several days), buffering the entire similarity matrix may be undesirable as required buffer size grows proportionally to a square of N. In this embodiment, only the internal representation itself is kept for frames that are older than a certain threshold. For those frames, hashing techniques such as locality sensitive hashing (LSH) is then used to detect a sequence of frames that matches the latest sequence of frames. LSH as a technique differs from the use of the self-similarity matrix, but may serve the same purpose in detecting an earlier sequence of frames that is similar to the latest sequence of frames. Generally, LSH helps to reduce dimensionality of high-dimensional data by hashing input items such that similar items map to the same buckets with high probability. The number of buckets is much smaller than the universe of possible input items, which saves processing cost.
(27) In an embodiment, the detecting of the repetitions in the played music comprises detecting that latest L frames are very similar to a sequence of frames that happened X seconds earlier. That two sequences of frames are very similar (i.e. sufficiently similar for indicating repetition in the played music) can be determined e.g. by comparing their similarity (e.g. inverse of Euclidean distance) to a set threshold. For example, repetition may be detected if the similarity is above a given threshold for the pair of representations R at times T and TX, for the pair at times T1 and TX1, and so forth until the pair at times TL and TXL. When repetition is detected, the next development in the played music can be predicted for coming frames from current time T onwards.
(28) The user can be allowed to select a desired musical style (such as rock, jazz, or bossa nova for example). The predicting of the next development can then be performed accordingly i.e. based on the selected style.
(29) In step 240, the respective timing based on the estimated time of the next beat need not be limited to defining the time on the next beat. Instead, the next time to play the predicted development may be timed at an offset of some fraction of the time between beats from the next beat. The offset may be anything from k to I beats, wherein k=1 and I is greater than or equal to 0, for example 0; N/8, N/16, N/32 wherein N is an integer greater or equal to 1. For example, the offset could be 5/8 or 66/16 beats i.e. more than one beats ahead but not necessarily with the same beat division as the base beat. Yet the timing would be based on the next beat.
(30) In an example embodiment, the real-time outputting comprises displaying any one or more of: musical notation such as notes, chords, drum notes and/or activating given fret, instrument key or drum specific indicators. The displaying may be performed using a display screen or projector.
(31)
(32) In an embodiment, the producing of the real-time output comprise visualising repeating sequences. When the latest L events indicate a repetition of a previously-seen sequence, the previously seen matching sequence(s) can be visually highlighted on the device screen as illustrated in
(33) A pre-defined library of musical patterns is used in an embodiment to assist in the predicting of the next development in the played music. The library contain, for example, any one or more musical patterns selected from a group consisting of: popular chord progressions; musical rules about note progressions; and popular drum sound patterns. A user can select one or more recorded songs and the recorded songs can then be processed as if previously received in the real-time audio signal. Subsequently, when the user is performing in real time afterwards, the latest sequence of frames can be compared also against the internal representation formed based on the recorded songs and it can be detected if the user is performing one of the recorded songs or playing something sufficiently similar and use that song in the predicting of the next development in the played music. In an embodiment, the musical key of the recorded songs is detected on their processing and the comparison of similarity is performed with a further step of converting the musical key of the recorded songs to match that of the currently played music. In this embodiment, the jamming assistant can propose a next development based on a recorded song that would suit to the played music except for its musical key and so broader selection of useful reference material can be used. Furthermore, the jamming assistant can simplify transposition of the played music to better suit to the singer or singers (e.g. players of the instruments or pure vocalists).
(34) By using recorded songs, it is possible to from learn possible patterns while the user is still allowed to play with rhythm, musical key (free transposition to another key) and style of her own preference freely deviating from those of the recorded songs as in a jamming session with other musicians.
(35) A musical key of the played music can be shown to the user.
(36) In an embodiment, the producing a real-time output may comprise performing one or more instruments along with the played music. For example, the jamming assistant can be configured to produce a corresponding midi-signal to be interpreted and played by a synthesizer with an instrument sound chosen by the user or selected by the jamming assistant (e.g. based on the recorded songs or pre-set rules, e.g. base or drums are less universally transportable from one instrument to another than e.g. flute, piano and violin).
(37)
(38) Various embodiments have been presented. It should be appreciated that in this document, words comprise, include and contain are each used as open-ended expressions with no intended exclusivity.
(39) The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments of the present disclosure a full and informative description of the best mode presently contemplated by the inventors for carrying out the disclosed embodiments. It is however clear to a person skilled in the art that the present disclosure is not restricted to details of the embodiments presented in the foregoing, but that it can be implemented in other embodiments using equivalent means or in different combinations of embodiments without deviating from the characteristics of the present disclosure.
(40) Furthermore, some of the features of the afore-disclosed embodiments of the present disclosure may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the principles of the present disclosure, and not in limitation thereof. Hence, the scope of the present disclosure is only restricted by the appended patent claims.