AUTOMATIC MUSIC DOCUMENT DISPLAYING ON PERFORMING MUSIC

20230067175 · 2023-03-02

Assignee

Inventors

Cpc classification

International classification

Abstract

A user interface presents structural musical information in a score in a way where both the start and the end points of each jump in the score are visible simultaneously. Each jump is presented in a manner that allows the user to select during performance, which one of different alternatives to choose, when approaching a decision point like a repeat in the song.

Claims

1. A computer implemented method comprising maintaining a music document indicating what should be performed in a piece of music; and repeatedly: displaying on a display in a main view a first part of the music document when a user performs the piece of music; identifying a source portion of the music document that is being performed by the user; identifying in the musical document a plurality of potential jump targets including at least any positions that are structurally associated with the source portion; displaying on the display one or more of the potential jump targets in respective one or more target views together with the main view; identifying a destination portion of the musical document for performing after the source portion; and updating the main view to display a second part of the musical document at the destination portion.

2. The method of claim 1, further comprising in the identifying of the destination portion, receiving a user indication of a chosen jump target from the user or another person.

3. The method of claim 2, further comprising receiving the user indication using one or more of any one of: a hard key; a soft key; a touch screen; a button.

4. The method of claim 2, further comprising receiving the user indication using a gaze detection and positions of the main view and the target views on the display.

5. The method of claim 2, further comprising receiving the user indication using an audio signal produced by the user.

6. The method of claim 5, the identifying of the destination portion comprising detecting which part of the piece of music the user is performing.

7. The method of claim 5, the identifying of the destination portion comprising recognizing what the user is performing by singing, using lyrics of the musical document.

8. The method of claim 5, the identifying of the destination portion comprising using speech recognition and determining verbal indication of the destination portion.

9. The method of claim 5, the identifying of the destination portion comprising using symbolic information of the music document, which symbolic information describes the piece of music.

10. The method of claim 9, further comprising adding supplementary symbolic information to the music document allowing the user to indicate the destination portion by performing accordingly.

11. The method of claim 10, wherein the symbolic information comprises an indication of notes or chords, and/or an indication of rhythm.

12. The method of claim 5, wherein the source portion is the first part of the music document or a sub-part of the first part.

13. The method of claim 5, further comprising identifying in the musical document a plurality of structurally associated jump sources and jump targets; and identifying one or more of the structurally associated jump sources in the source portion, and responsively identifying and displaying in respective target views one or more associated jump targets.

14. The method of claim 5, further comprising in the identifying of the destination portion, receiving a user indication of the chosen jump target from another person.

15. The method of claim 5, further comprising in the identifying of the destination portion, receiving a plurality of indications, and probabilistically determining the destination portion as the one that is most likely based on the plurality of indications.

16. The method of claim 15, further comprising hierarchically prioritizing some of the plurality of indications.

17. The method of claim 16, further comprising in the prioritizing, weighing more indications that are more reliable over indications that are less reliable.

18. The method of claim 5, further comprising displaying the main view and the target views in an order corresponding to that of an original musical document.

19. The method of claim 5, further comprising the music document comprising music sheets having lines and notes marked therein; the method further comprising adapting the music sheets by removing staves not needed for performing by the user.

20. An apparatus comprising at least one memory and processor collectively configured to cause performing the method of claim 1.

Description

BRIEF DESCRIPTION OF THE FIGURES

[0067] Some example embodiments will be described with reference to the accompanying figures, in which:

[0068] FIG. 1 schematically shows an apparatus according to an example embodiment;

[0069] FIG. 2 shows a block diagram of an apparatus according to an example embodiment; and

[0070] FIG. 3 shows an example screen shot with notes and lyrics masked;

[0071] FIG. 4 shows an example screen shot with notes and lyrics masked;

[0072] FIG. 5 shows an example screen shot with notes and lyrics masked; and

[0073] FIGS. 6a to 6d show a flow chart of a process of an example embodiment.

DETAILED DESCRIPTION

[0074] In the following description, like reference signs denote like elements or steps.

[0075] FIG. 1 schematically shows an apparatus 100 of an example embodiment. The apparatus 100 has a user interface comprising a display 110 and some soft or hard keys 120, 130, 140. The display may be a touch display.

[0076] In an embodiment, the user interface presents structural musical information in a score in a way where both the start and the end points of each jump in the score are visible simultaneously, and each jump is presented in a manner that allows the user to select during performance, which of the alternatives to choose, when approaching a decision point like a repeat in the song.

[0077] The display can be seen to comprise two different kinds of views: [0078] Main view: The primary area of the proposed user interface that shows the part of the music the user is currently performing. [0079] Target view(s): An auxiliary area of the user interface that shows the end point of a jump in score, where the start point of the jump is currently visible in the main view. In an example embodiment, the target view is shown and hidden contextually based on where the user is in the score and can be activated for the user to navigate the main view to the corresponding part.

[0080] FIG. 2 shows a block diagram of an apparatus 100 according to an example embodiment. The apparatus 100 comprises a communication interface 210; a processor 220; a user interface 230; and a memory 240.

[0081] The communication interface 210 comprises in an embodiment a wired and/or wireless communication circuitry, such as Ethernet; Wireless LAN; Bluetooth; GSM; CDMA; WCDMA; LTE; and/or 5G circuitry. The communication interface can be integrated in the apparatus 100 or provided as a part of an adapter, card, or the like, that is attachable to the apparatus 100. The communication interface 210 may support one or more different communication technologies. The apparatus 100 may also or alternatively comprise more than one of the communication interfaces 210.

[0082] In this document, a processor may refer to a central processing unit (CPU); a microprocessor; a digital signal processor (DSP); a graphics processing unit; an application specific integrated circuit (ASIC); a field programmable gate array; a microcontroller; or a combination of such elements.

[0083] The user interface may comprise a circuitry for receiving input from a user of the apparatus 100, e.g., via a keyboard; graphical user interface shown on the display of the apparatus 100; speech recognition circuitry; or an accessory device; such as a headset; and for providing output to the user via, e.g., a graphical user interface or a loudspeaker.

[0084] The memory 240 comprises a work memory 242 and a persistent memory 244 configured to store computer program code 246 and data 248. The memory 240 may comprise any one or more of: a read-only memory (ROM); a programmable read-only memory (PROM); an erasable programmable read-only memory (EPROM); a random-access memory (RAM); a flash memory; a data disk; an optical storage; a magnetic storage; a smart card; a solid-state drive (SSD); or the like. The apparatus 100 may comprise a plurality of the memories 240. The memory 240 may be constructed as a part of the apparatus 100 or as an attachment to be inserted into a slot; port; or the like of the apparatus 100 by a user or by another person or by a robot. The memory 240 may serve the sole purpose of storing data or be constructed as a part of an apparatus 100 serving other purposes, such as processing data.

[0085] A skilled person appreciates that in addition to the elements shown in FIG. 2, the apparatus 100 may comprise other elements, such as microphones; displays; as well as additional circuitry such as input/output (I/O) circuitry; memory chips; application-specific integrated circuits (ASIC); processing circuitry for specific purposes such as source coding/decoding circuitry; channel coding/decoding circuitry; ciphering/deciphering circuitry; and the like. Additionally, the apparatus 100 may comprise a disposable or rechargeable battery (not shown) for powering the apparatus 100 if external power supply is not available.

[0086] FIG. 3 shows a screen shot of an example embodiment. Here, the display 110 is divided into two parts. A top part is showing the main view, which takes up the largest amount of space on display and displays the section of the music the user is currently performing. When a coda marking 310 (highlighted above a fourth system of staves) is visible in the main view, or the apparatus 100 otherwise knows the user is approaching this potential jump start point, the apparatus 100 shows a corresponding jump target point 320 of the jump in a target view at the bottom of the display. The target view is indicated to be perceivably discontinuous or separated from the main view in this case by a line and a shadow effect 330 on bottom of the main view.

[0087] FIG. 4 shows a similar case, but with jump from the D.S. con rep. al coda marking 410 to a beginning 420 of the verse that has a similar marking. In this case, the target view is shown at the top of the display, again separated by a line and a shadow 330 from the main view. For a jump backward in the original score, it may be advantageous to present the target view at the top of the page, whereas a jump forward may be better presented at the bottom. In other words, the main view and the target view(s) may be ordered by progression of the piece of music, although this need not necessarily be the case. Generally speaking, any of the location, size, scaling, and/or shape of the target view are not essential to the present invention, rather these may be set specifically to each implementation and/or context.

[0088] Simultaneously visible target views need not to be limited to one. FIG. 5 shows an example screenshot with two different target views: one at the top and another one at the bottom of the display. When there are multiple jump start points visible in the main view, there can also be multiple different target views visible. Again, the number, positions, and other parameters of the target views in FIG. 5 are presented purely for illustration of an example and not intended for limitation of this invention.

[0089] There are also many different alternatives to the presentation of the main view. The main view can be a continuously scrolling view, like the one depicted in FIG. 5, or a page-based presentation of FIG. 4, for example. The scaling and size of the main view can also depend on the application and context. For example, it is possible to reduce the size and/or scaling of the main view, while one or more target views are visible. It is also possible to that two or more target views reside on one side of the main view in terms of the normal progression of music in the piece of music. In such a case, the target views may lead or trail the main view in an order such that the user can easily perceive how to target views relate to the main view with relation to the piece of music.

[0090] In an example embodiment, the user can select to jump to the target view. In the simplest implementations, selecting the jump is a manual action by the user. The selection action could be based on, e.g., the user tapping on a touch screen, selecting from a keyboard, or using a suitable remote-control device. There are also more automated approaches discussed in the following.

[0091] The target view serves a purpose of the user being able to follow the music through the transition, because they can see all the necessary information in the main and target view before and after proceeding to the jump target. In an example embodiment, when the user selects the jump, a transition to the target view is provided in a way that expands the target view visually to ensure continuity. The target view may expand to cover the whole display 110, so becoming, or merging with, the main view.

[0092] While it is possible to base visibility of the target view(s) purely on the structural elements visible on the display 110, it is advantageous to estimate, where the user is in the song. In the situation of FIG. 3, it is not necessary to display a jump target (coda target view before the user is approaching the jump origin 310 (al coda mark), e.g., when the user starts performing the fourth system (line) of the score. Similarly, in case of FIG. 5, the bottom and top target views only need to be shown, when the user is in on the second or the third system of the main view, respectively. This helps declutter the combination of different views.

[0093] Additionally, the contents of the main view can be made to reflect the current position in score, e.g., by scrolling the current position approximately to the center of the display 110. It is also possible to move a position pointer about the score while the user is performing and perform the scrolling in larger chunks. Moreover, in an embodiment, the music document is manually moved in the main view and the current position is known to be somewhere within the main view. In a further embodiment, the current position within the main view is estimated.

[0094] In a simple case, the source portion in the score is estimated simply based on the time it would take to perform each system of the score. In the case of FIG. 3, performing the first three visible systems with a typical tempo would take about 36 seconds. If we know, when the user started from the repeat mark on top the display 110, the coda target view could be shown only after that time. For this automation to be useful, it does not need to be particularly accurate, as performing each line in a typical score can take several seconds. For enhanced performance, however, the user may be allowed to change the timings either directly, or through a tempo setting or similar.

[0095] As pointed out in the background section, matching performing to the position in score can also be based on more sophisticated methods, such as analyzing the audio signal. For displaying the target views, even a fairly coarse position estimate may be useful.

[0096] Alternatively, estimating the position in the score can be based on, e.g., using a gaze-tracking (eye-tracking) device to determine where the user is looking at in the score. The accuracy of many eye-tracking systems suffices for this application. Such systems are also increasingly more common in regular mobile devices. Also, a combination of different methods for the estimate can be used.

[0097] Both the audio analysis and the gaze-tracking method of position estimation can also be used to signal user selection to the apparatus 100. In the case of an audio signal analysis system, the different paths through the song will differ at some point after the jump possibility, and the analysis system could maintain two or more hypotheses, and determine the user selection, once there is a significant difference in the likelihood of these options. For instance, in FIG. 5, the al coda mark creates two potential continuations, represented by the last line in the main view, and the line visible in the lower target view. These continuations start differing slightly in the second bar, and more significantly in the third bar, at which point the apparatus 100 can determine, whether the user selected coda instead of continuing in the verse. In case of a gaze-tracking device, user selection can be confirmed as soon as the apparatus 100 has reliably detected where user continues to focus on the display 110. A time-based position estimate would only be able to signify the target view selection in the case, where the jump is the only way forward in the music. For example, in FIG. 4, the only way forward from segno is to the top of the display 110; coda does not make sense as a continuation. That said, in some occasions the user may wish to repeat some part in deviation of the normal continuation or progression in the piece of music.

[0098] In an example embodiment, the apparatus 100 is provided with structural information about the current score to be presented. In an example embodiment, the apparatus 100 is based on processing musical data purely in symbolic form so that the apparatus can render that information in the user interface as required. This approach can produce very flexible results, as many parameters of presentation can be easily controlled to fit the current context. In another example embodiment, the apparatus 100 is based on a purely visual representation of the score (such as a digital image), with just enough auxiliary information to make the apparatus 100 functional.

[0099] The minimum information that is needed for this user interface to function as required by one example embodiment are the locations and relationships of start and end points for backward and forward jumps in the score. This allows basic target views to be created and shown when the corresponding start point or jump origin is visible in the main view. For better visual results, the apparatus 100 may have further information about at least the height of each line in score, such that each target view can be more appropriately sized and scaled. A gaze-tracked version of the interface will particularly benefit from understanding the locations and sizes of all displayed score lines. This information, combined with the direction of the user's gaze, allows gaze tracking the performing (focus) position in the score. An implementation based on analyzing audio signals, and updating the interface based on it, can be operated on more symbolic information about the music itself (e.g., notes, chords, transposition), and how they would be represented in the audio signal.

[0100] If the display 110 of the visual score is not based on symbolic information alone, any visual representation may be augmented with necessary symbolic information. This information is attached in an example embodiment to the visual representation by the end-user herself through either the apparatus 100 or with some other equipment, optionally using a same application that is used to provide automatic scrolling in the music document. In an example embodiment, this work may be performed by someone else and, for example, stored on a server, and delivered to the user through a network service. In an example embodiment, an image recognition system is used to automatically analyze the visual representation, recognize necessary symbols from it, and to augment the visual representation with necessary symbolic data. Such a recognition system may be implemented either locally (in the same device) or (in part) as a remote service through a network. Irrespective of how the symbolic information is first created, the apparatus 100 can be arranged to allow the user to edit any and all information user for constructing the interface.

[0101] It shall also be appreciated that the apparatus 100 need not perform all processing locally. The apparatus 100 may operate as a (dumb) terminal only acting as a man-machine interface, while processing is performed partially or entirely in a remote location, possibly in a computer cloud.

[0102] Various embodiments may further function well with regular scores designed for traditional print-out format. The symbolic structural information needed for the interface may also allow the apparatus 100 to unroll the score either automatically or by the user. Unrolling can help make jumps in score less frequently used (the score would typically be performed in the predetermined order), but can, on the other hand increase the number of jump possibilities, for example, if the song has many verses. If this is in some cases the preferred presentation, it may be necessary for the apparatus 100 to limit the number of target view options simultaneously available in the interface.

[0103] While the discussion and the examples above mostly relate to regular western musical notation, it is possible to apply the same techniques to other visual representations of musical information. The score could consist of, e.g., only lyrics and chords, or be a tablature for stringed instruments. This apparatus 100 is also most useful in cases, where the score can fit multiple lines in the main view of the interface. For example, full orchestral scores may have very large systems of staves, which does not easily lend itself to this presentation. The example presentations could still be adapted for such use cases by scaling, shaping, partial presentations in the main view or target views, and other modifications.

[0104] Some embodiments allow for compact, flexible presentation and usage of musical information on electronic devices with limited display space. With the presented user interface, users may easily and flexibly jump to different parts of the musical score, without losing sight of the notation itself both before and after the jump. When the presentation is augmented with automation techniques such as audio signal analysis or gaze-tracking, the user may maintain musical flexibility and achieve mostly or entirely hands-free operation.

[0105] In this document, a staff (br. stave) may refer to a notation including five parallel horizontal lines that, with a clef, indicate the pitch of musical notes.

[0106] A system of staves may refer to a collection of staves connected vertically to be performed simultaneously (e.g., piano left and right hands).

[0107] A Line (in score) may refer to an individual staff or a system of staves that can be used as a basis for structural navigation in the score. In case music is presented as a lyrics-and-chords representation, a line may refer to a single line of lyrics, associated with the chords to be performed.

[0108] A main view may refer to a primary area of the display 110 showing the part of the music the user is currently performing. A target view may refer to an auxiliary area of the user interface that shows the end point of a jump in score. A target view may be shown and hidden contextually and can be activated (selected) by the user to navigate the main view to the corresponding part.

[0109] FIGS. 6a to 6d show a flow chart according to an example embodiment. FIG. 6 illustrates a process comprising various possible steps including some optional steps while also further steps can be included and/or some of the steps can be performed more than once: [0110] 600. maintaining a music document indicating what should be performed in a piece of music; and repeatedly; [0111] 601. displaying on a display in a main view a first part of the music document when a user performs the piece of music; [0112] 602. identifying a source portion of the music document that is being performed by the user; [0113] 603. identifying in the musical document a plurality of potential jump targets including at least any positions that are structurally associated with the source portion; [0114] 604. displaying on the display one or more of the potential jump targets in respective one or more target views together with the main view; [0115] 605. identifying a destination portion of the musical document for performing after the source portion; and [0116] 606. updating the main view to display a second part of the musical document at the destination portion.

[0117] The process may be entirely or partially automatic.

[0118] The process may further comprise any one or more of further options: [0119] 607. identifying the source portion as the first part of the music document; [0120] 608. identifying the source portion as a sub-part of the first part; [0121] 609. in the identifying of the destination portion, receiving a user indication of a chosen jump target; [0122] 610. receiving the user indication using one or more keys or buttons; [0123] 611. receiving the user indication using a touch screen; [0124] 612. receiving the user indication received using a gaze detector configured to identify the chosen jump target based on the gaze of the user and the order of the main view and the target views on the display; [0125] 613. determining the user indication from an audio signal produced by the user; [0126] 614. determining the user indication from an audio signal produced by the user by detecting which part of the piece of music the user is performing; [0127] 615. maintaining in the music document music sheets having lines and notes marked therein and adapting the music sheets by removing staves not needed for performing by the user; [0128] 616. detecting which part of the piece of music the user is performing at least in part by recognizing what the user is singing, using lyrics of the musical document; [0129] 617. detecting which part of the piece of music the user is performing at least in part using speech recognition and determining verbal indication of the destination portion; [0130] 618. in the determining of the user indication from the audio signal produced by the user, employing symbolic information of the music document, which symbolic information describes the piece of music, wherein the symbolic information may comprise an indication of notes or chords, and/or an indication of rhythm; [0131] 619. adding supplementary symbolic information to the music document allowing the user to indicate the destination portion by performing accordingly, such as extraneous notes to be performed for providing the user indication, optionally comprising an indication of notes or chords, and/or rhythm, wherein the supplementary symbolic information may be made unique within the music document, such as given otherwise unused notes, chords, or sequences of notes or chords; [0132] 620. identifying in the musical document a plurality of structurally associated jump sources and jump targets; [0133] 621. identifying one or more of the structurally associated jump sources in the source portion, and responsively identifying and displaying in respective target views one or more associated jump targets; [0134] 622. in the identifying of the destination portion, receiving a user indication of the destination portion from another person, such as a leader of a band or orchestra; [0135] 623. in the identifying of the destination portion, receiving a plurality of indications; [0136] 624. probabilistically determining the destination portion as the one that is most likely based on the plurality of indications; [0137] 625. hierarchically prioritizing some of the plurality of indications; [0138] 626. in the prioritizing, weighing more indications that are more reliable over indications that are less reliable, wherein the determination of reliability optionally is predetermined or assessed on forming one or more of the plurality of indications; [0139] 627. displaying the main view and the target views in an order corresponding to that of an original musical document; [0140] 628. in the displaying of the main view and the one or more target views, user perceivably indicating discontinuations of displayed parts of the music document; [0141] 629. in the user perceivably indicating, displaying a border; [0142] 630. in the displaying of the main view and the one or more target views, adjusting the main view; [0143] 631. in the adjusting of main view, shrinking the main view in the direction of a potential jump target to free display area; [0144] 632. adding the respective target view using the freed display area; [0145] 633. performing the shrinking of the main view gradually to free display area by degree; [0146] 634. performing the adding the respective target view gradually using the freed display area; [0147] 635. vertically dividing display area of the display into the main view and the one or more target views; [0148] 636. vertically dividing display area of the display into the main view and the one or more target views so that at least one stave system height is provided for each of the main view and the one or more target views; [0149] 637. scrolling of the music document in the main view on performing the piece of music [0150] 638. expanding the target view of the chosen jump target to the main view on the updating of the display; [0151] 639. unrolling the music document to contain fewer jumps when performed according to the music document; [0152] 640. limiting number of potential target views presented to the user.

[0153] Any of the afore described methods, method steps, or combinations thereof, may be controlled or performed using hardware; software; firmware; or any combination thereof. The software and/or hardware may be local; distributed; centralized; virtualized; or any combination thereof. Moreover, any form of computing, including computational intelligence, may be used for controlling or performing any of the afore described methods, method steps, or combinations thereof. Computational intelligence may refer to, for example, any of artificial intelligence; neural networks; fuzzy logics; machine learning; genetic algorithms; evolutionary computation; or any combination thereof.

[0154] Various embodiments have been presented. It should be appreciated that in this document, words comprise; include; and contain are each used as open-ended expressions with no intended exclusivity.

[0155] The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments a full and informative description of the best mode presently contemplated by the inventors for carrying out the invention. It is however clear to a person skilled in the art that the invention is not restricted to details of the embodiments presented in the foregoing, but that it can be implemented in other embodiments using equivalent means or in different combinations of embodiments without deviating from the characteristics of the invention.

[0156] Furthermore, some of the features of the afore-disclosed example embodiments may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the principles of the present invention, and not in limitation thereof. Hence, the scope of the invention is only restricted by the appended patent claims.