VOCAL PROCESSING WITH ACCOMPANIMENT MUSIC INPUT
20170221466 ยท 2017-08-03
Assignee
Inventors
Cpc classification
G10H1/383
PHYSICS
G10H2210/335
PHYSICS
G10H2210/081
PHYSICS
G10H2210/066
PHYSICS
G10H1/366
PHYSICS
G10H2210/245
PHYSICS
G10H2210/331
PHYSICS
G10H2220/211
PHYSICS
G10H1/361
PHYSICS
G10H2210/261
PHYSICS
International classification
Abstract
Systems, including methods and apparatus, for generating audio effects based on accompaniment audio produced by live or pre-recorded accompaniment instruments, in combination with melody audio produced by a singer. Audible broadcast of the accompaniment audio may be delayed by a predetermined time, such as the time required to determine chord information contained in the accompaniment signal. As a result, audio effects that require the chord information may be substantially synchronized with the audible broadcast of the accompaniment audio. The present teachings may be especially suitable for use in karaoke systems, to correct and add sound effects to a singer's voice that sings along with a pre-recorded accompaniment track.
Claims
1-20. (canceled)
21. A method of generating a musical harmony signal, comprising: scanning a pre-recorded accompaniment track stored in digital form on an electronic device; analyzing the pre-recorded accompaniment track to determine musical key data for the pre-recorded accompaniment track; broadcasting the pre-recorded accompaniment track to a user; receiving melody notes from the user; generating a harmony signal harmonized to a musical key determined by the musical key data of the pre-recorded accompaniment track and a melody note received from the user; transmitting the harmony signal to an output mechanism to produce output harmony audio; streaming an accompaniment audio signal corresponding to the pre-recorded accompaniment track to the output mechanism to produce accompaniment audio synchronized with the output harmony audio.
22. The method of claim 21, wherein analyzing the pre-recorded accompaniment track to determine musical key data includes detecting chord changes in the accompaniment track, and evaluating each chord change to determine whether to use the chord change to generate the harmony signal.
23. The method of claim 22, wherein evaluating each chord change includes determining if a duration of the change is less than a predetermined threshold, and wherein generating the harmony signal ignores chord changes having durations less than the predetermined threshold.
24. The method of claim 23, wherein the predetermined threshold is in the range of one to three seconds.
25. The method of claim 21, further comprising entering a pre-recorded accompaniment mode.
26. The method of claim 25, further comprising evaluating the accompaniment track to determine whether the accompaniment track is pre-recorded, before entering the pre-recorded accompaniment mode.
27. The method of claim 26, wherein evaluating the accompaniment track to determine whether the accompaniment track is pre-recorded includes recognizing a drum beat.
28. The method of claim 21, wherein streaming the accompaniment audio signal to the output mechanism is delayed by a time sufficient to synchronize the accompaniment audio with the output harmony audio.
29. The method of claim 21, further comprising transmitting the melody notes to the output mechanism to produce melody audio synchronized with the output harmony audio.
30. The method of claim 21, further comprising correcting a pitch of at least one of the melody notes to create pitch-corrected melody notes, and transmitting the pitch-corrected melody notes to the output mechanism to produce melody audio synchronized with the output harmony audio.
31. A harmony generating method, comprising: causing a digital signal processor to: (i) analyze a pre-recorded accompaniment track stored in digital form on an electronic device, to determine chord information for the pre-recorded accompaniment track; (ii) after determining chord information for the pre-recorded accompaniment track, broadcast the pre-recorded accompaniment track to a user; (iii) receive a melody audio signal produced by the user's voice; (iv) generate harmony notes based on the chord information and the melody audio signal; (v) transmit harmony notes to an audio output mechanism; and (vi) transmit an accompaniment audio signal corresponding to the pre-recorded accompaniment track to the audio output mechanism to produce accompaniment audio synchronized with the harmony notes.
32. The method of claim 31, wherein determining chord information includes detecting chord changes and evaluating each chord change to determine if a duration of the change is less than a predetermined threshold, and wherein generating harmony notes ignores chord changes having durations less than the predetermined threshold.
33. The method of claim 32, wherein the predetermined threshold is less than three seconds.
34. The method of claim 31, further comprising causing the digital signal processor to transmit the melody audio signal to the audio output mechanism to produce melody audio synchronized with the accompaniment audio and the harmony notes.
35. The method of claim 31, further comprising causing the digital signal processor to correct a pitch of at least one melody note of the melody audio signal to create a pitch-corrected melody audio signal, and to transmit the pitch-corrected melody audio signal to the audio output mechanism to produce pitch-corrected melody audio.
36. The method of claim 36, wherein the pitch-corrected melody audio is synchronized with the accompaniment audio and the harmony notes
37. A method of generating audio signals with a digital signal processor, comprising: with a digital signal processor, analyzing a musical accompaniment track to determine musical key information associated with the track; after determining musical key information associated with all of the accompaniment track, broadcasting the accompaniment track to a user with the digital signal processor; with the digital signal processor, receiving melody notes produced by the user; with the digital signal processor, correcting a pitch of at least one of the melody notes to create pitch-corrected melody notes harmonized to the musical key information associated with the accompaniment track; with the digital signal processor, transmitting the pitch-corrected melody notes and the accompaniment track to an output mechanism, wherein the pitch-corrected melody notes and the accompaniment track are synchronized when produced by the output mechanism.
38. The method of claim 37, wherein the key information includes key changes, and wherein the digital signal processor is configured to ignore key changes lasting less than a predetermined threshold duration.
39. The method of claim 37, further comprising, with the digital signal processor, generating a synthesized harmony signal harmonized to the pitch-corrected melody notes and the musical key information associated with the accompaniment track, and transmitting the synthesized harmony signal to the output mechanism to produce synthesized harmony audio, wherein the pitch-corrected melody notes, the accompaniment track and the synthesized harmony audio are synchronized when produced by the output mechanism.
40. The method of claim 37, wherein the digital signal processor is configured to create pitch-corrected melody notes corresponding to melody notes which are not in a key determined by the musical key information associated with the accompaniment track.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0012]
[0013]
[0014]
[0015]
[0016]
DETAILED DESCRIPTION
[0017] To overcome the issues described above, among others, the present teachings disclose improvements to the existing methods and apparatus for vocal processing live harmony and pitch correction effects. Specifically, the present teachings disclose (1) a new method of pre-recorded accompaniment track analysis, (2) delaying the audible output of a pre-recorded track for at least the time required to accurately synchronize harmony and pitch corrected voices to a spectrally detected chord in an associated pre-recorded accompaniment track, (3) utilizing the sync time buffer or delay or longer to reduce or eliminate harmony generation and pitch correction responses to short detected harmonics that are inconsistent with the playing pre-recorded accompaniment track and recorded track structure, statistics and theories, (4) scanning libraries of songs on a device or service and store the scale and key information associated with each song, (5) using advanced data to further inform the user about the detected key and scale information, and (6) providing the user the detected key(s) and scale(s), confirmation and selection of preferences of the detected key and scale information settings detected by the advanced scanning.
I. Distinguishing Live Input vs. Pre-Recorded Processing
[0018] According to one aspect of the present teachings, two distinct types of musical inputs are identified separately. Live and pre-recorded accompaniment may be processed in a different manner for purposes of generating more accurate harmony notes and pitch correction. Live performance input, such as a live guitar player's guitar input, will continue to require the current standard of low latency and generally non-interpreted spectral processing response for accompaniment data. That data is typically a single instrument musical input source, such as a guitarist playing a live guitar and singing with live harmony and pitch correction from the device.
[0019] According to one aspect of the present teachings, accompaniment music received at a signal processor may not be immediately amplified and played through a loudspeaker, but rather amplification may be delayed for at least the time it takes for the spectral content of the received signal to be analyzed and harmony notes and pitch correction to be generated. As a result, harmony notes may be produced which are essentially now fully synchronized with the amplified accompaniment and melody notes, or pitch corrected notes even after a chord change.
[0020] In the new approach, pre-recorded accompaniment music is distinguished from live accompaniment as a different species of musical accompaniment input driving the vocal processing algorithm. Pre-recorded song accompaniment can also be spectrally processed differently for lead notes, chords, keys, and the like by analyzing the music before it is played to the performer whereby any musically inconsistent spectral data based on commercial song structure and other factors can be filtered and potentially rejected producing highly accurate and musically correct pitch and harmony generation data before the audio is audibly played to the user. In other words, buffering or delaying the accompaniment audio (e.g., analyzing the future accompaniment signal and comparing it to the dominant spectral data) provides more accurate harmonization and pitch correction for pre-recorded songs than previous minimally interpretive live methods. In the live accompaniment analysis process, the accuracy detection and processing of the musical source key and scale information will be less accurate because the window of time to analyze and produce a result is very narrow to achieve as close to zero latency as possible for live performance.
[0021] In some cases, with a sonically complex multi-instrument recording accompaniment, a momentary incorrect lead note, scale, or chord change can occur as the result of the system incorrectly detecting a momentary sonic combination of instruments and track vocals, noise, fidelity and other variables. That could result in the system changing the entire key of pitch correction and harmony voices to an incorrect key. With the proposed advanced song accompaniment processing method, incorrect brief, repeated and/or sudden detection of lead note, scale or key changes which resolve quickly to the previous or dominate key, note and scale data can potentially be filtered and ignored, whereby the current dominant key, scale or lead note, remains uninterrupted, resulting in significantly fewer unwanted harmonically dissonant system generated tones and harmonies.
[0022] In a further extension of the present teachings, scanning up to an entire pre-recorded accompaniment track or library of accompaniment tracks on a device and deriving note, key and scale data may be implemented. The extent and duration of this pre-scanning can have any desired time scale to suit a particular application. For example, it can be short in duration, such as 100-200 milliseconds, or it can be one second, three seconds or much longer, including pre-scanning the entire track to produce a data result. Any amount of advanced track scanning or delay techniques provide the most accurate harmony, pitch correction and time synchronization processing relative to the music accompaniment. Pre-scanning, buffering or delaying a playing track a song track to the performer can allow a larger future data segment to determine the most accurate spectral information for pre-recorded song accompaniment, including the omission of frequent brief or lengthy harmonic anomalies found during spectral analyses which are statically inconsistent with standard multi-instrument and vocal songs statistics such as rapid key changes or musically dissonant chord data.
II. Audio Signal Delay for Pre-Recorded Accompaniment Music
[0023] As mentioned above, determining the current chord or other spectral data in an accompaniment signal takes a signal processor and harmony generator a finite amount of time, typically around 200 milliseconds. In preexisting harmony generation systems used with live music sources, that processing time is a source of inherent lack of synchronization of the generated harmony notes with the original melody and the accompaniment track. While this problem will always be present with live instrument accompaniment such as a guitar input, the present teachings overcome this problem for pre-recorded accompaniment by playing the track and delaying that musical output.
[0024] More specifically, harmony voices create a chord with the original melody voice. When chords in the pre-recorded accompaniment music change, the chords created by the melody and harmony voices ideally should change at the same time, rather than at some later time. However, in current live harmony generation systems, the input accompaniment signal is typically amplified immediately, whereas the harmony notes are determined and amplified later and are asynchronous. Therefore, in existing systems, synthesized harmony notes are generally not always synchronized with the detected chords in the original musical accompaniment signal. This can result in a certain discordant sound in the combined amplified output for a finite time after a chord change in the accompaniment audio.
[0025]
[0026] According to the present teachings, the amplified output accompaniment signal 18, including both the original accompaniment audio and any synthesized harmony notes, may be delayed relative to the input audio signal by a predetermined time, as depicted in
[0027] The block diagram of
[0028] The singer then sings in conjunction with the delayed loudspeaker output, so that the singer's melody signal 62 will be highly synchronized with the latest accompaniment chord that has already been analyzed. The singer's current melody note may be used in conjunction with the analyzed chord to generate harmony notes and/or pitch-corrected melody notes, collectively indicated at 64, with a digital signal processor 66 virtually immediately, resulting in essentially synchronized amplification of the singer's melody note or pitch corrected note, the accompaniment chord or notes, and processor generated harmony notes generated using the present melody and accompaniment data.
[0029] In other words, the presently described system provides a sufficient delay or buffer of the pre-recorded accompaniment song so that the singer's output and the accompaniment output is synchronized. The additional buffer window further provides the accompaniment spectral algorithm significantly more time to accurately interpret and process complex multi-instrument music. Although two separate digital signal processors 54 and 66 are shown in
III. Spectral Analysis Techniques for Pre-Recorded Accompaniment Music
[0030]
[0031] Method 100 allows for a comparatively longer analysis of spectral (i.e., musical note) information, which can even include future accompaniment spectral data and lead notes. Controlling harmony generation and pitch correction with the standard live method using pre-recorded accompaniment of any playable multi-instrument commercial song produces serious inaccuracies because this music source type is the most spectrally complex to analyze accurately in real time. Brief and quickly alternating spectral and harmonic interpretation errors occur due to the complex harmonics of a given music track or for other reasons. These errors are amplified immediately causing incorrect pitch correction and harmony generation. Unlike live performance and live music structure, these events in a pre-recorded song are highly likely to be incorrect data or noise and need to be buffered and filtered for a period of time while the system, for example, maintains the previous and musically correct consistent data. Therefore, in conjunction with the novel delay feature for harmony synchronization, further new methods of controlling and potentially limiting harmony and pitch correction responsiveness are required to greatly improve accuracy. Live instrument methods are insufficient.
[0032] This new method combines commercial song structure statistical data such as the fact that commercial songs generally stay in one key from the detected song start point. When most commercial songs change key, the key is maintained for a significant period of time. Incorrect musical spectral interpretation occurs frequently with pre-recorded songs, when inadvertent notes or other types of noise are incorrectly interpreted as a key change. The harmony and pitch algorithm in the new method analyzes the future segment of the audible track to omit these errors, relying on the consistency of pre-recorded music structure. Since a novice user can select any possible pre-recorded song in existence to sing along and be the source to control the harmony and pitch correction, the new method directs the pitch correction and harmony notes response to buffer sudden inconsistent accompaniment data following known commercial music standards.
[0033] Furthermore, sonically complex prerecorded accompaniment songs can be spectrally analyzed in a manner whereby musically inconsistent sonic analyses data moments (errors) are expected by the control algorithm, and the pitch correction and or harmony generation can be controlled to ignore spectral inconsistencies, maintain the current and future (music scanned in advance) dominant musical features, and ignore these brief errors.
[0034] At step 102, an accompaniment track or library of accompaniment tracks is provided. At step 104, a desired accompaniment track or set of provided accompaniment tracks is scanned and analyzed by a signal processor to determine its spectral information. Because there is no urgency to accomplish this in order to synchronize with live playing of accompaniment instruments, time is provided to confirm accurate spectral information and filter potentially erroneous and musically incorrect spectral data. In the case of a detected and potentially erroneous harmonic data point, both pitch correction and harmony generation can be maintained to the previous data point, or only the pitch or scale correction can be maintained to the previous data point while the harmony generation is allowed to follow the potentially erroneous chord data point, balancing the risk that at least one of the two will be musically correct. Moreover, with the additional time that can be spent on spectral analysis, confirming a song key or chord change can be performed accurately and consistently.
[0035] At step 106, melody notes are received, typically produced by a karaoke singer's voice, and harmony notes and pitch corrected notes are generated based on the melody notes in conjunction with the recently analyzed accompaniment music. The system maintains output of current key/scale and chord during the buffer period. Also, if a singer is detected as holding a note for a duration of time determined to be a held or sustained note, the algorithm can maintain at least the initial pitch corrected note steady and in some cases the harmony notes can also be maintained, briefly ignoring other conflicting spectral information.
[0036] More specifically, according to the present teachings, the performer's held note data may be interpreted by the effects processing algorithm as strongly intending to hold that distinct note, and possibly also to hold the current harmony combination, temporarily overriding any conflict with the key and chord data. The algorithm can resume processing after the held note is released. Rapidly adjusting or pitch correcting a held or sustained note and potentially an associated harmony drastically to another note in the scale or a different key would confuse the performer who obviously intended to maintain those notes and harmonies. Also during this time, additional techniques may be applied to avoid unpleasant harmony or pitch generation, such as by maintaining the output of the current or dominant scale, key and chord data.
[0037] At step 108, an evaluation is performed to determine if the current key and scale of the melody notes should be maintained, or if they should be adjusted, and any adjustment is performed. For example, step 108 may include determining if a current melody note is musically complementary with the current accompaniment note, i.e., falls within the same key. In addition, step 108 may include determining if the key of the current accompaniment note is a reliable indication of the accompaniment key, or if it is an anomaly based on a mistake or inadvertent key change in the accompaniment music. This can be accomplished by evaluating the duration of the accompaniment key and ignoring key changes of sufficiently short duration. Because the accompaniment music may be analyzed in advance, evaluating the duration of the accompaniment key can also be done in advance. It need not be done at the instant a particular melody note is sung and detected.
[0038] For example, key changes or detected dissonant chord detection anomalies in the accompaniment music of fewer than three seconds, fewer than two seconds, or under any other desired time threshold may be ignored for purposes of performing corrections to the current melody note and or harmony notes. If however, an accompaniment key change is determined to be an actual, intentional key change in the music, then the melody note can be adjusted into the proper key if necessary. Furthermore, if it is determined that the melody note is already in the proper key but is off-pitch (i.e., sharp or flat), the melody note also may be shifted to correct its sound. Pitch shifting of melody notes may be accomplished, for example, using the well known technique of pitch synchronous overlap and add (PSOLA). A description of this technique is found, for instance, in U.S. Patent Application Publication No. 2008/0255830, which is hereby incorporated by reference for all purposes. Additional pitch shifting methods are disclosed, for example, in U.S. Pat. No. 5,973,252, which is also hereby incorporated by reference for all purposes.
[0039] At step 110, the generated harmony notes and the melody, including any pitch correction, is synchronized with the accompaniment track. Finally, at step 112, the accompaniment track, the vocal harmonies, and the originally sung melody notes with possible pitch correction and/or other chosen sound effects, all are output, for instance through an output jack or directly from a speaker integrated with a harmony generating karaoke device.
IV. Additional Examples
[0040]
[0041] At step 216, the user selects a single accompaniment track for an immediate performance. At step 218, the track accompaniment begins to play but is not audible to the user. Instead, at step 220, a delay buffer stores the track in memory for at least the time required to synchronize the harmony and pitch correction output with the latest detected chord accompaniment, and perhaps longer. During this time, at step 222, the spectral analysis algorithm of the effects processor attempts to determine the current key, scale and chord in the accompaniment song. Special pre-recorded song based filters and algorithms are enabled for this purpose, which are different from live guitar input algorithms. At step 224, the accompaniment is broadcast audibly to the user, for example through a loudspeaker, and at step 226, the processor receives melody notes sung by the user.
[0042] At step 228, the processor detects a key, chord, or lead note change in the accompaniment audio and/or in the melody notes, and evaluates the change to determine whether to accept the change for purposes of harmony generation and/or pitch correction. If the duration of the change is less than a predetermined threshold duration, such as three seconds, two seconds, one second, or any other desired threshold, the algorithm ignores the change and maintains the current or dominant key, chord or lead note data. On the other hand, if a change is detected for a consistent duration past the threshold, the algorithm may accept the change for purposes of harmony generation and pitch correction.
[0043] At step 230, the processor generates harmony notes and makes any pitch correction deemed necessary. Since the buffered delay of the audible audio is at least the time to spectrally analyze the accompaniment track and generate the harmony notes and pitch corrected notes, the harmony notes and accompaniment chords are synchronized. When the track accompaniment ends, at step 232 a duration of silence can be detected by the spectral algorithm. At step 234, the processor then can potentially reset or remove any previous spectral history. Upon recognition of a starting track from a period of silence, a new spectral history for that song can begin to be stored, returning to step 210 of the method.
[0044]
[0045] System 300 includes a chord detection circuit 302, which also may be referred to simply as a chord detector, a harmony processing circuit 304, which may be referred to more generally as a note generator, and a delay circuit 306, which also may be referred to as a delay unit. In some cases, chord detection circuit 302, harmony processing circuit 304 and delay circuit 306 all may be portions of a digital signal processor, as indicated at 308. Furthermore, digital signal processor 308 may be integrated into a karaoke machine 310, along with other components such as an amplifier 312, a loudspeaker 314 and/or a microphone 316.
[0046] Chord detection circuit 302 is configured to receive and analyze an accompaniment audio signal, and to determine chord information corresponding to a chord of the accompaniment audio signal. In other words, the chord detector is configured to receive an accompaniment audio signal, to analyze the accompaniment audio signal to determine chords contained within the accompaniment audio signal, and to produce chord information corresponding to the chords that have been determined. This process generally takes a particular duration of time, which is typically on the order of hundreds of milliseconds, such as 200 ms.
[0047] Harmony processor circuit or note generator 304 is configured to receive and analyze the chord information produced by the chord detector along with melody notes received from a singer, and to produce a synthesized harmony signal corresponding to each detected chord and melody note. The harmony signal will be harmonized to the chord of the accompaniment audio signal and the melody note, and the harmony processing circuit is typically configured to transmit the harmony signal to a loudspeaker to produce harmony audio.
[0048] Delay circuit or unit 306 is configured to receive the accompaniment audio signal, and to store the accompaniment audio signal in memory for a predetermined delay time until the chord detector produces the chord information. The delay circuit is further configured to stream the accompaniment audio signal to the loudspeaker after the predetermined delay time has lapsed to produce accompaniment audio. In some cases, the predetermined delay time approximates the duration of time required for the chord detector to extract chord information from the accompaniment audio signal. In other cases, the delay time may be longer, and may allow for additional analysis of the accompaniment audio.
[0049] When system 300 or portions thereof are integrated into a karaoke machine such as machine 310, the accompaniment audio signal will typically be pre-recorded, and the melody notes will be received in real time from a karaoke singer using microphone 316. In this case, system 300 will be configured to generate harmony notes as quickly as possible after receiving each melody note, i.e., the system may be configured to produce the harmony signal substantially in real time with receiving and amplifying the melody note. To accomplish this, the harmony processing circuit may be further configured to transmit the melody note to the loudspeaker, along with the harmony notes and the accompaniment signal. According, system 300 may be configured to broadcast the accompaniment audio signal, the melody audio signal and any generated harmony notes through the loudspeaker substantially simultaneously.
[0050] Digital signal processor 308 also may be configured to perform other functions. For example, the digital signal processor may be configured to determine a musical key of the accompaniment audio signal and to create a pitch-corrected melody note by shifting the melody note received from the singer into the musical key of the accompaniment audio signal, and to transmit the pitch-corrected melody note to the loudspeaker. In other words, the digital signal processor (or a portion thereof, such as the note generator) may be configured to determine a pitch of the melody note and to generate a pitch-corrected melody note if the pitch of the melody note is musically inconsistent with the chord information. When pitch-shifted melody notes are generated, they may be broadcast through the loudspeaker in place of the corresponding original melody notes, which have presumably been determined to contain a pitch error. In some cases, however, the system may be configured to amplify and audibly produce both the original melody notes and the pitch-shifted notes, for instance as a method of allowing a karaoke singer to hear the correction.
[0051] In some cases, the note generator may be configured to generate a pitch-corrected melody note only based on chord information representing chord changes lasting longer than a predetermined threshold duration. That is, the note generator may be configured to ignore short-term chord changes that have a high probability of misrepresenting the overall pattern or intent of the accompaniment music. Similarly, the harmony generator may be configured to ignore such short-term chord changes. Generally speaking, short-term chord changes may be ignored for purposes of generating harmony notes, generating pitch-shifted melody notes, or both.
[0052] In addition to possibly ignoring chord changes that occur for less than a predetermined duration, signal processor 308 may be configured to ignore other types of chord information, such as chord information that is determined to represent sounds produced by percussion instruments or by other sources that are unlikely to embody a musician's intent to change chords. As in the case of short-term chord changes, such source specific chord information can be ignored for purposes of generating harmony notes, generating pitch-shifted melody notes, or both.