Method and installation for processing a sequence of signals for polyphonic note recognition

10068558 · 2018-09-04

Assignee

Inventors

Cpc classification

International classification

Abstract

This is a method and installation in which a time-domain digital audio signal is split into a plurality of narrow-band time-domain digital audio signals confined to specific frequency bands, short-term segments of which are temporarily stored in memory. The method comprises the use of signal processing algorithms for extracting multiple signal features from said short-term segments in a fixed sequence or upon request from a decision-making algorithm. Said decision-making algorithm makes tentative or final decisions about the type of occupancy of frequency bands resulting from the extracted features. Said decision-making algorithm may request from said signal processing algorithms further specific feature extractions from specific short-term segments and make further tentative or final decisions about the type of occupancy of frequency bands resulting from the requested features. Next, said decision-making algorithm stores its tentative decisions and makes final decisions about band occupancy for processing together with results from later short-term segments. Eventually, said decision-making algorithm outputs final decisions derived from current and past short-segments in the form of a set of notes having been played over some recent time interval, together with information as to the timing of each note from the set.

Claims

1. A method for processing an original time-domain digital audio signal wherein said signal is split into a plurality of narrow-band time-domain digital audio signals confined to specific frequency bands, short-term segments of which are temporarily stored in memory, the method comprising: using signal processing algorithms, extracting from said segments of said narrow-band time-domain signals, in a fixed sequence or upon request from a decision-making algorithm, one or more narrow-band time-domain features selected from a group of narrow-band time-domain features comprising instantaneous frequency or characteristics derived therefrom, instantaneous period or characteristics derived therefrom, instantaneous envelope or characteristics derived therefrom, and the time-domain positions of zero-crossings derived from sample values, directly or by interpolation, or characteristics derived therefrom, using said decision-making algorithm, making tentative or final decisions about a type of occupancy of frequency bands resulting from said narrow-band time-domain features, using said decision-making algorithm, requesting from said signal processing algorithms further specific feature extractions from specific short-term segments and makes tentative or final decisions about the type of occupancy of frequency bands resulting from the requested features, using said decision-making algorithm, storing the tentative and final decisions about band occupancy for processing together with results from later short-term segments, and using said decision-making algorithm, outputting final decisions derived from current and past short-term segments in the form of a set of notes having been played over some recent time interval, together with information relating to the timing of each note from the set.

2. The method according to claim 1, wherein said decision making also takes into account the short-term power of said original time-domain digital audio signal.

3. The method according to claim 1, wherein said decision making also takes into account restrictions on band occupancy patterns based on a priori knowledge that said time-domain digital audio signal originates from a specific musical instrument with specific physical restrictions in the simultaneous playing of specific sets of notes.

4. The method according to claim 1, wherein said decision making includes, in addition to identifying the frequency bands in which the fundamental frequencies of notes are detected, continuous segment-wise estimations of the actual fundamental frequencies of the notes that have been detected, the translation of such continuous segment-wise estimations of the actual fundamental frequencies into single-note tuning information, and the ability to output this single-note tuning information.

5. The method according to claim 1, wherein said decision making includes a specific recognition of note onsets, the extraction of onset-related timing information, the calculation of deviations in timing with respect to the timing of individual notes of a pre-defined reference sequence of single or multiple notes, and the ability to output such timing information and timing deviations.

6. The method according to claim 1, wherein said decision making also includes extracting, from single-note tuning information and a priori knowledge that said time-domain digital audio signal originates from a specific musical instrument, additional information on the tuning behavior of said instrument.

7. The method according to claim 1, wherein said decision making also includes extracting information for the purpose of adaptively improving the performance of the decision making algorithm.

8. An apparatus for processing a sequence of signals wherein an original time-domain digital audio signal is split into a plurality of narrow-band time-domain digital audio signals confined to specific frequency bands, short-term segments of which are temporarily stored, with physical elements including at least a processor and a memory allowing use of signal processing algorithms for: extracting from said short-term segments one or more narrow-band time-domain features selected from a group of narrow-band time-domain features comprising instantaneous frequency or characteristics derived therefrom, instantaneous period or characteristics derived therefrom, instantaneous envelope or characteristics derived therefrom, and the time-domain positions of zero-crossings derived from sample values, directly or by interpolation, or characteristics derived therefrom, said extraction of said features taking place in a fixed sequence or upon request from a decision-making algorithm, then having said decision-making algorithm make tentative or final decisions about the type of occupancy of frequency bands resulting from said narrow-band time-domain features, then having said decision-making algorithm request from said signal processing algorithms further specific narrow-band time-domain features from specific short-term segments and make tentative or final decisions about the type of occupancy of frequency bands resulting from said requested features, said decision-making algorithm storing its tentative and final decisions about band occupancy in said memory for processing together with results from later short-term segments, and said processor further having said decision-making algorithm output final decisions derived from current and past short-term segments in the form of a set of notes having been played over some recent time interval, together with information as to the timing of each note from the set.

9. The apparatus according to claim 8, further comprising a microphone as the source of the original time-domain digital audio signal.

10. The apparatus according to claim 8, further comprising a display, and having said display visually represent the set of notes having been played over some recent time interval, together with information as to the timing of each note from the set.

Description

(1) In the following the method will be explained and described by way of examples relating to the following figures, which show:

(2) FIG. 1 describes individual oscillations a represented by spectral lines;

(3) FIG. 2 beats which can be observed within one specific narrow band occupied by two spectral lines;

(4) FIG. 3 The steps of a Fourier transform processing from signals to notes;

(5) FIG. 4 A signal processing from signals to notes using a bank of narrow-band band-pass filters;

(6) FIG. 5 An improved method for processing signals to notes using individual time sequences of signals confined to each individual band, which are stored temporarily in order for a single feature or a plurality of features to be extracted from the signals being stored in memory, either in a fixed sequence or upon request from a decision-making algorithm;

(7) FIG. 6 A specific implementation of this mechanism according to FIG. 5 in which a short segment of the time domain output of a given frequency band is processed in order to approximate its signal envelope and to extract a frequency measurement from the signal segment's zero crossings;

(8) FIG. 7 represents the overall logical structure of a processor for implementing the invention.

BRIEF DESCRIPTION OF THE FIGURES

(9) FIG. 1 describes a situation in which a first note being played is represented by the sum of a fundamental oscillation and a number of harmonic oscillations, and a second note being played simultaneously is also represented by the sum of another fundamental oscillation and a number of harmonic oscillations. The individual oscillations are represented by spectral lines, and some frequency bands can be occupied by spectral lines originating from both the first and the second note.

(10) FIG. 2 describes the phenomenon of beats which can be observed within one specific narrow band occupied by two spectral lines with a small difference in frequency (consistent with the narrow bandwidth of the frequency band) and with approximately similar amplitudes.

(11) FIG. 3 describes the mechanism by which taking the Fourier transform (windowed or not) of a finite-length segment of a digital audio signal, then taking the same Fourier transform of the following, adjacent finite length segment of the digital signal etc. yields, in each band, one single number for each finite length segment of the digital signal representing the power of all contributions of the input signal to this particular band. In other words, there is a significant information reduction in performing the Fourier transform on contiguous segments and in using one single number to characterize the conditions within a given band. In other words, deciding for each band one time per segment whether it can be defined as a peak or not and only processing the position in the frequency domain of the set of peaks so defined is equivalent to a very significant reduction in the amount of information available relative to a given band for decision-making.

(12) FIG. 4 describes the mechanism by which an input signal occupying a wide band of frequencies is split by a bank of band pass filters, generating at its outputs individual time sequences of signals confined to each individual band. It is common practice, in such implementations, to measure the signal energy present in each band over a given time interval, to characterize each band as a peak or non-peak exclusively on the basis of the energy measurement, and to address the process of decision-making solely on the base of the position in the frequency domain of the set of peaks so defined, which again is equivalent to a very significant reduction in the amount of information available for decision-making.

(13) FIG. 5 describes the fundamental mechanism by which an input signal occupying a wide band of frequencies is split by a bank of band-pass filters, generating at its outputs individual time sequences of signals confined to each individual band, which are stored temporarily in order for a single feature or a plurality of features to be extracted from the signals being stored in memory, either in a fixed sequence or upon request from a decision-making algorithm. While accumulated energy in each band can obviously be calculated with such a scheme, it is equally possible to extract information-rich band-signal characteristics such as average values, variances, maximal and minimal values, local maxima and minima, signal envelopes, parameters of polynomial approximations, interpolated values, statistics of distances between observed or calculated zero crossings, etc.

(14) FIG. 6 describes a specific implementation of this mechanism in which a short segment of the time domain output of a given frequency band is processed in order to approximate its signal envelope and to extract a frequency measurement from the signal segment's zero crossings. In the case of a single spectral component with a quasi-stationary behaviour, the envelope will be flat, apart from a possible small fluctuation caused by noise. In the case where two spectral components are present in the band, the envelope will generally feature a distinct and measurable slope. In other words, detecting a segment of the envelope with a slope too large to have been caused by noise is a strong indication that more than one spectral line is present. On the other hand, an essentially flat envelope indicates either the presence of a single spectral component, or that of two or more spectral components the sum of which yields a short term maximum. Further information can be extracted from the statistics of the distances measured between zero crossings. Combining information from the envelope and from a frequency measurement can contribute to a more accurate estimation of the spectral component or components present within the band over the observation segment. The observation of subsequent segments will yield additional information, for example when the sum of two or more spectral components starts yielding the signal increasingly differing from the previous maximum. This simple and often very clear-cut distinction between the presence of one and that of several spectral components is not possible when peaks are only defined by the total energy present within a given band.

(15) FIG. 7 describes the overall logical structure of a processor for implementing the invention. The input signal is split into narrow bands, and short-term segments are entered in a band segment signal memory. An algorithmic block for feature extraction can read the segments from memory and execute commands from a decision making algorithmic block requesting specific features. The segment decision making algorithmic block processes features from several short-term simultaneous segments from several bands. Features and decisions are stored short-term in a segment decision memory. A higher-level algorithmic block for decision making processes results from several short-term segments and several bands and outputs information on notes, their timing, and chords.

DETAILED DESCRIPTION OF THE INVENTION

(16) In the present invention, a set of narrow-band, time-domain signals is generated from the input signal via a band-pass filter bank, which itself can be implemented, as is well known to persons of the art, either by implementing the individual filters directly, or by performing at least one part of the processing via Fourier transformation. The resulting time-domain signals are temporarily stored, thus allowing for a pre-defined or a decision-dependent extraction of relevant features from the individual narrow-band time-domain signals. An early peak/non-peak decision based on energy average measurement is not performed.

(17) Digital signal processing algorithms are installed which can extract specific features from the individual, narrow-band time-domain signals, such as, for illustration and not as an exhaustive list, by processing short-term statistics, signal envelopes, envelope-derived signal parameter estimates, and frequency measurements and their statistics.

(18) The results of such signal processing allow a decision-making algorithm to reach tentative or final partial decisions concerning the non-occupancy, the ambiguous occupancy, and the single or multiple occupancy of individual frequency bands by spectral components, and also to represent the corresponding segments of band signals in terms of sets of parameters from signal models.

(19) The decision-making algorithm requests a first set of features to be extracted from a set of time-domain band signals. Upon reception and processing of such features, the decision-making algorithm may require further features to be selectively extracted from some time-domain band signals, and the process of requesting features, processing the results, and possibly requesting further features can be repeated a number of times depending on the signal properties and the complexity of decision making.

(20) It is clear to a person of the art that the time signals belonging to one particular decision interval can be stored exclusively for the duration of the decision interval, but also stored over consecutive several decision intervals, in order to confirm or infirm tentative decisions made over short periods of time. Similarly, it is also possible to store extracted features over several consecutive decision intervals.

(21) It is also clear to a person of the art that, while the invention has been described within the scope of detecting notes on the basis of fundamentals and harmonics, it can equally be applied to the task of detecting multiple sounds which are not characterized by simple harmonic models, to the task of reliably detection the onset of musical notes, and to the task of extracting ongoing information relative to the tuning of the instrument.

(22) It is further clear to a person of the art that the method of signal processing described in this invention can be implemented either offline on in real-time, and run on a general-purpose stationary or portable computer of sufficient processing power with the necessary built-in or external peripherals (for example a desktop computer or a notebook), a special-purpose stationary or portable device of sufficient processing power with the necessary built-in or external peripherals (for example a tablet or a smartphone), or a dedicated electronic device of sufficient processing power with the necessary built-in or external peripherals.

(23) It is further clear to a person of the art that the individual functional blocks mentioned in this invention can be implemented in a plurality of ways, such as, in the sense of a list of illustrative examples and not as an exhaustive list, within separate signal processors or within a common one, using separate memory devices or common ones, and with code that can be either stored in a fixed form, or retrieved from an external code repository, or compiled locally on demand.