Methods and apparatus for obtaining enhanced mass spectrometric data

Abstract

A method comprising decomposing mass spectrometry data, especially of ion species that undergo multiple direction changes in a periodic manner, the data comprising signal and noise measured over time, into a sum of K harmonic component signals and a noise component, wherein the harmonic component signals and their number K are derived from the data and a determined quantity representative of the noise. The harmonic component signals and their number K may be determined iteratively on the basis of: using an initial value of K to calculate a minimised non-negative measure of difference R.sup.(K) between the measured and model data comprising data sets of K-harmonic component signals, and if R.sup.(K) does not lie within a noise range based on the quantity representative of the noise, changing the value of K and recalculating R.sup.(K) until R.sup.(K) lies within the noise range. Mass spectral information may be derived from the model data set.

Claims

1. A method of operating a mass spectrometer comprising an Orbital Trap mass analyzer or a Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass analyzer, the method comprising: generating a plurality of species of ions using an ion source of the mass spectrometer, the species being within a range of mass-to-charge ratios; introducing the plurality of species into the Orbital Trap or FT-ICR mass analyzer such that the plurality of species undergo periodic motion within a range of frequencies within the Orbital Trap or FT-ICR mass analyzer; acquiring transient data comprising signal and noise measured over time that has been obtained from detection of an image current generated by the periodic motions of the plurality of species of ions within the Orbital Trap or FT-ICR mass analyzer; determining a quantity representative of the noise from the acquired transient data; determining a noise range based upon the quantity representative of the noise; determining a model data set of K-harmonic component signals from the acquired transient data, each component signal comprising a complex-valued amplitude that includes phase information and a complex-valued frequency that includes time decay information, wherein the harmonic component signals and their number K are determined iteratively on the basis of: using an initial value of K to calculate a minimized nonnegative measure of difference R.sup.(K) between the acquired transient data and model data comprising data sets of K-harmonic component signals; and if R.sup.(K) does not lie within the noise range, changing the value of K and recalculating R.sup.(K) as many times as necessary until R.sup.(K) does lie within the noise range; generating mass spectral information about the ion species from the model data set, the mass spectral information comprising, for each of the K harmonic component signals, a mass-to-charge ratio and an estimate of a number of ions of a respective corresponding species of ions; and controlling subsequent acquisitions of the mass analyzer using the generated mass spectral information, wherein either a resolution of the mass-to-charge ratios of the generated mass spectral information utilized for controlling the subsequent acquisitions is greater than a resolution of mass-to-charge ratios calculated by a Fast Fourier Transform method, or the mass-to-charge ratios of the generated mass spectral information have fewer artifacts relative to mass spectral information generated using a Filter Diagonalization method.

2. The method according to claim 1 wherein the measure of difference R.sup.(K) comprises a minimized normalized sum of residuals between the acquired transient data and the model data at a plurality of data points.

3. The method according to claim 1 wherein R.sup.(K) is recalculated for increasing values of K starting from an initial value of 0.

4. The method according to claim 1 wherein R.sup.(K) is recalculated for decreasing values of K starting from an initial value.

5. The method according to claim 1 wherein an initial value for K is determined from a number of peaks in the frequency domain spectrum of the acquired transient data.

6. The method according to claim 1 wherein the value of K is changed and R.sup.(K) is recalculated until the value of K is the minimum value of K for which R.sup.(K) is less than, or is equal to, the quantity representative of the noise.

7. The method according to claim 1 wherein the value of K is changed and R.sup.(K) is recalculated until R.sup.(K) becomes the closest value to the quantity representative of the noise.

8. The method according to claim 1 wherein the quantity representative of the noise comprises a noise power and the noise power is determined by a method comprising one or more of: evaluating the noise power from the acquired transient data; evaluating the noise power from a previous or another set of data acquired from the mass analyzer; measuring characteristics of preamplifiers used in the data measuring apparatus of the mass analyzer; setting a noise power on the basis of prior knowledge of the mass analyzer.

9. The method according to claim 1 wherein the model data set comprises a harmonic signal which may be described by a sum of K complex exponential terms each multiplied by complex amplitudes, and the K harmonic signals are derived assuming the harmonic signal possesses autocorrelative properties.

10. The method according to claim 1 wherein the measure of difference R.sup.(K) is described by a term or terms involving: $\min \frac{1}{N} {.Math.}_{n = 0}^{N - 1} {.Math. c_{n} - c_{n}^{*} .Math.}^{2},$ where c.sub.n is acquired data at each of N data points, and c*.sub.n is the K-harmonic signal at each of N data points in the model data set.

11. The method according to claim 1, wherein the generating of the mass-to-charge ratios of the species of ions includes determining the mass-to-charge ratios of the K species of ions by performing the steps of: deriving a set of autocorrelation coefficients, a, relating terms c*.sub.n according to a.sub.0c*.sub.n+a.sub.1c*.sub.n+1+ . . . +a.sub.Kc*.sub.n+K=0, where c*.sub.n is the K-harmonic signal at acquired data points in the model data set; combining the autocorrelation coefficients, a, in a polynomial equation of the form a.sub.0+a.sub.1+a.sub.2.sup.2 . . . +a.sub.K.sup.K=0 where is a complex number; deriving the frequencies of the K harmonic signals from the roots, .sub.k, of the polynomial equation; and translating each of the K frequencies of the K harmonic signals from the frequency to the mass-to-charge domain.

12. The method according to claim 11, wherein the generating of mass spectral information about the ion species includes determining an estimate of the number of ions of each species within the Orbital Trap or FT-ICR mass analyzer, wherein the number of ions of each species is determined from the amplitudes of the K-harmonic signals, the amplitudes being found by minimization of the residual R, where R is of the form $R = (\frac{1}{N}) \underset{n}{.Math.} {.Math. c_{n} - \underset{k}{.Math.} {A_{k} (_{k})}^{n} .Math.}^{2},$ and c.sub.n is acquired data at each of N data points.

13. The method according to claim 1 wherein the acquired transient data corresponds to periodic motions of ions of a limited range of mass-to-charge ratios selected from a larger range of mass-to-charge ratios, said larger range of mass-to-charge ratios corresponding to a larger transient data set.

14. The method of claim 13 wherein the transient data corresponding to the restricted range of mass-to-charge ratios is selected from the larger transient data set by a method comprising: obtaining a frequency spectrum of the larger transient data set to form a transformed data set; selecting a range of frequencies in the frequency domain spectrum of the transformed data set to form a transformed data subset, and; transforming the transformed data subset back into the time domain to form the acquired transient data.

15. The method according to claim 1, wherein generating of mass spectral information about the ion species from the model data set includes deriving a mass spectrum from the model data set comprising a set of K harmonic component signals.

16. A mass spectrometer system including an Orbital Trap or FT-ICR mass analyzer comprising: an ion source that, in operation, generates a plurality of species of ions within a range of mass-to-charge ratios, each species having a different mass-to-charge ratio, wherein the Orbital Trap or FT-ICR mass analyzer receives the generated plurality of species of ions; and a computer electronically coupled to the Orbital Trap or FT-ICR mass analyzer so as to receive, from the Orbital Trap or FT-ICR mass analyzer, acquired transient data comprising signal and noise measured over a time duration and acquired from an image current generated by the Orbital Trap or FT-ICR mass analyzer as a result of periodic motions of the plurality of species of ions within a range of frequencies within the mass analyzer over the time duration, wherein the computer is configured to: determine a quantity representative of the noise from the acquired transient data; determine a noise range based upon the quantity representative of the noise; determine a model data set of K-harmonic component signals from the acquired transient data, each component signal comprising a complex-valued amplitude that includes phase information and a complex-valued frequency that includes time decay information, wherein the harmonic component signals and their number K are determined iteratively by: using an initial value of K to calculate a minimized non-negative measure of difference R.sup.(K) between the acquired transient data and model data comprising data sets of K-harmonic component signals, and if R.sup.(K) does not lie within the noise range, changing the value of K and recalculating as many times as necessary until R.sup.(K) does lie within the noise range; generate mass spectral information about the ion species from the model data set, the mass spectral information comprising, for each of the K harmonic component signals, a mass-to-charge ratio and an estimate of a number ions of a respective corresponding species of ions; and control subsequent acquisitions of the mass analyzer using the generated mass spectral information, wherein either a resolution of the mass to charge ratios of the generated mass spectral information utilized for controlling the subsequent acquisitions is greater than a resolution of mass-to-charge ratios calculated by a Fast Fourier Transform method, or the mass-to-charge ratios of the generated mass spectral information have fewer artifacts relative to mass spectral information generated from a Filter Diagonalization method.

Description

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

(1) FIG. 1 shows a schematic block diagram depicting an example of the method of the present invention.

(2) FIG. 2A shows a schematic block diagram depicting an example of the method of the present invention in further detail.

(3) FIG. 2B shows in further detail a step in the method of the present invention, the step being the calculation of the minimum residual.

(4) FIG. 3 shows a schematic representation of part of a process of selecting a frequency window of interest from the Fourier transform of the measured data and performing an inverse FT.

(5) FIGS. 4A and 4B show a schematic block diagram depicting a workflow of an algorithm for performing the invention. FIG. 4A shows steps 2-50 of the algorithm; FIG. 4B shows steps 60-100 of the algorithm.

(6) FIG. 5A shows a noiseless test signal that comprises synthetic test data of three sine waves plotted on an amplitude vs. frequency bin scale.

(7) FIG. 5B shows the conventional FT power spectrum which would be generated for the three sine waves of FIG. 5A.

(8) FIG. 6A is a plot of the resultant values calculated using the methods of the invention of amplitude vs. frequency on the test signal of FIG. 5A for 1000 different random noise cases each of which has noise =0.001.

(9) FIG. 6B shows the FT power spectrum of the noise-free signal together with error bars indicating the uncertainty which would be produced by the noise.

(10) FIG. 7A is a similar plot to that of FIG. 6A but with RMS noise deviations such that =0.01.

(11) FIG. 7B is a similar plot to that of FIG. 6B but with RMS noise deviations such that =0.01.

(12) FIG. 8A is a similar plot to that of FIG. 6A but with RMS noise deviations such that =0.1.

(13) FIG. 8B is a similar plot to that of FIG. 6B but with RMS noise deviations such that =0.1.

(14) FIG. 9A-FIG. 9D are plots of the probability distribution of the number of K-harmonic signals found using the methods of the invention.

(15) FIG. 10A-FIG. 10D are plots of the reduced residual, R.sup.(K)/.sup.2, vs. the number of harmonics, K.

(16) FIG. 11 schematically shows how the methods of the invention effectively find the optimum location in the space of K-harmonic signals custom character .sup.K for each value of K so as to minimise the residual R.sup.(K).

(17) FIG. 12A shows a plot of peak amplitudes and frequencies found, using the FDM approach, in 1000 different generations of a three-sine-wave test signal including random noise at a signal-to-noise ratio (SNR) of 100000.

(18) FIG. 12B is a plot similar to FIG. 12A for which the peak amplitudes and frequencies were found using the methods of the present invention.

(19) FIG. 12C is a plot of peak amplitudes and frequencies found, using the FDM approach, in 1000 different generations of the three-sine-wave test signal including random noise at SNR of 1000.

(20) FIG. 12D is a plot similar to FIG. 12C for which the peak amplitudes and frequencies were found using the methods of the present invention.

(21) FIG. 12E is a plot of peak amplitudes and frequencies found, using the FDM approach, in 1000 different generations of the three-sine-wave test signal including random noise at SNR of 100.

(22) FIG. 12F is a plot similar to FIG. 12E for which the peak amplitudes and frequencies were found using the methods of the present invention.

(23) FIG. 12G is a plot of peak amplitudes and frequencies found, using the FDM approach, in 1000 different generations of the three-sine-wave test signal including random noise at SNR of 10.

(24) FIG. 12H is a plot similar to FIG. 12G for which the peak amplitudes and frequencies were found using the methods of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

(25) FIG. 1 shows a schematic block diagram depicting an example of the method of the present invention. In FIG. 1, a method is represented for determining harmonic component signals and their number. K, representative of different species of ions that are or were present within a mass analyser and within a range of mass-to-charge ratios, each species having a different mass-to-charge ratio. A quantity representative of the noise is determined, 10, which is representative of the noise in measured data comprising signal and noise which has been measured over time. An initial value of K is chosen, 20. A measure of difference R.sup.(K) between the measured data and model data comprising data sets of K-harmonic component signals is calculated 30. The measure of difference R.sup.(K) is compared to a noise range based upon the quantity representative of noise, 40, and if R.sup.(K) does not lie within the noise range, a new value of K is chosen, 50, and the measure of difference R.sup.(K) between the measured data and model data comprising data sets of K-harmonic component signals is again calculated, 30. When and if R.sup.(K) does lie within the noise range, K and the harmonic component signals have been determined, and the method terminates 100 in respect of finding K. Subsequent steps can then be performed as herein described, such as finding frequencies and amplitudes of the K harmonic signals and determining a mass spectrum. Where the method is performed using calculating apparatus, K and information about the harmonic component signals are output immediately prior to termination.

(26) The measured data may have been measured immediately preceding the application of the method, or it may have been measured at any preceding time. The measured data may have been measured at the location at which the method is performed, or it may have been measured at some distant location. Consequently the method may be applied to data measured before the present invention had been made and it may be applied to data taken using a mass analyser at any remote location. Accordingly the method of the present invention does not necessarily include the step of measuring the measured data since the measured data may have been acquired earlier and/or elsewhere.

(27) It will be appreciated that step 10, determining a quantity representative of the noise, may be performed before or after step 20 (choosing an initial value of K), and before or after step 30 (calculating R.sup.(K)).

(28) FIG. 2A is a schematic block diagram depicting an example of the method of the present invention in further detail. Similar steps to those depicted in FIG. 1 have the same identifiers.

(29) The measured data comprises N data points c.sub.n, where c.sub.n further comprises noise components .sub.n. .sub.n represents additive noise with spectral noise power v(f)=v.sub.0 over the frequency window corresponding to the range of mass-to-charge ratios, the RMS deviations of which are {square root over ( custom character |.sub.n|.sup.2)}=.sub.n={square root over (v.sub.0N/T)}. In this example the quantity representative of the noise is the noise power .sup.2. The spectral noise power v(f)=v.sub.0 is determined, 10 and the noise power .sup.2 is determined from it; the additive noise components .sub.n, are unknown. Preferably in the methods of the present invention the spectral noise power is substantially constant, and in the present embodiment is assumed to be constant over the frequency window corresponding to the range of mass-to-charge ratios.

(30) The quantity representative of the noise may be determined by one or more of: measuring the quantity representative of the noise from the measured data; measuring the quantity representative of the noise from a previous set of measured data derived from the mass analyser; measuring characteristics of preamplifiers used in the data measuring apparatus of the mass analyser; setting a quantity representative of the noise on the basis of prior knowledge of the mass analyser.

(31) One preferred method of measuring the noise power from a previous set of measured data derived from the mass analyser comprises calculating the L2 norm of a calibration transient that is detected and digitized with no ions directed into the mass analyser.

(32) A preferred method of measuring the quantity representative of the noise from the measured data itself, which may be performed on the frequency spectrum (i.e. after an FFT has been performed), comprises the steps of:

(33) (a) calculating an average intensity of all the measured data;

(34) (b) calculating the standard deviation of the intensity of all the measured data;

(35) (c) calculating a first noise threshold on the basis of the average (avg) and standard deviation (sigma) calculated, preferably as avg+0.3.sigma;

(36) (d) selecting a first set of points from the measured data on the basis that they have lower intensities than the first noise threshold;

(37) (e) calculating the average intensity of the first set of points (avg1);

(38) (f) calculating the standard deviation of the intensity of the first set of points (sigma1);

(39) (g) calculating a second noise threshold on the basis of the average (avg1) and standard deviation (sigma1) calculated, preferably as avg1+0.3.sigma1;

(40) (h) selecting a second set of peaks from the measured data on the basis that they have lower intensities than the second noise threshold.

(41) The second set of peaks comprise noise and having been thus separated from peaks which are considered to be signal, the quantity representative of noise may be calculated from the second set of peaks.

(42) In this example, a noise range is determined, 11, based upon the quantity representative of the noise.

(43) Steps 10 and 11, determining a quantity representative of the noise, and determining a noise range, may be performed at any stage prior to 40, the comparison between the measure of difference R.sup.(K) and the quantity representative of the noise.

(44) The model data comprising data sets of noiseless K-harmonic component signals forms a total model data set c*.sub.n

(45) $\begin{matrix} c_{n}^{*} = {.Math.}_{k = 1}^{K} A_{k} \exp (2 i f_{k} t_{n}) & (1) \end{matrix}$
where t.sub.n=N.sup.1n T n=0 . . . N1, such that c.sub.n=c*.sub.n+.sub.n.

(46) The number of harmonics K, complex value amplitudes A.sub.k and complex-value frequencies f.sub.k are to be determined, and may be obtained by methods of the present invention as will be further described. The sought amplitudes and frequencies are complex and include, therefore, phase and decay information correspondingly. The measured data is preferably recorded at substantially constant time periods, T/N.

(47) FIG. 2B describes in more detail the calculation of the minimum residual R.sup.(K), that is shown as position 30 in FIG. 1. A value for K is supplied and the initial values for the autocorrelation coefficients vector are initialized with a normalized complex value K+1 dimensional vector 31. Iterations 32 in accordance with formulas

(48) $\begin{matrix} a^{(i + \frac{1}{2})} = {H (a^{(i)})}^{- 1} a^{(i)}, \\ a^{(i + 1)} = {.Math. a^{(i + \frac{1}{2})} .Math.}^{- 1} a^{(i + \frac{1}{2})} \end{matrix}$
are performed a number of times until the difference of the autocorrelation coefficients on two subsequent iterations is smaller than a certain value of .sub.1 33. The value of residual R.sub.i and its gradient R.sub.i are then calculated 34, and the residual values on two subsequent iterations are compared 35. The quasi-Newton iterations 36 are performed in accordance with the formulas

(49) $\begin{matrix} a^{(i + \frac{1}{2})} = a^{(i)} - {H (a^{(i)})}^{- 1} R_{i}, \\ a^{(i + 1)} = {.Math. a^{(i + \frac{1}{2})} .Math.}^{- 1} a^{(i + \frac{1}{2})} \end{matrix}$
until the minimum of R.sup.(K) is approached with a certain accuracy .sub.2. The value of R.sup.(K) is assumed equal to the residual norm on the last iteration 37.

(50) It may be convenient to perform the methods of the present invention using measured data comprising data relating to a limited range of mass-to-charge ratios in order to reduce computational complexity, and in order to ensure that the spectral noise power is substantially constant over the range of mass-to-charge ratios. Therefore optionally, the range of mass-to-charge ratios may be limited and selected from a larger data set. Preferably the range of mass-to-charge ratios is limited and is selected from a larger data set by a method comprising: obtaining a frequency spectrum of the larger measured data set to form a transformed data set, by for example, taking a Fourier transform of the larger measured data set; selecting a range of frequencies in the frequency domain spectrum of the transformed data set to form a transformed data subset, and; transforming the transformed data subset back into the time domain to form the measured data. A schematic representation of an example of part of this process is shown in FIG. 3. Step (I) consists of selecting a frequency window of interest from the Fourier transform of the larger measured data spectrum 201, and shifting the frequency window of interest by a negative offset, so that it is centered around zero within Nyquist frequency band to produce spectrum 202. Step (II) involves creating an inverse Fourier transform image of spectrum 202 to obtain the windowed measured data 203. This process is depicted at step 12 in FIG. 2A. Where the methods of the present invention are performed using measured data comprising a limited range of mass-to-charge ratios, preferably a relatively narrow spectral window F.sub.s custom character , (s=s.sub.0 . . . s.sub.0+N1) is taken from a fast Fourier transformation (FFT) of FTMS measured data FF.sub.s that contains N Fourier transform bins. The reversed Fourier image in the time domain is:

(51) $\begin{matrix} \begin{matrix} c_{n} = \frac{1}{N} {.Math.}_{s = s_{0}}^{s_{0} + N - 1} F_{s} \exp [2 i (s - s_{0} - \frac{N}{2}) \frac{t_{n}}{T}], \\ t_{n} = \frac{nT}{N} \\ n = 0 .Math. N - 1 \end{matrix} & (2) \end{matrix}$

(52) The frequency content of which is limited to the measured frequencies f[s.sub.0/T . . . (s.sub.0+N)/T], which are shifted by a constant negative offset f=(s.sub.0+N/2)/T to fit the Nyquist frequency band f=f+f[N/2T . . . N/2T].

(53) The following detailed example of the methods of the present invention will utilize this optional windowed data set, i.e. the measured data comprises a range of mass-to-charge ratios which has been limited and selected from a larger data set. It will be appreciated that whilst this option has been chosen in order to give an example in which a limited data set has been chosen from a larger data set, the principles that follow apply equally if the whole data set had been utilized, as long as the spectral power of the noise remains substantially constant over the data set used.

(54) Accordingly, the windowed measured data c.sub.n is assumed to be the K-harmonic data set c*.sub.n corrupted by white Gaussian noise .sub.n: c.sub.n=c*.sub.n+.sub.n, where

(55) $\begin{matrix} c_{n}^{*} = {.Math.}_{k = 1}^{K} A_{k} \exp (i_{k} t_{n})_{n} A_{k}_{k} =_{k}^{} + i_{k}^{} & (3) \end{matrix}$
with complex amplitudes A.sub.k and frequencies .sub.k. The real parts of the frequencies are restricted within the band N/T<.sub.kN/T thus eliminating the Nyquist uncertainty. (The space of K-harmonic signals is further denoted as custom character .sup.K.)

(56) Optionally, a step 14 in FIG. 2, may be performed to determine whether the measured data, which may be windowed measured data, contains any signal peaks. If it does not contain peaks, then there is no need to determine the harmonic component signals and their number K. If it contains peaks, then the method can be performed to find the harmonic component signals and their number K. Determining whether peaks are contained in a spectrum is a known technique and such techniques may be employed here, e.g. determining whether peaks are contained by finding peaks which exceed a threshold level of intensity, or by other peak picking routines. Whether peaks are contained is preferably determined by calculating a norm: c.sub.n=N.sup.1.sup.2|c.sub.n|.sup.2, and, if the norm is less than 1, taking this to indicate that there are no peaks within the measured data which are above the noise, in which case the procedure terminates, 100. The method may then move to the next window of measured data and begin the procedure again.

(57) A measure of difference R.sup.(K) between the measured data and model data comprising data sets of K-harmonic component signals may be represented as:

(58) $\begin{matrix} \begin{matrix} R^{(K)} = \min .Math._{n} .Math. \\ = \min \frac{1}{N} {.Math.}_{n = 0}^{N - 1} {.Math._{n} .Math.}^{2} \\ = \min \frac{1}{N} {.Math.}_{n = 0}^{N - 1} {.Math. c_{n} - c_{n}^{*} .Math.}^{2}, (c_{n}^{*}) {��}^{k} \end{matrix} & (4) \end{matrix}$
where K is the number of K-harmonic signals in the model data set, and K is the measure of how many different species of ions are or were present within a mass analyser when the measured data was acquired and within a range of mass-to-charge ratios, each species having a different mass-to-charge ratio. Other forms of difference R.sup.(K) may be used but preferably the form of equation (4) is used and will be used in this example, being a minimised normalized sum of residuals between the measured data and the model data at a plurality of data points. The measure of difference R.sup.(K) is preferably minimised, i.e. is the minimum value, for each given value of K as described in more detail below. In other words, for each K, R.sup.(K) is determined as the minimum norm of the difference between the signal c.sub.n and any possible K-harmonic signal c*.sub.n.

(59) Accurate estimation of the number of mass peaks plays a vital role in the performance of this or any other method operating under noisy conditions typically found in FTMS data. Since the number of harmonics cannot be determined exactly for noisy measured data this method evaluates the statistically most probable value of K. On increasing K, R.sup.(K) tends to zero, as more and more harmonic signals are added to the model data set and the model data set may more closely match the measured data. Indeed when K=N the model data set can equal the measured dataincluding the noise components within the measured data, i.e. such that the difference R.sup.(K)=0 when K=N. It may be shown that when K=N/2 a combination of K-harmonic signals may also be made to equal the measured data, as in the prior art FDM. However unlike prior art methods, in the present invention K is restricted so that the K-harmonic model data set does not model significantly more than the signal component of the measured data, and in this way it distinguishes signal from noise. Methods of the invention use an initial value of K to calculate a value for R.sup.(K) and this value is compared to a noise range based upon the determined quantity representative of the noise. If R.sup.(K) does not lie within the noise range, the value of K is changed and R.sup.(K) recalculated. This process is repeated as many times as necessary until R.sup.(K) does lie within the noise range, thereby finding the most probable value of K and the harmonic component signals. This process ensures that the K-harmonic data set thus formed will substantially only model the signal component of the measured data.

(60) Hence an initial value for K is chosen, 20 in FIG. 2A, and the measure of difference R.sup.(K) is calculated, 30. The value R.sup.(K) is compared to the noise range which is based upon the determined quantity representative of the noise, preferably the noise power, 40. If R.sup.(K) does not lie within the noise range, the value of K is changed, 50, and R.sup.(K) recalculated, 30. This process is repeated as many times as necessary until R.sup.(K) does lie within the noise range. K and the harmonic component signals are thereby determined when that condition is met.

(61) If the information within the measured data set is previously unprocessed, it is computationally efficient to start the process with an initial value for K of zero, and increase K from that value, as the data from mass analysers when operating at high resolving power is sparse. Alternatively, if the measured data is first processed, an initial value for K may be determined from a number of peaks in the frequency domain spectrum of the measured data. If the measured data is first processed by, for example, taking a Fourier transform of the measured data, then an initial value for K may be determined from a number of peaks in the frequency domain spectrum of the transformed data thus found. It will be appreciated that K may, alternatively, be decreased from an initial value (where the initial value is greater than zero). K may be decreased from an initial value which is less than N/4 as K will usually be very much smaller than N in the method of the present invention, because of the sparse nature of the data from mass analysers operating at high resolving powers, and, significantly, because the method of the invention does not seek to fit harmonics to noise, and then subsequently distinguish noise results from valid ion signals as is a feature of some prior art methods. Rather the method seeks to fit just enough K-harmonic signals to the data so as to avoid the noise and this approach thereby preferably finds just enough K-harmonics as there are ion species within the range of mass-to-charge ratios in the sample of measured data.

(62) Accordingly the value of K is changed and R.sup.(K) is recalculated, preferably until R.sup.(K) falls within the noise range; and/or until R.sup.(K) is just less than, or is equal to, the quantity representative of the noise; and/or until R.sup.(K) becomes the closest value to the quantity representative of the noise. In practice this can be achieved simply by determining K as the minimum value for which the measure of difference R.sup.(K) equals or preferably becomes less than the quantity representative of the noise, which typically means just less than the quantity representative of the noise.

(63) Direct numerical evaluation of equation (4) with respect to complex frequencies and amplitudes A.sub.k, .sub.k is an essentially nonlinear problem which has no robust solution, given non-convexity of the norm and large number of local minima resulted from the oscillating nature of the fitting function equation (3). However, methods of the present invention preferably utilize the property that the signal component of the measured data (i.e. not including the noise) possesses autocorrelative properties, which means that each successive value of intensity in the signal component of a measured data set (such as a transient) can formally be represented as a linear combination of the signal component of the measured data at previous time-points. Accordingly the K-harmonic signal may be written:
c*.sub.n=.sub.0c*.sub.nK+.sub.1c*.sub.nK+1+ . . . .sub.K1c*.sub.n1(5)
where .sub.0, . . . .sub.K1 are autocorrelation coefficients.

(64) It will be appreciated that the autocorrelative properties also mean that a preceding value of intensity in a measured data set can also formally be represented as a linear combination of the measured data at succeeding time-points, and a similar equation could be written to express that without departing from the present invention. In this example, equation (5) will be used.

(65) Preferably the model data set comprises an harmonic signal which may be described by a sum of K complex exponential terms each multiplied by complex amplitudes, such as is set out in equations (1) and (3), and the K harmonic signals are derived assuming the harmonic signal possesses autocorrelative properties as described by equation (5).

(66) As mentioned, the method of the invention effectively involves a form of estimating probabilities for different numbers K of individual harmonics with non-zero amplitudes in the model data set and finds the most probable number. Both real and imaginary parts of .sub.n are taken to be independent complex values, normally distributed with mean-square deviations /{square root over (2)}. Therefore R=c.sub.nc*.sub.n=.sub.n is statistically distributed as .sup.2 with 2N degrees of freedom, having the probability density function

(67) $\begin{matrix} p (R) = \frac{N^{N + 1}}{N!^{2 N}} R^{N - 1} \exp (- \frac{NR}{^{2}}), R 0 & (6) \end{matrix}$
and a corresponding cumulative probability can be expressed through the incomplete Euler gamma function as

(68) $\begin{matrix} P (R) =_{R}^{} p (R^{}) R^{} = \frac{(N, NR /^{2})}{(N - 1)!} & (7) \end{matrix}$

(69) The value P(R.sup.(K)) gives the probability for the number of harmonics in the noise-free model data {c*.sub.n} to assume a value less than or equal to K. The probability for the number of harmonics to take value K exactly is p.sub.k=P(R.sup.(K))P(R.sup.(K1)). The most probable value of K that provides the highest fidelity approximation of the actual number of harmonics in the signal, corresponds to the maximum of the distribution (6) R*=(11/N).sup.2.sup.2 lying between R.sup.(K) and R.sup.(K1). As already described, a practical way to estimate the most probable K consists in repeatedly increasing the values of K starting from an initial value (preferably zero) until the residual norm R.sup.(K) drops below R*.sup.2. In an alternative embodiment, involving the try value of K repeatedly decreasing, the method is stopped (K1 is found and therefore K is found) when R.sup.<K1> just exceeds the quantity representative of the noise .sup.2. Considering both cases (increasing or decreasing K), the stop condition can be formulated as the double inequality: R.sup.(K).sup.2<R.sup.(K1).

(70) Calculation of R.sup.(K) is performed as the numerical minimization of the residual norm with respect to 2K complex-value parameters of the sought noise-free model data {c*.sub.n}the frequencies .sub.k and amplitudes A.sub.k. Since any K+1 subsets .sub.p=(c*.sub.p, . . . , c*.sub.P+K) of K+1 subsequent elements are not linearly independent and the matrix with rows .sub.0 . . . .sub.K is degenerate, there exists a non-zero complex-value vector a=(a.sub.0 . . . a.sub.K), which is referred to as the set of autocorrelation coefficients for the signal c*.sub.n, such that
a.sub.0c*.sub.n+a.sub.1c*.sub.n+1+ . . . +a.sub.Kc*.sub.n+K=0 n0 . . . NK1(8)
i.e. it is possible to associate any K-harmonic signal with a set of K+1 complex autocorrelation coefficients a.sub.0, . . . , a.sub.K with the use of NK linear conditions. The coefficients a.sub.k are, for the purpose of computational feasibility, normalized as |a.sub.k|.sup.2=1.

(71) Preferably, equation (8) is a strict condition imposed on the initially unknown noiseless model data set c*.sub.n as proposed in Osborne, M. R. and Smyth, G. K. (1991). A modified Prony algorithm for fitting functions defined by difference equations. SIAM Journal of Scientific and Statistical Computing, 12, 362-382. The formulae (8) are then treated as extra conditions to be satisfied in the minimization procedure for residual norm of equation (4). Methods of the present invention preferably find a set of autocorrelation coefficients a.sub.0, . . . , a.sub.K that define the K-harmonic signal being nearest to c.sub.n in the sense of the residual norm of equation (4), under NK conditions imposed on c*.sub.n. In matrix form these conditions read:

(72) 0 $\begin{matrix} (\begin{matrix} c_{0}^{*} \\ c_{1}^{*} \\ .Math. \\ c_{N - 1}^{*} \end{matrix}) = 0, = [\begin{matrix} a_{0} & a_{1} & .Math. & a_{K} & 0 & 0 & 0 \\ 0 & a_{0} & a_{1} & .Math. & a_{K} & 0 & 0 \\ .Math. & .Math. & .Math. & .Math. & .Math. & .Math. & .Math. \\ 0 & 0 & 0 & a_{0} & a_{1} & .Math. & a_{K} \end{matrix}] & (9) \end{matrix}$
where custom character is (K+1)-diagonal rectangular (NK)N Toeplitz matrix. The Lagrange method is preferably used to express the minimal residual norm for the difference between the windowed measured data c.sub.n and any K-harmonic signal which satisfies the conditions (8) with given autocorrelation coefficients. Therefore, the norm in matrix notation becomes
R(a)=N.sup.1.sup.T.sup.1=N.sup.1.sup.TH(a)a(10)
where H(a)=C.sup.TB(a).sup.1C, B(a)= custom character (a)(a).sup.T, and C is a shifted measured data matrix

(73) $\begin{matrix} C = [\begin{matrix} c_{0} & c_{1} & .Math. & c_{K} \\ c_{1} & c_{2} & .Math. & c_{K + 1} \\ .Math. & .Math. & .Math. \\ c_{N - K - 1} & c_{N - K} & .Math. & c_{N - 1} \end{matrix}] & (12) \end{matrix}$

(74) Although minimization of R(a) with respect to a is a nonlinear problem, it is robust and practically realizable for arbitrary initial values of a.

(75) The problem of finding a minimal residual norm with respect to any possible K-harmonic signal is thus reduced to minimization of (10) with respect to all possible normalized sets of a.sub.k, that is R.sup.(K)=min R(a). The matrix H(a) is parametrically dependent on a.sub.k, which makes the said minimization problem nonlinear; nevertheless the function R(a) is smooth and generally has only one local minimum which represents the global minimum (its degeneracy with respect to the common phase of a.sub.k is not critical). Any known iterative method of numerical minimization to find R.sup.(K) and the minimizing set of a.sub.k, e.g. the method of gradient decent, or the method of conjugated gradients, gives a robust algorithm which is practically independent on the choice of initial values.

(76) Preferably, therefore, a method of harmonic inversion is used to find the set of autocorrelation coefficients. By this process, the most probable value for K is found, where the most probably value is taken to be when R.sup.(K) becomes substantially equal to the quantity representative of the noise.

(77) Having determined the harmonic component signals and their number K, and in doing so, determined the autocorrelation coefficients (the complex-value vector a=(a.sub.0 . . . a.sub.K)), the frequencies of the K-harmonic signals are determined by finding the roots of the K-order polynomial equation
a.sub.0+a.sub.1+a.sub.2.sup.2 . . . +a.sub.K.sup.K=0(13)

(78) The set of autocorrelation coefficients unambiguously defines the set of frequencies .sub.K in the windowed measured data and corresponding real-value frequencies f.sub.k in the larger data set with formulas

(79) $\begin{matrix} \begin{matrix} _{k} = - i \frac{N}{T} \ln_{k}, \\ f_{k} = \frac{s_{0} + N / 2}{T} + \frac{Im_{k}}{2} \end{matrix} & (14) \end{matrix}$

(80) The fact that the signal is K-harmonic and cannot be reduced to a (K1) harmonics with autocorrelation coefficients a.sub.0 and a.sub.K non-zero and all frequencies .sub.k unique within the Nyquist band, ensures that all roots .sub.k are non-zero and unique.

(81) If the harmonic signals are non-decaying, a particular case occurs. By setting additional constraints a.sub.Kk/a.sub.k=1, the following parameterization

(82) $\begin{matrix} \begin{matrix} a_{t} = \frac{b_{t} + {ib}_{K - t}}{\sqrt{2}}, \\ a_{K - t =} \frac{b_{t} - {ib}_{K - t}}{\sqrt{2}}, \\ a_{K / 2 =} b_{K / 2} \end{matrix} & (15) \end{matrix}$
where index t runs in the range 0t<K/2 and (b.sub.0, . . . , b.sub.K) is real-value vector that obeys the normalization condition |b|.sup.2=.sub.k=0.sup.Kb.sub.k.sup.2=1 allows (10) to be rearticulated as
R.sup.(K)(b)=N.sup.1b.sup.TH(b)b, H(b)=Re{C.sup.T[ custom character (a(b))(a(b)).sup.T].sup.1C} (16)
where elements of the (NK)(K+1) matrix C are

(83) $\begin{matrix} \begin{matrix} C_{p, t}^{} = \frac{C_{p + c} + C_{p + K - t}}{\sqrt{2}}, \\ C_{p, K - t}^{} = i \frac{C_{p + c} - C_{p + K - t}}{\sqrt{2}}, \\ C_{p, K / 2}^{} = C_{p + K / 2} \end{matrix} & (17) \end{matrix}$
with 0t<K/2 (the last formula is only for even K).

(84) The amplitudes are found by minimization of the original norm of the residual expressed as R=(1/N).sub.n|c.sub.n.sub.kA.sub.k(.sub.k).sup.n|.sup.2. Having determined K and .sub.k, the noise-free signal can be reconstructed as

(85) $\begin{matrix} \begin{matrix} c^{*} = (\begin{matrix} c_{0}^{*} \\ .Math. \\ c_{N - 1}^{*} \end{matrix}) \\ = A^{*}, \\ = [\begin{matrix} 1 & 1 & .Math. & 1 \\ _{0} & _{1} & .Math. & _{K - 1} \\ _{0}^{2} & _{1}^{2} & .Math. & _{K - 1}^{2} \\ .Math. & .Math. & .Math. & .Math. \\ _{0}^{N - 1} & _{1}^{N - 1} & .Math. & _{K - 1}^{N - 1} \end{matrix}] \end{matrix} & (18) \end{matrix}$
where A* is the vector of amplitudes. A* is determined by minimizing the norm of residual (4):

(86) $\begin{matrix} \begin{matrix} R = .Math. c_{n} - c_{n}^{*} .Math. \\ = {\overline{A}}^{T} MA - {\overline{A}}^{T} G - {\overline{G}}^{T} A + .Math. c_{n} .Math. \end{matrix} where & (19) \\ \begin{matrix} M = \frac{1}{N} {\overline{}}^{T}, \\ G = \frac{1}{N} {\overline{}}^{T} (\begin{matrix} {\overline{c}}_{0} \\ .Math. \\ \overline{c_{N - 1}} \end{matrix}) \end{matrix} & (20) \end{matrix}$

(87) This set of amplitudes that delivers the minimum to (19) appears as a solution to the system of linear equations R/=MAG=0
A*=M.sup.1G, R(A*)=R.sup.(K)=c.sub.nG.sup.TM.sup.1G(21)

(88) Fixating the phase is achieved by introducing an additional constraint to the amplitudes A=B*e.sup.bp where BRe and is the common phase.

(89) The method also allows for statistical assessment of the determined values for the amplitudes and frequencies, which provides qualitatively better data for further analysis of the mass spectrum. Reporting fidelity criteria for detected mass-to-charge peaks (such as confidence intervals for mass-to-charge ratios and abundances) significantly increases both specificity and selectivity of the informatics approaches that rely on mass spectra as an input.

(90) In a similar way to the evaluation of the number of harmonics in windowed measured data, the residual of the norm is treated as a random variable which parametrically depends on deviations. Any deviation A.sub.k of the amplitude from its most probable values A*.sub.k necessarily increases the residual norm by the value Rr.sub.k|A.sub.k|.sup.2>0; and any deviation f.sub.k of the frequency from its most probable values f*.sub.k necessarily increases the residual norm by the value Rq.sub.k|f.sub.k|.sup.2>0.

(91) The conditional cumulative probability function (under the condition that the number of harmonics is exactly K) is

(92) $\begin{matrix} \begin{matrix} P^{(K)} (R) = \frac{(N, N (R^{(K)} + R) /^{2})}{(N, {NR}^{(K)} /^{2})} \\ 1 - \erf (\sqrt{\frac{N}{2}} \frac{R}{^{2}}), N >> 1 \end{matrix} & (22) \end{matrix}$
and gives the probability that R exceeds R.sup.(K) by the value greater than R. Correspondingly, the probability that R>1.96 .sup.2/{square root over (N)} appears less than 0.05. The confidence intervals in which the harmonic's amplitude or frequency are found with the 95% fidelity are estimated as

(93) $\begin{matrix} .Math. A_{K} .Math. \frac{}{N^{1 / 4}} \sqrt{\frac{1.96}{r_{k}}} and .Math. f_{K} .Math. \frac{}{N^{1 / 4}} \sqrt{\frac{1.96}{q_{k}}} & (23) \end{matrix}$
correspondingly. The coefficients r.sub.k and q.sub.k are found either analytically or numerically.

(94) In view of the above description, embodiments of the invention may be performed according to the schematic block diagram depicted in FIG. 4, which shows a workflow of an algorithm for performing the invention. FIG. 4A shows steps 2-50 of the algorithm and FIG. 4B shows steps 60-100 of the same algorithm. A FFT is taken of transient measured data, 2. A first window is selected 4a, and s.sub.0=0, the window being selected from the first N points of the Fourier transformed data.

(95) Windowed transient measured data

(96) $c_{n} = \frac{1}{N} {.Math.}_{s = s_{0}}^{s_{0} + N - 1} F_{s} \exp [2 i (s - s_{0} - \frac{N}{2}) \frac{t_{n}}{T}]$
is then first compiled from the larger measured data set, 6. The norm c.sub.n=N.sup.1.sup.2|c.sub.n|.sup.2 is calculated 14a, and a decision is made 14b: if the norm is greater than or equal to 1, it is taken that peaks are present in the windowed measured data and the procedure continues to 20; if not, the procedure passes to step 85 which will be described below.

(97) At step 20, a first value of K is selected which in the first iteration is, in this example, K:=1. The vector of autocorrelation coefficients a.sub.k is then calculated, 30, the values of a.sub.k being found which minimize the difference between the measured data and the model data set for the particular value of K, to give the measure of difference

(98) R.sup.(K)=c.sub.nc*.sub.n=min(N.sup.1.sup.TH(a)a). The measure of difference R.sup.(K) is then compared to the determined noise range, 40, and if the measure of difference R.sup.(K) lies within the noise range, the harmonic component signals and their number K, the measure of how many peaks are present in the windowed measured data, has been found, and the value of K is stored for future output, the procedure passing to step 60. If R.sup.(K) does not lie within the noise range, then a new value of K is selected, 50, in this case K:=K+1, and the new value of K is used in 30 to calculate a new value for R.sup.(K) and so on until the measure of difference R.sup.(K) does lie within the noise range and K has been found.

(99) At 60, peak frequencies are found by solving the K.sup.th-order polynomial a.sub.0+a.sub.1+a.sub.2.sup.2 . . . +a.sub.K.sup.K=0, giving frequencies in the larger measured data set

(100) 0 $f_{k} = T^{- 1} (s_{0} + \frac{N}{2} - i N \ln_{k}) .$
The frequencies are stored for future output and the procedure passes to 70, whereupon the peak amplitudes A.sub.k are found to minimize the norm of the residual:

(101) $\begin{matrix} R^{(K)} = .Math. c_{n} - c_{n}^{*} .Math. \\ = \min (c - {.Math.}_{k = 1}^{K} A_{k} \exp (i 2 f_{k} t_{n})) . \end{matrix}$

(102) The amplitudes thus found are again stored for future output and the procedure passes to 80, finding the confidence intervals for the frequencies and amplitudes just obtained at 60 and 70. The confidence intervals are stored for future output, and the procedure passes to 85, where a decision is taken as to whether or not the windowed measured data just analysed, or partly analysed if the step is arrived at from 14b, was the last windowed measured data set from the larger measured data set. If not the procedure passes to 4b, at which the next windowed measured data is chosen, s.sub.0:=s.sub.k and the procedure then passes to 6 once again.

(103) Once all windowed measured data sets have been processed in the above way, the procedure passes to 90 whereupon the previously stored data comprising K, the list of frequencies, amplitudes and their confidence intervals is output, and the procedure terminates, 100. It will be appreciated that the output may be of various formats. For example, the frequencies are typically translated into mass-to-charge ratios and the amplitudes into ion abundances for output. The preferred output comprises a measure of the number of different species of ions, mass-to-charge ratios of ions together with their abundances and terms indicating confidences in those values, e.g. as a mass spectrum.

(104) To further illustrate methods of the invention, synthetic test data of three sine waves together with added noise is analysed, and the results illustrated in FIGS. 5, 6, 7 and 8, at a succession of increasing noise powers, .sup.2, to enable comparison of the methods as the signal to noise alters. For each noise power, 1000 different random noise signals of the given noise power are used and in each of the 1000 cases the methods are applied to produce determined values of K, and the frequencies and amplitudes for each found K. This approach shows how the method performs for different random noise contributions at each signal to noise ratio.

(105) FIG. 5A shows the noiseless test signal plotted on an amplitude vs. frequency bin scale, which consists of three non-decaying harmonics separated by 0.5 FT bins each with amplitudes 1. FIG. 5B shows the conventional FT power spectrum which would be generated for the three sine waves which have differences in frequency below the Nyquist limit. The components are clearly not resolved in the FT spectrum.

(106) FIG. 6A is a plot of the resultant values calculated using the methods of the invention of amplitude vs. frequency for the 1000 different random noise cases each of which has =0.001 (the lowest noise case). It can be seen that the method has distinguished three peaks, each having amplitudes close to 1.0, located at frequencies close to 0.5, 1.0 and 1.5, and though the figure cannot show this, for each of the 1000 cases three peaks were distinguished. The spread in the points plotted indicates the range of variation of the calculated amplitudes and frequencies due to the presence of the noise, slightly different values for amplitude and frequency being determined by the methods for different random noise cases, though all noise distributions had the same RMS noise deviations such that =0.001. FIG. 6B again shows the FT power spectrum of the noise-free signal together with error bars indicating the uncertainty which would be produced by the noise (hardly discernible).

(107) FIG. 7A and FIG. 8A are similar plots to that of FIG. 6A, and FIG. 7B and FIG. 8B are similar plots to that of FIG. 6B, but now the RMS noise deviations are such that =0.01 in FIGS. 7A-7B and =0.1 in FIGS. 8A-8B. As the noise increases relative to the signal, the method produces a wider range of amplitudes and frequencies as solutions. For =0.01 typically three peaks are still found by the method, but at =0.1 from some of the 1000 runs only two peaks are found. The range of amplitudes increases on increasing the noise, and at =0.1 is tending to be larger than 1.0. However the error bars on FIG. 8B indicates that the FT power spectrum would be extremely noisy under these conditions, and, of course, would still only have detected one peak.

(108) FIG. 9A-FIG. 9D provides plots of the probability distribution of the number of K-harmonic signals found using the methods of the invention. For various different noise levels (ranging from 0.001 to 1 as indicated), but only one particular noise distribution out of the 1000 used above in relation to FIGS. 5-8, the value p(K), the probability of the number of harmonics, is plotted against K, the number of harmonics. As described earlier, p(K)=P(R.sup.(K))P(R.sup.(K1)). These graphs illustrate that whilst there are non-zero probabilities for an incorrect value of K to be found at all noise levels, the invented method, finding the most probable value for K, correctly finds K for =0.001 and =0.01. As the noise increases there is an increasing probability that a lower value of K will be found.

(109) FIG. 10A-FIG. 10D provides plots of the reduced residual vs. the number of harmonics. Here, the reduced residual is R.sup.(K)/.sup.2, hence when the reduced residual equals 1 the residual R.sup.(K)=.sup.2. FIG. 10 shows R.sup.(K)/.sup.2 vs. K for various values of , and it can be seen that for increasing noise, the difference between the residual and the noise is reduced over the range of K, indicating that it is more difficult to distinguish the difference between R.sup.(K) and the determined noise. FIG. 10 also shows graphically how the criterion for finding K is used. In a preferred method as already described, K is increased from 0 and R.sup.(K) is calculated repeatedly for different K until R.sup.(K) lies within them noise range. In a preferred method this is when R.sup.(K) first drops below the determined noise level. For =0.001 and =0.01, K is correctly found as R.sup.(K) first drops below the determined noise level at K=3.

(110) FIG. 11 schematically illustrates how the invented methods, by minimizing the norm of equation (4), effectively find the optimum location in the space of K-harmonic signals custom character .sup.K for each value of K so as to minimise the residual R.sup.(K). It can be seen how the residual R.sup.(K) is closest to the noise level (dotted circle) for the space of K-harmonic signals .sup.(K) where K=3.

(111) A comparison of the methods of the invention with the FDM is shown in FIGS. 12A-12H under the additional constraint that the model signals have zero decay. A test signal again consisting of three sine waves having frequencies that differ by one half of a frequency bin was used, with added noise at four different signal-to-noise ratios (SNR). The test signals have frequencies at 0.5, 1.0 and 1.5 on the frequency bin scale (x), and each have amplitude 1.0 on the intensity scale (y). Once again, for each of the signal-to-noise ratios, 1000 different random noise data sets were added to the test signal and the methods of the invention, and the FDM, were used to determine K, f and A in each case. FIG. 12A, 12C, 12E, 12G show the results for the FDM, whilst FIGS. 12B, 12D, 12F and 12H show the results for the methods of the present invention.

(112) FIGS. 12A and 12B compare the different methods for SNR=100000. It can be seen from FIG. 12B that the invented method finds three peaks (i.e. it determines that K=3), and that those peaks have, to a high degree of precision and accuracy, amplitudes of 1.0 and are at frequency bins 0.5, 1.0 and 1.5. No extraneous peaks appear to have been detected. In contrast, even at this very high SNR, the results in FIG. 12A show the FDM attributes a large number of extraneous peaks across the full frequency range plotted (5.0 to +5.0), albeit with low amplitudes. At this SNR a further treatment of the FDM results is required to separate peaks which are due to signal from the spurious results of low amplitude. The FDM also finds three peaks, with amplitudes 1.0 and at frequency bin locations of 0.5, 1.0 and 1.5, however the spread in values of amplitude and frequency is a little larger, i.e. the precision is lower, than that produced by the invented methods.

(113) At SNR=1000, a marked difference between the methods is very apparent. The FDM results, in FIG. 12C, again indicate a large number of extraneous peaks across the full frequency range, of low amplitude. Now, however, the uncertainty in the attribution of the three signal peaks has increased very significantly. For some of the 1000 different noise cases (all at SNR=1000) the FDM has found three peaks but with amplitudes ranging from near zero to 2.5. In some cases the FDM has only found two peaks that might be attributed to signal and in some cases only one. The precision is very poor compared to the results of the invented methods shown in FIG. 12D. The invented methods find K=3 at all times and that each signal peak has amplitude 1 and appears at frequency bins 0.5, 1.0 and 1.5 with vastly greater precision than is found by the FDM.

(114) As the SNR reduces further to SNR=100, FIG. 12E shows that the FDM predicts increasing amplitudes for the extraneous peaks spread across the frequency spectrum, and that there are, from visual inspection, just two peaks that might be attributed to signal, those peaks having amplitudes ranging from 1 to 2 and being at frequencies 0.5 and 1.5, i.e. the central signal peak is not predicted in almost all the 1000 runs. The results of the invented methods, in FIG. 12F, show three peaks are again found, having amplitudes ranging from 0.5 to 1.5 and being located close to the correct frequency bin locations. The precision has worsened. On rare occasions an extraneous peak of very low amplitude appears close to frequency bin 2. Visually, it would appear that the results in FIG. 12F are more helpful to the mass spectroscopist than those from the FDM even at a SNR which is ten times better.

(115) At the very poor SNR of 10, the FDM predicts, in FIG. 12G, a wide range of spurious peaks right across the frequency range which have a wide range of amplitudes, some greatly exceeding 1. Two signal peaks might be identified in some cases, but the amplitudes range from near zero to 8, and the frequency locations range from 0 to 2. The invented methods produce results plotted in FIG. 12H which suggest only two peaks at frequency bins 0.5 and 1.5, of amplitudes 1.2 to 2.2.

(116) It can be seen that the FDM produces many extraneous peaks at all SNR. To use the FDM in practical situations these must be distinguished from signal peaks, but it can be seen from FIG. 12 that the amplitudes of the spurious peaks vary significantly with the SNR and the range of amplitudes also varies, making the process of distinguishing more difficult. The results plotted in FIG. 12 show how the invented methods, by finding the most probable K taking into account a determined noise, do not thereby suffer the problem of spurious peaks which must in some way be later distinguished from signal peaks.

(117) Methods of the present invention may also be applied to data from other types of spectroscopic analysis such as, for example, nuclear magnetic resonance (NMR), and infrared spectroscopy. In NMR the relaxation of the spins of atomic nuclei after excitation with electromagnetic pulses is recorded and the relaxation signals and their frequencies depend upon the surrounding of the nuclei, including the molecular structure. The observed spectroscopic frequencies (also called lines or peaks) are for example influenced by the coupling between adjacent nuclei, leading to frequency shifts and/or line splitting. Observed Nuclear spins are typically those of Hydrogen (1H), 13C and the less common 15N, 31P, 19F. The details of the method and the common methods of data evaluation are set out in various text books, including D. H. Williams and I. Fleming: Spectroscopic methods in organic chemistry, 4th ed., London 1989 (which additionally contains a chapters on UV, visible and infrared spectroscopy).

(118) Whilst, as already described, in mass spectrometry such as FT-MS the detected frequencies are usually representative of the mass-to-charge of ions, the ions following periodic motion within the mass analyser, and frequency differences correspond to mass-to-charge differences, in NMR the frequencies are representative of spin relaxation frequencies (i.e. the difference between the various, possibly split, excited state and ground state nuclear spin energy levels) and the differences are indicative of nuclear spin coupling energies and various other effects of the surroundings of a nucleus that have influence on the energy levels.

(119) The frequency range of NMR signals, following heterodyning, is relatively small compared to the range encountered in mass spectrometry, and the methods of the present invention are well suited for decomposing spectral harmonic signals from such data, to directly provide chemical shifts and line broadening information.

(120) As with Fourier Transform and FDM, methods of the present invention may also be extended into higher dimensions, such as, for example, two dimensional NMR. While in the most simplistic way the chemical shifts and broadenings (i.e. frequencies and line-widths) may be determined on a per-spectrum basis and the data simply stacked in the additional dimension, preferably the data is directly handled in multiple dimensions, as it is for example in synthetic aperture radar (SAR) applications (Carrara et al. Spotlight Synthetic Aperture Radar, Boston 1995), or in conventional 2-D FT-NMR applications (see e.g. Peter Giintert, Volker Dtsch, Gerhard Wider and Kurt Wthrich: Processing of multi dimensional NMR data with the new software PROSA; Journal of Biomolecular NMR, 2 (1992) 619-629). Examples of the extension to additional dimensions using the FDM are provided by Vladimir A. Mandelshtam, Howard S. Taylor, and A. J. Shaka: Application of the Filter Diagonalization Method to One- and Two-Dimensional NMR Spectra; Journal of Magnetic Resonance 133, 304-312 (1998), article number. MN981476. Such direct multidimensional processing has, inter alia, the advantage of better localization of the frequencies in the multiple dimensions and correct abstraction from or interpolation between the separate spectra. A further advantage is improved signal-to-noise ratio.

(121) Pre-processing using conventional fast Fourier transform methods may be used to guide the sectioning of the two-dimensional data for optimum processing by the method of the invention, e.g. aid the selection of rectangles in frequency and time to be processed together. It may also be used to control subsequent acquisitions within the same experiment, e.g. data dependent ion selection and/or fragmentation in mass spectrometry or e.g. re-acquisitions with adjusted settings (pulse sequences) in 2D-NMR spectrometry.

(122) Both the basic method of the invention and it's extension to multiple dimensions mayin addition to the improved determination of frequencies (i.e. masses, IR spectral lines, chemical shifts, radar objects etc.) and intensitiesbe used as a means for optimal data compression in recorded data by retaining only the K identified frequency/intensity datasets, preferably together with aggregate information on the noise/background, such as sigma used during the determination of K.

(123) Accordingly in another aspect the present invention provides a method of spectrometry comprising:

(124) providing measured data comprising a combination of periodic signals and noise measured over time using a spectrometer;

(125) determining a quantity representative of the noise in the measured data, and

(126) determining a model data set of K-harmonic component signals from the measured data;

(127) wherein the harmonic component signals and their number K are determined iteratively on the basis of: (i) using an initial value of K to calculate a minimised non-negative measure of difference R.sup.(K) between the measured data and model data comprising data sets of K-harmonic component signals, and (ii) if R.sup.(K) does not lie within a noise range based upon the quantity representative of noise, changing the value of K and recalculating R.sup.(K) as many times as necessary until R.sup.(K) does lie within the noise range;

(128) and deriving spectroscopic information from the model data set, the spectroscopic information comprising one or more of: a measure of the number of harmonic component signals; a measure of the frequencies of the harmonic component signals; a measure of the signal intensity of the harmonic component signals.

(129) The method of spectrometry may comprise a method of mass spectrometry, a method of NMR spectroscopy, or a method of infrared spectroscopy. The spectroscopic information from the model data set in NMR spectroscopy methods may further comprise resonance frequencies, chemical shifts and intensity (abundance) information concerning the nuclei. The spectroscopic information from the model data set in infrared spectroscopy methods may further comprise absorption frequencies and intensity (abundance) information concerning chemical groups.

(130) The present invention still further provides a method of data compression comprising decomposing measured data comprising signal and noise measured over time using a spectrometer into a sum of K harmonic component signals and a noise component, wherein the harmonic component signals and their number K are derived from the measured data and a determined quantity representative of the noise in the measured data.

(131) As used herein, including in the claims, unless the context indicates otherwise, singular forms of the terms herein are to be construed as including the plural form and vice versa.

(132) Throughout the description and claims of this specification, the words comprise, including, having and contain and variations of the words, for example comprising and comprises etc, mean including but not limited to, and are not intended to (and do not) exclude other components.

(133) It will be appreciated that variations to the foregoing embodiments of the invention can be made while still falling within the scope of the invention. Each feature disclosed in this specification, unless stated otherwise, may be replaced by alternative features serving the same, equivalent or similar purpose. Thus, unless stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

(134) The use of any and all examples, or exemplary language (for instance, such as, for example and like language) provided herein, is intended merely to better illustrate the invention and does not indicate a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.

(135) Any steps described in this specification may be performed in any order or simultaneously unless stated or the context requires otherwise.

(136) All of the features disclosed in this specification may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. In particular, the preferred features of the invention are applicable to all aspects of the invention and may be used in any combination. Likewise, features described in non-essential combinations may be used separately (not in combination).

Methods and apparatus for obtaining enhanced mass spectrometric data

Assignee

Inventors

Cpc classification

Classification Explorer

H01J49/0036

ELECTRICITY

Classification Explorer

G01R33/4625

PHYSICS

Classification Explorer

H01J49/38

ELECTRICITY

Classification Explorer

G06F2218/10

PHYSICS

International classification

Classification Explorer

G01N15/00

PHYSICS

Classification Explorer

H01J49/38

ELECTRICITY

Classification Explorer

G01N15/10

PHYSICS

Classification Explorer

H01J49/00

ELECTRICITY

Classification Explorer

G01N15/06

PHYSICS

Classification Explorer

G06K9/00

PHYSICS

Classification Explorer

G01N15/02

PHYSICS

Abstract

Claims

Description