Methods and apparatus for obtaining enhanced mass spectrometric data
10840073 ยท 2020-11-17
Assignee
Inventors
Cpc classification
H01J49/0036
ELECTRICITY
G01R33/4625
PHYSICS
International classification
G01N15/00
PHYSICS
Abstract
A method comprising decomposing mass spectrometry data, especially of ion species that undergo multiple direction changes in a periodic manner, the data comprising signal and noise measured over time, into a sum of K harmonic component signals and a noise component, wherein the harmonic component signals and their number K are derived from the data and a determined quantity representative of the noise. The harmonic component signals and their number K may be determined iteratively on the basis of: using an initial value of K to calculate a minimised non-negative measure of difference R.sup.(K) between the measured and model data comprising data sets of K-harmonic component signals, and if R.sup.(K) does not lie within a noise range based on the quantity representative of the noise, changing the value of K and recalculating R.sup.(K) until R.sup.(K) lies within the noise range. Mass spectral information may be derived from the model data set.
Claims
1. A method of operating a mass spectrometer comprising an Orbital Trap mass analyzer or a Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass analyzer, the method comprising: generating a plurality of species of ions using an ion source of the mass spectrometer, the species being within a range of mass-to-charge ratios; introducing the plurality of species into the Orbital Trap or FT-ICR mass analyzer such that the plurality of species undergo periodic motion within a range of frequencies within the Orbital Trap or FT-ICR mass analyzer; acquiring transient data comprising signal and noise measured over time that has been obtained from detection of an image current generated by the periodic motions of the plurality of species of ions within the Orbital Trap or FT-ICR mass analyzer; determining a quantity representative of the noise from the acquired transient data; determining a noise range based upon the quantity representative of the noise; determining a model data set of K-harmonic component signals from the acquired transient data, each component signal comprising a complex-valued amplitude that includes phase information and a complex-valued frequency that includes time decay information, wherein the harmonic component signals and their number K are determined iteratively on the basis of: using an initial value of K to calculate a minimized nonnegative measure of difference R.sup.(K) between the acquired transient data and model data comprising data sets of K-harmonic component signals; and if R.sup.(K) does not lie within the noise range, changing the value of K and recalculating R.sup.(K) as many times as necessary until R.sup.(K) does lie within the noise range; generating mass spectral information about the ion species from the model data set, the mass spectral information comprising, for each of the K harmonic component signals, a mass-to-charge ratio and an estimate of a number of ions of a respective corresponding species of ions; and controlling subsequent acquisitions of the mass analyzer using the generated mass spectral information, wherein either a resolution of the mass-to-charge ratios of the generated mass spectral information utilized for controlling the subsequent acquisitions is greater than a resolution of mass-to-charge ratios calculated by a Fast Fourier Transform method, or the mass-to-charge ratios of the generated mass spectral information have fewer artifacts relative to mass spectral information generated using a Filter Diagonalization method.
2. The method according to claim 1 wherein the measure of difference R.sup.(K) comprises a minimized normalized sum of residuals between the acquired transient data and the model data at a plurality of data points.
3. The method according to claim 1 wherein R.sup.(K) is recalculated for increasing values of K starting from an initial value of 0.
4. The method according to claim 1 wherein R.sup.(K) is recalculated for decreasing values of K starting from an initial value.
5. The method according to claim 1 wherein an initial value for K is determined from a number of peaks in the frequency domain spectrum of the acquired transient data.
6. The method according to claim 1 wherein the value of K is changed and R.sup.(K) is recalculated until the value of K is the minimum value of K for which R.sup.(K) is less than, or is equal to, the quantity representative of the noise.
7. The method according to claim 1 wherein the value of K is changed and R.sup.(K) is recalculated until R.sup.(K) becomes the closest value to the quantity representative of the noise.
8. The method according to claim 1 wherein the quantity representative of the noise comprises a noise power and the noise power is determined by a method comprising one or more of: evaluating the noise power from the acquired transient data; evaluating the noise power from a previous or another set of data acquired from the mass analyzer; measuring characteristics of preamplifiers used in the data measuring apparatus of the mass analyzer; setting a noise power on the basis of prior knowledge of the mass analyzer.
9. The method according to claim 1 wherein the model data set comprises a harmonic signal which may be described by a sum of K complex exponential terms each multiplied by complex amplitudes, and the K harmonic signals are derived assuming the harmonic signal possesses autocorrelative properties.
10. The method according to claim 1 wherein the measure of difference R.sup.(K) is described by a term or terms involving:
11. The method according to claim 1, wherein the generating of the mass-to-charge ratios of the species of ions includes determining the mass-to-charge ratios of the K species of ions by performing the steps of: deriving a set of autocorrelation coefficients, a, relating terms c*.sub.n according to a.sub.0c*.sub.n+a.sub.1c*.sub.n+1+ . . . +a.sub.Kc*.sub.n+K=0, where c*.sub.n is the K-harmonic signal at acquired data points in the model data set; combining the autocorrelation coefficients, a, in a polynomial equation of the form a.sub.0+a.sub.1+a.sub.2.sup.2 . . . +a.sub.K.sup.K=0 where is a complex number; deriving the frequencies of the K harmonic signals from the roots, .sub.k, of the polynomial equation; and translating each of the K frequencies of the K harmonic signals from the frequency to the mass-to-charge domain.
12. The method according to claim 11, wherein the generating of mass spectral information about the ion species includes determining an estimate of the number of ions of each species within the Orbital Trap or FT-ICR mass analyzer, wherein the number of ions of each species is determined from the amplitudes of the K-harmonic signals, the amplitudes being found by minimization of the residual R, where R is of the form
13. The method according to claim 1 wherein the acquired transient data corresponds to periodic motions of ions of a limited range of mass-to-charge ratios selected from a larger range of mass-to-charge ratios, said larger range of mass-to-charge ratios corresponding to a larger transient data set.
14. The method of claim 13 wherein the transient data corresponding to the restricted range of mass-to-charge ratios is selected from the larger transient data set by a method comprising: obtaining a frequency spectrum of the larger transient data set to form a transformed data set; selecting a range of frequencies in the frequency domain spectrum of the transformed data set to form a transformed data subset, and; transforming the transformed data subset back into the time domain to form the acquired transient data.
15. The method according to claim 1, wherein generating of mass spectral information about the ion species from the model data set includes deriving a mass spectrum from the model data set comprising a set of K harmonic component signals.
16. A mass spectrometer system including an Orbital Trap or FT-ICR mass analyzer comprising: an ion source that, in operation, generates a plurality of species of ions within a range of mass-to-charge ratios, each species having a different mass-to-charge ratio, wherein the Orbital Trap or FT-ICR mass analyzer receives the generated plurality of species of ions; and a computer electronically coupled to the Orbital Trap or FT-ICR mass analyzer so as to receive, from the Orbital Trap or FT-ICR mass analyzer, acquired transient data comprising signal and noise measured over a time duration and acquired from an image current generated by the Orbital Trap or FT-ICR mass analyzer as a result of periodic motions of the plurality of species of ions within a range of frequencies within the mass analyzer over the time duration, wherein the computer is configured to: determine a quantity representative of the noise from the acquired transient data; determine a noise range based upon the quantity representative of the noise; determine a model data set of K-harmonic component signals from the acquired transient data, each component signal comprising a complex-valued amplitude that includes phase information and a complex-valued frequency that includes time decay information, wherein the harmonic component signals and their number K are determined iteratively by: using an initial value of K to calculate a minimized non-negative measure of difference R.sup.(K) between the acquired transient data and model data comprising data sets of K-harmonic component signals, and if R.sup.(K) does not lie within the noise range, changing the value of K and recalculating as many times as necessary until R.sup.(K) does lie within the noise range; generate mass spectral information about the ion species from the model data set, the mass spectral information comprising, for each of the K harmonic component signals, a mass-to-charge ratio and an estimate of a number ions of a respective corresponding species of ions; and control subsequent acquisitions of the mass analyzer using the generated mass spectral information, wherein either a resolution of the mass to charge ratios of the generated mass spectral information utilized for controlling the subsequent acquisitions is greater than a resolution of mass-to-charge ratios calculated by a Fast Fourier Transform method, or the mass-to-charge ratios of the generated mass spectral information have fewer artifacts relative to mass spectral information generated from a Filter Diagonalization method.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16) .sup.K for each value of K so as to minimise the residual R.sup.(K).
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
DETAILED DESCRIPTION OF THE INVENTION
(25)
(26) The measured data may have been measured immediately preceding the application of the method, or it may have been measured at any preceding time. The measured data may have been measured at the location at which the method is performed, or it may have been measured at some distant location. Consequently the method may be applied to data measured before the present invention had been made and it may be applied to data taken using a mass analyser at any remote location. Accordingly the method of the present invention does not necessarily include the step of measuring the measured data since the measured data may have been acquired earlier and/or elsewhere.
(27) It will be appreciated that step 10, determining a quantity representative of the noise, may be performed before or after step 20 (choosing an initial value of K), and before or after step 30 (calculating R.sup.(K)).
(28)
(29) The measured data comprises N data points c.sub.n, where c.sub.n further comprises noise components .sub.n. .sub.n represents additive noise with spectral noise power v(f)=v.sub.0 over the frequency window corresponding to the range of mass-to-charge ratios, the RMS deviations of which are {square root over (|.sub.n|.sup.2
)}=.sub.n={square root over (v.sub.0N/T)}. In this example the quantity representative of the noise is the noise power .sup.2. The spectral noise power v(f)=v.sub.0 is determined, 10 and the noise power .sup.2 is determined from it; the additive noise components .sub.n, are unknown. Preferably in the methods of the present invention the spectral noise power is substantially constant, and in the present embodiment is assumed to be constant over the frequency window corresponding to the range of mass-to-charge ratios.
(30) The quantity representative of the noise may be determined by one or more of: measuring the quantity representative of the noise from the measured data; measuring the quantity representative of the noise from a previous set of measured data derived from the mass analyser; measuring characteristics of preamplifiers used in the data measuring apparatus of the mass analyser; setting a quantity representative of the noise on the basis of prior knowledge of the mass analyser.
(31) One preferred method of measuring the noise power from a previous set of measured data derived from the mass analyser comprises calculating the L2 norm of a calibration transient that is detected and digitized with no ions directed into the mass analyser.
(32) A preferred method of measuring the quantity representative of the noise from the measured data itself, which may be performed on the frequency spectrum (i.e. after an FFT has been performed), comprises the steps of:
(33) (a) calculating an average intensity of all the measured data;
(34) (b) calculating the standard deviation of the intensity of all the measured data;
(35) (c) calculating a first noise threshold on the basis of the average (avg) and standard deviation (sigma) calculated, preferably as avg+0.3.sigma;
(36) (d) selecting a first set of points from the measured data on the basis that they have lower intensities than the first noise threshold;
(37) (e) calculating the average intensity of the first set of points (avg1);
(38) (f) calculating the standard deviation of the intensity of the first set of points (sigma1);
(39) (g) calculating a second noise threshold on the basis of the average (avg1) and standard deviation (sigma1) calculated, preferably as avg1+0.3.sigma1;
(40) (h) selecting a second set of peaks from the measured data on the basis that they have lower intensities than the second noise threshold.
(41) The second set of peaks comprise noise and having been thus separated from peaks which are considered to be signal, the quantity representative of noise may be calculated from the second set of peaks.
(42) In this example, a noise range is determined, 11, based upon the quantity representative of the noise.
(43) Steps 10 and 11, determining a quantity representative of the noise, and determining a noise range, may be performed at any stage prior to 40, the comparison between the measure of difference R.sup.(K) and the quantity representative of the noise.
(44) The model data comprising data sets of noiseless K-harmonic component signals forms a total model data set c*.sub.n
(45)
where t.sub.n=N.sup.1n T n=0 . . . N1, such that c.sub.n=c*.sub.n+.sub.n.
(46) The number of harmonics K, complex value amplitudes A.sub.k and complex-value frequencies f.sub.k are to be determined, and may be obtained by methods of the present invention as will be further described. The sought amplitudes and frequencies are complex and include, therefore, phase and decay information correspondingly. The measured data is preferably recorded at substantially constant time periods, T/N.
(47)
(48)
are performed a number of times until the difference of the autocorrelation coefficients on two subsequent iterations is smaller than a certain value of .sub.1 33. The value of residual R.sub.i and its gradient R.sub.i are then calculated 34, and the residual values on two subsequent iterations are compared 35. The quasi-Newton iterations 36 are performed in accordance with the formulas
(49)
until the minimum of R.sup.(K) is approached with a certain accuracy .sub.2. The value of R.sup.(K) is assumed equal to the residual norm on the last iteration 37.
(50) It may be convenient to perform the methods of the present invention using measured data comprising data relating to a limited range of mass-to-charge ratios in order to reduce computational complexity, and in order to ensure that the spectral noise power is substantially constant over the range of mass-to-charge ratios. Therefore optionally, the range of mass-to-charge ratios may be limited and selected from a larger data set. Preferably the range of mass-to-charge ratios is limited and is selected from a larger data set by a method comprising: obtaining a frequency spectrum of the larger measured data set to form a transformed data set, by for example, taking a Fourier transform of the larger measured data set; selecting a range of frequencies in the frequency domain spectrum of the transformed data set to form a transformed data subset, and; transforming the transformed data subset back into the time domain to form the measured data. A schematic representation of an example of part of this process is shown in , (s=s.sub.0 . . . s.sub.0+N1) is taken from a fast Fourier transformation (FFT) of FTMS measured data FF.sub.s that contains N Fourier transform bins. The reversed Fourier image in the time domain is:
(51)
(52) The frequency content of which is limited to the measured frequencies f[s.sub.0/T . . . (s.sub.0+N)/T], which are shifted by a constant negative offset f=(s.sub.0+N/2)/T to fit the Nyquist frequency band f=f+f[N/2T . . . N/2T].
(53) The following detailed example of the methods of the present invention will utilize this optional windowed data set, i.e. the measured data comprises a range of mass-to-charge ratios which has been limited and selected from a larger data set. It will be appreciated that whilst this option has been chosen in order to give an example in which a limited data set has been chosen from a larger data set, the principles that follow apply equally if the whole data set had been utilized, as long as the spectral power of the noise remains substantially constant over the data set used.
(54) Accordingly, the windowed measured data c.sub.n is assumed to be the K-harmonic data set c*.sub.n corrupted by white Gaussian noise .sub.n: c.sub.n=c*.sub.n+.sub.n, where
(55)
with complex amplitudes A.sub.k and frequencies .sub.k. The real parts of the frequencies are restricted within the band N/T<.sub.kN/T thus eliminating the Nyquist uncertainty. (The space of K-harmonic signals is further denoted as .sup.K.)
(56) Optionally, a step 14 in
(57) A measure of difference R.sup.(K) between the measured data and model data comprising data sets of K-harmonic component signals may be represented as:
(58)
where K is the number of K-harmonic signals in the model data set, and K is the measure of how many different species of ions are or were present within a mass analyser when the measured data was acquired and within a range of mass-to-charge ratios, each species having a different mass-to-charge ratio. Other forms of difference R.sup.(K) may be used but preferably the form of equation (4) is used and will be used in this example, being a minimised normalized sum of residuals between the measured data and the model data at a plurality of data points. The measure of difference R.sup.(K) is preferably minimised, i.e. is the minimum value, for each given value of K as described in more detail below. In other words, for each K, R.sup.(K) is determined as the minimum norm of the difference between the signal c.sub.n and any possible K-harmonic signal c*.sub.n.
(59) Accurate estimation of the number of mass peaks plays a vital role in the performance of this or any other method operating under noisy conditions typically found in FTMS data. Since the number of harmonics cannot be determined exactly for noisy measured data this method evaluates the statistically most probable value of K. On increasing K, R.sup.(K) tends to zero, as more and more harmonic signals are added to the model data set and the model data set may more closely match the measured data. Indeed when K=N the model data set can equal the measured dataincluding the noise components within the measured data, i.e. such that the difference R.sup.(K)=0 when K=N. It may be shown that when K=N/2 a combination of K-harmonic signals may also be made to equal the measured data, as in the prior art FDM. However unlike prior art methods, in the present invention K is restricted so that the K-harmonic model data set does not model significantly more than the signal component of the measured data, and in this way it distinguishes signal from noise. Methods of the invention use an initial value of K to calculate a value for R.sup.(K) and this value is compared to a noise range based upon the determined quantity representative of the noise. If R.sup.(K) does not lie within the noise range, the value of K is changed and R.sup.(K) recalculated. This process is repeated as many times as necessary until R.sup.(K) does lie within the noise range, thereby finding the most probable value of K and the harmonic component signals. This process ensures that the K-harmonic data set thus formed will substantially only model the signal component of the measured data.
(60) Hence an initial value for K is chosen, 20 in
(61) If the information within the measured data set is previously unprocessed, it is computationally efficient to start the process with an initial value for K of zero, and increase K from that value, as the data from mass analysers when operating at high resolving power is sparse. Alternatively, if the measured data is first processed, an initial value for K may be determined from a number of peaks in the frequency domain spectrum of the measured data. If the measured data is first processed by, for example, taking a Fourier transform of the measured data, then an initial value for K may be determined from a number of peaks in the frequency domain spectrum of the transformed data thus found. It will be appreciated that K may, alternatively, be decreased from an initial value (where the initial value is greater than zero). K may be decreased from an initial value which is less than N/4 as K will usually be very much smaller than N in the method of the present invention, because of the sparse nature of the data from mass analysers operating at high resolving powers, and, significantly, because the method of the invention does not seek to fit harmonics to noise, and then subsequently distinguish noise results from valid ion signals as is a feature of some prior art methods. Rather the method seeks to fit just enough K-harmonic signals to the data so as to avoid the noise and this approach thereby preferably finds just enough K-harmonics as there are ion species within the range of mass-to-charge ratios in the sample of measured data.
(62) Accordingly the value of K is changed and R.sup.(K) is recalculated, preferably until R.sup.(K) falls within the noise range; and/or until R.sup.(K) is just less than, or is equal to, the quantity representative of the noise; and/or until R.sup.(K) becomes the closest value to the quantity representative of the noise. In practice this can be achieved simply by determining K as the minimum value for which the measure of difference R.sup.(K) equals or preferably becomes less than the quantity representative of the noise, which typically means just less than the quantity representative of the noise.
(63) Direct numerical evaluation of equation (4) with respect to complex frequencies and amplitudes A.sub.k, .sub.k is an essentially nonlinear problem which has no robust solution, given non-convexity of the norm and large number of local minima resulted from the oscillating nature of the fitting function equation (3). However, methods of the present invention preferably utilize the property that the signal component of the measured data (i.e. not including the noise) possesses autocorrelative properties, which means that each successive value of intensity in the signal component of a measured data set (such as a transient) can formally be represented as a linear combination of the signal component of the measured data at previous time-points. Accordingly the K-harmonic signal may be written:
c*.sub.n=.sub.0c*.sub.nK+.sub.1c*.sub.nK+1+ . . . .sub.K1c*.sub.n1(5)
where .sub.0, . . . .sub.K1 are autocorrelation coefficients.
(64) It will be appreciated that the autocorrelative properties also mean that a preceding value of intensity in a measured data set can also formally be represented as a linear combination of the measured data at succeeding time-points, and a similar equation could be written to express that without departing from the present invention. In this example, equation (5) will be used.
(65) Preferably the model data set comprises an harmonic signal which may be described by a sum of K complex exponential terms each multiplied by complex amplitudes, such as is set out in equations (1) and (3), and the K harmonic signals are derived assuming the harmonic signal possesses autocorrelative properties as described by equation (5).
(66) As mentioned, the method of the invention effectively involves a form of estimating probabilities for different numbers K of individual harmonics with non-zero amplitudes in the model data set and finds the most probable number. Both real and imaginary parts of .sub.n are taken to be independent complex values, normally distributed with mean-square deviations /{square root over (2)}. Therefore R=c.sub.nc*.sub.n=.sub.n is statistically distributed as .sup.2 with 2N degrees of freedom, having the probability density function
(67)
and a corresponding cumulative probability can be expressed through the incomplete Euler gamma function as
(68)
(69) The value P(R.sup.(K)) gives the probability for the number of harmonics in the noise-free model data {c*.sub.n} to assume a value less than or equal to K. The probability for the number of harmonics to take value K exactly is p.sub.k=P(R.sup.(K))P(R.sup.(K1)). The most probable value of K that provides the highest fidelity approximation of the actual number of harmonics in the signal, corresponds to the maximum of the distribution (6) R*=(11/N).sup.2.sup.2 lying between R.sup.(K) and R.sup.(K1). As already described, a practical way to estimate the most probable K consists in repeatedly increasing the values of K starting from an initial value (preferably zero) until the residual norm R.sup.(K) drops below R*.sup.2. In an alternative embodiment, involving the try value of K repeatedly decreasing, the method is stopped (K1 is found and therefore K is found) when R.sup.<K1> just exceeds the quantity representative of the noise .sup.2. Considering both cases (increasing or decreasing K), the stop condition can be formulated as the double inequality: R.sup.(K).sup.2<R.sup.(K1).
(70) Calculation of R.sup.(K) is performed as the numerical minimization of the residual norm with respect to 2K complex-value parameters of the sought noise-free model data {c*.sub.n}the frequencies .sub.k and amplitudes A.sub.k. Since any K+1 subsets .sub.p=(c*.sub.p, . . . , c*.sub.P+K) of K+1 subsequent elements are not linearly independent and the matrix with rows .sub.0 . . . .sub.K is degenerate, there exists a non-zero complex-value vector a=(a.sub.0 . . . a.sub.K), which is referred to as the set of autocorrelation coefficients for the signal c*.sub.n, such that
a.sub.0c*.sub.n+a.sub.1c*.sub.n+1+ . . . +a.sub.Kc*.sub.n+K=0 n0 . . . NK1(8)
i.e. it is possible to associate any K-harmonic signal with a set of K+1 complex autocorrelation coefficients a.sub.0, . . . , a.sub.K with the use of NK linear conditions. The coefficients a.sub.k are, for the purpose of computational feasibility, normalized as |a.sub.k|.sup.2=1.
(71) Preferably, equation (8) is a strict condition imposed on the initially unknown noiseless model data set c*.sub.n as proposed in Osborne, M. R. and Smyth, G. K. (1991). A modified Prony algorithm for fitting functions defined by difference equations. SIAM Journal of Scientific and Statistical Computing, 12, 362-382. The formulae (8) are then treated as extra conditions to be satisfied in the minimization procedure for residual norm of equation (4). Methods of the present invention preferably find a set of autocorrelation coefficients a.sub.0, . . . , a.sub.K that define the K-harmonic signal being nearest to c.sub.n in the sense of the residual norm of equation (4), under NK conditions imposed on c*.sub.n. In matrix form these conditions read:
(72)
where is (K+1)-diagonal rectangular (NK)N Toeplitz matrix. The Lagrange method is preferably used to express the minimal residual norm for the difference between the windowed measured data c.sub.n and any K-harmonic signal which satisfies the conditions (8) with given autocorrelation coefficients. Therefore, the norm in matrix notation becomes
R(a)=N.sup.1
where H(a)=(a)
(73)
(74) Although minimization of R(a) with respect to a is a nonlinear problem, it is robust and practically realizable for arbitrary initial values of a.
(75) The problem of finding a minimal residual norm with respect to any possible K-harmonic signal is thus reduced to minimization of (10) with respect to all possible normalized sets of a.sub.k, that is R.sup.(K)=min R(a). The matrix H(a) is parametrically dependent on a.sub.k, which makes the said minimization problem nonlinear; nevertheless the function R(a) is smooth and generally has only one local minimum which represents the global minimum (its degeneracy with respect to the common phase of a.sub.k is not critical). Any known iterative method of numerical minimization to find R.sup.(K) and the minimizing set of a.sub.k, e.g. the method of gradient decent, or the method of conjugated gradients, gives a robust algorithm which is practically independent on the choice of initial values.
(76) Preferably, therefore, a method of harmonic inversion is used to find the set of autocorrelation coefficients. By this process, the most probable value for K is found, where the most probably value is taken to be when R.sup.(K) becomes substantially equal to the quantity representative of the noise.
(77) Having determined the harmonic component signals and their number K, and in doing so, determined the autocorrelation coefficients (the complex-value vector a=(a.sub.0 . . . a.sub.K)), the frequencies of the K-harmonic signals are determined by finding the roots of the K-order polynomial equation
a.sub.0+a.sub.1+a.sub.2.sup.2 . . . +a.sub.K.sup.K=0(13)
(78) The set of autocorrelation coefficients unambiguously defines the set of frequencies .sub.K in the windowed measured data and corresponding real-value frequencies f.sub.k in the larger data set with formulas
(79)
(80) The fact that the signal is K-harmonic and cannot be reduced to a (K1) harmonics with autocorrelation coefficients a.sub.0 and a.sub.K non-zero and all frequencies .sub.k unique within the Nyquist band, ensures that all roots .sub.k are non-zero and unique.
(81) If the harmonic signals are non-decaying, a particular case occurs. By setting additional constraints a.sub.Kk/
(82)
where index t runs in the range 0t<K/2 and (b.sub.0, . . . , b.sub.K) is real-value vector that obeys the normalization condition |b|.sup.2=.sub.k=0.sup.Kb.sub.k.sup.2=1 allows (10) to be rearticulated as
R.sup.(K)(b)=N.sup.1b.sup.TH(b)b, H(b)=Re{(a(b))
where elements of the (NK)(K+1) matrix C are
(83)
with 0t<K/2 (the last formula is only for even K).
(84) The amplitudes are found by minimization of the original norm of the residual expressed as R=(1/N).sub.n|c.sub.n.sub.kA.sub.k(.sub.k).sup.n|.sup.2. Having determined K and .sub.k, the noise-free signal can be reconstructed as
(85)
where A* is the vector of amplitudes. A* is determined by minimizing the norm of residual (4):
(86)
(87) This set of amplitudes that delivers the minimum to (19) appears as a solution to the system of linear equations R/=MAG=0
A*=M.sup.1G, R(A*)=R.sup.(K)=c.sub.n
(88) Fixating the phase is achieved by introducing an additional constraint to the amplitudes A=B*e.sup.bp where BRe and is the common phase.
(89) The method also allows for statistical assessment of the determined values for the amplitudes and frequencies, which provides qualitatively better data for further analysis of the mass spectrum. Reporting fidelity criteria for detected mass-to-charge peaks (such as confidence intervals for mass-to-charge ratios and abundances) significantly increases both specificity and selectivity of the informatics approaches that rely on mass spectra as an input.
(90) In a similar way to the evaluation of the number of harmonics in windowed measured data, the residual of the norm is treated as a random variable which parametrically depends on deviations. Any deviation A.sub.k of the amplitude from its most probable values A*.sub.k necessarily increases the residual norm by the value Rr.sub.k|A.sub.k|.sup.2>0; and any deviation f.sub.k of the frequency from its most probable values f*.sub.k necessarily increases the residual norm by the value Rq.sub.k|f.sub.k|.sup.2>0.
(91) The conditional cumulative probability function (under the condition that the number of harmonics is exactly K) is
(92)
and gives the probability that R exceeds R.sup.(K) by the value greater than R. Correspondingly, the probability that R>1.96 .sup.2/{square root over (N)} appears less than 0.05. The confidence intervals in which the harmonic's amplitude or frequency are found with the 95% fidelity are estimated as
(93)
correspondingly. The coefficients r.sub.k and q.sub.k are found either analytically or numerically.
(94) In view of the above description, embodiments of the invention may be performed according to the schematic block diagram depicted in
(95) Windowed transient measured data
(96)
is then first compiled from the larger measured data set, 6. The norm c.sub.n=N.sup.1.sup.2|c.sub.n|.sup.2 is calculated 14a, and a decision is made 14b: if the norm is greater than or equal to 1, it is taken that peaks are present in the windowed measured data and the procedure continues to 20; if not, the procedure passes to step 85 which will be described below.
(97) At step 20, a first value of K is selected which in the first iteration is, in this example, K:=1. The vector of autocorrelation coefficients a.sub.k is then calculated, 30, the values of a.sub.k being found which minimize the difference between the measured data and the model data set for the particular value of K, to give the measure of difference
(98) R.sup.(K)=c.sub.nc*.sub.n=min(N.sup.1.sup.TH(a)a). The measure of difference R.sup.(K) is then compared to the determined noise range, 40, and if the measure of difference R.sup.(K) lies within the noise range, the harmonic component signals and their number K, the measure of how many peaks are present in the windowed measured data, has been found, and the value of K is stored for future output, the procedure passing to step 60. If R.sup.(K) does not lie within the noise range, then a new value of K is selected, 50, in this case K:=K+1, and the new value of K is used in 30 to calculate a new value for R.sup.(K) and so on until the measure of difference R.sup.(K) does lie within the noise range and K has been found.
(99) At 60, peak frequencies are found by solving the K.sup.th-order polynomial a.sub.0+a.sub.1+a.sub.2.sup.2 . . . +a.sub.K.sup.K=0, giving frequencies in the larger measured data set
(100)
The frequencies are stored for future output and the procedure passes to 70, whereupon the peak amplitudes A.sub.k are found to minimize the norm of the residual:
(101)
(102) The amplitudes thus found are again stored for future output and the procedure passes to 80, finding the confidence intervals for the frequencies and amplitudes just obtained at 60 and 70. The confidence intervals are stored for future output, and the procedure passes to 85, where a decision is taken as to whether or not the windowed measured data just analysed, or partly analysed if the step is arrived at from 14b, was the last windowed measured data set from the larger measured data set. If not the procedure passes to 4b, at which the next windowed measured data is chosen, s.sub.0:=s.sub.k and the procedure then passes to 6 once again.
(103) Once all windowed measured data sets have been processed in the above way, the procedure passes to 90 whereupon the previously stored data comprising K, the list of frequencies, amplitudes and their confidence intervals is output, and the procedure terminates, 100. It will be appreciated that the output may be of various formats. For example, the frequencies are typically translated into mass-to-charge ratios and the amplitudes into ion abundances for output. The preferred output comprises a measure of the number of different species of ions, mass-to-charge ratios of ions together with their abundances and terms indicating confidences in those values, e.g. as a mass spectrum.
(104) To further illustrate methods of the invention, synthetic test data of three sine waves together with added noise is analysed, and the results illustrated in
(105)
(106)
(107)
(108)
(109)
(110) .sup.K for each value of K so as to minimise the residual R.sup.(K). It can be seen how the residual R.sup.(K) is closest to the noise level (dotted circle) for the space of K-harmonic signals
.sup.(K) where K=3.
(111) A comparison of the methods of the invention with the FDM is shown in
(112)
(113) At SNR=1000, a marked difference between the methods is very apparent. The FDM results, in
(114) As the SNR reduces further to SNR=100,
(115) At the very poor SNR of 10, the FDM predicts, in
(116) It can be seen that the FDM produces many extraneous peaks at all SNR. To use the FDM in practical situations these must be distinguished from signal peaks, but it can be seen from
(117) Methods of the present invention may also be applied to data from other types of spectroscopic analysis such as, for example, nuclear magnetic resonance (NMR), and infrared spectroscopy. In NMR the relaxation of the spins of atomic nuclei after excitation with electromagnetic pulses is recorded and the relaxation signals and their frequencies depend upon the surrounding of the nuclei, including the molecular structure. The observed spectroscopic frequencies (also called lines or peaks) are for example influenced by the coupling between adjacent nuclei, leading to frequency shifts and/or line splitting. Observed Nuclear spins are typically those of Hydrogen (1H), 13C and the less common 15N, 31P, 19F. The details of the method and the common methods of data evaluation are set out in various text books, including D. H. Williams and I. Fleming: Spectroscopic methods in organic chemistry, 4th ed., London 1989 (which additionally contains a chapters on UV, visible and infrared spectroscopy).
(118) Whilst, as already described, in mass spectrometry such as FT-MS the detected frequencies are usually representative of the mass-to-charge of ions, the ions following periodic motion within the mass analyser, and frequency differences correspond to mass-to-charge differences, in NMR the frequencies are representative of spin relaxation frequencies (i.e. the difference between the various, possibly split, excited state and ground state nuclear spin energy levels) and the differences are indicative of nuclear spin coupling energies and various other effects of the surroundings of a nucleus that have influence on the energy levels.
(119) The frequency range of NMR signals, following heterodyning, is relatively small compared to the range encountered in mass spectrometry, and the methods of the present invention are well suited for decomposing spectral harmonic signals from such data, to directly provide chemical shifts and line broadening information.
(120) As with Fourier Transform and FDM, methods of the present invention may also be extended into higher dimensions, such as, for example, two dimensional NMR. While in the most simplistic way the chemical shifts and broadenings (i.e. frequencies and line-widths) may be determined on a per-spectrum basis and the data simply stacked in the additional dimension, preferably the data is directly handled in multiple dimensions, as it is for example in synthetic aperture radar (SAR) applications (Carrara et al. Spotlight Synthetic Aperture Radar, Boston 1995), or in conventional 2-D FT-NMR applications (see e.g. Peter Giintert, Volker Dtsch, Gerhard Wider and Kurt Wthrich: Processing of multi dimensional NMR data with the new software PROSA; Journal of Biomolecular NMR, 2 (1992) 619-629). Examples of the extension to additional dimensions using the FDM are provided by Vladimir A. Mandelshtam, Howard S. Taylor, and A. J. Shaka: Application of the Filter Diagonalization Method to One- and Two-Dimensional NMR Spectra; Journal of Magnetic Resonance 133, 304-312 (1998), article number. MN981476. Such direct multidimensional processing has, inter alia, the advantage of better localization of the frequencies in the multiple dimensions and correct abstraction from or interpolation between the separate spectra. A further advantage is improved signal-to-noise ratio.
(121) Pre-processing using conventional fast Fourier transform methods may be used to guide the sectioning of the two-dimensional data for optimum processing by the method of the invention, e.g. aid the selection of rectangles in frequency and time to be processed together. It may also be used to control subsequent acquisitions within the same experiment, e.g. data dependent ion selection and/or fragmentation in mass spectrometry or e.g. re-acquisitions with adjusted settings (pulse sequences) in 2D-NMR spectrometry.
(122) Both the basic method of the invention and it's extension to multiple dimensions mayin addition to the improved determination of frequencies (i.e. masses, IR spectral lines, chemical shifts, radar objects etc.) and intensitiesbe used as a means for optimal data compression in recorded data by retaining only the K identified frequency/intensity datasets, preferably together with aggregate information on the noise/background, such as sigma used during the determination of K.
(123) Accordingly in another aspect the present invention provides a method of spectrometry comprising:
(124) providing measured data comprising a combination of periodic signals and noise measured over time using a spectrometer;
(125) determining a quantity representative of the noise in the measured data, and
(126) determining a model data set of K-harmonic component signals from the measured data;
(127) wherein the harmonic component signals and their number K are determined iteratively on the basis of: (i) using an initial value of K to calculate a minimised non-negative measure of difference R.sup.(K) between the measured data and model data comprising data sets of K-harmonic component signals, and (ii) if R.sup.(K) does not lie within a noise range based upon the quantity representative of noise, changing the value of K and recalculating R.sup.(K) as many times as necessary until R.sup.(K) does lie within the noise range;
(128) and deriving spectroscopic information from the model data set, the spectroscopic information comprising one or more of: a measure of the number of harmonic component signals; a measure of the frequencies of the harmonic component signals; a measure of the signal intensity of the harmonic component signals.
(129) The method of spectrometry may comprise a method of mass spectrometry, a method of NMR spectroscopy, or a method of infrared spectroscopy. The spectroscopic information from the model data set in NMR spectroscopy methods may further comprise resonance frequencies, chemical shifts and intensity (abundance) information concerning the nuclei. The spectroscopic information from the model data set in infrared spectroscopy methods may further comprise absorption frequencies and intensity (abundance) information concerning chemical groups.
(130) The present invention still further provides a method of data compression comprising decomposing measured data comprising signal and noise measured over time using a spectrometer into a sum of K harmonic component signals and a noise component, wherein the harmonic component signals and their number K are derived from the measured data and a determined quantity representative of the noise in the measured data.
(131) As used herein, including in the claims, unless the context indicates otherwise, singular forms of the terms herein are to be construed as including the plural form and vice versa.
(132) Throughout the description and claims of this specification, the words comprise, including, having and contain and variations of the words, for example comprising and comprises etc, mean including but not limited to, and are not intended to (and do not) exclude other components.
(133) It will be appreciated that variations to the foregoing embodiments of the invention can be made while still falling within the scope of the invention. Each feature disclosed in this specification, unless stated otherwise, may be replaced by alternative features serving the same, equivalent or similar purpose. Thus, unless stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
(134) The use of any and all examples, or exemplary language (for instance, such as, for example and like language) provided herein, is intended merely to better illustrate the invention and does not indicate a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
(135) Any steps described in this specification may be performed in any order or simultaneously unless stated or the context requires otherwise.
(136) All of the features disclosed in this specification may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. In particular, the preferred features of the invention are applicable to all aspects of the invention and may be used in any combination. Likewise, features described in non-essential combinations may be used separately (not in combination).