METHOD FOR EVALUATING DATA FROM MASS SPECTROMETRY, MASS SPECTROMETRY METHOD, AND MALDI-TOF MASS SPECTROMETER
20230160905 · 2023-05-25
Inventors
Cpc classification
H01J49/0036
ELECTRICITY
International classification
Abstract
The invention relates to a method to evaluate mass spectrometry data for the analysis of peptides from biological samples, particularly MALDI-TOF mass spectrometry data, comprising the steps of: providing expected mass defects; determining measured mass defects, i.e. the mass defects resulting from the mass spectrometry data; and comparing the measured mass defects with the expected mass defects.
Claims
1. Method of evaluating mass spectrometry data for the analysis of molecules, which contain the five chemical elements carbon, hydrogen, oxygen, nitrogen and sulfur, from biological samples, comprising the following steps: a) providing expected mass defects; b) determining measured mass defects, i.e., the mass defects resulting from the mass spectrometry data; c) comparing the measured mass defects with the expected mass defects; and creating a mass defect diagram from an average spectrum formed over a plurality of spectra of a mass spectrometric measurement, which employs an ionization mechanism that substantially generates molecules with a single positive charge, wherein creating the mass defect diagram includes: determining a list of local maxima and their respective m/z values, wherein for each m/z value a deviation from a respective nearest mass corresponding to a theoretical molecule mass model is determined, determining for every m/z value m a nominal mass m.sub.N for which a modulus of the deviation between m and the mass expected according to the theoretical molecule mass model is minimized, wherein a minimum deviation can assume values from −0.5 to 0.5, and entering positions of local maxima into the diagram whose horizontal axis corresponds to the mass or the m/z value, and on whose vertical axis the deviation from the theoretical molecule mass model determined above is plotted.
2. The method according to claim 1, wherein the molecules are peptides and the expected mass defects are calculated from
m.sub.N r.sub.p, where m.sub.N designates the nominal mass of a peptide and r.sub.p is a scalar between 10.sup.−3 and 10.sup.−4.
3. The method according to claim 1, wherein a mass defect for a measured mass m is calculated from
4. The method according to claim 3, wherein a discrepancy δ.sub.P between a measured and an expected mass defect is calculated directly from said measured mass m as
5. The method according to claim 1, wherein the median of the measured mass defects is formed and compared with the expected mass defects in order to compare the measured mass defects with the expected mass defects over subintervals of a mass axis.
6. The method according to claim 1, wherein the measured values are corrected when the data are to be used further, i.e., depending on the deviation of the measured mass defects from the expected mass defects.
7. The method according to claim 1, wherein the measured mass defects are calculated for local maxima of the spectral intensities.
8. Mass spectrometric method for the analysis of molecules, which contain the five chemical elements carbon, hydrogen, oxygen, nitrogen and sulfur, from biological samples, comprising the following steps: a) carrying out one or more mass spectrometric analyses on the biological sample and providing data which result from the mass spectrometric analyses; and b) carrying out the method according to claim 1.
9. The method according to claim 8, wherein the mass spectrometry data are MALDI-TOF mass spectrometry data.
10. A mass spectrometer, having an ionization mechanism that substantially generates molecules with a single positive charge, and a control unit for the analysis of molecules, which contain the five chemical elements carbon, hydrogen, oxygen, nitrogen and sulfur, from biological samples using a method according to claim 1.
11. The mass spectrometer according to claim 10, including a time-of-flight (TOF) analyzer.
12. The method according to claim 1, further comprising using at least one of calculatory and visual means in order to assess a quality of the data on the basis of the comparison in step c).
13. The method according to claim 1, wherein the method is used for one or more of (i) quality control and (ii) signal correction of the mass spectrometry data.
14. The method according to claim 1, wherein the mass defect diagram is visualized and displayed on a screen.
15. The method according to claim 1, wherein m/z values or molecular mass are used in daltons (Da) as a multiple of the atomic mass unit (amu).
16. The method according to claim 1, wherein the molecules are metabolites which are metabolic products, including but not limited to: lipids, carbohydrates, and breakdown products from substances taken up from food or the environment.
17. The method according to claim 1, wherein the biological samples are tissue samples.
18. The method according to claim 17, wherein the tissue samples (i) are tissue sections or (ii) comprise tissue cells.
19. The method according to claim 1, wherein the mass spectrometry data are MALDI-TOF mass spectrometry data.
20. The method according to claim 1, wherein the molecules are biological macromolecules.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0061] Further features of the invention result from the rest of the description and the Claims. Advantageous example embodiments of the invention are explained below in more detail with the aid of drawings. They show:
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
DETAILED DESCRIPTION
[0072] The molecules recorded in a MALDI measurement comprise metabolites and peptides in particular. Metabolites are metabolic products and can have various chemical forms, e.g. lipids, carbohydrates or breakdown products from substances taken up from food or the environment. Their masses are typically less than 1,000 Da. Peptides, in contrast, are chains of amino acids with masses of up to 5,000 Da and more.
[0073] All 23 amino acids which occur in proteins—and thus all peptides—consist of the five chemical elements carbon, hydrogen, oxygen, nitrogen and sulfur. For all peptides, the relative proportion of these elements is approximately the same, irrespective of their total mass, so the mass defect of a peptide is essentially determined by its nominal mass. The result is an almost linear relationship between the mass m of a peptide and its nominal mass m.sub.N:
m≈m.sub.P(m.sub.N)=(1+r.sub.P)m.sub.N, where r.sub.p≈4.95×10.sup.−4.
[0074] The variance of the true masses around the theoretical average m.sub.P is relatively small; its standard deviation can be estimated using
σ.sub.P(m.sub.N)=σ.sub.0+s.sub.P m.sub.N, where σ.sub.0≈0.02, s.sub.P≈2.0×10.sup.−5
[0075] The large number of different proteins and the peptides resulting therefrom in biological tissue cells means a typical MALDI spectrum has signal intensities at practically all m.sub.P(m.sub.N) for a broad range of nominal masses m.sub.N. When a sum or average spectrum is formed from several spectra obtained during the measurement of a tissue sample, see
[0076] The determination of the mass of a molecule is subject to an error which originates mainly from two causes: on the one hand, the time of flight of a molecule can only be measured with a certain accuracy and in discrete intervals, which results in a discretization of the mass axis, i.e. a subdivision into successive intervals (m/z bins). The width of the m/z bins is usually not constant, but increases toward higher masses.
[0077] On the other hand, the time of flight of the molecule depends not only on its mass, but also on its original state within the ion cloud at the start of the acceleration. This original state, in particular speed and direction of motion of the molecule, are largely unknown and lead to a significant measurement error, which is usually corrected by a calibration after the measurement.
[0078] The commonly used calibration methods include external calibration and statistical peptide calibration. External calibration involves placing several drops of a solution with defined constituents next to the tissue sample before the measurement. The spectra measured therein are compared with the expected masses of the known constituents after the measurement and a calibration curve is determined for the m/z axis of a spectrum. For peptide calibration, the aforementioned relationship between the true mass of a peptide and its nominal mass is utilized to shift the peak positions which are presumed to belong to a peptide to the theoretically expected m/z values, see Wool A, Smilansky Z: Precalibration of matrix-assisted laser desorption/ionization-time of flight spectra for peptide mass fingerprinting. Proteomics 2002, 2, 1365-1373.
[0079] Neither calibration method can completely correct the errors in the m/z values. External calibration, moreover, requires a manual interaction, while peptide calibration requires a large amount of computation and is time-consuming.
[0080] Since with this method, compensation of the mass errors is done globally for all spectra of a measurement, differing errors in the spectra of a data set cannot be corrected in this way. As an alternative, an internal calibration is therefore used also, wherein the calibration solution is distributed over the tissue sample being analyzed, thus enabling an individual correction for each spectrum of the measurement.
[0081] For practical reasons, a calibration solution can only contain a small number of known substances. This limits the number of reference points from which the calibration curve is determined, and thus the accuracy of the calibration. This form of calibration moreover requires a manual user interaction.
[0082] In contrast, methods of statistical peptide calibration (see Wool A, Smilansky Z: Precalibration of matrix-assisted laser desorption/ionization-time of flight spectra for peptide mass fingerprinting. Proteomics 2002, 2, 1365-1373; Wolski W E, Lalowski M, Jungblut P, and Reinert K. Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants. BMC bioinformatics, 6(1):203, 2005) are fully automatic and do not need a calibration solution. With these methods, the correction is performed by comparing the masses measured in the tissue with a theoretical peptide mass model (see above) and a peptide database. These methods require a prior peak picking, i.e. an identification of relevant peaks in a spectrum, are very time-consuming, and can lead to defective results due to an incorrect assignment between peak and peptide database.
[0083] To visualize the mass defects observed in a spectrum, the m/z values of the peaks found in a spectrum are plotted in a diagram, whose horizontal axis corresponds to the mass m (or m/z value), while their decimal places m-floor(m) are plotted on the vertical axis, see
[0084] Moreover, peptides can specifically be chemically modified in such a way that they exhibit a mass defect which differs significantly from the peptide mass model (also known as an averagine model) and can be distinguished from unmodified peptides with the aid of this deviation, cf. Chen X, Savickas P, Vestal M. Methods and systems for mass defect filtering of mass spectrometry data. U.S. Pat. No. 7,634,364, filed 2006 Jun. 23, granted 2009 Dec. 15; Yao X, Diego P, Ramos A A, Shi Y. Averagine-scaling analysis and fragment ion mass defect labeling in peptide mass spectrometry. Anal. Chem. 2008 Oct. 1; 80(19):7383-91. doi: 10.1021/ac801096e; Sleno L. The use of mass defect in modern mass spectrometry. J. Mass. Spectrom. 2012, 47: 226-236. doi:10.1002/jms.2953.
[0085] With this method of mass defect filtering, the mass defect determined for a spectral peak is used to chemically characterize the corresponding molecule in more detail. A precondition for this method is thus that the accuracy of the mass determination is sufficiently high.
[0086] A representation which differs from the usual mass scale is occasionally used to graphically visualize the mass defect filtering. In this representation the deviation from the nearest mass corresponding to the averagine model in each case is plotted in the vertical direction instead of the mass defect, cf. Yao X, Diego P, Ramos A A, Shi Y. Averagine-scaling analysis and fragment ion mass defect labeling in peptide mass spectrometry, Anal. Chem. 2008 Oct. 1; 80(19):7383-91. doi: 10.1021/ac801096e.
[0087] In the particular context, these diagrams serve merely to illustrate the method with the aid of exemplary, synthetically computed peptide masses. No application of this form of representation to actually measured data is known.
[0088] A diagram known as a peptide mass defect diagram (PMD) can be created from an average spectrum formed over several spectra of a MALDI measurement. To this end, a list of all local maxima and their respective m/z values is determined, and for each m/z value the deviation from the respective nearest mass corresponding to the theoretical peptide mass model is determined. Assuming that the measured signals are attributable to peptides, for every m/z value m the nominal mass m.sub.N is now determined for which the modulus of the deviation between m and the mass expected according to the theoretical peptide mass model (averagine model, see above) m.sub.P(m.sub.N) is minimized (see below). The minimum deviation δ.sub.P(m), which can assume values from −0.5 to 0.5, is known as the peptide model distance. The peptide model distance corresponds to the above-described discrepancy between measured and expected mass defect.
[0089] The positions of all local maxima are now entered into a diagram whose horizontal axis corresponds again to the mass or the m/z value, and on whose vertical axis the deviation from the peptide mass model determined above is plotted.
[0090] Compared to the familiar mass defect diagram, the PMD is therefore generated by means of a transformation which reproduces the positions of the theoretically expected peptide mass defects onto the reference line 22, which is a horizontal zero line. Moreover, the PMD differs from the known representations by virtue of the fact that no prior specific signal analysis is carried out, in particular no identification of significant peptide peaks (peak picking). Rather, the PMD essentially reflects statistical characteristics of the spectral background signal (cf.
[0091] The following quality characteristics of a spectrum can easily be read off from a PMD: [0092] 1. Mass range with peptide signals: A clearly recognizable band (“peptide band 21”) close to the reference line 22 points to the presence of peptide signals in the mass range in question. Where the band structure is lost in an unstructured point cloud (typically recognizable at the top end of the mass axis, to the right of an upper limit 30), the peptide signal is lost in the noise (
[0096] Compared to visualization with the aid of conventional mass defect diagrams, the above-stated quality characteristics of a spectrum can be recognized much more clearly in a PMD. In particular, even smaller mass shifts, or those restricted to subsections of the mass axis, can be detected more easily as deviations from the horizontal reference line 22.
[0097] A PMD can also be formed for an individual spectrum or for the maximum spectrum over several individual spectra (so-called skyline spectrum), rather than for an average spectrum. This representation does not provide as much information, however.
[0098] In addition to the pure visualization, the information shown in a PMD can also be evaluated quantitatively as follows (see also mathematical formulation further below): [0099] 1. Determination of the discrepancy between measured and expected peptide mass defect as a function of mass. To this end, the median of the mass defects over subintervals of the mass axis is formed and compared with the expected value. [0100] 2. Determination of the variance of the mass defects about their average. To this end, the interquartile range of the mass defects is determined over subintervals and converted into a specified multiple of the standard deviation of an assumed normal distribution. [0101] 3. Determination of the mass range with recognizable peptide signal. To this end, the variance of the mass defects determined from the data (range between lines 26, 27) is compared with the width of the reference interval (range between lines 23, 24) and the range is determined in which the deviation remains within a selected tolerance.
[0102] This quantitative information can be displayed in the PMD and also presented numerically or processed further to evaluate the quality of the measurement.
S=(s.sub.j, m.sub.j).sub.j=1 . . . n, where n ∈, 0<m.sub.1< . . . <m.sub.n
describes an (individual, average or skyline) spectrum which consists of the n intensities s.sub.1 . . . s.sub.n for the m/z values m.sub.1 . . . m.sub.n.
floor(x) for x>0
designates the integer part of a positive number x, where
φ(x)=x−floor(x) for x>0
designates the part of a positive number x after the decimal point.
[0103] The PMD of the local maxima of S consists of the graphical representation of the points
Where μ ∈ designates the radius of the local environment, via which the local maxima are formed, and the function
describes the signed discrepancy between the mass defect expected for a peptide and the one actually measured. The above representation of the discrepancy δ.sub.p(m) is derived as follows:
[0104] The theoretically expected mass defect of a peptide with nominal mass m.sub.N is
m.sub.p−m.sub.n=(1+r.sub.p)m.sub.N=r.sub.pm.sub.N.
For an actually measured mass m of a peptide, its nominal mass is assumed to be the integer mass m.sub.N for which the absolute difference
|m−(1+r.sub.p)m.sub.N|
is minimized. This leads to
The discrepancy δ.sub.p(m) results from the difference between measured and expected mass defect
Weighting the difference in the mass defects with 1/(1+r.sub.p) serves to normalize δ.sub.p(m) to the range of values [−0.5 . . . 0.5].
[0105] The reference line 22 of the theoretically expected average mass defects of peptides is described by the zero line δ.sub.p=0. To determine the reference interval (lines 23, 24), the expected variance v(m) of the positions of the local maxima is considered as a function of the mass, which can be estimated by the sum of the variance of the true peptide masses σ.sup.2.sub.P and the variance originating from the discretization of the mass axis,
Here Δm(m) designates the width of the m/z bins at mass position m. The reference interval is formed by the limiting lines 23, 24 or
d.sub.P.sup.1,2(m)=±μ√{square root over (v(m))},
where the scaling factor μ>0 gives the width of the interval as a multiple of a standard deviation (typically μ=2).
[0106] For the spectrum , a partitioning l of the mass axis shall be given in pairs of disjoint intervals I.sub.k:
I=(I.sub.k).sub.k=1, . . . , K where K ∈, U.sub.kI.sub.k=[m.sub.1, m.sub.n].
For a PMD, in which the points
are shown, the discrepancy
E.sub.k=median{δ.sub.P(m.sub.i): i ∈ L ∩ I.sub.k}
is formed to determine the mass discrepancy E(m) for the subintervals I.sub.k. The E.sub.k are shown as points above the respective midpoints of the corresponding subintervals I.sub.k, and a suitable interpolation is carried out in between (e.g. linear). The variance e(m) of the mass defects is similarly formed from the interquartile ranges (IQR),
where the scaling factor μ>0 again gives the width of the interval as a multiple of a standard deviation, typically μ=2, and erf designates the Gaussian error function. The mass range with recognizable peptide signal is determined to be that part of the mass axis for which the ratio of observed (lines 26, 27) and expected variance (lines 28, 29) remains below a specified tolerance threshold t:
A typical tolerance value is t=1.2. The positions of the outer edges of M.sub.P can be drawn in the PMD as vertical lines.
[0107] The above-described representation of a spectrum in a PMD can be applied, in principle, for both average spectra and individual spectra. It requires the identification of local maxima in the relevant spectrum, however, and thus a sufficiently high signal-to-noise ratio, which typically does not exist for individual spectra.
[0108] This disadvantage can be circumvented by representing the spectra in a peptide mass defect histogram (PMH). This is created by presenting all spectral intensities for all m/z bins of a spectrum in a 2D histogram, in which the horizontal axis again corresponds to the mass axis, and the vertical axis represents the peptide model distance to the relevant mass (see below). Both axes are uniformly subdivided into pre-selected numbers of subintervals (typically 20-50, can be different for each axis), thus partitioning the diagram area into rectangular tiles.
[0109] The spectrum under analysis is now interpolated to an m/z resolution which corresponds to the selected subdivision of the mass defect axis. All those intensity values of the interpolated spectrum whose masses and mass defects fall within the relevant subintervals of the horizontal or vertical axis are then summed for each tile.
[0110] For the graphical illustration, all the tiles can finally be visualized using a suitably selected gray scale or color scale corresponding to the summed intensities. As in the PMD, the reference line 31 and the reference intervals 32, 33 are additionally drawn in (
[0111] In analogy with the quantitative evaluation of a PMD, characteristic quantities of a spectrum can also be calculated from a PMH—and thus also for individual spectra, too. To this end, an evaluation of the vertically arranged summed intensity values is carried out for each subinterval of the horizontal mass axis to determine cluster points and variance values from this.
[0112] It must be noted here that the top and bottom edge lines of a PMH, i.e. the points associated with the extreme distance values +0.5 and −0.5, can be considered to be identical. Circular statistics are therefore suitable to describe the distribution of the summed intensity values in the vertical direction. The first circular moment Z in particular can be used as the (complex-valued) statistics (mathematical formulation see below). The circular moments Z for all the subintervals of the mass axis taken together are called the mass shift profile of the spectrum considered. The complex argument of Z corresponds (apart from a factor 2π/(1+r.sub.P)) to the discrepancy between measured and expected masses. The value of Z provides a reciprocal measure for the variance of the measured peptide model distances: The value Z=0 corresponds to a maximum variance of all the measurements over the interval [−0.5 . . . 0.5], while in the extreme case of a minimum variance where all distance values are identical, Z assumes a value with modulus 1.
[0113] For the actual calculation of the mass shift profile Z, the two steps to form the 2D histogram and the computation of the circular moments can be combined and expressed as Fourier integrals of the spectrum over the subintervals of the mass axis (see below). These integrals can be numerically approximated with the aid of suitable integration rules (for example the trapezoidal rule or Simpson's rule). It is also possible here to forgo a finer discretization and interpolation of the spectrum and to calculate directly with the discrete spectral intensities in the resolution originally available.
[0114] The mass shift profile provides an estimate of the measurement errors of the measured masses occurring in a spectrum with respect to the true masses. In practice it is often desirable to correct these shifts and thus achieve a higher accuracy for the measured masses of a spectrum.
[0115] On the other hand, the mass shift profile is obtained by comparing the measured data with the relatively simple, linear averagine model (see above). The estimation of the mass errors through the mass shift profile cannot therefore be more accurate than the accuracy of the model itself, which is not sufficiently high for many applications, at least in the lower mass range up to approx. 1,000 Da. A correction of the measured masses by the estimated measurement error can therefore lead to parts of the measurement becoming less accurate.
[0116] For many applications, however, it is not absolute mass accuracy which is decisive, but rather the best possible comparability between individual spectra from one and the same or from several measurements. The absolute measurement error of the measured masses of a spectrum is less relevant in these cases than the differences of the measurement errors within an ensemble of spectra.
[0117] The method of mass shift normalization consists in initially determining the respective mass shift profile for each spectrum in an ensemble (see above), forming a common, average reference profile from all the individual mass shift profiles, and finally modifying each spectrum in such a way that the mass shift profile of the modified spectrum corresponds to the reference profile. The relative deviation between the signal peaks of the individual spectra belonging to one and the same peptide is reduced, and the comparability of the spectra is enhanced (
[0118] The reference profile is determined by forming the arithmetic mean element by element (see below). To normalize a single spectrum to the reference profile, relative shift values are determined for the individual subintervals for which the mass shift profiles were calculated, and these values are interpolated over the whole mass axis. The measured mass values of the spectrum are then corrected by these interpolated shift values.
[0119] By applying these shifts, each individual spectrum is given its own mass axis. For a joint evaluation of an ensemble of spectra, it is usually desirable for all spectra to be defined on a common mass axis. This can be achieved by forming a common mass axis (for example by averaging over all the individual mass axes or by selecting an arbitrary mass axis as the reference mass axis) and subsequently interpolating each normalized spectrum to the common mass axis.
[0120] Peptide mass defect histogram:
S=(s.sub.j,m.sub.j).sub.j=1 . . . m mit n ∈, 0<m.sub.1< . . . <m.sub.n
designates, as above, an (individual, average or skyline) spectrum consisting of the n intensities s.sub.1 . . . s.sub.n for the m/z values m.sub.1 . . . m.sub.n.
[0121] As above, for a spectrum , there shall be a partitioning l of the mass axis as well as a further partitioning J of the interval [−0.5 . . . 0.5],
For the partitionings l and J,
Γ.sub.k,l={m ∈ I.sub.k:δ.sub.P(m)∈ J.sub.j}
provides a finer partitioning of the mass axis, where the subintervals Γ.sub.k,l are assigned to the individual tiles of the 2D histogram. Furthermore, an interpolating line of the spectrum S shall be given by a continuous function {tilde over (S)}(m),
{tilde over (S)}:[m.sub.l,m.sub.n].fwdarw., mit {tilde over (S)}(m.sub.j)=s.sub.j, j=1 . . . n.
[0122] The matrix H(S)=(h.sub.k,l) shall be defined by
For numerical calculation of h.sub.k,l, one can select {tilde over (S)} as a linear interpolant of S, for example. The integrals can then be evaluated exactly. To form the PMH, the matrix H(S) is depicted as a gray scale or false color image.
[0123] Mass Shift Profile:
With the designations and definitions of the previous section, the first circular moment of the columns of H is given by the complex quantities
where
For the specific numerical calculation of these integrals, a suitable integration formula (e.g. trapezoid formula or Simpson's rule), and as reference points of {tilde over (S)} especially the discrete measurement points (m.sub.j).sub.j=1 . . . n can be selected. Because of {tilde over (S)}(m.sub.j)=s.sub.j, it is then not necessary to explicitly interpolate the spectrum S.
[0124] Mass Shift Normalization:
There shall be an ensemble of N spectra S.sup.i(i=1 . . . N), which have a common mass axis (m.sub.j).sub.j=1 . . . n. Furthermore, there shall be a partitioning l of the mass axis into K subintervals, as above. The mass shift profiles computed for these partitionings for the individual spectra S.sup.i shall be designated by Z.sup.i:
Z.sup.i=(Z.sub.k.sup.i).sub.k=1 . . . K for i=1 . . . N
Now S=(s.sub.j,m.sub.j).sub.j=1 . . . n shall be an arbitrary spectrum defined over the same mass axis with mass shift profile Z=(Z.sub.k).sub.k=1 . . . K. For each subinterval of the partitioning l, a relative displacement
is now determined, where arg(z)∈(−π,π] designates the complex argument function. The individual shifts Δ.sub.k are assigned to the midpoints of the subintervals I.sub.k and interpolated over the complete mass axis (typically by means of linear interpolation). A shift vector Δ*=(Δ*.sub.j).sub.j=1 . . . n is thus obtained. The normalized spectrum S* is obtained by applying the shift values to the m/z values of the spectrum ,
S*=(s.sub.j,m*.sub.j).sub.j=1 . . . n, where m*.sub.j=m.sub.j+Δ*.sub.j,