Evaluation of complex mass spectrometry data from biological samples
11062891 ยท 2021-07-13
Inventors
Cpc classification
G01N33/5008
PHYSICS
H01J49/0036
ELECTRICITY
G01N33/53
PHYSICS
G16B40/10
PHYSICS
G01N2496/25
PHYSICS
G01N33/4833
PHYSICS
G01N24/087
PHYSICS
International classification
G01N33/53
PHYSICS
G01N33/50
PHYSICS
G16B40/10
PHYSICS
Abstract
The disclosure relates to a method which is suitable for the quality control and signal correction of mass spectrometry data of biological tissue samples and is based on the analysis of the chemical background signal observed in a spectrum. It exploits the fact that the chemical background signal contains components from a plurality of polymer molecules, whose chemical structure has strong regularities. These regularities mean that the observed masses are subject to certain statistical distributions, which are each characteristic of the class of molecule. By analyzing these statistical properties, it is possible to detect and correct any mass shifts which may be present.
Claims
1. A method for the evaluation of mass spectrometric measurement data for the analysis of tissue samples, comprising the steps of: (a) providing a tissue sample which contains polymers having varying linkages of characteristic molecules; (b) processing the tissue sample in order to prepare at least two types of polymers and molecules derived therefrom and render them accessible for a subsequent mass-spectrometric measurement; (c) acquiring mass spectra of the processed tissue sample; (d) determining the mass shift of the polymer mass signals imaged in the mass spectra, where a mass shift represents the deviation of a measured mass signal from the adjacent molecular mass signal to be expected on a mass scale, and the totality of the molecular mass signals to be expected is calculated using a theoretical model of at least one type of polymer; (e) evaluating the mass shifts determined; and (f) assessing a quality of the mass spectra according to the evaluation.
2. The method according to claim 1, wherein, in order to create a theoretical model of at least one type of polymer, the mass is assumed to be approximately proportional to the nominal mass, and all natural numbers in a suitably selected range are taken into consideration as possible nominal masses for this type of polymer.
3. The method according to claim 1, wherein the polymers comprise biopolymers of the type of proteins, peptides, N-glycans and/or lipids.
4. The method according to claim 1, wherein the determination of the mass shifts comprises the calculation of the quantity
5. The method according to claim 4, wherein the scaling factor for N-glycans is .sub.G=1+3.510.sup.4, and for proteins and peptides .sub.P=1+4.9510.sup.4.
6. The method according to claim 4, wherein the Kendrick profile for a mass spectrum is estimated by: (i) determining the positions of the local maxima of the mass spectrum, plotting them as a point cloud in the Kendrick diagram, and estimating a distribution function by means of standard methods of density estimation; or (ii) forming a two-dimensional histogram from the spectral intensities of a mass spectrum in the plane of the Kendrick diagram so that the intensities occurring in each histogram tile are summed up and, after normalization in the vertical direction, each column corresponds to a numerical approximation of the Kendrick profile for the relevant mass interval.
7. The method according to claim 4, wherein the evaluation comprises: (i) the calculation of circular moments greater than the first circular moment in order to detect more than one cluster point of Kendrick shifts; and/or (ii) a Hough-type transform for recognizing the structure.
8. The method according to claim 7, wherein the steps to calculate the Kendrick shifts and to calculate circular moments are combined and the nth circular moment is expressed as per the equation
9. The method according to claim 8, wherein the deviation in the range of the subinterval I.sub.k of the mass scale is calculated from at least two circular moments for different n as per the equation
10. The method according to claim 9, wherein the .sub.0.
11. The method according to claim 10, wherein one of the values 0, 1 or 2 is chosen for the exponent .
12. The method according to claim 7, wherein the Hough-type transform is carried out on the mass values m.sub.i determined from an acquired spectrum, associated Kendrick shifts .sub.i and appropriately selected density values p.sub.i as per the equation
13. The method according to claim 12, wherein r is discretized in the interval [10.sup.3 . . . 10.sup.3].
14. The method according to claim 12, wherein the density values p.sub.i correspond to the spectral intensities of an acquired spectrum, and the m.sub.i to the respectively associated mass values.
15. The method according to claim 12, wherein the m.sub.i correspond to the mass values for local maxima of an acquired spectrum and the p.sub.i are chosen to be unity.
16. The method according to claim 1, wherein the evaluation and the assessment of the quality of the mass spectra encompasses determining a deviation between the mass scale used for the acquisition of the mass spectra and a mass scale derived from the mass shifts.
17. The method according to claim 16, wherein the evaluation comprises: (i) an absolute mass calibration of individual mass spectra by identifying a dominant component and its characteristic for an individual mass spectrum and, from this, determining a correction function for a mass calibration; and/or (ii) a relative correction of the mass scales of an ensemble of mass spectra with respect to each other so that the relative shifts between the mass spectra are minimized.
18. The method according to claim 16, wherein the deviation is determined as per one of the equations: (i) (m)={circumflex over (r)}m+{circumflex over (d)}; or (ii) (m)=({circumflex over (r)}m+{circumflex over (d)}+), where ({circumflex over (r)}, {circumflex over (d)}) describes a local maximum of a Hough transform H(r, d).
19. The method according to claim 18, wherein the local maximum ({circumflex over (r)}, {circumflex over (d)}) corresponds to the global maximum of the Hough transform H(r, d).
20. The method according to claim 1, wherein the processing of the tissue sample includes the action of agents and/or reagents.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention can be better understood by referring to the following illustrations. The elements in the illustrations are not necessarily to scale, but are primarily intended to illustrate the principles of the invention (largely schematically). In the illustrations, the same reference characters designate corresponding elements in the different views.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION
(16) While the invention has been described and explained with reference to a number of embodiments, those skilled in the art will recognize that various changes in form and detail can be made hereto without departing from the scope of the technical teaching defined in the enclosed claims.
(17) The method outlined in the introduction is conditional on there being no significant signal components from other classes of molecules in the spectrum in addition to the class of molecule investigated. When this condition is not fulfilled, further structures appear in the Kendrick diagram in addition to the structure that is actually of relevance, and these may also be superimposed on the former. This situation can even occur with MALDI imaging data obtained from proteins and peptides in the form of MALDI matrix-substance cluster ions (see
(18) An extension of the method is described below which can also be applied when signal components of several classes of molecules superimpose on each other in the spectra. A variety of methods are described which allow both an absolute calibration of individual spectra as well as a relative correction of an ensemble of spectra.
(19) The method is principally suitable for spectral data in which signal components from one or more classes of polymer molecules are present. The term polymers is used here for molecules which consist mainly of a combination of several identical or chemically similar groups of atoms. In relation to biopolymers, this includes not only the aforementioned proteins and peptides but particularly glycans, which consist of a tree-like arrangement of monosaccharides (single sugar molecules), or lipids, which contain one or more hydrocarbon chains as key constituents. Matrix substances such as sinapic acid, 2,5-dihydroxybenzoic acid, -cyanohydroxy cinnamic acid or 2,4,6-trihydroxyacetophenone can also polymerize to clusters and appear in mass spectra as a correspondingly regular background signal.
(20) Hereinafter, particular consideration is given to the class of molecules known as N-glycans, since these are very important for biomedical applications and, at the same timewith suitable sample preparation, e.g. a digest with enzymes such as an endoglycosidase (e.g. peptide n-glycosidase F, PNGaseF)can be detected with a high signal quality in MALDI imaging data (see Drake, R. R., T. W. Powers, E. E. Jones, E. Bruner, A. S. Mehta and P. M. Angel (2017): MALDI Mass Spectrometry Imaging of N-Linked Glycans in Cancer Tissues. Advances in Cancer Research 134: 85-116).
(21) The peptide mass model describes the statistical distribution of the exact masses of peptides as a function of their nominal masses. Converting exact masses m to Kendrick mass shifts (relative to a fixed scaling factor ), one obtains a theoretical, probabilistic model of Kendrick shifts. This can be described mathematically by a set of probability density functions P=P(; m.sub.N), which is parameterized by the nominal mass m.sub.N. Since the range of values of the Kendrick shift, the interval [ . . . +], is identified with the circle, these are circular distributions. The probabilistic model P is called the Kendrick profile for the scaling factor .
(22) When peptides are considered, the theoretical Kendrick profile consists of normal distributions with a mean of zero and a standard deviation which increases with mass (see Wolski, W. E., M. Farrow, A. K. Emde, H. Lehrach, M. Lalowski and K. Reinert (2006): Analytical model of peptide mass cluster centres with applications. Proteome Sci 4: 18). In general, such a model results when the expected mass defect for a class of molecule is approximately proportional to the nominal mass. This is the case for peptides (see introduction) as well as N-glycans, the associated scaling factor for the latter being .sub.G=1+3.510.sup.4 (see
(23) For other classes of molecules, the relationship can be more complex, for example a linear dependence with non-zero axis offset. Multi-modal distributions can also appear, with the result that the distribution of the possible mass defects for a given nominal mass has two or more cluster points. More complex relationships can nevertheless also be described by a mathematical model.
(24) The Kendrick profile is an abstract, mathematical model for the relationship between nominal mass and Kendrick shift. A measured spectrum only provides a list of intensities and the corresponding masses, however. Representing a spectrum in a Kendrick diagram allows an initial, visual, qualitative assessment of the said relationship. For a more detailed analysis, the Kendrick profile on which it is based can be estimated from a measured spectrum with the aid of the methods described below. The distortions present in the spectrum (mass shifts, noise) mean that the estimated Kendrick profile will not agree with the theoretical Kendrick profile for the class of molecules considered. The estimated Kendrick profile is therefore called an empirical Kendrick profile hereinafter.
(25) To estimate an empirical Kendrick profile for a spectrum, the positions of the local maxima of the spectrum can be determined, plotted as a point cloud in the Kendrick diagram, and a distribution function estimated using standard methods of density estimation. This is conditional on there being a sufficiently high signal-to-noise ratio in the measured spectrum, which can be the case with average spectra, but is typically not present with individual spectra.
(26) Alternatively, a two-dimensional histogram can be formed in the plane of the Kendrick diagram from the spectral intensities of a spectrum so that the intensities occurring within each histogram tile are summed (see
(27) The information contained in a Kendrick profile can be used in two ways: On the one hand, for an ensemble of measured spectra, corrections for each individual spectrum can be determined by comparing the associated profiles so that the relative shifts between the spectra are minimized. On the other hand, the dominant component and its characteristic can be determined for an individual spectrum and used in turn to determine a correction function for a mass calibration. Both of these methods are illustrated below.
(28) With unimodal, symmetric distributions, the first circular moment can be used to determine the position of the cluster point of the distribution. This is not possible with asymmetric or multimodal distributions. It is, however, possible to determine the relative shift by comparing two moments for two different Kendrick profiles at the same mass position. But this method becomes numerically unstable and loses accuracy if the respective moments are close to zero in absolute terms, which is often the case with multimodal distributions.
(29) To overcome this, not only the first circular moment is considered, but further, higher moments, too, and the relative shift between two profiles is determined via a weighted averaging. The mass axis is divided up into subintervals I.sub.k and a spectrum S is considered, which is interpolated by a continuous function {tilde over (S)}(m). The nth circular moment for the Kendrick profile in the interval I.sub.k is calculated with
(30)
(31) For an ensemble of N spectra S.sup.j (j=1 . . . N) with a common mass axis, the corresponding moments .sub.k,n.sup.j are calculated in this way, as are the average moments
(32)
which describe the reference profile. The exponent .sub.0 describes the extent to which the absolute values of the individual moments enter into the weighting, typical values are {0, 1, 2}.
(33) For the jth spectrum, the relative mass shift in the subinterval I.sub.k is now calculated with respect to the reference profile with
(34)
where z* is called the complex conjugate of an arbitrary complex number z (here ).
(35) The individual shifts .sub.k.sup.j are assigned to the midpoints of the subintervals I.sub.k and interpolated over the whole mass axis (typically by means of linear interpolation). A shift vector is thus obtained for the transform of the mass axis of the jth spectrum.
(36) For a more accurate normalization, or a correction, of absolute mass shifts, it is helpful to identify the distribution component associated with a given class of molecule in a Kendrick profile. In many cases, these are approximately linear structures which can be found with the aid of the Hough transform known from image processing, see U.S. Pat. No. 3,069,654. Known uses of Hough-type transforms in the field of mass spectrometry are confined to those which locate morphologies or textures in two-dimensional mass spectrometric images; see for example the publication US 2017/0221687 A1. Since the topology of the Kendrick profile corresponds to that of a cylinder because of the identicalness of the upper and lower edge, an appropriately adapted transform is used hereinafter.
(37) The conventional Hough transform operates on images which are defined in a plane, two-dimensional space. The Hough transform H for such an image represents a depiction which indicates, for every point in a likewise two-dimensional parameter space, the intensity with which the straight line parameterized by the respective point can be found in the original image. Variants of the Hough transform are known for other parametrizable geometric figures, such as circles or ellipses.
(38) In the case considered, the empirical Kendrick profile, which is defined not in a two-dimensional space, but in a cylindrical space, replaces the original image. The linear objects sought in this space are parametrized by a gradient r and an offset d, and are described by equations of the type (m)=p (rm+d+). The cylindrical Hough transform for locating linear structures in a Kendrick profile is now defined as follows.
(39) Let an empirical Kendrick profile as described above be represented by a 2D histogram. For the ith histogram tile, m.sub.i and .sub.i are the interpolation points on the mass axis or Kendrick shift axis respectively, and p.sub.i the corresponding density value of the Kendrick profile. The cylindrical Hough transform H is then given by
(40)
(41) Here the gradient r describes the deviation from the scaling factor , and d the constant offset of the mass shift in the range [ . . . +]. Appropriate discretizations are chosen for both quantities; corresponds to the discretization width for d.
(42) If the Kendrick profile is described by means of a point cloud, i.e. as a sequence of Kendrick shifts .sub.i and corresponding masses m.sub.i, the cylindrical Hough transform is calculated in the same way by assuming that the p.sub.i are unity.
(43) Linear structures in the Kendrick profile can now be localized via local maxima in the Hough transform (see
(m)=({circumflex over (r)}m+{circumflex over (d)}+).
(44) The absolute maximum of the Hough transform can thus be assigned to the dominant class of molecules (see
(45)
(46) The invention has been described above with reference to different, specific example embodiments. It is to be understood, however, that various aspects or details of the embodiments described can be modified without deviating from the scope of the invention. In particular, features and measures disclosed in connection with different embodiments can be combined as desired if this appears feasible to a person skilled in the art. Moreover, the above description serves only as an illustration of the invention and not as a limitation of the scope of protection, which is exclusively defined by the appended claims, taking into account any equivalents which may possibly exist.