MASS SPECTROMETRIC METHOD FOR DETERMINING THE PRESENCE OR ABSENCE OF A CHEMICAL ELEMENT IN AN ANALYTE

20200243315 ยท 2020-07-30

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to a mass spectrometric method for determining (predicting) the presence or absence of a chemical element in an analyte which provides valuable information towards reduction of complexity for annotating a chemical formula to the analyte. The method is based on representing a measured isotopic pattern of an analyte as a feature vector and assigning the feature vector to the presence/absence class using a machine learning algorithm, like a support vector machine (SVM) or an artificial neural network (NN).

    Claims

    1. A mass spectrometric method for determining the presence or absence of a chemical element in an analyte, the method comprising: (a) generating analyte ions from the analyte; (b) measuring an isotopic pattern of the analyte ions by mass spectrometry, wherein the isotopic pattern comprises multiple isotopic peaks each characterized by a mass value and an intensity value; (c) representing the isotopic pattern as a feature vector {right arrow over (v)}; and (d) applying the feature vector {right arrow over (v)} to a supervised element classifier that assigns the feature vector {right arrow over (v)} to a first class (chemical element present) or to a second class (chemical element absent), wherein the supervised element classifier is trained on a set of feature vectors {right arrow over (v)}.sub.t which represent isotopic patterns of compounds with known elemental composition and wherein the chemical element is present in a proper subset of the compounds.

    2. The method according to claim 1, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t representing a corresponding isotopic pattern comprises mass values and normalized intensity values of the isotopic peaks of its respective isotopic pattern.

    3. The method according to claim 1, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t representing a corresponding isotopic pattern comprises a mass value of a monoisotopic peak, mass differences between the monoisotopic peak and other isotopic peaks and normalized intensity values of the isotopic peaks of its respective isotopic pattern.

    4. The method according to claim 3, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t further comprises a mass difference between the monoisotopic peak and a nominal mass.

    5. The method according to claim 4, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t is arranged as follows: [m.sub.0, s.sub.0, d(m.sub.0, m.sub.i), s.sub.i, d(m.sub.0, M.sub.0)] with i=1 . . . N, wherein m.sub.0 is the mass value of the monoisotopic peak, s.sub.0 is a normalized intensity value of the monoisotopic peak, d(m.sub.0, m.sub.i) is a mass difference between the monoisotopic peak and the ith isotopic peak, s.sub.i is a normalized intensity value of an ith isotopic peak, and d(m.sub.0, M.sub.0) is a difference between the mass value of the monoisotopic peak and nominal mass M.sub.0.

    6. The method according to claim 2, wherein normalized intensity values s.sub.i of a feature vector are calculated from intensity values s.sub.i of corresponding isotopic peaks by using a p-norm:
    s.sub.i=s.sub.i/s with s=(|s.sub.i|.sup.p).sup.1/p with 1p.

    7. The method according to claim 1, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t representing a corresponding isotopic pattern comprises mass values and transformed intensity values of the isotopic peaks of its respective isotopic pattern.

    8. The method according to claim 7, wherein the intensity values of the isotopic peaks of said corresponding isotopic pattern are transformed by a centered-log ratio (clr) transformation or by an isometric log-ratio (ilr) transformation.

    9. The method according to claim 8, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t is arranged as follows: [m.sub.0, clr.sub.0, d(m.sub.0, m.sub.i), clr.sub.i, d(m.sub.0, M.sub.0)] with i=1 . . . N, wherein m.sub.0 is the mass value of a monoisotopic peak, clr.sub.0 is a clr-transformed intensity value of the monoisotopic peak, d(m.sub.0, m.sub.i) is a mass difference between the monoisotopic peak and an ith isotopic peak, clr.sub.i is a clr-transformed intensity value of the ith isotopic peak, and d(m.sub.0, M.sub.0) is a difference between the mass value of the mono-isotopic peak and a nominal mass and wherein the clr-transformation is defined by: clr.sub.i=log(s.sub.i/.sup.N+1{square root over (s.sub.0.Math.s.sub.1.Math. . . . s.sub.N)}) with s.sub.i=0 . . . N being the intensity values of the isotopic peaks.

    10. The method according to claim 8, wherein each of the feature vector {right arrow over (v)} and the feature vectors of the set {right arrow over (v)}.sub.t is arranged as follows: [m.sub.0, ilr.sub.0, d(m.sub.0, m.sub.i), ilr.sub.i, d(m.sub.0, m.sub.N), d(m.sub.0, M.sub.0)] with i=1 . . . N1, wherein m.sub.0 is the mass value of a monoisotopic peak, ilr.sub.i are the ilr-transformed intensity values of the isotopic peaks, d(m.sub.0, m.sub.i) is a mass difference between the monoisotopic peak and a ith isotopic peak, and d(m.sub.0, M.sub.0) is a difference between the mass value of the monoisotopic peak and a nominal mass and wherein the ilr-transformation is defined by: {right arrow over (ilr)}={right arrow over (clr)}.Math.B with {right arrow over (ilr)}=(ilr.sub.i=0 . . . N1), {right arrow over (clr)}=(clr.sub.i=0 . . . N), and balance matrix B of reduced dimension dim(B)=(N+1)N and B.Math.B.sup.T=I.sub.N

    11. The method according to claim 1, wherein the supervised element classifier is one of a support vector machine (SVM), an artificial neural network (NN) and a random forest (RF, random decision forest) classifier.

    12. The method according to claim 11, wherein the inherent parameters of the supervised element classifier (hyperparameter) are optimized during the training of the supervised element classifier.

    13. The method according to claim 1, wherein the presentation of the isotopic pattern as a feature vector is optimized during the training of the supervised element classifier.

    14. The method according to claim 13, wherein a selection of features or estimation of feature importance is performed during the training of the supervised element classifier.

    15. The method according to claim 1, wherein the chemical element is one of Br, Cl, S, I, F, P, K, Na and Pt.

    16. The method according to claim 15, wherein, in step (d), the first class corresponds to the presence of two or more of the chemical elements and the second class corresponds to the absence of said two or more of the chemical elements and wherein the supervised element classifier is trained on a set of feature vectors {right arrow over (v)}.sub.t which represent isotopic patterns of compounds with known elemental composition and wherein said two or more of the chemical elements are present in a proper subset of the compounds.

    17. The method according to claim 1, wherein the isotopic patterns of compounds used for training the supervised element classifier are theoretically derived.

    18. The method according to claim 1, wherein the isotopic patterns of compounds used for training the supervised element classifier are experimentally measured.

    19. The method according to claim 18, wherein the isotopic patterns of compounds used for the supervised element classifier and the isotopic pattern of the analyte ions are measured on the same mass spectrometric system.

    20. The method according to claim 1, wherein determination of the presence or absence of the chemical element is used for reducing or enhancing the number of chemical elements during annotating a chemical formula to the analyte.

    Description

    DESCRIPTION OF THE DRAWINGS

    [0036] FIG. 1 shows the number of chemical formulas within a mass tolerance of 5 mDa in the m/z-range between 100 and 600 Da for three sets of chemical elements ({C, H, N , O}, {C, H, N, O, P, S, NA, K, Cl}, {C, H, N, O, P, S, Na, K, Cl, Br, F, I}.

    [0037] FIG. 2 shows a flow chart of a method according to the present invention.

    [0038] FIG. 3 shows the number of experimentally measured compounds (positive and negative) for the chemical elements of interest prepared in equal amounts to be used for training and validation. The data set is split 80%/20% for training and validation of the supervised element classifiers.

    [0039] FIG. 4 shows results for a smart-margin RBF-Kernel SVM trained on the experimental data and optimized by particle swarm optimization. The measured intensity values of the isotopic patterns are normalized by p-norm with p=1 (closure). The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix.

    [0040] FIG. 5 shows results for a smart-margin RBF-Kernel SVM trained on the experimental data and optimized by particle swarm optimization. The measured intensity values of the isotopic patterns are transformed by a centered-log ratio (clr) transformation. The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix.

    [0041] FIG. 6 shows results for a smart-margin RBF-Kernel SVM trained on the experimental data and optimized by particle swarm optimization. The measured intensity values of the isotopic patterns are transformed by an isometric log-ratio (ilr) transformation. The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix.

    [0042] FIG. 7 shows a schematic of a dense, feed-forward neural network with biases. Numbers in the neurons depict the index of the neurons and do not represent their values.

    [0043] FIG. 8 shows results for a dense, feed-forward artificial neural network trained on the experimental data and optimized by an evolutionary algorithm. The measured intensity values of the isotopic patterns are normalized by p-norm with p=1 (closure). The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix.

    [0044] FIG. 9 shows results for a dense, feed-forward artificial neural network trained on the experimental data and optimized by an evolutionary algorithm. The measured intensity values of the isotopic patterns are transformed by a centered-log ratio (clr) transformation. The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix.

    [0045] FIG. 10 shows results for a dense, feed-forward artificial neural network trained on the experimental data and optimized by an evolutionary algorithm. The measured intensity values of the isotopic patterns are transformed by an isometric log-ratio (ilr) transformation. The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix.

    DETAILED DESCRIPTION OF THE INVENTION

    [0046] While the invention has been shown and described with reference to a number of different embodiments thereof, it will be recognized by those skilled in the art that various changes in form and detail may be made herein without departing from the scope of the invention as defined by the appended claims.

    [0047] Elemental composition is at the core of the combinatorial problem for generating possible chemical formulas for a given m/z-value. It is a goal of the present invention to predict chemical elements that are contained in an analyte from the measured isotopic pattern of the analyte and thus to determine the elemental composition of the analyte used for a subsequent generation of possible chemical formulas. Providing or excluding certain chemical elements reduces the amount of possible chemical formulas to be calculated and compared. According to the present invention, machine learning by using supervised classifier provides a way to solve this problem.

    [0048] In addition to reducing complexity of the annotating procedure, the method according to the present invention allows to specifically select and examine only certain isotopic pattern and thus compounds of interest, based on the presence of specific chemical elements.

    Definitions

    [0049] The term mass value is used here interchangeably for the mass-to-charge ratio (m/z value) of a molecular ion as well as for the molecular mass of the corresponding compound. The mass-to-charge ratio of a molecular ion can be converted to the molecular mass of the corresponding compound, e.g. by charge deconvolution.

    [0050] The nominal mass for a chemical element is the mass number of its most abundant naturally occurring stable isotope. For a molecular ion or molecule, the nominal mass is the sum of the nominal masses of the constituent atoms. For example, carbon has two stable isotopes 12C at 98.9% natural abundance and 13C at 1.1% natural abundance, thus the nominal mass of carbon is 12.

    [0051] The mass of the monoisotopic peak is the sum of the masses of the atoms in a molecule using the mass of the principal (most abundant) isotope for each chemical element. The difference between the nominal mass and the monoisotopic mass is termed mass defect.

    [0052] A confusion matrix is a table that allows visualization of the performance of a classifier, typically a supervised classifier. Each row of the confusion matrix represents the instances in a predicted class while each column represents the instances in an actual class:

    TABLE-US-00001 Positive Negative Condition Condition Positive TP (True FP (False Positive Prediction Positive) Positive) predictive value [00001] PPV = .Math. .Math. TP .Math. .Math. TP + FP Negative FN (False TN (True Negative Prediction Negative) Negative) predictive value [00002] NPV = .Math. .Math. TN .Math. .Math. TN + FN [00003] Sensitivity = .Math. .Math. TP .Math. .Math. TP + FN [00004] Specificity = .Math. .Math. TN .Math. .Math. TN + FP

    Support-Vector Machine (SVM):

    [0053] A support-vector machine (SVM) is a supervised machine learning method which can be used for classification. During training, a SVM constructs a hyperplane in the highly dimensional data space which separates labeled training data points with respect to their class labels. The parameters of the hyperplane are optimized such that the distance to the nearest training-data points of any class (so-called margin) is maximized. An important consequence of this geometric description is that the max-margin hyperplane is completely determined by those data points that lie nearest to it. These data points are called support vectors. An assignment of unlabeled data points to be classified after training is made by determining on which side the unlabeled data points are located. Once trained properly, unlabeled data points can be assigned to a class at fast speed and low computational effort.

    [0054] The SVM can be extended to cases in which the data are not linearly separable, for example by introducing a so called soft-margin. The soft-margin allows that training data points are not accurately separated by the margin. An internal untrained parameter (hyperparameter) of the SVM determines a trade-off between increasing the margin and ensuring that all training data points lie on the correct side of the margin.

    [0055] The SVM can further be generalized by applying a so-called kernel trick by which the data points of the input space are transformed into a transformed feature space. The transformation allows fitting a maximum-margin hyperplane in the transformed feature space. The transformation can be nonlinear and the transformed feature space higher-dimensional than the input space. Although the classifier is based on a separating hyperplane in the transformed feature space, it may be nonlinear in the original input space. The nonlinear kernel function can further comprise additional hyperparameters (untrained, predetermined parameters). Functions of common kernels include for example polynomials (homogeneous or inhomogeneous), radial-basis functions (RBF) and hyperbolic tangent functions.

    Artificial Neural Network (ANN)

    [0056] An artificial neural network (ANN) is a system inspired by biological neural networks. An ANN is generally based on a collection of connected nodes (artificial neurons). Each connection (edge) between artificial neurons, like the synapses in a biological neural network, can transmit a signal from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. The output of each artificial neuron is computed by some non-linear function (activation function) of the sum of its inputs. Artificial neurons may have a threshold such that the signal is sent only if the sum of the inputs is above that threshold.

    [0057] Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing multiple hidden layers.

    [0058] The connections between artificial neurons typically have weights that are adjusted during training. The weight increases or decreases the strength of the signal at a connection. Numerous algorithms are available for training neural network models. Most of them can be viewed as an optimization employing some form of gradient descent and using backpropagation to compute the actual gradients.

    [0059] An artificial neural network generally comprises multiple hyperparameters, in particular more hyperparameters than a SVM. Hyperparameters of an artificial neural network can be related to the structure of the network itself, e.g. the number of the hidden layers, the number of the nodes, biases of nodes or layers, as well as to parameters of the activation function of the nodes and a regularizing parameter which penalizes the decision boundary in case of overfitting.

    Example 1

    [0060] Here, the supervised element classifier is a support vector machine (SVM) using a soft margin and an RBF kernel. The hyperparameters are related to the soft margin and the RBF kernel, and are optimized during the training by particle swarm optimization. The isotopic patterns used for training and validating the SVM are experimentally measured.

    [0061] The experimental data are obtained from measurements on an OTOF mass spectrometer with an electrospray source which is coupled to a liquid chromatograph. The compounds with known elemental composition belong to different compound classes: coffee metabolomics, synthetic molecules, pesticides and toxic substances.

    [0062] The element determination is applied only to compounds with a molecular mass below 600 Da. The training data set is balanced with equal amounts of compounds containing an element (positive) and not containing an element (negative). The chemical elements of interest are: Br, Cl, S, I, F, P, K and Na. The elements C, H, N and O are almost always present and therefore are not part of the classification. The choice for elements of interest is based on their occurrence in the experimental data and a vast majority of biomolecules. FIG. 3 shows the number of compounds (positive and negative) for the chemical elements of interest to be used for training and validation of the SVM. The data set is split for training (80%) and validation (20%). The number of compounds used for validation are:

    TABLE-US-00002 Na K P S F Cl Br I 1204 384 68 1110 338 900 284 48

    [0063] The isotopic patterns are represented in three different ways by using a p-normalization with p=1 (closure), a centered-log ratio transformation (clr) and an isometric log-ratio transformation (ilr). For closure and clr-representation, the feature vectors are arranged as follows: [m.sub.0, lnt.sub.0, m.sub.im.sub.0, lnt.sub.i, mDef] with i=1 . . . 9, wherein m.sub.0 and m.sub.i are the mass values isotopic peaks, mDef is the mass defect and lnt.sub.0 and lnt.sub.i are the normalized or transformed intensity values calculated from the measured intensity values s.sub.i. For ilr-representation, the feature vector does not comprise a lnt.sub.9 component. The length of the feature vectors is 21 (closure and clr) and 20 (ilr). The hyperparameters of the SVM are separately optimized for each representation.

    [0064] FIGS. 4 to 6 show results for the smart-margin RBF-Kernel SVM trained on the experimental data and optimized by particle swarm optimization. The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix. In FIG. 4, the measured intensity values of the isotopic patterns are normalized by p-norm with p=1 (closure). In FIG. 5, the measured intensity values of the isotopic patterns are transformed by a centered-log ratio transformation (clr). In FIG. 6, the measured intensity values of the isotopic patterns are transformed by an isometric log-ratio transformation (ilr).

    Example 2

    [0065] Here, the supervised element classifier is a dense, feed-forward, artificial neural network ANN with biases, as shown in FIG. 7. In a dense network, each layer is fully connected to the following layer. The activation function of the ANN is a rectified linear unit:

    [00005] ReLU ( x ) = { x if .Math. .Math. x > 0 0 if .Math. .Math. x 0 .

    The predictions for the validation data set are made by a feed-forward pass through the ANN.

    [0066] The isotopic patterns used for training and validating the ANN are experimentally measured. The experimental data and the representation of the isotopic pattern are the same as in Example 1.

    [0067] During training, the feature vectors are submitted to the ANN in batches. A batch is a subset of all feature vectors used for training the ANN. Once a batch has been passed through the ANN, a back-propagation takes place. It propagates the error of the current prediction back through the ANN in order to update the weights by adjusting their values in small steps towards the best gradient. The weights are adjusted for a given set of hyperparameters.

    [0068] The hyperparameters of the ANN are a regularizing parameter, the number of hidden layers and the number of artificial neurons in the hidden layers. An evolutionary algorithm is used to optimize the hyperparameters of the ANN.

    [0069] FIGS. 8 to 10 show results for the ANN. The results comprise accuracy of correct classification, sensitivity, specificity and the complete confusion matrix. In FIG. 8, the measured intensity values of the isotopic patterns are normalized by p-norm with p=1 (closure). In FIG. 9, the measured intensity values of the isotopic patterns are transformed by a centered-log ratio transformation (clr). In FIG. 10, the measured intensity values of the isotopic patterns are transformed by an isometric log-ratio transformation (ilr).

    [0070] The results of both examples show that the machine learning algorithms used achieve good prediction results for element prediction from mass spectrometric signals. The SVM works better than the ANN. The prediction for polyisotopic chemical elements is generally more accurate than the prediction for single isotopic chemical elements.

    [0071] Considering the use case of reducing chemical elements during the annotation of a chemical formula to a measured analyte, it is possible to remove elements from consideration if so predicted. However, removal of an element from consideration that is present in the underlying analyte during the annotation needs to be prevented. Otherwise a correct match cannot be found. For this use case the negative predictive value (NPV) of a classifier is important. It refers to the percentage of correct negative prediction under negative condition.

    [0072] The SVM classifier shows a NPV of 89-100% for the polyisotopic chemical elements. The NPV for the ANNs is generally worse.

    [0073] For the reversed use case of suggesting elements during the annotation of chemical formulas to a measured analyte, the positive predictive value (PPV) is of importance. PPV refers to percentage of correct positive prediction under positive condition. However, suggesting a chemical element that is not part of the underlying analyte results in addition of false positive chemical formulas and increases the overall complexity. Therefore a classifier for this use case needs to have a high PPV value.

    [0074] The SVM classifier shows a PPV89% for the polyisotopic chemical elements. The PPV for the ANNs is generally worse.

    [0075] The invention has been shown and described above with reference to a number of different embodiments thereof. It will be understood, however, by a person skilled in the art that various aspects or details of the invention may be changed, or various aspects or details of different embodiments may be arbitrarily combined, if practicable, without departing from the scope of the invention. Generally, the foregoing description is for the purpose of illustration only, and not for the purpose of limiting the invention which is defined solely by the appended claims, including any equivalent implementations, as the case may be.