Bioinformatics platform for high-throughput identification and quantification of O-glycopeptide

Abstract

The present invention relates to a bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide using high resolution mass spectrum. Particularly, according to the bioinformation processing analysis method of the present invention, the quantitative changes of O-linked glycopeptide containing non-informed sugar chains included in various samples can be efficiently and accurately analyzed; the prediction or diagnosis of disease including cancer can be made easy by using a high resolution mass spectrometer; or the investigation of O-linked glycopeptide structure of a therapeutic glycoprotein can be efficiently achieved.

Claims

1. A bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides comprising the following steps: 1) obtaining mass spectrum by a high resolution mass spectrometer for a glycopeptide obtained by enzyme hydrolysis of a glycoprotein in a sample; 2) converting the mass spectrum results obtained in step 1) into MS and tandem (MS/MS) spectra; 3) calculating an M-score of each tandem spectrum by using oxonium ion peaks composed of m/z of an HCD-MS/MS individual spectrum peak converted in step 2) of 126.05, 129.06, 138.06, 144.06, 145.05, 147.07, 163.06, 168.07, 186.08, 204.08, 274.09, 292.10, 350.15, 366.14, 454.16, 528.19, and 657.24, wherein the M-score is calculated by mathematical formula 1 below: $[Mathematical Formula 1]$ $M_{score} = \frac{n}{N} \times \frac{\sqrt{{.Math.}_{i = 1}^{n} O_{i}}}{(n - 1)}, where O_{i} = \frac{I_{mi} \times I_{\max (\leq 700 Da)}^{- 1} \times C}{.Math. MassError .Math. + 1.0}$ wherein, N is the number of confirmable oxonium ion peaks, n is the number of confirmed oxonium ion peaks, I.sub.mi is the matched i.sup.th peak intensity, I.sub.max is the intensity of the base peak in the spectrum, and C is a constant value; 4) determining a value for separating a glycopeptide and a polypeptide using Gaussian fitting method in the M-score distribution calculated in step 3), and selecting a glycopeptide spectrum; 5) selecting an O-linked glycopeptide spectrum by using O-linked and N-linked sorting factors (O/N sorting factors) from the glycopeptide spectrum selected in step 4); 6) obtaining an isotope distribution in MS of the O-linked glycopeptide spectrum selected in step 5) and then determining an O-linked glycopeptide existing in a database using an S-score calculated by comparing with the database, wherein the S-score is calculated by mathematical formula 2 below: $[Mathematical Formula 2]$ $S_{score} = (\frac{1.0}{(1.0 + {.Math.}_{i = 1}^{n} {(X 1 - Y 1)}_{i}^{2})} * C 1) + (\frac{n (.Math. X 2 Y 2) - (.Math. X 2) (.Math. Y 2)}{\sqrt{(n .Math. X 2^{2} - {(.Math. X 2)}^{2}) * (n .Math. Y 2^{2} - {(.Math. Y 2)}^{2})}} * C 2)$ wherein, X1 is the mass of the n.sup.th peak among the theoretical isotope peaks, Y1 is the mass of the n.sup.th peak among the experimental isotope peaks, X2 is the relative intensity of the n.sup.th peak among the theoretical isotope peaks, Y2 is the relative intensity of the n.sup.th peak among the experimental isotope peaks, C1 and C2 are constant values; 7) evaluating the O-linked glycopeptide existing in the database determined in step 6) by using a Y-score of the tandem spectrum, wherein the Y-score is calculated by mathematical formula 3 below: $[Mathematical Formula 3]$ $Y_{score} = {HCD}_{match} \times C 1 + {CID}_{match} C 2$ ${HCD}_{match} = {.Math.}_{i = 1}^{n} \frac{I_{mi}}{I_{\max}} / {.Math.}_{i = 1}^{n} \frac{I_{s i}}{I_{\max}} \times 100.0$ ${CID}_{match} = {.Math.}_{i = 1}^{n} \frac{I_{mi}}{I_{\max}} / {.Math.}_{i = 1}^{n} \frac{I_{s i}}{I_{\max}} \times 100.0$ wherein, I.sub.max is the intensity of the base peak in the spectrum, I.sub.mi is the matched i.sup.th peak intensity, I.sub.si is the i.sup.th peak intensity, and C1 and C2 are constant values; 8) determining an O-linked glycosylation site by calculating a P-score for the O-linked glycopeptide existing in the database evaluated in step 7) and then performing quantitative analysis of the O-linked glycopeptide included in the database evaluated above, wherein the P-score is calculated by mathematical formula 4 below: $\begin{matrix} P_{score} = \frac{n}{N} \times {.Math.}_{i = 1}^{n} \frac{I_{mi}}{I_{\max}} & [Mathematical Formula 4] \end{matrix}$ wherein, N is the number of peptide fragments (c, z ion) of the confirmable glycopeptide, n is the number of peptide fragments of the confirmed glycopeptide, I.sub.max is the intensity of the base peak in the spectrum, and I.sub.mi is the matched i.sup.th peak intensity; 9) selecting the O-linked glycopeptide that does not exist in the database by using similarity calculated by mathematical formula 6 below with the spectrum of the O-linked glycopeptide (root/seed) existing in the database quantitatively analyzed in step 8) and a spectrum of a glycopeptide that does not exist in the database; $\begin{matrix} SS = \frac{{.Math.}_{i = 1}^{n} S_{i} \times S_{i}^{'}}{\sqrt{{.Math.}_{i = 1}^{n} S_{i}^{2} \times {.Math.}_{i = 1}^{n} S_{i}^{' 2}}} & [Mathematical Formula 6] \end{matrix}$ wherein, SS is similarity of two different tandem mass spectrometry spectrum peaks and mass similarity, Si is an (x, y) matrix, x is the relative intensity of the n.sup.th peak, y is the mass of the n.sup.th peak, and S′i is an (x′, y′) matrix, x′ is the relative intensity of the n.sup.th peak n, y′ is the mass of the n.sup.th peak); 10) evaluating the O-linked glycopeptide that does not exist in the database by using a Y-score of the tandem spectrum obtained from the O-linked glycopeptide confirmed not to exist in the database selected in step 9); and 11) determining an O-linked glycosylation site by calculating a P-score for the O-linked glycopeptide that does not exist in the database evaluated in step 10) and then performing quantitative analysis of the O-linked glycopeptide that does not exist in the database evaluated above.

2. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the tandem spectrum of step 3) is a CID or HCD-MS/MS spectrum.

3. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1.

4. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 3, wherein when the S-score is calculated, similarity is measured using Pearson correlation analysis with Euclidean distance and intensity distribution using mass distribution of isotopes.

5. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the database is constructed by using a theoretical isotope distribution of the glycopeptide from the glycoprotein in step 6) and then the database is used for the identification and quantification of O-linked glycopeptide.

6. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the tandem spectrum of step 7) or step 10) is a CID or HCD-MS/MS spectrum.

7. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the tandem spectrum of step 8) or step 11) is a CID or HCD-MS/MS spectrum.

8. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the hydrolysis is performed with an enzyme selected from the group consisting of trypsin, Arg-C, Asp-N, Glu-C, Lys-C, chymotrypsin, and proteinase K.

9. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the mass spectrometer has a mass resolution of more than 10,000 and a mass accuracy of less than 50 ppm.

10. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the mass spectrometer is Orbitrap Fusion Lumos, Orbitrap Elite, or Q Exactive mass spectrometer.

11. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the quantitative analysis in step 8) or step 11) is performed by using the S-score of MS spectra.

12. The bioinfomatic analysis method for the identification and quantification of O-linked glycopeptides according to claim 1, wherein the quantitative analysis in step 8) or step 11) is performed by summing intensities of three peaks from the theoretical maximum intensity among the isotope peaks showing the intensity of the selected glycopeptide in MS spectra.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The application of the preferred embodiments of the present invention is best understood with reference to the accompanying drawings, wherein:

(2) FIG. 1 is a flowchart illustrating the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide according to the present invention.

(3) FIG. 2 is a diagram illustrating the M-score distribution of a general peptide and a glycopeptide obtained in step 3) of the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(4) FIG. 3 is a diagram illustrating the classification of O-linked and N-linked peptides from the glycopeptides selected in step 4) by using O-linked and N-linked sorting factors in step 5) of the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(5) FIG. 4 is a diagram illustrating the Y-score distribution in step 7) of the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(6) FIG. 5 is a diagram illustrating the O-family search method in step 10) of the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(7) FIGS. 6a and 6b are diagrams illustrating that the spectrum similarity in the O-family search method in step 10) of the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein is 0.97.

(8) FIG. 7 is a diagram illustrating the Y-score distribution in step 11) of the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(9) FIGS. 8a and 8b are diagrams illustrating the representative EThcD spectrum of T(HexNAc-Hex)PLPPT(HexNAc-Hex) SAHGNVAEGETK PDPDVTER, the O-linked glycopeptide, selected by the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(10) FIGS. 9a and 9b are diagrams illustrating the representative EThcD spectrum of T(2HexNAc-2Hex)PLPPTSAHGNVAEGETKPDPDVTER, the O-linked glycopeptide, selected by the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein.

(11) FIG. 10a is a diagram illustrating the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention repeated three times by using human serum samples in a preferred embodiment of the invention.

(12) FIG. 10b is a graph illustrating the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention repeated three times by using human serum samples in a preferred embodiment of the invention.

(13) FIG. 11a is a Venn diagram illustrating the results of qualitative analysis of glycoprotein according to the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using human serum samples in a preferred embodiment of the invention.

(14) FIG. 11b is a Venn diagram illustrating the results of qualitative analysis of O-linked glycopeptide according to the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using human serum samples in a preferred embodiment of the invention.

(15) FIG. 12 is a diagram illustrating the heat map showing the results of quantitative analysis of O-linked glycopeptide according to the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the present invention using human serum samples in a preferred embodiment of the invention.

(16) FIG. 13a is a diagram illustrating the representative HCD (high energy collision dissociation) spectrum of TPLPPTSAHGNVAEGETKPDPDVTER(HexNAc-Hex-Neu5Ac) (SEQ. ID. NO: 2), the O-linked glycopeptide selected based on the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention using human serum samples in a preferred embodiment of the invention.

(17) FIG. 13b is a diagram illustrating the representative CID (collision-induced dissociation) spectrum of TPLPPTSAHGNVAEGETKPDPDVTER (HexNAc-Hex-Neu5Ac) (SEQ. ID. NO: 2), the O-linked glycopeptide selected based on the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention using human serum samples in a preferred embodiment of the invention.

(18) FIG. 13c is a diagram illustrating the representative ETD (electron transfer dissociation) spectrum of TPLPPTSAHGNVAEGETKPDPDVTER (HexNAc-Hex-Neu5Ac) (SEQ. ID. NO: 2), the O-linked glycopeptide selected based on the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention using human serum samples in a preferred embodiment of the invention.

(19) FIG. 13d is a diagram illustrating the representative HCD spectrum of T(HexNAc-Hex-Neu5Ac)PLPPTSAHGNVAEGETKPDPDVTER, the O-linked glycopeptide selected based on the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein in a preferred embodiment of the invention.

(20) FIG. 13e is a diagram illustrating the representative CID spectrum of T(HexNAc-Hex-Neu5Ac)PLPPTSAHGNVAEGETKPDPDVTER, the O-linked glycopeptide selected based on the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein in a preferred embodiment of the invention.

(21) FIG. 13f is a diagram illustrating the representative EThcD (electron-transfer/higher-energy collision dissociation) of T(HexNAc-Hex-Neu5Ac)PLPPTSAHGNVAEGETKPDPDVTER, the O-linked glycopeptide selected based on the results of the bioinformation processing analysis for the identification and quantification of O-linked glycopeptide of the present invention using a standard glycoprotein in a preferred embodiment of the invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

(22) Hereinafter, the present invention is described in detail.

(23) The present invention provides a bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide comprising the following steps: 1) obtaining mass spectrum by analyzing the polypeptide obtained by hydrolyzing the glycoprotein in a sample with a high resolution mass spectrometer; 2) converting the mass spectrum results obtained in step 1) into MS and tandem spectrum (MS/MS); 3) calculating M-score of each tandem spectrum by using oxonium ion peaks composed of m/z of the HCD-MS/MS individual spectrum peak converted in step 2) of 126.05, 129.06, 138.06, 144.06, 145.05, 147.07, 163.06, 168.07, 186.08, 204.08, 274.09, 292.10, 350.15, 366.14, 454.16, 528.19, and 657.24; 4) determining a value for separating the glycopeptide and the peptide using Gaussian fitting method in the M-score distribution calculated in step 3), and selecting the glycopeptide spectrum; 5) selecting O-linked glycopeptide spectrum by using O-linked and N-linked sorting factors (0/N sorting factors) from the glycopeptide spectrum selected in step 4); 6) obtaining the isotope distribution in MS of the O-linked glycopeptide spectrum selected in step 5) and then determining the O-linked glycopeptide existing in the database using S-score calculated by comparing with the database; 7) evaluating the O-linked glycopeptide existing in the database determined in step 6) by using Y-score of the tandem spectrum; 8) determining O-linked sugar site by calculating P-score for the O-linked glycopeptide existing in the database evaluated in step 7) and then performing quantitative analysis of the O-linked glycopeptide included in the database evaluated above; 9) selecting the O-linked glycopeptide that does not exist in the database by using O-family search method based on the O-linked glycopeptide existing in the database quantitatively analyzed (root/seed) in step 8); 10) evaluating the O-linked glycopeptide that does not exist in the database by using Y-score of the tandem spectrum obtained from the O-linked glycopeptide confirmed not to exist in the database selected in step 9); and 11) determining O-linked sugar site by calculating P-score for the O-linked glycopeptide that does not exist in the database evaluated in step 10) and then performing quantitative analysis of the O-linked glycopeptide that does not exist in the database evaluated above.

(24) The term “hydrolysis” used herein refers the process of separating only sugars from glycoproteins. The hydrolysis herein can be performed by any method which is well known to those in the art. In particular, the hydrolysis can be performed with a hydrolase which is precisely selected from the group consisting of trypsin, arginine-C (Arg-C), aspartic acid-N (Asp-N), glutamic acid-C (Glu-C), lysine-C (Lys-C), chymotrypsin, and proteinase K.

(25) The term “tandem spectrum (MS/MS)” used herein refers the spectrum of the target ions of interest or the ions having a relatively high sensitivity selected among total mass spectrum (MS). The tandem mass analysis can be performed by analyzing the mass of the tandem spectrum. Based on the O-linked glycopeptides present in the database determined by the tandem spectrum (root/seed), such O-linked glycopeptides that do not exist in the database can be screened by performing O-family search method. The tandem spectrum above can be CID or HCD-MS/MS spectrum.

(26) According to the analysis method of the invention, a mass spectrometer can be used to perform qualitative and quantitative analysis of O-linked glycopeptides which are more complicated than general peptides and have high diversity and are present in a low concentration in a sample. From the results obtained by using the mass spectrometer, O-linked glycopeptide can be identified using M-score, S-score, Y-score, and P-score and quantitative analysis of the identified glycopeptide can be performed. The mass spectrometer may have a mass resolution of more than 10,000 and a mass accuracy of less than 50 ppm. Particularly, the mass spectrometer herein can be Orbitrap Fusion Lumos, Orbitrap Elite, or Q Exactive.

(27) The M-score above is used to classify the general peptide spectrum and the glycopeptide spectrum efficiently. The M-score can be calculated by mathematical formula 1 below:

(28) $[Mathematical Formula 1]$ $M_{score} = \frac{n}{N} \times \frac{\sqrt{{.Math.}_{i = 1}^{n} O_{i}}}{(n - 1)}, where O_{i} = \frac{I_{mi} \times I_{\max (\leq 700 Da)}^{- 1} \times C}{.Math. MassError .Math. + 1.0}$

(29) (N is the number of confirmable sugar peaks,

(30) n is the number of confirmed sugar peaks,

(31) I.sub.mi is the matched i.sup.th peak intensity,

(32) I.sub.max is the intensity of the base peak in the spectrum, and

(33) C is a constant value).

(34) M-score distribution map can be prepared from the HCD-MS/MS spectra of the glycoprotein standard sample (hemopexin) according to the method above (FIG. 2). The distribution map can be used to classify general peptides and glycopeptides by automatically applying Gaussian fitting.

(35) In step 5) of the analysis method of the present invention, the O-linked glycopeptide spectrum showing O/N sorting factor of up to 4.0 can be determined from the glycopeptides spectrum selected by using the M-score distribution map prepared in step 4) (FIG. 3). At this time, the O/N sorting factor can be calculated by mathematical formula 5 below:
O/NSorting Factor=O.sub.i(138)+O.sub.i(168)/O.sub.i(126)+I.sub.i(144) [Mathematical Formula 5]

(36) In the analysis method of the present invention, S-score is used to identify glycopeotides by comparing the theoretical database with the isotope distribution obtained by MS of the glycopeptides spectrum in step 6). This value can be calculated by mathematical formula 2 below:

(37) $[Mathematical Formula 2]$ $S_{score} = (\frac{1.0}{(1.0 + {.Math.}_{i = 1}^{n} {(X 1 - Y 1)}_{i}^{2})} * C 1) + (\frac{n (.Math. X 2 Y 2) - (.Math. X 2) (.Math. Y 2)}{\sqrt{(n .Math. X 2^{2} - {(.Math. X 2)}^{2}) * (n .Math. Y 2^{2} - {(.Math. Y 2)}^{2})}} * C 2)$

(38) (X1 is the mass of the n.sup.th peak among the theoretical isotope peaks,

(39) Y1 is the mass of the n.sup.th peak among the experimental isotope peaks,

(40) X2 is the relative intensity of the n.sup.th peak among the theoretical isotope peaks,

(41) Y2 is the relative intensity of the n.sup.th peak among the experimental isotope peaks,

(42) C1 and C2 are constant values).

(43) When the S-score is calculated, the similarity can be measured using Pearson correlation analysis with the Euclidean distance and the intensity distribution using the mass distribution of isotopes. In step 6), a database was constructed by using the theoretical isotope distribution of the glycopeptide obtained from glycoproteins. Then, the database can be used for the identification and quantitative analysis of O-linked glycopeptide. The term “isotope” used herein refers a chemical element that has the same atomic number but has the different atomic mass.

(44) In the analysis method of the present invention, the degree of glycopeptide fragmentation, which can theoretically be expressed using O-family search method, of the O-linked glycopeptide candidate or the O-linked glycopeptides (FIG. 4) evaluated using Y-score in steps 7) and 10) can be calculated and evaluated by the Y-score in tandem spectrum (CID or HCD) (FIGS. 8 and 9). At this time, the Y-score can be calculated by mathematical formula 3 below, which can be expressed by the sum of HCD.sub.match and CID.sub.match:

(45) $[Mathematical Formula 3]$ $Y_{score} = {HCD}_{match} \times C 1 + {CID}_{match} C 2$ ${HCD}_{match} = {.Math.}_{i = 1}^{n} \frac{I_{mi}}{I_{\max}} / {.Math.}_{i = 1}^{n} \frac{I_{s i}}{I_{\max}} \times 100.0$ ${CID}_{match} = {.Math.}_{i = 1}^{n} \frac{I_{mi}}{I_{\max}} / {.Math.}_{i = 1}^{n} \frac{I_{s i}}{I_{\max}} \times 100.0$

(46) (I.sub.max is the intensity of the base peak in the spectrum,

(47) I.sub.mi is the matched i.sup.th peak intensity,

(48) I.sub.si is the i.sup.th peak intensity, and

(49) C1 and C2 are constant values).

(50) In the analysis method of the present invention, the P-score in step 8) or step 11) is used to confirm and evaluate the degree of glycopeptide fragmentation, which can theoretically be expressed, in tandem spectrum (ETD/EThcD). The P-score above can be used to specify the glycosylated site position, which can be calculated by mathematical formula 4 below:

(51) $\begin{matrix} P_{score} = \frac{n}{N} \times {.Math.}_{i = 1}^{n} \frac{I_{mi}}{I_{\max}} & [Mathematical Formula 4] \end{matrix}$

(52) (N is the number of peptide fragments (c, z ion) of the confirmable glycopeptide,

(53) n is the number of peptide fragments of the confirmed glycopeptide,

(54) I.sub.max is the intensity of the base peak in the spectrum, and

(55) I.sub.mi is the matched i.sup.th peak intensity).

(56) In the method above, the quantitative analysis in step 8) or step 11) can be performed by using S-score in MS spectra. This can be expressed by summing the intensities of the three peaks from the theoretical maximum intensity among the isotope peaks showing the intensity of the selected glycopeptide in MS spectra.

(57) In the analysis method of the present invention, O-linked glycopeptides that are not present in the database can be selected by using O-family search method based on the O-linked glycopeptides (root/seed) present in the database determined using the tandem spectrum of step 9) (FIG. 5). At this time, the selection can be calculated by mathematical formula 6 below. The spectral similarity (SS) can be used to find similar spectra and the value may be greater than or equal to 0.9 (FIG. 6):

(58) $\begin{matrix} SS = \frac{{.Math.}_{i = 1}^{n} S_{i} \times S_{i}^{'}}{\sqrt{{.Math.}_{i = 1}^{n} S_{i}^{2} \times {.Math.}_{i = 1}^{n} S_{i}^{' 2}}} & [Mathematical Formula 6] \end{matrix}$

(59) (SS: two different tandem mass spectrometry spectrum peaks and mass similarity,

(60) Si: (x, y) matrix, x is the relative intensity of the n.sup.th peak, y is the mass of the n.sup.th peak, and

(61) S′i: (x′, y′) matrix, x′ is the relative intensity of the n.sup.th peak n, y′ is the mass of the n.sup.th peak).

(62) The present inventors present the results of the analysis up to step 8) performed by using the HCD, CID, and EThcD (electron-transfer/higher-energy collision dissociation) spectrum obtained from glycopeptide of the standard glycoprotein sample (hemopexin) using Orbitrap, the high resolution mass spectrometer, before O-family searching in Table 1 below (Table 1).

(63) TABLE-US-00001 TABLE 1 MS/MS MS HCD-MS/ O/N S. O-Glycopeptides Charge m/z RT scan M scan Factor TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc) 3 973.473 38.8 8993 8994 1.6 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc) 4 730.357 35.7 7487 7488 2.0 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Neu5Ac) 3 1070.505 42.4 10672 10674 2.1 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAcNeu5Ac) 4 803.131 42.5 10737 10756 1.4 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 3 1027.490 39.5 9299 9303 2.2 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 4 770.870 39.1 9118 9128 2.1 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 3 1124.521 43.3 11137 11138 2.3 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 4 843.644 43.1 11038 11039 2.0 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-2Neu5Ac) 3 1221.553 42.9 10935 10955 1.7 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-2Neu5Ac) 4 916.419 47.9 13119 13133 2.0 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex) 4 821.643 33.6 6484 6486 2.3 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 3 1149.201 34.1 6703 6704 2.2 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 4 862.149 36.8 8011 8027 2.3 M- S- Y- MS O-Glycopeptides score score score RT Three TIQ TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc) 1.6 99.8 79.8 38.5 231629754 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc) 2.0 100.0 75.8 38.4 114832679 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Neu5Ac) 2.1 98.0 73.8 42.6 3196957 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAcNeu5Ac) 1.4 99.6 74.3 42.5 3821353 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 2.2 98.6 85.2 35.3 192574557 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 2.1 98.7 75.4 35.3 2437221120 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 2.3 98.7 84.1 38.5 2198194752 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 2.0 99.8 76.2 38.5 15466424704 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-2Neu5Ac) 1.7 99.8 82.6 42.6 118725937 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-2Neu5Ac) 2.0 99.8 75.8 42.6 1600530448 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex) 2.3 99.7 74.4 33.8 5601061 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 2.2 99.9 84.9 33.8 27053351 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 2.3 98.6 76.4 33.8 431081292

(64) Herein, the quantitative analysis of O-linked glycopeptides existing in the database qualitatively analyzed using the Y-score distribution map of hemopexin, the standard glycoprotein sample, was performed. At this time, the quantitative analysis was accomplished by summing the three-point TIQ values of each glycopeptide ion chromatogram. The data point to calculate TIQ was the sum of the peak intensities of three isotopes based on the strongest peak.

(65) In the method of the present invention, O-linked glycopeptides that are not present in the database were selected by using O-family search method of step 9) (FIGS. 5 and 6). The O-linked glycopeptides that are not present in the database selected above proceeded to the identification and quantification by using Y-score distribution by the same manner as described above. The results are shown in Table 2 (Table 2).

(66) TABLE-US-00002 TABLE 2 MS/MS MS HCD-MS/ O/N S. O-Glycopeptides Charge m/z RT scan MS scan Factor TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc) 3 973.472 39.4 9255 9256 0.6 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 3 1027.491 38.8 9007 9009 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 4 770.870 43.6 11262 11264 0.6 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 3 1124.522 36.8 8011 8028 1.6 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 4 843.643 37.4 8295 8316 0.8 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 3 1149.202 34.7 6983 6998 1.8 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 4 862.153 37.0 8127 8129 0.5 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Fuc-Neu5Ac) 4 1007.948 43.2 11083 11096 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Neu5Ac) 3 1343.264 41.9 10457 10469 0.9 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Neu5Ac) 4 1007.704 41.5 10272 10292 0.6 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Neu5Ac) 5 806.362 40.5 9764 9778 0.6 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-3Neu5Ac) 4 1080.476 45.4 12123 12124 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-3Neu5Ac) 5 864.582 43.5 11212 11229 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-4Neu5Ac) 4 1153.250 48.9 13435 13437 1.6 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-4Neu5Ac) 5 922.801 47.5 12983 12987 0.9 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-Neu5Ac) 3 1246.232 38.0 8616 8619 1.7 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-Neu5Ac) 4 934.927 37.9 8589 8590 1.1 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-Neu5Ac) 5 748.144 37.3 8250 8252 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-2Neu5Ac) 4 967.186 44.3 11591 11599 0.6 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Fuc) 4 858.146 41.5 10248 10255 0.8 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Fuc-Neu5Ac) 4 930.922 44.2 11545 11560 0.9 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Neu5Ac) 3 1192.216 41.2 10110 10118 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Neu5Ac) 4 894.418 40.2 9653 9662 0.6 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-2Hex-2Neu5Ac) 4 1058.470 43.5 11186 11209 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex) 4 953.437 32.9 6146 6147 1.0 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Fuc-2Neu5Ac) 3 1562.340 42.6 10816 10818 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Fuc-2Neu5Ac) 4 1171.758 45.5 12123 12138 1.1 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Neu5Ac) 3 1464.969 38.2 8715 8736 0.9 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Neu5Ac) 4 1098.981 38.6 8894 8896 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Neu5Ac) 5 879.389 38.2 8692 8706 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-3Neu5Ac) 3 1562.008 43.2 11083 11097 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-3Neu5Ac) 4 1171.759 43.7 11306 11314 0.8 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-3Neu5Ac) 5 937.607 42.7 10840 10854 0.8 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-4Neu5Ac) 4 1244.533 47.2 12842 12844 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-4Neu5Ac) 5 995.829 46.8 12688 12689 0.9 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-5Neu5Ac) 4 1317.307 51.8 14344 14350 2.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Fuc-3Neu5Ac) 4 1208.272 43.5 11212 11228 1.8 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Fuc-3Neu5Ac) 5 966.820 43.5 11212 11220 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Neu5Ac) 3 1367.944 35.1 7196 7215 1.6 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Neu5Ac) 4 1026.210 35.1 7196 7198 0.7 TPLPPTSAHGNVAEGETKPDPDVTER_(4HexNAc-4Hex-4Neu5Ac) 4 1335.815 46.3 12473 12487 2.5 TPLPPTSAHGNVAEGETKPDPDVTER_(4HexNAc-4Hex-4Neu5Ac) 5 1068.852 44.6 11755 11764 1.5 M- S- Y- MS O-Glycopeptides score score Score RT Three TIQ TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc) 1.9 N/A 77.9 38.5 231629754 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 2.2 N/A 82.4 35.3 192574557 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex) 2.2 N/A 78.5 35.3 2437221120 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 2.3 N/A 79.9 38.5 2198194752 TPLPPTSAHGNVAEGETKPDPDVTER_(HexNAc-Hex-Neu5Ac) 2.2 N/A 74.6 38.5 15466424704 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 2.2 N/A 87.2 33.8 27053351 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex) 2.3 N/A 77.2 33.8 431081292 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Fuc-Neu5Ac) 2.1 N/A 71.7 42.0 3954063568 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Neu5Ac) 2.3 N/A 83.6 41.0 573427140 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Neu5Ac) 2.1 N/A 77.2 41.0 6377947360 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-2Neu5Ac) 1.9 N/A 72.4 40.9 192706316 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-3Neu5Ac) 1.9 N/A 75.0 44.0 639218652 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-3Neu5Ac) 1.9 N/A 73.3 44.0 56253145 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-4Neu5Ac) 1.7 N/A 75.2 47.5 37355083 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-4Neu5Ac) 1.6 N/A 72.4 47.5 7263807 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-Neu5Ac) 2.4 N/A 85.6 37.1 50840529 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-Neu5Ac) 2.4 N/A 77.3 37.1 772775072 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-2Hex-Neu5Ac) 2.3 N/A 69.5 37.0 5568842 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-2Neu5Ac) 2.2 N/A 75.5 43.9 5861588 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Fuc) 2.0 N/A 64.2 38.6 1828656880 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Fuc-Neu5Ac) 1.7 N/A 59.4 42.7 313889802 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Neu5Ac) 2.4 N/A 81.6 40.9 17610932 TPLPPTSAHGNVAEGETKPDPDVTER_(2HexNAc-Hex-Neu5Ac) 2.4 N/A 77.3 40.9 49380856 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-2Hex-2Neu5Ac) 2.4 N/A 78.5 42.8 10095801 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex) 2.3 N/A 78.5 32.1 45935472 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Fuc-2Neu5Ac) 2.2 N/A 72.9 45.3 368324 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Fuc-2Neu5Ac) 2.3 N/A 62.6 45.3 10605150 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Neu5Ac) 2.3 N/A 76.5 38.1 1514594 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Neu5Ac) 2.3 N/A 77.6 38.4 192391058 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-2Neu5Ac) 2.2 N/A 63.8 38.2 2444823 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-3Neu5Ac) 2.3 N/A 77.0 43.2 22270832 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-3Neu5Ac) 2.1 N/A 75.4 42.8 861390400 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-3Neu5Ac) 1.7 N/A 71.9 42.8 178344960 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-4Neu5Ac) 1.9 N/A 72.7 47.2 37061631 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-4Neu5Ac) 1.6 N/A 65.6 47.1 12279873 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-5Neu5Ac) 2.3 N/A 61.0 52.0 1637785 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Fuc-3Neu5Ac) 2.1 N/A 69.0 43.3 4669822 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Fuc-3Neu5Ac) 2.5 N/A 73.0 43.4 639563 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Neu5Ac) 2.3 N/A 77.8 35.3 8242920 TPLPPTSAHGNVAEGETKPDPDVTER_(3HexNAc-3Hex-Neu5Ac) 2.2 N/A 77.7 35.2 109264811 TPLPPTSAHGNVAEGETKPDPDVTER_(4HexNAc-4Hex-4Neu5Ac) 2.3 N/A 61.5 44.9 8489955 TPLPPTSAHGNVAEGETKPDPDVTER_(4HexNAc-4Hex-4Neu5Ac) 2.0 N/A 65.0 44.8 4235244

(67) The results above showed similar qualitative and quantitative distributions as the reference [Miloslav Sanda, et al., Increased sialylation of site specific O-glycoforms of hemopexin in liver disease. Clin Proteom, 2016, 13:24]. It was also confirmed from the results of the analysis performed with hemopexin, the standard glycoprotein samples, that TPLPPTSAHGNVAEGE TKPDPDVTER(2HexNAc-2Hex) (SEQ. ID. NO: 1), the O-linked glycopeptide having the same molecular weight, could be identified in the form of A) T(HexNAc-Hex)PLPPT(HexNAc-Hex)SAHGNVAEGETKPDPDVTER or B) T(2HexNAc-2Hex)PLPPTSAHGNVAEGETKPDPDVTER having sugar chains bound to different O-glycosylation sites in different forms (FIGS. 8 and 9).

(68) In conclusion, the method of the present invention is useful for the identification of O-linked glycopeptide specifically in a standard glycoprotein sample and for the quantitative analysis thereof. This method can be broadly applied to various studies in relation to the analysis of glycoprotein including biosimilars.

(69) Practical and presently preferred embodiments of the present invention are illustrative as shown in the following Examples.

(70) However, it will be appreciated that those skilled in the art, on consideration of this disclosure, may make modifications and improvements within the spirit and scope of the present invention.

Example 1: Preparation of Human Serum Sample

(71) The human serum sample was purchased from Sigma Aldrich. Trypsin was added to the purchased serum sample, followed by hydrolysis at 37° C. for overnight. The hydrolyzed sample was concentrated by using ZIC-HILIC column.

Example 2: Qualitative and Quantitative Analysis of O-Linked Glycopeptide

(72) LC/ESI-MS/MS analysis was performed with Orbitrap Fusion lumos (Orbitrap Fusion™), the high resolution mass spectrometer, linked to the polypeptide included in the sample prepared in Example 1. The analysis above was repeated three times for obtaining reproducible results. The mass analysis result file (RAW) was converted into ms1 (MS) and ms2 (MS/MS) files by using RAWConverter v1.1 (The Scripps Research Institute, USA) which is a freeware program. The identification and quantification of O-linked glycopeptide were performed by the bioinformation processing analysis method for the identification and quantification of O-linked glycopeptide of the invention. The results are shown in FIGS. 10-13.

(73) As shown in FIG. 10, the results of the qualitative analysis of O-linked glycopeptide repeated three times using the human serum sample were confirmed (FIG. 10a) and the results are shown in a graph (FIG. 10b). As shown in FIG. 11, the results obtained from the analysis of glycoprotein (FIG. 11a) or O-linked glycopeptide (FIG. 11b) were presented as Venn diagram. As shown in FIG. 12, the results of the quantification of O-linked glycopeptide repeated three times using the human serum sample were confirmed by heatmap (FIG. 12). In addition, as shown in FIG. 11, it was confirmed that the representative O-linked glycopeptide of the human serum sample analyzed by the method of the present invention displayed similar HCD, CID, and ETD spectra to those of T(HexNAc-Hex-Neu5Ac)PLPPTSAHGNVAEGETKPDPDVTER, the representative O-linked glycopeptide of the standard protein sample (FIG. 13).

(74) Those skilled in the art will appreciate that the conceptions and specific embodiments disclosed in the foregoing description may be readily utilized as a basis for modifying or designing other embodiments for carrying out the same purposes of the present invention. Those skilled in the art will also appreciate that such equivalent embodiments do not depart from the spirit and scope of the invention as set forth in the appended Claims.

Bioinformatics platform for high-throughput identification and quantification of O-glycopeptide

Assignee

Inventors

Cpc classification

Classification Explorer

G16B50/00

PHYSICS

Classification Explorer

G01N33/6848

PHYSICS

Classification Explorer

G16B40/10

PHYSICS

Classification Explorer

G01N2560/00

PHYSICS

Classification Explorer

G16B20/00

PHYSICS

International classification

Classification Explorer

G01N33/68

PHYSICS

Classification Explorer

G16B50/00

PHYSICS

Abstract

Claims

Description