Method for high-throughput screening of non-target biomarkers based on metabolic perturbation caused by pollutants
11226318 · 2022-01-18
Assignee
Inventors
Cpc classification
H01J49/0036
ELECTRICITY
G01N30/7233
PHYSICS
G16B40/10
PHYSICS
G01N33/5308
PHYSICS
G16B5/00
PHYSICS
International classification
Abstract
Disclosed is a method for high-throughput screening of non-target biomarkers based on metabolic disturbance caused by pollutants, belonging to the field of environmental exposure and health. The method includes the following steps: (1) extracting to obtain extracts to be tested; (2) performing chromatographic analysis to obtain a spectrum containing chromatographic peaks; (3) identifying and labeling features of pollutants, taking chromatographic peaks other than the features of the pollutants as features of potential metabolites, and performing non-target labeling of the features of the potential metabolites; (4) establishing a linear regression model by taking the peak areas of the features of the potential metabolites as dependent variables and the peak areas of the features of the pollutants as independent variables; (5) operating the model, and performing non-target screening of the biomarkers to preliminarily obtain related biomarkers; (6) identifying the MS spectra and MS/MS spectra of the preliminarily obtained biomarkers, and identifying biomarkers related to pollutant exposure. The disclosed method improves the accuracy of biomarker screening and the throughput of biomarker screening.
Claims
1. A method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants, comprising the following steps: (1) performing sample treatment, and extracting pollutants and metabolites from biological samples to obtain extracts to be tested; (2) performing scan analysis and detection of the extracts to be tested through a high performance liquid chromatography—time-of-flight mass spectrometer to obtain a spectrum containing chromatographic peaks; (3) identifying and labeling features of pollutants according to the spectrum, taking chromatographic peaks other than the features of the pollutants as features of the potential metabolites, and performing non-target labeling of the features of the potential metabolites; (4) establishing a linear regression model by taking the peak areas of the features of the potential metabolites as dependent variables and the peak areas of the features of the pollutants as independent variables; (5) operating the model to perform non-target screening of the biomarkers, obtaining related biomarkers by preliminary screening; and (6) identifying the mass spectrometry (MS) spectra and Tandem mass spectrometry (MS/MS) spectra of the biomarkers obtained in step (5), and identifying biomarkers related to pollutant exposure.
2. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 1, wherein the method further comprises step (7): using a correction method to correct the model, operating the corrected model, and repeating steps (5)-(6).
3. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 2, wherein the correction method comprises false discovery rate (FDR) correction and interference factor correction; in the process of the FDR correction, threshold p<0.05 is corrected to be FDR<20%; and in the process of the interference factor correction, interference factors existing in samples are added to the model as covariates for correction.
4. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 2, wherein when the extracts to-be-tested contain multiple pollutants, the correction method comprises a co-exposure correction method comprised of taking multiple pollutants as potential independent variables to perform multiple stepwise regression and model correction.
5. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 4, wherein in the process of identifying the features of the pollutants, the spectrum is converted into a data file, and the peaks in the data file are imported into a Mass Spec data interrogation software and aligned for analysis to identify the pollutants.
6. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 5, wherein in the process of non-target labeling of the features of the potential metabolites, the spectrum is converted into a backup file, the peaks in the backup file are imported into a software program for untargeted metabolomics and aligned, and the features with detection rate greater than 80% are retained as the features of the potential metabolites.
7. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 4, wherein in step (4), the model with significance p<0.05 after operation is taken as an effective model to implement the operation process of step (5).
8. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 7, wherein in step (6), a software program for untargeted metabolomics and an internet-based metabolite identification platform are combined to identify the biomarkers.
9. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 8, wherein in step (2), the adopted detection conditions are as follows: high performance liquid chromatographic instrument: Infinity1260; chromatographic column: C18 column: 2.1 mm×50 mm, 2.5 μm; column temperature: 40° C.; flow rate: 0.4 mL/min; mobile phase: phase A in positive ion mode: 0.1% formic acid-aqueous solution; phase A in negative ion mode: 2 mM ammonium acetate aqueous solution; and phase B: methanol; the gradient elution conditions are as follows: TABLE-US-00009 Time (min) A % B % 1.00 95 5 11.00 75 25 19.00 50 50 25.00 25 75 29.00 0 100 32.00 0 100 32.01 95 5 36.00 100 0 full scan mode: data dependence mode; ion source: positive and negative electrospray ionization source; full scan mass range: MS 50-1250 Da, MS/MS 30-1000 Da; collision energy: ±40 eV; collision energy spread: 20 eV; ion source temperature: 550° C.
10. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 2, wherein the method further comprises the step of metabolic pathway enrichment of the biomarkers, and in this step, the identified biomarkers are enriched into metabolic pathways to obtain metabolic pathways disturbed by pollutants.
11. The method for screening of non-target biomarkers based on a metabolic disturbance caused by pollutants according to claim 1, wherein the method further comprises the step of metabolic pathway enrichment of the biomarkers, and in this step, the identified biomarkers are enriched into metabolic pathways to obtain metabolic pathways disturbed by pollutants.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION
(6) The present invention is further described below with reference to specific examples.
Example 1
(7) This example relates to a method for high-throughput screening of non-target biomarkers based on a metabolic disturbance caused by perfluorinated pollutants in the blood medium. The method includes the following steps:
(8) (1) 0.5 mL of serum sample is placed into a 15 mL centrifuge tube, 0.26-0.28 g of magnesium sulfate-sodium chloride mixture and 1.5 mL of acetonitrile are added, and then the sample is swirled immediately. At this time, the sample is in suspension. Ultrasonic extraction of the sample is performed for 30 min and centrifugated, then the supernatant is transferred. The residue is extracted twice with 95% acetonitrile-aqueous solution, and the extracts are combined. The extracts are blown with nitrogen to be nearly dry, transferred to a chromatographic sample bottle and made up to 100 μL with acetonitrile.
(9) (2) Instrument analysis: the extracted samples are subjected to full scan analysis and detection through a high-performance liquid chromatography—time-of-flight mass spectrometer. Parameters are as follows:
(10) high performance liquid chromatographic instrument: Infinity 1260;
(11) chromatographic column: C18 column: (2.1 mm×50 mm, 2.5 μm);
(12) column temperature: 40° C.;
(13) flow rate: 0.4 mL/min;
(14) mobile phase: 0.1% formic acid-aqueous solution (phase A in positive ion mode), 2 mM ammonium acetate aqueous solution (phase A in negative ion mode) and methanol (phase B).
(15) Table 2 shows the gradient elution conditions.
(16) TABLE-US-00002 TABLE 2 Gradient elution conditions time (min) A % B % 1.00 95 5 11.00 75 25 19.00 50 50 25.00 25 75 29.00 0 100 32.00 0 100 32.01 95 5 36.00 100 0
(17) mass spectrometer instrument: Triple TOF 4600;
(18) full scan mode: data dependence mode;
(19) ion source: positive and negative electrospray ionization source;
(20) full scan mass range: MS 50-1250 Da, MS/MS 30-1000 Da;
(21) collision energy: ±40 eV;
(22) collision energy spread: 20 eV;
(23) ion source temperature: 550° C.
(24) (3) Labeling and identification of features of pollutants: the spectrum obtained after instrumental analysis is converted into a WIFF file, an AB SCIEX Windows Interchange File Format data file, the peaks in the WIFF file are imported into PEAKVIEW® Software, a Mass Spec data interrogation software and aligned for analysis. The chromatogram of the sample is obtained, as shown in
(25) The pollutants of interest in this example are perfluorinated compounds, which are novel pollutants widely existing in the environment and organisms. Legacy perfluorocarboxylic acid and perfluorosulfonic acid are identified by matching retention time and mass spectrum fragments with those of standard samples. The structure of the novel perfluorinated substances without standard samples can be calculated by analyzing the fragments of mass spectra using Formula Finder function. Parameters are as follows:
(26) peak picking mass range: 50-1250 Da;
(27) peak picking mass error: 0.01 Da;
(28) alignment retention time error: 2 min;
(29) alignment mass error: 0.01 Da;
(30) identification mass error: MS 0.01 Da, MS/MS 0.005 Da.
(31) The perfluorinated pollutants identified by the above steps are shown in Table 3.
(32) TABLE-US-00003 TABLE 3 Identified perfluorinated pollutants Mass Retention Name Abbreviation number (Da) time (min) Perfluorooctanoate PFOA 412.9664 19.94 Perfluorononanoate PFNA 462.9632 21.35 Perfluorodecanoate PFDA 512.9600 22.54 Perfluoroundecanoate PFUnDA 562.9568 23.63 Perfluorohexanesulfonate PFHxS 398.9366 18.09 Perfluoroheptanesulfonate PFHpS 448.9334 20.30 Perfluorooctanesulfonate PFOS 498.9302 21.75 6:2 chloro ether sulfonic acid 6:2 Cl-PFESA 530.8956 22.40 8:2 chloro ether sulfonic acid 8:2 Cl-PFESA 630.8892 24.07
(33) (4) Non-target labeling of features of metabolites: the spectra obtained after instrumental analysis are converted into an ABF file, a backup file created by Analysis Services, a component of Microsoft SQL Server used for online analytical processing (OLAP) and data mining. The peaks in the ABF file are imported into MSDIAL software, a universal software program for untargeted metabolomics that supports multiple MS instruments and MS vendors, and aligned. Metabolites with detection rate greater than 80% and the peak area and mass spectrum corresponding to each peak are listed as a chromatographic peak table. Parameters are as follows:
(34) peak picking mass range: 30-1250 Da;
(35) peak picking mass error: 0.01 Da;
(36) alignment retention time error: 0.5 min;
(37) alignment mass error: 0.015 Da.
(38) After 84 samples were analyzed, 3798 metabolites were finally labeled, and a sample metabolite matrix of 84×3798 was obtained.
(39) (5) Non-target screening of biomarkers: a linear regression model is established in SPSS® software, a software package used for interactive, or batched, statistical analysis. the dependent variables are the peak areas of the features of the potential metabolites subjected to non-target labeling in (4), the independent variables are the peak areas of the features of the perfluorinated pollutants in (3) respectively, and the model with significance p<0.05 after model operation is regarded as an effective model. The number of biomarkers related to exposure of each perfluorinated compound can be obtained. Table 4 shows the number of biomarkers corresponding to nine perfluorinated compounds obtained in this step.
(40) TABLE-US-00004 TABLE 4 Number of biomarkers corresponding to the nine perfluorinated compounds Name Abbreviation Biomarker quantity Perfluorooctanoate PFOA 1583 Perfluorononanoate PFNA 506 Perfluorodecanoate PFDA 664 Perfluoroundecanoate PFUnDA 427 Perfluorohexanesulfonate PFHxS 1639 Perfluoroheptanesulfonate PFHpS 381 Perfluorooctanesulfonate PFOS 140 6:2 chloro ether sulfonic acid 6:2 Cl-PFESA 53 8:2 chloro ether sulfonic acid 8:2 Cl-PFESA 76
(41) (6) High-throughput identification of biomarkers: the MS spectra and MS/MS spectra of biomarkers obtained in step (5) are subjected to programmed identification by multi-platform combination. Firstly, MSP (a file extension associated with Windows Installer Patch file used for updating Windows and Microsoft programs) files in positive and negative ion modes are loaded in MSDIAL software, a universal software program for untargeted metabolomics that supports multiple MS instruments and MS vendors, respectively for metabolite library alignment. Unmatched metabolic characteristic peaks are uploaded to MetDNA platform, an internet-based platform with URL of http://metdna.zhulab.cn/ that implements a metabolic reaction network (MRN) based recursive algorithm for metabolite identification and supports data from different LC systems, and MS platforms for further identification. Identification results are classified in confidence level according to the standard recommended by Metabolites Standards Initiative (MSI): the metabolites identified by MSDIAL are at level 2; the “seed” metabolites identified by MetDNA are level 2, and other metabolites identified by MetDNA are at level 3, wherein MSDIAL is a universal software program for untargeted metabolomics that supports multiple MS instruments and MS vendors, and MetDNA platform is an internet-based platform with URL at http://metdna.zhulab.cn/ that implements a metabolic reaction network (MRN) based recursive algorithm for metabolite identification and supports data from different LC systems and MS platforms. MSDIAL parameters are as follows:
(42) mass error: MS 0.01 Da, MS/MS 0.05 Da.
(43) threshold of score: 80 points.
(44) Table 5 shows the number of biomarkers corresponding to nine perfluorinated compounds identified in this step.
(45) TABLE-US-00005 TABLE 5 Number of biomarkers corresponding to the nine identified perfluorinated compounds Name Abbreviation Biomarker quantity Perfluorooctanoate PFOA 235 Perfluorononanoate PFNA 63 Perfluorodecanoate PFDA 99 Perfluoroundecanoate PFUnDA 60 Perfluorohexanesulfonate PFHxS 243 Perfluoroheptanesulfonate PFHpS 55 Perfluorooctane sulfonate PFOS 17 6:2 chloro ether sulfonic acid 6:2 Cl-PFESA 9 8:2 chloro ether sulfonic acid 8:2 Cl-PFESA 21
(46) (7) Correction of a biomarker non-target screening model: false positive results are reduced through multiple corrections, which are as follows, respectively:
(47) false discovery rate (FDR) correction: R software is used to correct the threshold p<0.05 to FDR<20% through a qvalue command;
(48) interference factor correction: interference factors (age, weight, residence) existing in samples are added to the regression model as covariates for correction; and
(49) co-exposure correction: when a model of a perfluorinated compound is analyzed, a multiple regression model is operated by taking biomarkers as dependent variables and 8 other perfluorinated compounds as independent variables, and a “stepwise” method is selected to retain significant perfluorinated compounds as independent variables and delete non-significant perfluorinated. After operation of the model, the regression model of the following three objects is obtained: 1) dependent variable: biomarker; 2) independent variable 1: the specific perfluorinated compound analyzed; (3) independent variable 2: significant perfluorinated compounds among 8 perfluorinated compounds. In this case, the significance of item 2) (independent variable 1: a perfluorinated compound analyzed) is the significance p value after correction, and the metabolite with the p value still less than 0.05 is used as the final biomarker corresponding to this perfluorinated compound. The high-throughput identification of biomarkers is implemented by step (6).
(50) Table 6 shows the number of biomarkers corresponding to the nine identified perfluorinated compounds by using the corrected model.
(51) TABLE-US-00006 TABLE 6 Number of identified biomarkers corresponding to the nine perfluorinated compounds after correction Name Abbreviation Biomarker quantity Perfluorooctanoate PFOA 64 Perfluorononanoate PFNA 21 Perfluorodecanoate PFDA 33 Perfluoroundecanoate PFUnDA 14 Perfluorohexanesulfonate PFHxS 147 Perfluoroheptanesulfonate PFHpS 3 Perfluorooctane sulfonate PFOS 0 6:2 chloro ether sulfonic acid 6:2 Cl-PFESA 1 8:2 chloro ether sulfonic acid 8:2 Cl-PFESA 1
(52) (8) Metabolic pathway enrichment of the biomarkers: the biomarkers are enriched into metabolic pathways. The metabolic pathways disturbed by perfluorinated compounds include steroid hormone biosynthesis, arachidonic acid metabolism, α-linolenic acid metabolism, linoleic acid metabolism and retinol metabolism.
Comparative Example 1
(53) The procedures of this comparative example are basically the same as those of Example 1, except that in step (7), only FDR correction and interference factor correction are adopted, and the model is not corrected by co-exposure correction. Table 7 shows the number of biomarkers corresponding to the nine perfluorinated compounds finally identified.
(54) TABLE-US-00007 TABLE 7 Number of identified biomarkers corresponding to the nine perfluorinated compounds Name Abbreviation Biomarker quantity Perfluorooctanoate PFOA 190 Perfluorononanoate PFNA 27 Perfluorodecanoate PFDA 48 Perfluoroundecanoate PFUnDA 14 Perfluorohexanesulfonate PFHxS 218 Perfluoroheptanesulfonate PFHpS 3 Perfluorooctane sulfonate PFOS 0 6:2 chloro ether sulfonic acid 6:2 Cl-PFESA 1 8:2 chloro ether sulfonic acid 8:2 Cl-PFESA 1
(55) By comparison with the results obtained in Example 1, it can be seen that the model correction without co-exposure correction increases the false positive of the results to a certain extent.
Example 2
(56) The procedures of this example are basically the same as those of Example 1, except that the step of docking perfluorooctanoate (PFOA) and perfluorooctane sulfonate (PFOS) which are the two most widely studied perfluorinated compounds is added before step (5) to predict their metabolic disturbance capabilities.
(57) Molecules of PFOA and PFOS are subjected to structural optimization in the SYBYL® software, a molecular modeling program for creating molecular model from sequence through lead optimization and has capabilities for small molecule modeling and simulation, macromolecular modeling and simulation, and cheminformatics and lead identification. Tripos force field, Gasteiger-Huckel charge and a Powell gradient method are applied until the termination gradient drops below 0.001 kcal/(mol.Math.Å).
(58) After the structure of human serum albumin is downloaded, the ligand is extracted from the SYBYL® software, a molecular modeling program for creating molecular model from sequence through lead optimization and has capabilities for small molecule modeling and simulation, macromolecular modeling and simulation, and cheminformatics and lead identification, to form a docking pocket, and the crystal water is removed and protonated. The optimized perfluorinated ligand and the protein pocket are docked in the SYBYL® software, a molecular modeling program for creating molecular model from sequence through lead optimization and has capabilities for small molecule modeling and simulation, macromolecular modeling and simulation, and cheminformatics and lead identification, and an optimum conformation is selected as docking result. When the total score of the docking result is greater, the docking ability is stronger, and free pollutants are less; otherwise, free pollutants are more, which may lead to stronger transient metabolic disorder and more biomarkers. The conformation obtained by docking is shown in
(59) TABLE-US-00008 TABLE 8 Results of docking of the perfluorinated ligand and the protein pocket Name Abbreviation Total score Kd Perfluorooctanoate PFOA 3.9573 1.10 × 10.sup.−4 Perfluorooctane sulfonate PFOS 4.3431 4.54 × 10.sup.−5
(60) The results of subsequent biomarker screening of Example 1 show that PFOA indeed has stronger transient metabolic disturbance than PFOS. When a large number of pollutants are screened, using Example 2 can focus on pollutants with strong metabolic disturbance, which can reduce workload and improve efficiency.