IDENTIFICATION SYSTEM OF CIRCULATING BIOMARKERS FOR CANCER DETECTION, DEVELOPMENT METHOD OF CIRCULATING BIOMARKERS FOR CANCER DETECTION, CANCER DETECTION METHOD AND KIT
20230235410 · 2023-07-27
Assignee
Inventors
- Jian-Hao Li (Hsinchu City, TW)
- Hui-Chu Hsieh (Changhua County, TW)
- Po-Chang Chen (Hsinchu City, TW)
- Pei-Shin Jiang (Hsinchu City, TW)
- Chih-Lung Lin (Taichung City, TW)
Cpc classification
C12Q1/6809
CHEMISTRY; METALLURGY
International classification
Abstract
An identification system of circulating biomarkers for cancer detection, a development method of circulating biomarkers for cancer detection, a cancer detection method and a kit are provided in the present disclosure, and the development method includes the following steps. Expression levels of multiple genes in normal tissue samples and tumor tissue samples are identified, and genes with high expression levels in the tumor tissue samples are selected. Afterwards, a weight of each human tissue’s contribution to plasma exosomes is calculated using tissue-specific genes and group-enriched genes. Next, expression levels of plasma exosome genes of healthy people and cancer patients are compared by an overlapping index, and circulating biomarkers and combinations thereof suitable for detection and evaluation of plasma exosomes are selected.
Claims
1. A development method of circulating biomarkers for cancer detection, comprising: identifying expression levels of multiple genes in normal tissue samples and tumor tissue samples, and selecting genes with high expression levels in the tumor tissue samples; using tissue-specific genes and group-enriched genes to calculate a weight of each human tissue’s contribution to plasma exosomes; and comparing expression levels of plasma exosome genes of healthy people and cancer patients by an overlapping index, and selecting circulating biomarkers and combinations thereof suitable for detection and evaluation of the plasma exosomes.
2. The development method according to claim 1, wherein a statistical analysis method is used to select the genes with high expression level in the tumor tissue samples, and the statistical analysis method includes a null hypothesis test and a fold change threshold.
3. The development method according to claim 2, wherein the null hypothesis test includes: using Welch’s t-test to calculate a p value; adjusting the p value by using a permutation test to increase a test validity; and performing a screening by using a false discovery rate as a criterium to reduce a probability of selecting false high-expression genes.
4. The development method according to claim 1, further comprising comparing an exosome database and subcellular locations to see if the circulating biomarkers are expressed on the surface and/or inside exosome before calculating the weight of each human tissue’s contribution to plasma exosomes.
5. The development method according to claim 1, further comprising using the calculated weight to simulate a plasma exosome expression level distribution of circulating biomarker in the healthy people and the cancer patients after calculating the weight of each human tissue’s contribution to plasma exosomes.
6. The development method according to claim 5, wherein an intersection area of probability density functions of plasma exosome expression levels of the healthy people and the cancer patients are calculated according to the simulated plasma exosome expression level distributions, and the intersection area is the overlapping index.
7. The development method according to claim 1, when the overlapping index of the plasma exosome gene is equal to 0.70 or less than 0.70, the plasma exosome gene is listed as a potential selection target.
8. An identification system of circulating biomarkers for cancer detection, using the development method according to claim 1.
9. An identification system of circulating biomarkers for cancer detection, comprising: a) identification module, for identifying expression levels of multiple genes in normal tissue samples and tumor tissue samples, and selecting genes with high expression levels in the tumor tissue samples; b) computing module, using tissue-specific genes and group-enriched genes to calculate a weight of each human tissue’s contribution to plasma exosomes; and c) evaluation module, comparing expression levels of plasma exosome genes of healthy people and cancer patients by an overlapping index, and selecting circulating biomarkers and combinations thereof suitable for detection and evaluation of the plasma exosomes.
10. The identification system according to claim 9, wherein a statistical analysis method is used to select the genes with high expression level in the tumor tissue samples, and the statistical analysis method includes a null hypothesis test and a fold change threshold.
11. The identification system according to claim 10, wherein the null hypothesis test includes: using Welch’s t-test to calculate a p value; adjusting the p value by using a permutation test to increase a test validity; and performing a screening by using a false discovery rate as a criterium to reduce a probability of selecting false high-expression genes.
12. The identification system according to claim 9, further comprising comparing an exosome database and subcellular locations to see if the circulating biomarkers are expressed on the surface and/or inside exosome before calculating the weight of each human tissue’s contribution to plasma exosomes.
13. The identification system according to claim 9, further comprising using the calculated weight to simulate a plasma exosome expression level distribution of circulating biomarker in the healthy people and the cancer patients after calculating the weight of each human tissue’s contribution to plasma exosomes.
14. The identification system according to claim 13, wherein an intersection area of a probability density function of plasma exosome expression levels of the healthy people and the cancer patients are calculated according to the simulated plasma exosome expression level distributions, and the intersection area is the overlapping index.
15. The identification system according to claim 9, when the overlapping index of the plasma exosome gene is equal to 0.70 or less than 0.70, the plasma exosome gene is listed as a potential selection target.
16. A cancer detection method, using circulating biomarkers developed by the identification system according to claim 9, and the circulating biomarkers include BIRC5 and ART3.
17. The cancer detection method according to claim 16, which is used for triple-negative breast cancer detection.
18. A kit, using circulating biomarkers developed by the identification system according to claim 9, and the circulating biomarkers include BIRC5 and ART3.
19. The kit according to claim 18, which is used for triple-negative breast cancer detection.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015]
[0016]
[0017]
[0018]
[0019]
DETAILED DESCRIPTION OF DISCLOSURED EMBODIMENTS
[0020] The following examples are described in detail in conjunction with the accompanying drawings, but the provided examples are not intended to limit the scope of the present disclosure. Moreover, terms such as “include”, “comprise”, “have”, etc. used in the text are all open-ended terms, that is, “including but not limited to”.
[0021] The disclosure provides an identification system of circulating biomarkers for cancer detection and a development method of circulating biomarkers for cancer detection. The identification system of circulating biomarkers for cancer detection of the disclosure uses the development method of circulating biomarkers for cancer detection of the disclosure. Therefore, for the purpose of succinct description, the following mainly illustrates with the identification system of circulating biomarkers for cancer detection. The details of the development method of circulating biomarkers for cancer detection of the disclosure are basically repeated with the identification system of circulating biomarkers for cancer detection of the disclosure, so it will not be described in detail below.
[0022] The identification system of circulating biomarkers for cancer detection of the embodiment in the disclosure includes a) an identification module, b) a computing module and c) an evaluation module, wherein the identification module is used to select tumor tissue-upregulated gene markers, the computing module is used to calculate tissue weights, and the evaluation module is used to evaluate differences between healthy people and patients. In the following, the a) identification module, the b) computing module, and the c) evaluation module will be used to describe the identification system of the circulating biomarker for cancer detection according to an embodiment of the disclosure.
[0023] In terms of definition explanation, regarding the identification system of the circulating biomarker for cancer detection in the disclosure, wherein “identification system” includes hardware operating platforms (personal computers, supercomputers, etc.) and software (application programming interfaces, data processing algorithms, etc.), “module” can be a block, area, part, application area, or operation area in the identification system, but the disclosure is not limited thereto.
A) Identification Module
[0024] In the identification system of circulating biomarker for cancer detection disclosed in the disclosure, a) identification module compares normal tissue samples and tumor tissue samples in multiple genes covered by the exon-level RNA-seq or their products such as protein and mRNA expression levels, so as to select genes with high expression level in tumor tissue samples. Although the embodiment is mainly described with transcriptomics as an example, the disclosure is not limited thereto, and can also be applied to other physical data such as proteomics. It must be noted that before the genes with high expression level in the tumor tissue samples are selected, the data quality control/quality inspection (QC, quality control) of the physical data is performed first.
[0025] In the present embodiment, the genes with high expression level in tumor tissue samples are selected using statistical analysis methods, the statistical analysis methods include a null hypothesis test and a fold change threshold. In the following, the null hypothesis test and fold change threshold will be explained in detail.
Null Hypothesis Test
[0026] In the present embodiment, the null hypothesis test is used to examine whether the average expression level of each gene in tumor tissue samples is significantly higher than that in normal samples. The null hypothesis test includes Welch’s t-test, permutation test and false discovery rate (FDR). In more detail, the Welch’s t-test is used to calculate a p value, the permutation test is used to adjust the p value, and then the false discovery rate is used as a standard for screening to reduce the probability of selecting false high-expression genes. In the following, the Welch’s t-test, the permutation test and the false discovery rate will be explained in detail.
Welch’s T-Test
[0027] In this embodiment, Welch’s t-test allows tumor and normal tissue data variance to be different when testing whether the average expression level of each gene in tumor samples is significantly higher than that in normal samples. In more detail, the applied formula is as follows:
[0028] Test whether the average expression level (mean) of the gene of the tumor sample is statistically significantly higher than the average expression level of the gene of the normal sample, wherein
are respectively the sample standard deviations of cancer and normal samples. The null hypothesis (H.sub.0) here is: the average gene expression level of tumor samples ≤ the average gene expression level of normal samples, which belongs to the one-tailed test in the null hypothesis test. The probability threshold is set to be 0.5%, that is, if the probability of observing the current data statistically is less than 0.5% under the assumption condition of H.sub.0 (p value<0.005), then the hypothesis of H.sub.0 is rejected.
Permutation Test
[0029] When samples are limited, the resampling-based permutation test can be an effective statistical test. In more detail, the applied formula is as follows:
wherein N.sub.pm is a number of random permutations in the permutation test,
is a cumulative number of N.sub.pm random permutations where p value ≤ p value before permutation, p.sub.pm; is a p value calculated by the permutation test. In this embodiment, set N.sub.pm=10.sup.5, and p.sup.pm; can be regarded as a correction to the p value of Welch’s t-test.
False Discovery Rate
[0030] In this embodiment, when screening from a large number of genes, in order to reduce the incidence of false positives, the false discovery rate q≤0.005 is used as the standard. In more detail, the applied formula is as follows:
[0031] In the process of screening a large number of genes at the same time, in order to reduce the probability of false positives, the p value obtained by the gene according to the null hypothesis test can be sorted from smallest to largest, and then the false discovery rate standard can be used to screen genes, wherein N.sub.gene is the total number of screened genes, and p.sub.n is the p value for the n.sup.th gene (the genes have been sorted from smallest to largest according to the p value obtained by the null hypothesis test). After the maximum n value (n.sub.max) satisfies q≤0.005 is calculated, the first n.sub.max genes are the genes selected based on the false discovery rate.
Fold Change
[0032] Considering the interpretability of detection instrument results, this disclosure also sets appropriate fold change threshold conditions to exclude genes with too small fold change. The definition of fold change (FC) is:
[0033] That is, the ratio of the average gene expression level of tumor samples and normal samples. Although under the condition that the average gene expression level of tumor tissue is higher than the average expression level of normal tissue gene, a large number of genes can already be excluded (the exact number of genes excluded is related to the range of genes covered by each data set, and the data set used in this disclosure can exclude 40% to 50% of genes), and the number of genes left after screening with the condition of FC>2 is less than 5% of the original number of genes.
[0034] According to an embodiment of the present disclosure, exemplary operations are as follows. The triple-negative breast cancer RNA-seq gene expression level dataset GSE118527 in the Gene Expression Omnibus (GEO) database was analyzed, and the data of 88 cases with tumors and normal tissues around the tumors were compared, covering a total of 45,308 genes. The filter conditions are, for example: [0035] (1) There are genes listed in ExoCarta, Vesiclepedia, or EVmiRNA exosome data sets, that is, genes that have been confirmed to appear in the exosome. [0036] (2) The average gene expression level of tumor samples is higher than that of normal samples. [0037] (3) Tumor tissue vs normal tissue satisfies q value < 0.005 [0038] (4) Fold change > 2 Among the 45308 genes, only 607 genes meet the above conditions, which greatly reduces the number of candidate genes.
[0039] Before b) computing module calculates the weight of each human tissue for plasma exosome contribution, the identification system of the circulating biomarker for cancer detection in this disclosure refers to subcellular location information of exosome database to see if the circulating biomarkers are expressed on the surface and/or inside exosome. Circulating biomarkers expressed on the exosome surface can be further used for antibody binding.
B) Computing Module
[0040] In the identification system of circulating biomarker for cancer detection disclosed in the disclosure, b) computing module uses tissue-specific genes and group-enriched genes to calculate the weight of each human tissue for plasma exosome contribution. In more detail, the applied formula is as follows:
[0041] Plasma exosomes are the sum of the exosomes secreted by various tissues/organs/blood cells in the blood. Therefore, the gene expression level of a gene (gn) on plasma exosomes can be expressed by the above formula. According to an embodiment of the disclosure, a total of 69 types of tissues, organs, or blood cells, etc. that provide detection data in large human omics databases such as HPA, FANTOM5, and GTEx are expected to cover all sources of exosomes in the blood (as shown in Table 1 below). In order to calculate the (C.sub.gn × W.sub.gn) weight of each tissue, according to an embodiment of the disclosure, human tissue and organ gene expression level data provided by online databases such as HPA, FANTOM5, and GTEx is used. First, several tissue-specific genes of each tissue are selected, the plasma exosome expression level of this type of gene is estimated as a contribution only from the highly expressed tissue, and then the (C.sub.gn × W.sub.gn) weight of the tissue is calculated. If there are no tissue-specific genes in a tissue, several group-enriched genes are selected, that is, a group of genes with significantly increased expression level in this tissue and other tissues, so as to jointly determine the (C.sub.gn × W.sub.gn) weight of each tissue.
TABLE-US-00001 Tissue/Organ breast adipose tissue skin bone marrow lymph node lung kidney liver gallbladder spleen stomach duodenum small intestine rectum colon appendix tongue esophagus smooth muscle heart muscle skeletal muscle urinary bladder retina placenta vagina fallopian tube cervix endometrium ovary medulla oblongata pons thalamus white matter amygdala hippocampal formation midbrain spinal cord cerebellum basal ganglia choroid plexus cerebral cortex pituitary gland hypothalamus thyroid gland parathyroid gland tonsil thymus adrenal gland pancreas salivary gland Blood cell platelet NK-cell naive CD8 T-cell memory CD8 T-cell naive CD4 T-cell memory CD4 T-cell T-reg gdT-cell MAIT T-cell naive B-cell memory B-cell neutrophil basophil eosinophil classical monocyte non-classical monocyte intermediate monocyte myeloid DC plasmacytoid DC
[0042] According to an embodiment of the disclosure, when calculating the weight, for each tissue-specific gene and group-enriched gene, HPA, FANTOM5 and GTEx human tissue expression level data are used to calculate the expression level probability density function of each tissue with the lognormal distribution for best fitting the expression level data of each tissue. After the expression level distribution of a tissue is obtained, it is used to calculate the gene expression level distribution of the exosome released into the blood by the tissue under different (C.sub.gn × W.sub.gn) weights.
[0043] According to an embodiment of the disclosure, the exosome gene expression level data of 149 healthy people are assembled, and for the selected tissue-specific genes and group-enriched genes, an in-house algorithm is used to adjust and test the weight of several tissues at the same time, and find out the (C.sub.gn × W.sub.gn) weight that can best restore the plasma exosome expression level distribution of all tissue-specific and group-enriched genes. The order of magnitude of tissue weight obtained by the simulation is as follows: [0044] Fat: ~1e-5 [0045] Breast: ~1e-5 [0046] Liver: ~1e-7 [0047] Lung: ~1e-3 [0048] Pancreas: ~1e-6 [0049] Skin: ~1e-5 [0050] Basophil: ~1e-5 [0051] Platelet: ~1e-1
[0052] Please refer to
C) Evaluation Module
[0053] In the identification system of circulating biomarker for cancer detection disclosed in the present disclosure, the c) evaluation module compares the gene expression levels of the plasma exosomes of healthy people and cancer patients by using an overlapping index, and selects circulating biomarkers and combinations thereof suitable for detection and evaluation of the plasma exosomes.
[0054] According to an embodiment of the present disclosure, the calculated weight is used to simulate a plasma exosome expression level distribution of circulating biomarker in the healthy people and the cancer patients after calculating the weight of each human tissue’s contribution to plasma exosomes in the b) computing module. An intersection area of probability density functions of plasma exosome expression levels of the healthy people and the cancer patients are calculated according to the simulated plasma exosome expression level distributions, and the intersection area is the overlapping index. The smaller the intersection area (overlapping index), the better it is expected to be able to distinguish healthy and cancer statuses by plasma exosome detection. When the overlapping index ≤ 0.70, it is listed as a potential selection target. For example, the aforementioned overlapping index may be 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15 or 0.10, etc., but the present disclosure is not limited thereto. Furthermore, in addition to the overlapping index calculation, the biomarker selection also comprehensively considers the known characteristics of the gene, such as the gene function known in the literature, the subcellular locations of the gene product (protein) in the cell, and the plasma membrane confidence, etc., and finally the circulating biomarkers and combinations thereof suitable for detection and evaluation of the plasma exosomes are selected.
[0055] According to an embodiment of the present disclosure, after calculating the weight of the contribution of each tissue/organ/blood cell to the plasma exosome expression level, it is used to simulate the plasma exosome expression level distribution of healthy people in genes that have high expression level in the tumor tissue as identified in previous embodiment. Then, based on the gene expression level data of 88 cases of triple-negative breast cancer tissues from the GSE118527 data set and considering the phenomenon that cancer tissue cells release more exosomes than normal cells, the expression level distribution of individual genes in triple-negative breast cancer patients in plasma exosomes is simulated. According to the simulated plasma exosome expression level distributions of healthy and diseased people, the intersection area of plasma exosome expression level distributions of healthy and disease people for each individual gene can be calculated, and the overlapping index can be obtained.
[0056] The smaller the overlapping index of a gene, the smaller the overlap of plasma exosome expression levels between breast cancer and breast cancer-free states of the gene, and thus potentially a better biomarker for distinguishing breast cancer and breast cancer-free states through exosome detection in the person to be examined.
Validation of Breast Cancer Exosomal Biomarkers
[0057] According to the results of null hypothesis test and overlapping index analysis, the subcellular locations of these gene products (proteins) in the cell are further compared. The proteins noted to be expressed on the membrane in the HPA (human protein atlas) database are selected, or genes with plasma membrane confidence > 3 and extracellular confidence > 3 in the COMPARTMENTS Subcellular localization database are selected. From genes having fold change (FC) greater than 1.5, we sorted genes from low to high overlapping index, and ART3, BIRC5, CD274 and PTK7 are taken as examples for verification as exosome protein biomarkers. According to the annotations of HPA and COMPARTMENTS, ART3, BIRC5, CD274 and PTK7 may all be exosome surface proteins. In triple-negative breast cancer tissue RNA-seq study (GSE118527), through the analysis of a) identification module in this disclosure, the fold change and q values of ART3, BIRC5, CD274 and PTK7 are ART3: FC=2.7, q=2.3×10.sup.-9, BIRC5: FC=8.3, q=6.64×10.sup.-45, CD274: FC =1.6, q=2.2×10.sup.-9 and PTK7: FC= 1.9, q=2.4×10.sup.-12. In addition, according to the circulatory system simulation results of b) computing module in this disclosure, the amount and distribution of ART3 and BIRC5 in normal human plasma exosomes are significantly lower than those of CD274 and PTK7. In the verification of the cell line exosome, ultra-high-speed centrifugation is first carried out to separate cell line exosomes. After quantification by Nanoparticle Tracking Analysis (NTA), an equal number of exosomes are taken to compare the expression difference of these proteins in exosomes from normal breast epidermal cell lines (HMEC), triple-negative breast cancer cell lines (MDA-MB-231, MDA-MB-468 and HCC1806) and normal human plasma by immunoassay.
[0058] According to an embodiment of the present disclosure, the processes of immunoassay include the following steps. Firstly, 96-well round bottom white plates carrying the magnetic beads which conjugated with ART3, BIRC5, CD274 and PTK7 antibodies are prepared. 100 .Math.L of exosome samples isolated from different cell lines and plasma are added to the wells in a concentration of 5×10.sup.8 particles/mL, respectively. The reaction is performed on a shaker at 900 rpm at 37° C. for 60 minutes under non-lysing conditions. After washing the magnetic beads with 0.1% Tween-PBST, we then added 100 .Math.L of 0.5 ug/mL biotin conjugated anti-CD81 antibody to each well and react for 60 minutes. After the magnetic beads are further washed with 0.1% Tween-PBST, 100 .Math.L of streptavidin-HRP enzyme is added to each well and react for another 60 minutes. After the magnetic beads are washed with 0.1% Tween-PBST, the luminescent HRP substrate is added to react on the shaker for one minute and the luminescence signal is read.
[0059]
Plasma Interference Test
[0060] In this disclosure, the detecting sensitivity to breast cancer cell exosomes is evaluated by analyzing the samples spiked with various concentrations of breast cancer cell exosomes in plasma exosomes. According to an embodiment of the present disclosure, exosomes from HCC1806, a triple-negative breast cancer cell line, are added to the 100 .Math.L of size exclusion chromatography (SEC) processed plasma exosomes and make the final concentrations of HCC1806 exosomes to be 1×10.sup.9, 2×10.sup.8, 4×10.sup.7, and 8×10.sup.6 particles/mL respectively. Then the exosome surface protein is detected according to the above-mentioned magnetic bead immunoassay.
Selection of Capture-Detection Combination for Exosome Detection
[0061] Exosomes are vesicles secreted by cells, which can carry molecules such as proteins, mRNA or microRNA of primitive cells. The exosomes of specific subgroups can be enriched by identifying surface proteins which perform an affinity purification, such as tumor exosomes, etc., and the biomarkers carried by it is further analyzed, so as to increase the specificity of detection. In the present disclosure, an optimized combination of exosome biomarkers can be developed by calculating C-D pair overlapping index of protein capture for enrichment and biomarker for detection. In an embodiment of the present disclosure, BIRC5 and PTK7 are used respectively as surface proteins for affinity purification, and ART3 is used as a biomarker for detection. First, the expression level distributions of enrichment biomarkers (BIRC5 or PTK7) are used to simulate the proportion redistribution of exosomes from different tissue sources after the enrichment step, then the expression level distribution probability density function of the detection biomarker (ART3) of healthy and diseased people and the associated overlapping index are calculated.
[0062] In one embodiment of the present disclosure, the overlapping index of the biomarker combination is verified by evaluating their performance in immunodetection of tumor cell exosomes addition to the plasma. 800 .Math.L plasma exosome separated by the size exclusion chromatography (SEC) is taken, and exosomes from MDA-MB-231 and MDA-MB-468, which are triple-negative breast cancer cell lines, are added and the concentrations of exosomes were made to be 1×10.sup.9 particles/mL for MDA-MB-231 and 3×10.sup.8 particles/mL for MDA-MB-468, respectively. Next, the performance in exosome detection of two C-D pairs, BIRC5-ART3 and PTK7-ART3, are compared according to the above-mentioned method of magnetic bead based immunoassay, wherein BIRC5 and PTK7 antibodies are used to capture exosomes, and ART3 antibodies are used as detection antibodies. Please refer to (B) of
[0063] This disclosure also provides a cancer detection method, using the circulating biomarker developed by the identification system of circulating biomarker for cancer detection described above. The circulating biomarkers include BIRC5 and ART3, which can be used to detect triple-negative breast cancer. For example, BIRC5 or ART3 antibodies are immobilized on carriers (such as magnetic beads or antibody-absorbable reaction disks) to capture exosomes in samples such as plasma, urine, and spinal fluid, and then antibodies which recognize BIRC5 or ART3 or other proteins are used for immunodetection. During the detection, enzymes such as horseradish peroxidase (HRP) and their substrates or fluorophore reagents can be used to generate signals for the detection of exosome biomarkers.
[0064] This disclosure also provides a kit, using the circulating biomarker developed by the identification system of circulating biomarker for cancer detection described above. The circulating biomarkers include BIRC5 and ART3, which can be used to detect triple-negative breast cancer. The kit contains BIRC5 or ART3 antibody or a solid support with BIRC5 or ART3 antibody, such as magnetic beads or a reaction plate that can absorb antibodies, or this antibody reagent is combined with antibodies which recognize BIRC5 or ART3 or other proteins, which is for exosome detection with or without reagents such as enzymes including HRP and their substrates or fluorophores.
[0065] In summary, the present disclosure provides an identification system of circulating biomarkers for cancer detection, a development method of circulating biomarkers for cancer detection, a cancer detection method and a kit. The identification system and development method use null hypothesis test, computational deconvolution, overlapping index and other methods, based on gene expression data of proteins and nucleic acids, identify genes whose gene expression level in tumor tissue is significantly higher than that in normal tissue, and consider the fold change of gene expression level of tumor tissue compared with gene expression level of normal tissue and other screening conditions. After that, the exosome expression level distribution of these genes in the blood of healthy and diseased people is simulated, combined with the exosome expression level distribution of healthy and diseased people after the enrichment step, so as to sort out the candidate biomarkers of proteins and nucleic acids, which are used as priority references for subsequent clinical specimen verification.