Systems and methods for detection of multiple cancer types
11530453 · 2022-12-20
Assignee
Inventors
- Marko Bitenc (Ljubljana, SI)
- Kristi Kruusmaa (Ljubljana, SI)
- Juan Martinez-Barea (Seville, ES)
- Christian Hense (Seville, ES)
- Pol Sola de los Santos (Seville, ES)
- Pol Canal Noguer (Seville, ES)
- Marko Chersicola (Ljubljana, SI)
- Primoz̆ Knap (Ljubljana, SI)
Cpc classification
International classification
Abstract
The present disclosure provides, among other things, methods for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer detection (e.g., screening) and compositions related thereto. In various embodiments, the present disclosure provides methods for screening that include analysis of methylation status of one or more methylation biomarkers, and compositions related thereto. In various embodiments, the present disclosure provides methods for detection (e.g., screening) that include detecting (e.g., screening) methylation status of one or more methylation biomarkers in cfDNA, e.g., in ctDNA. In various embodiments, the present disclosure provides methods for screening that include detecting (e.g., screening) methylation status of one or more methylation biomarkers in cfDNA, e.g., in ctDNA, using MSRE-qPCR and/or using massively parallel sequencing (e.g., next-generation sequencing).
Claims
1. A method of detecting methylation markers in a human subject suspected of having cancer, the method comprising: determining a methylation status of each of at least three screening markers identified in a sample obtained from the human subject suspected of having cancer, wherein the sample comprises cell-free DNA that is isolated from blood or plasma of the human subject, wherein a first of the at least three screening markers comprises a methylation locus comprising at least a portion of chr19:12867716-12867820 (SEQ ID NO: 82), a second of the at least three screening markers comprises a methylation locus comprising at least a portion of chr20:48828337-48828448 (SEQ ID NO: 86), and a third of the at least three screening markers comprises a methylation locus comprising at least a portion of chr20:23049354-23049500 (SEQ ID NO: 107.
2. The method of claim 1, wherein the subject is suspected having at least one of the cancers selected from the group consisting of: (i) colorectal cancer, (ii) breast cancer, (iii) lung cancer, and (iv) pancreatic cancer.
3. The method of claim 1, wherein each methylation locus comprises at least one CpG dinucleotide.
4. The method of claim 1, further comprising, determining a methylation status of each of at least three cancer-differentiating markers selected from the DMRs of Table 15, said at least three cancer-differentiating markers identified in a sample obtained from the subject.
5. The method of claim 1, further comprising, determining a methylation status of each of at least three cancer-differentiating markers selected from the DMRs of Table 13, said at least three cancer-differentiating markers identified in a sample obtained from the subject.
6. The method of claim 1, wherein methylation status is determined using quantitative polymerase chain reaction (qPCR).
7. The method of claim 1, wherein methylation status is determined using methylation sensitive restriction enzyme quantitative polymerase chain reaction (MSRE-qPCR).
8. The method of claim 1, wherein methylation status is determined using massively parallel sequencing.
9. The method of claim 1, wherein each methylation locus is equal to or less than 5000 bp in length.
10. The method of claim 1, comprising determining the methylation status of each of the one or more screening markers using next generation sequencing (NGS).
11. The method of claim 10, comprising using one or more oligonucleotide capture baits that enrich for a target region to capture one or more corresponding methylation locus/loci.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The foregoing and other objects, aspects, features, and advantages of the present disclosure will become more apparent and better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
(2)
(3)
DETAILED DESCRIPTION
(4) It is contemplated that systems, architectures, devices, methods, and processes of the claimed invention encompass variations and adaptations developed using information from the embodiments described herein. Adaptation and/or modification of the systems, architectures, devices, methods, and processes described herein may be performed, as contemplated by this description.
(5) Throughout the description, where articles, devices, systems, and architectures are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are articles, devices, systems, and architectures of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps.
(6) It should be understood that the order of steps or order for performing certain action is immaterial so long as the invention remains operable. Moreover, two or more steps or actions may be conducted simultaneously.
(7) The mention herein of any publication, for example, in the Background section, is not an admission that the publication serves as prior art with respect to any of the claims presented herein. The Background section is presented for purposes of clarity and is not meant as a description of prior art with respect to any claim.
(8) Documents are incorporated herein by reference as noted. Where there is any discrepancy in the meaning of a particular term, the meaning provided in the Definition section above is controlling.
(9) Headers are provided for the convenience of the reader—the presence and/or placement of a header is not intended to limit the scope of the subject matter described herein.
(10) Screening for Multiple Cancer Types
(11) There is a need for improved methods of detecting (e.g., screening for) colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. This includes a need for screening for early-stage cancer. Colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening programs are often ineffective or unsatisfactory. Improved screens for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer improves diagnosis and reduces colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer mortality.
(12) DNA methylation (e.g., hypermethylation or hypomethylation) can activate or inactivate genes, including genes that impact development of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. Thus, for example, hypermethylation can inactivate one or more genes that typically act to suppress cancer, causing or contributing to development of cancer in a sample or subject.
(13) The present disclosure includes the discovery that determination of the methylation status of one or more methylation loci provided herein, and/or the methylation status of one or more DMRs provided herein, provides for detection of (e.g., screening for) colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. In certain embodiments, screening can classify a subject as having or not having one or more of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer with a high degree of sensitivity and/or specificity. The present disclosure provides compositions and methods including or relating to colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer methylation biomarkers that, individually or in various panels comprising two or more methylation biomarkers, provide for screening of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer.
(14) In various embodiments, a methylation biomarker of the present disclosure used for detection of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer is selected from a methylation locus that is or includes at least a portion of a DMR listed in Tables 1-4, 7-11 13 and 15. Tables 1-4, 7-11 13 and 15 list the region of DNA on which the DMR is found, which includes the chromosome number (chr), the start and end positions of the DMR on the chromosome, and genes (if any) that are known to be associated with the region. If no genes are currently known to be associated with the region, the term “NA” is listed in the Genes column.
(15) TABLE-US-00001 TABLE 1 List of DMRs found to have significantly altered methylation pattern(s) in the blood and/or tissue of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer patients compared to controls. Gene chr start end width genome HOXA7 7 27155916 27156027 112 hg38 MIR196B 7 27169630 27169719 90 hg38 TNFRSF10D 8 23163995 23164099 105 hg38 DNM3 1 171841774 171841857 84 hg38 C1orf230 1 151721583 151721679 97 hg38 ZNF568 19 36916252 36916371 120 hg38 C9orf50 9 129620787 129620870 84 hg38 NA 5 73436635 73436710 76 hg38 DLX6AS 7 97014186 97014266 81 hg38 DIO3;MIR1247 14 101561395 101561505 111 hg38 GSG1L 16 28063861 28063964 104 hg38 LONRF2 2 100322387 100322463 77 hg38 PCDH9 13 67231171 67231265 95 hg38 GFPT2 5 180353729 180353815 87 hg38
(16) TABLE-US-00002 TABLE 2 List of DMRs found to have significantly altered methylation pattern allowing for distinguishing between colorectal cancer, breast cancer, lung cancer and pancreatic cancer patients Gene chr start end width genome MAST1 19 12867716 12867820 105 hg38 KLK10 19 51019613 51019705 93 hg38 HOXD8 2 176129896 176130001 106 hg38 C2orf88 2 190180554 190180682 129 hg38 PREX1 20 48828337 48828448 112 hg38 AMOTL2 3 134364403 134364512 110 hg38 SOX2OT 3 181719490 181719596 107 hg38 SCGN 6 25652114 25652232 119 hg38 HOXA7 7 27155916 27156027 112 hg38 HOXA7 7 27156273 27156352 80 hg38 HOXA7 7 27156291 27156403 113 hg38 MIR196B 7 27169630 27169719 90 hg38 TNFRSF10D 8 23163949 23164031 83 hg38 TNFRSF10D 8 23163995 23164099 105 hg38 NA 10 100830555 100830658 104 hg38 ME3 11 86672189 86672296 108 hg38 ME3 11 86672338 86672429 92 hg38 TXNRD1 12 104215675 104215784 110 hg38 HAPLN3 15 88895676 88895781 106 hg38 CYBA 16 88651139 88651205 67 hg38 ZNF568 19 36916284 36916453 170 hg38 ZNF582 19 56393606 56393725 120 hg38 ZNF471 19 56507527 56507675 149 hg38 ZNF471 19 56507558 56507675 118 hg38 ZNF471 19 56507662 56507750 89 hg38 THBD 20 23049354 23049500 147 hg38 JAM2 21 25640320 25640399 80 hg38 SDC2 8 96493985 96494062 78 hg38 FGF14 13 102394577 102394651 75 hg38 CDKN2A 9 21970919 21971017 99 hg38
(17) TABLE-US-00003 TABLE 3 Primer sequences for use with MSRE-qPCR method for DMRs in Table 1 (DMRs with general cancer detection power) (Table 3 discloses SEQ ID NOs 1-28, respectively, in order of columns) annotations chr start end primer_F primer_R HOXA7 7 27155916 27156027 TCGAACCCATT CGGCGCAGCCT AATTGGGCCAT TTCTGGTTT A (SEQ ID NO: 1) (SEQ ID NO: 15) MIR196B 7 27169630 27169719 CCAAGGAGAGA GCCTGGGGCAC ACCCTGCCATC TCTGTTGCACT G (SEQ ID NO: 2) (SEQ ID NO: 16) TNFRSF10D 8 23163995 23164099 TTGTGCGCGTGC GCGGGAAGGGA AAAGGTTC (SEQ GTACAACTGAC ID NO: 3) C (SEQ ID NO: 17) DNM3 1 171841774 171841857 CAGAGCGCCGG CCCCACTGCCGC CAAGAGC (SEQ ATCCTTAC (SEQ ID NO: 4) ID NO: 18) C1orf230 1 151721583 151721679 TTAGCGCAGCG CCCAGTCCTGG CAGCTGGAG GGCAGCTACA (SEQ ID NO: 5) (SEQ ID NO: 19) ZNF568 19 36916252 36916371 GCCCAAGCCTC CGAACCATCCCT ACCCTCACACA CCGCGCCA (SEQ G (SEQ ID NO: 6) ID NO: 20) C9orf50 9 129620787 129620870 AGAGTAGCCAA GGCACTGTACC CTTTGGGGGTTG GAGCTTGCTGTT CT (SEQ ID NO: CT (SEQ ID NO: 7) 21) NA 5 73436635 73436710 CAGGCTGGGCG GGGGTAGCGGG GTCTTTGAC TGCTTCCAG (SEQ ID NO: 8) (SEQ ID NO: 22) DLX6- 7 97014186 97014266 CAAGACCTGGC TTGCAGGCTGG AS1 GCATCTTTGC ATTAGGATGC (SEQ ID NO: 9) (SEQ ID NO: 23) DIO3OS, 14 101561395 101561505 TCCGGGCTCAA GCGAGGCATCT MIR1247 GTTGCAAGG GGGCTTCAG (SEQ ID NO: 10) (SEQ ID NO: 24) GSG1L 16 28063861 28063964 CCGAAAGAAAT GGTTTTGTTGCC CCGAGCCAGGG CCACGTCC (SEQ TGA (SEQ ID NO: ID NO: 25) 11) LONRF2 2 100322387 100322463 CTCTCAGTCCCG GCAAGAGACGC CCGGCTTAGGT GGACCTGGAGC A (SEQ ID NO: 12) (SEQ ID NO: 26) PCDH9 13 67231171 67231265 GCGTGCGAAGT CTCAGGTTTCCA CTCCTCTAGCGG GGCGCGGCT A (SEQ ID NO: 13) (SEQ ID NO: 27) GFPT2 5 180353729 180353815 CGTAAGGGGCA CTCAGATGGGA GAGCGAGGGGT GCGCGGCAGGA (SEQ ID NO: 14) A (SEQ ID NO: 28)
(18) TABLE-US-00004 TABLE 4 Primer sequences for use with MSRE-qPCR method for DMRs in Table 2 (DMRs with cancer location specific methylation signal) (Table 4 discloses SEQ ID NOS 29-36, 1, 37-38, 2, 39, 3, 40-63, 15, 64-65, 16, 66, 17, 67-75, 75 and 76-81, respectively, in order of columns) Gene chr start end primer_F primer_R MAST1 19 12867716 12867820 CCCCCTCCATGC CTCCAGCAGCGCC AGCAAGC (SEQ GAGAAAC (SEQ ID ID NO: 29) NO: 56) KLK10 19 51019613 51019705 GCAGGTAGCTTC AGAGGATACCAGC ACCTGGGAGTCG GGCAGACCACA (SEQ ID NO: 30) (SEQ ID NO: 57) HOXD8 2 176129896 176130001 GGGGTTTGTAAA CGCTGCCTCCACTG CCGAGGCCAGAG TTTCCTCTCA (SEQ (SEQ ID NO: 31) ID NO: 58) C2orf88 2 190180554 190180682 GCCGGCAGCTGC GAGGGTTGCCTCG TTGGTAGTTG ATACTTCCTCA (SEQ ID NO: 32) (SEQ ID NO: 59) PREX1 20 48828337 48828448 TCACCGCGGGCT GCTTTCCCAGCCCG ACGCCACT (SEQ GTGTTT (SEQ ID ID NO: 33) NO: 60) AMOTL2 3 134364403 134364512 CCATGGCTTCCT TTCCCTGAGGAGTC TTCTTTGGCAGA TGGGGAGGAG (SEQ (SEQ ID NO: 34) ID NO: 61) SOX2OT 3 181719490 181719596 GGCAAATTGAGG CGAGCCCCACACA CCGAGCTG (SEQ GCACCTT (SEQ ID ID NO: 35) NO: 62) SCGN 6 25652114 25652232 TCCCCAAAGCGC CGTCCCTCAGCCCT AGAGACAGA CAGCAA (SEQ ID (SEQ ID NO: 36) NO: 63) HOXA7 7 27155916 27156027 TCGAACCCATTA CGGCGCAGCCTTTC ATTGGGCCATA TGGTTT (SEQ ID (SEQ ID NO: 1) NO: 15) HOXA7 7 27156273 27156352 CCCGGGGATGTT CCTTTGCGTCCGGC TTGGTCGT (SEQ TACGG (SEQ ID NO: ID NO: 37) 64) HOXA7 7 27156291 27156403 GTAGGAGGCGCA CCTCGACCGTTCCG GGGCAGGT (SEQ GGCTTA (SEQ ID ID NO: 38) NO: 65) MIR196B 7 27169630 27169719 CCAAGGAGAGA GCCTGGGGCACTCT ACCCTGCCATCG GTTGCACT (SEQ ID (SEQ ID NO: 2) NO: 16) TNFRSF10D 8 23163949 23164031 GGTGGATCGAAA TGGCAGTGTAGCTG GCGCCAAA (SEQ CGAGAACC (SEQ ID ID NO: 39) NO: 66) TNFRSF10D 8 23163995 23164099 TTGTGCGCGTGC GCGGGAAGGGAGT AAAGGTTC (SEQ ACAACTGACC (SEQ ID NO: 3) ID NO: 17) NA 10 100830555 100830658 AGCGGCTGAAAT CTATGCAGGAACC TGGTGCGCC CGCCGACCG (SEQ (SEQ ID NO: 40) ID NO: 67) ME3 11 86672189 86672296 AGATCCGGTGCG CAAGCCACCACGC GGTGACAG (SEQ GGGATTA (SEQ ID ID NO: 41) NO: 68) ME3 11 86672338 86672429 GAGCTGAGGTCT CCAGCGCGGTCCA ACGCGGTCCC CCCATTG (SEQ ID (SEQ ID NO: 42) NO: 69) TXNRD1 12 104215675 104215784 GGCTATGACTTC GGAACTGACGGAG GCTGTTGTCACC CCGAAGGA (SEQ ID (SEQ ID NO: 43) NO: 70) HAPLN3 15 88895676 88895781 GTCTCCAGACTC CCCCTCCCCAAACT GCTGGGAACCAC CTCCTATTCCA (SEQ ID NO: 44) (SW ID NO: 71) CYBA 16 88651139 88651205 CATCTGTAGGGT TATGCCTCGGCGTG GCAGGGCTGTCC GCTAGAGAGG (SEQ (SEQ ID NO: 45) ID NO: 72) ZNF568 19 36916284 36916453 TGTGTTCTGGCC CGAATGTTCATCCC GGAAGTTGAGTG GCGCGCAGTT (SEQ (SEQ ID NO: 46) ID NO: 73) ZNF582 19 56393606 56393725 TCCGGGAAACAT GAACAGCACTCCTC AGTCTTTAGGCG CGCGCACTG (SEQ T (SEQ ID NO: 47) ID NO: 74) ZNF471 19 56507527 56507675 CCCCACGCGTAC GCGGGTAAGAGCA TCACACCGAAG GGAGTGTG (SEQ ID (SEQ ID NO: 48) NO: 75) ZNF471 19 56507558 56507675 GTCGCGCGTTTC GCGGGTAAGAGCA CCTCCCAG (SEQ GGAGTGTG (SEQ ID ID NO: 49) NO: 75) ZNF471 19 56507662 56507750 CTGCTCTTACCC GAGGGACCTTAGA GCCGGAACCCTG GCAGAGCGGGC (SEQ ID NO: 50) (SEQ ID NO: 76) THBD 20 23049354 23049500 TCTGACTGGCAT TTGGGGTCCTGGTC TGAGGAAGGTCG CTTGGCGC (SEQ ID (SEQ ID NO: 51) NO: 77) JAM2 21 25640320 25640399 CCGCGTGGTCTG GAATTCCCTCCACC GGCTCTGTAG TCCGCCCCAC (SEQ (SEQ ID NO: 52) ID NO: 78) SDC2 8 96493985 96494062 CTTCAGAGAGCA GAACGCGGCGCCC GCCTTCCCGG TCTCACTT (SEQ ID (SEQ ID NO: 53) NO: 79) FGF14 13 102394577 102394651 CAACGGAAACTT CTCGCCGGGGGCTT CCCGCGCTAC CGCTAC (SEQ ID (SEQ ID NO: 54) NO: 80) CDKN2A 9 21970919 21971017 GCATCTATGCGG CGTGGACCTGGCTG GCATGGTTACTG AGGAGCTG (SEQ ID (SEQ ID NO: 55) NO: 81)
(19) For the avoidance of any doubt, any methylation biomarker provided herein can be, or be included in, among other things, a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer marker. Additionally, any methylation biomarker herein can be, or be included in, a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer methylation biomarker.
(20) In some embodiments, said methylation biomarker can be or include a single methylation locus. In some embodiments, a methylation biomarker can be or include two or more methylation loci. In some embodiments, a methylation biomarker can be or include a single differentially methylated region (DMR) (e.g., (i) a DMR selected from those listed in Tables 1-2, 7-11 13 and 15, (ii) a DMR that encompasses a DMR selected from those listed in Tables 1-4, 7-11 13 and 15, (iii) a DMR that overlaps with one or more DMRs selected from those listed in Table 1, or (iv) a DMR that is a portion of a DMR selected from those listed in Tables 1-4, 7-11 13 and 15). In some embodiments, a methylation locus can be or include two or more DMRs (e.g., two, three, four, or more DMRs selected from those listed in Table 1, or two, three, four, or more DMRs, each of which overlap with and/or encompass a DMR selected from those listed in Tables 1-4, 7-11 13 and 15). In some embodiments, a methylation biomarker can be or include a single methylation site. In other embodiments, a methylation biomarker can be or include two or more methylation sites. In some embodiments, a methylation locus can include two or more DMRs and further include DNA regions adjacent to one or more of the included DMRs.
(21) In some instances, a methylation locus is or includes a gene, such as a gene provided in Tables 1-4, 7-11, 13 and 15. In some instances a methylation locus is or includes a portion of a gene, e.g., a portion of a gene provided in Tables 1-4, 7-11, 13 and 15. In some instances, a methylation locus includes but is not limited to identified nucleic acid boundaries of a gene.
(22) In some instances, a methylation locus is or includes a coding region of a gene, such as a coding region of a gene provided in Tables 1-4, 7-11, 13 and 15. In some instances a methylation locus is or includes a portion of the coding region of gene, e.g., a portion of the coding region a gene provided in Tables 1-4, 7-11, 13 and 15. In some instances, a methylation locus includes but is not limited to identified nucleic acid boundaries of a coding region of gene.
(23) In some instances, a methylation locus is or includes a promoter and/or other regulatory region of a gene, such as a promoter and/or other regulatory region of a gene provided in Tables 1-4, 7-11, 13 and 15. In some instances, a methylation locus is or includes a portion of the promoter and/or regulatory region of a gene, e.g., a portion of promoter and/or regulatory region a gene provided in Tables 1-4, 7-11, 13 and 15. In some instances, a methylation locus includes but is not limited to identified nucleic acid boundaries of a promoter and/or other regulatory region of gene. In some embodiments a methylation locus is or includes a high CpG density promoter, or a portion thereof.
(24) In some embodiments, a methylation locus is or includes non-coding sequence. In some embodiments, a methylation locus is or includes one or more exons, and/or one or more introns.
(25) In some embodiments, a methylation locus includes a DNA region extending a predetermined number of nucleotides upstream of a coding sequence, and/or a DNA region extending a predetermined number of nucleotides downstream of a coding sequence. In various instances, a predetermined number of nucleotides upstream and/or downstream and be or include, e.g., 500 bp, 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 75 kb, or 100 kb. Those of skill in the art will appreciate that methylation biomarkers capable of impacting expression of a coding sequence may typically be within any of these distances of the coding sequence, upstream and/or downstream.
(26) Those of skill in the art will appreciate that a methylation locus identified as a methylation biomarker need not necessarily be assayed in a single experiment, reaction, or amplicon. A single methylation locus identified as a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer methylation biomarker can be assayed, e.g., in a method including separate amplification (or providing oligonucleotide primers and conditions sufficient for amplification of) of one or more distinct or overlapping DNA regions within a methylation locus, e.g., one or more distinct or overlapping DMRs. Those of skill in the art will further appreciate that a methylation locus identified as a methylation biomarker need not be analyzed for methylation status of each nucleotide, nor each CpG, present within the methylation locus. Rather, a methylation locus that is a methylation biomarker may be analyzed, e.g., by analysis of a single DNA region within the methylation locus, e.g., by analysis of a single DMR within the methylation locus.
(27) DMRs of the present disclosure can be a methylation locus or include a portion of a methylation locus. In some instances, a DMR is a DNA region with a methylation locus that is, e.g., 1 to 5,000 bp in length. In various embodiments, a DMR is a DNA region with a methylation locus that is equal to or less than 5000 bp, 4,000 bp, 3,000 bp, 2,000 bp, 1,000 bp, 950 bp, 900 bp, 850 bp, 800 bp, 750 bp, 700 bp, 650 bp, 600 bp, 550 bp, 500 bp, 450 bp, 400 bp, 350 bp, 300 bp, 250 bp, 200 bp, 150 bp, 100 bp, 50 bp, 40 bp, 30 bp, 20 bp, or 10 bp in length. In some instances, e.g., as set forth herein, a methylation locus has a length of at least 10, at least 15, at least 20, at least 30, at least 50, or at least 75 base pairs.
(28) Methylation biomarkers, including without limitation methylation loci and DMRs provided herein, can include at least one methylation site that is a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer biomarker.
(29) For clarity, those of skill in the art will appreciate that term methylation biomarker is used broadly, such that a methylation locus can be a methylation biomarker that includes one or more DMRs, each of which DMRs is also itself a methylation biomarker, and each of which DMRs can include one or more methylation sites, each of which methylation sites is also itself a methylation biomarker. Moreover, a methylation biomarker can include two or more methylation loci. Accordingly, status as a methylation biomarker does not turn on the contiguousness of nucleic acids included in a biomarker, but rather on the existence of a change in methylation status for included DNA region(s) between a first state and a second state, such as between colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer, and controls.
(30) As provided herein, a methylation locus can be any of one or more methylation loci each of which methylation loci is, includes, or is a portion of a gene (or specific DMR) identified in Table 1. In some embodiments, a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer methylation biomarker includes a single methylation locus that is, includes, or is a portion of a gene identified in Tables 1-4, 7-11, 13 and 15.
(31) In some embodiments, a methylation biomarker includes two or more methylation loci, each of which is, includes, or is a portion of a gene identified in Tables 1-4, 7-11, 13 and 15. In some embodiments, a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer methylation biomarker includes a plurality of methylation loci, each of which is, includes, or is a portion of a gene identified in Tables 1-4, 7-11, 13 and 15.
(32) In various embodiments, a methylation biomarker can be or include one or more individual nucleotides (e.g., a single individual cysteine residue in the context of CpG) or a plurality of individual cysteine residues (e.g., of a plurality of CpGs) present within one or more methylation loci (e.g, one or more DMRs) provided herein. Thus, in certain embodiments a methylation biomarker is or includes methylation status of a plurality of individual methylation sites.
(33) In various embodiments, a methylation biomarker is, includes, or is characterized by change in methylation status that is a change in the methylation of one or more methylation sites within one or more methylation loci (e.g., one or more DMRs). In various embodiments, a methylation biomarker is or includes a change in methylation status that is a change in the number of methylated sites within one or more methylation loci (e.g., one or more DMRs). In various embodiments, a methylation biomarker is or includes a change in methylation status that is a change in the frequency of methylation sites within one or more methylation loci (e.g., one or more DMRs). In various embodiments, a methylation biomarker is or includes a change in methylation status that is a change in the pattern of methylation sites within one or more methylation loci (e.g., one or more DMRs).
(34) In various embodiments, methylation status of one or more methylation loci (e.g., one or more DMRs) is expressed as a fraction or percentage of the one or more methylation loci (e.g., the one or more DMRs) present in a sample that are methylated, e.g., as a fraction of the number of individual DNA strands of DNA in a sample that are methylated at one or more particular methylation loci (e.g., one or more particular DMRs). Those of skill in the art will appreciate that, in some instances, the fraction or percentage of methylation can be calculated from the ratio of methylated DMRs to unmethylated DMRs for one or more analyzed DMRs, e.g., within a sample.
(35) In various embodiments, methylation status of one or more methylation loci (e.g., one or more DMRs) is compared to a reference methylation status value and/or to methylation status of the one or more methylation loci (e.g., one or more DMRs) in a reference sample. In certain instances, a reference is a non-contemporaneous sample from the same source, e.g., a prior sample from the same source, e.g., from the same subject. In certain instances, a reference for the methylation status of one or more methylation loci (e.g., one or more DMRs) is the methylation status of the one or more methylation loci (e.g., one or more DMRs) in a sample (e.g., a sample from a subject), or a plurality of samples, known to represent a particular state (e.g., a cancer state or a non-cancer state). Thus, a reference can be or include one or more predetermined thresholds, which thresholds can be quantitative (e.g., a methylation value) or qualitative. Those of skill in the art will appreciate that a reference measurement is typically produced by measurement using a methodology identical to, similar to, or comparable to that by which the non-reference measurement was taken.
(36) Cancers
(37) In certain embodiments, methods and compositions of the present disclosure are useful for screening for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. In other embodiments, the methods and compositions are useful for screening for other cancers. In general, examples of different types of cancers include, for example, colorectal cancer, hematopoietic cancers including leukemias, lymphomas (Hodgkin's and non-Hodgkin's), myelomas and myeloproliferative disorders: sarcomas, melanomas, adenomas, carcinomas of solid tissue, squamous cell carcinomas of the mouth, throat, larynx, and lung, liver cancer, genitourinary cancers such as prostate, cervical, bladder, uterine, and endometrial cancer and renal cell carcinomas, bone cancer, pancreatic cancer, skin cancer, cutaneous or intraocular melanoma, cancer of the endocrine system, cancer of the thyroid gland, cancer of the parathyroid gland, head and neck cancers, breast cancer, gastro-intestinal cancers and nervous system cancers, benign lesions such as papillomas, and the like. Cancers include cancers at any of the various possible stages known in the art, including, e.g., Stage 0, Stage I, Stage II, Stage III, and/or Stage IV. In certain instances, the present disclosure includes screening of early stage cancer.
(38) Subjects and Samples
(39) A sample analyzed using methods and compositions provided herein can be any biological sample and/or any sample including nucleic acids. In various particular embodiments, a sample analyzed using methods and compositions provided herein can be a sample from a mammal. In various particular embodiments, a sample analyzed using methods and compositions provided herein can be a sample from a human subject. In various particular embodiments, a sample analyzed using methods and compositions provided herein can be a sample from a mouse, rat, pig, horse, chicken, or cow.
(40) In various instances, a human subject is a subject diagnosed or seeking diagnosis as having, diagnosed as or seeking diagnosis as at risk of having, and/or diagnosed as or seeking diagnosis as at immediate risk of having, colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. In various instances, a human subject is a subjected identified as a subject in need of screening for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. In certain instances, a human subject is a subject identified as in need of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening by a medical practitioner. In various instances, a human subject is identified as in need of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening due to age, e.g., due to an age equal to or greater than 45 years, e.g., an age equal to or greater than 45, 50, 55, 60, 65, 70, 75, 80, 85, or 90 years, though in some instances a subject 18 years old or older may be identified as at risk and/or in need of screening for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. In various instances, a human subject is identified as being high risk and/or in need of screening for a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer based on, without limitation, familial history, prior diagnoses, and/or an evaluation by a medical practitioner. In various instances, a human subject is a subject not diagnosed as having, not at risk of having, not at immediate risk of having, not diagnosed as having, and/or not seeking diagnosis for a cancer such as a colorectal cancer, breast cancer, lung cancer, pancreatic cancer, or any combination thereof.
(41) A sample from a subject, e.g., a human or other mammalian subject, can be a sample of, e.g., blood, blood component (e.g., plasma, buffy coat), cfDNA (cell free DNA), ctDNA (circulating tumor DNA), stool, or advanced adenoma and/or colorectal tissue. In some particular embodiments, a sample is an excretion or bodily fluid of a subject (e.g., stool, blood, plasma, lymph, or urine of a subject) or a tissue sample of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. A sample from a subject can be a cell or tissue sample, e.g., a cell or tissue sample that is of a cancer or includes cancer cells, e.g., of a tumor or of a metastatic tissue. In various embodiments, a sample from a subject, e.g., a human or other mammalian subject, can be obtained by biopsy (e.g., colonoscopy resection, fine needle aspiration or tissue biopsy) or surgery.
(42) In various particular embodiments, a sample is a sample of cell-free DNA (cfDNA). cfDNA is typically found in biological fluids (e.g., plasma, serum, or urine) in short, double-stranded fragments. The concentration of cfDNA is typically low, but can significantly increase under particular conditions, including without limitation pregnancy, autoimmune disorder, myocardial infraction, and cancer. Circulating tumor DNA (ctDNA) is the component of circulating DNA specifically derived from cancer cells. ctDNA can be present in human fluids. For example in some instances, ctDNA can be found bound to and/or associated with leukocytes and erythrocytes. In some instances, ctDNA can be found not bound to and/or associated with leukocytes and erythrocytes. Various tests for detection of tumor-derived cfDNA are based on detection of genetic or epigenetic modifications that are characteristic of cancer (e.g., of a relevant cancer). Genetic or epigenetic modifications characteristic of cancer can include, without limitation, oncogenic or cancer-associated mutations in tumor-suppressor genes, activated oncogenes, hypermethylation, and/or chromosomal disorders. Detection of genetic or epigenetic modifications characteristic of cancer or pre-cancer can confirm that detected cfDNA is ctDNA.
(43) cfDNA and ctDNA provide a real-time or nearly real-time metric of the methylation status of a source tissue. cfDNA and ctDNA have a half-life in blood of about 2 hours, such that a sample taken at a given time provides a relatively timely reflection of the status of a source tissue.
(44) Various methods of isolating nucleic acids from a sample (e.g., of isolating cfDNA from blood or plasma) are known in the art. Nucleic acids can be isolated, e.g., without limitation, standard DNA purification techniques, by direct gene capture (e.g., by clarification of a sample to remove assay-inhibiting agents and capturing a target nucleic acid, if present, from the clarified sample with a capture agent to produce a capture complex, and isolating the capture complex to recover the target nucleic acid).
(45) Methods of Measuring Methylation Status
(46) Methylation status can be measured by a variety of methods known in the art and/or by methods provided herein. Those of skill in the art will appreciate that a method for measuring methylation status can generally be applied to samples from any source and of any kind, and will further be aware of processing steps available to modify a sample into a form suitable for measurement by a given methodology. Methods of measuring methylation status include, without limitation, methods including whole genome bisulfite sequencing, targeted bisulfite sequencing, targeted enzymatic methylation sequencing, methylation-status-specific polymerase chain reaction (PCR), methods including mass spectrometry, methylation arrays, methods including methylation-specific nucleases, methods including mass-based separation, methods including target-specific capture, and methods including methylation-specific oligonucleotide primers. Certain particular assays for methylation utilize a bisulfite reagent (e.g., hydrogen sulfite ions) or enzymatic conversion reagents (e.g., Tet methylcytosine dioxygenase 2).
(47) Bisulfite reagents can include, among other things, bisulfite, disulfite, hydrogen sulfite, or combinations thereof, which reagents can be useful in distinguishing methylated and unmethylated nucleic acids. Bisulfite interacts differently with cytosine and 5-methylcytosine. In typical bisulfite-based methods, contacting of DNA with bisulfite deaminates unmethylated cytosine to uracil, while methylated cytosine remains unaffected; methylated cytosines, but not unmethylated cytosines, are selectively retained. Thus, in a bisulfite processed sample, uracil residues stand in place of, and thus provide an identifying signal for, unmethylated cytosine residues, while remaining (methylated) cytosine residues thus provide an identifying signal for methylated cytosine residues. Bisulfite processed samples can be analyzed, e.g., by next generation sequencing (NGS).
(48) Enzymatic conversion reagents can include Tet methylcytosine dioxygenase 2 (TET2). TET2 oxidizes 5-methylcytosine and thus protects it from the consecutive deamination by APOBEC. APOBEC deaminates unmethylated cytosine to uracile, while oxidizes 5-mthylcytosine remains unaffected. Thus, in a TET2 processed sample, uracil residues stand in place of, and thus provide an identifying signal for, unmethylated cytosine residues, while remaining (methylated) cytosine residues thus provide an identifying signal for methylated cytosine residues. TET2 processed samples can be analyzed, e.g., by next generation sequencing (NGS).
(49) Methods of measuring methylation status can include, without limitation, massively parallel sequencing (e.g., next-generation sequencing) to determine methylation state, e.g., sequencing by—synthesis, real-time (e.g., single-molecule) sequencing, bead emulsion sequencing, nanopore sequencing, or other sequencing techniques known in the art. In some embodiments, a method of measuring methylation status can include whole-genome sequencing, e.g., measuring whole genome methylation status from bisulfite or enzymatically treated material with base-pair resolution.
(50) In some embodiments, methods of measuring methylation status include, without limitation, targeted bisulfite sequencing, targeted enzymatic methylation sequencing, and reduced representation bisulfite sequencing e.g., utilizing use of restriction enzymes to measure methylation status of high CpG content regions from bisulfite or enzymatically treated material with base-pair resolution.
(51) In some embodiments, a method of measuring methylation status can include targeted sequencing e.g., measuring methylation status of pre-selected genomic location from bisulfite or enzymatically treated material with base-pair resolution.
(52) In some embodiments, the pre-selection (capture) of regions of interest can be done by complementary in vitro synthesized oligonucleotide sequences (either baits, primers or probes).
(53) In some embodiments, a method for measuring methylation status can include Illumina Methylation Assays e.g., measuring over 850,000 methylation sites quantitatively across a genome at single-nucleotide resolution.
(54) Various methylation assay procedures can be used in conjunction with bisulfite treatment to determine methylation status of a target sequence such as a DMR. Such assays can include, among others, Methylation-Specific Restriction Enzyme qPCR, sequencing of bisulfite-treated nucleic acid, PCR (e.g., with sequence-specific amplification), Methylation Specific Nuclease-assisted Minor-allele Enrichment PCR, and Methylation-Sensitive High Resolution Melting. In some embodiments, DMRs are amplified from a bisulfite-treated DNA sample and a DNA sequencing library is prepared for sequencing according to, e.g., an Illumina protocol or transpose-based Nextera XT protocol. In certain embodiments, high-throughput and/or next-generation sequencing techniques are used to achieve base-pair level resolution of DNA sequence, permitting analysis of methylation status.
(55) Another method, that can be used for methylation detection includes PCR amplification with methylation-specific oligonucleotide primers (MSP methods), e.g., as applied to bisulfite-treated sample (see, e.g., Herman 1992 Proc. Natl. Acad. Sci. USA 93: 9821-9826, which is herein incorporated by reference with respect to methods of determining methylation status). Use of methylation-status-specific oligonucleotide primers for amplification of bisulfite-treated DNA allows differentiation between methylated and unmethylated nucleic acids. Oligonucleotide primer pairs for use in MSP methods include at least one oligonucleotide primer capable of hybridizing with sequence that includes a methylation site, e.g., a CpG. An oligonucleotide primer that includes a T residue at a position complementary to a cytosine residue will selectively hybridize to templates in which the cytosine was unmethylated prior to bisulfite treatment, while an oligonucleotide primer that includes a G residue at a position complementary to a cytosine residue will selectively hybridize to templates in which the cytosine was methylated cytosine prior to bisulfite treatment. MSP results can be obtained with or without sequencing amplicons, e.g., using gel electrophoresis. MSP (methylation-specific PCR) allows for highly sensitive detection (detection level of 0.1% of the alleles, with full specificity) of locus-specific DNA methylation, using PCR amplification of bisulfite-converted DNA.
(56) Another method that can be used to determine methylation status after bisulfite treatment of a sample is Methylation-Sensitive High Resolution Melting (MS-HRM) PCR (see, e.g., Hussmann 2018 Methods Mol Biol. 1708:551-571, which is herein incorporated by reference with respect to methods of determining methylation status). MS-HRM is an in-tube, PCR-based method to detect methylation levels at specific loci of interest based on hybridization melting. Bisulfite treatment of the DNA prior to performing MS-HRM ensures a different base composition between methylated and unmethylated DNA, which is used to separate the resulting amplicons by high resolution melting. A unique primer design facilitates a high sensitivity of the assays enabling detection of down to 0.1-1% methylated alleles in an unmethylated background. Oligonucleotide primers for MS-HRM assays are designed to be complementary to the methylated allele, and a specific annealing temperature enables these primers to anneal both to the methylated and the unmethylated alleles thereby increasing the sensitivity of the assays.
(57) Another method that can be used to determine methylation status after bisulfite treatment of a sample is Quantitative Multiplex Methylation-Specific PCR (QM-MSP). QM-MSP uses methylation specific primers for sensitive quantification of DNA methylation (see, e.g., Fackler 2018 Methods Mol Biol. 1708:473-496, which is herein incorporated by reference with respect to methods of determining methylation status). QM-MSP is a two-step PCR approach, where in the first step, one pair of gene-specific primers (forward and reverse) amplifies the methylated and unmethylated copies of the same gene simultaneously and in multiplex, in one PCR reaction. This methylation-independent amplification step produces amplicons of up to 10.sup.9 copies per μL after 36 cycles of PCR. In the second step, the amplicons of the first reaction are quantified with a standard curve using real-time PCR and two independent fluorophores to detect methylated/unmethylated DNA of each gene in the same well (e.g., 6FAM and VIC). One methylated copy is detectable in 100,000 reference gene copies.
(58) Another method that can be used to determine methylation status after bisulfite treatment of a sample is Methylation Specific Nuclease-assisted Minor-allele Enrichment (MS-NaME) (see, e.g., Liu 2017 Nucleic Acids Res. 45(6):e39, which is herein incorporated by reference with respect to methods of determining methylation status). Ms-NaME is based on selective hybridization of probes to target sequences in the presence of DNA nuclease specific to double-stranded (ds) DNA (DSN), such that hybridization results in regions of double-stranded DNA that are subsequently digested by the DSN. Thus, oligonucleotide probes targeting unmethylated sequences generate local double stranded regions resulting to digestion of unmethylated targets; oligonucleotide probes capable of hybridizing to methylated sequences generate local double-stranded regions that result in digestion of methylated targets, leaving methylated targets intact. Moreover, oligonucleotide probes can direct DSN activity to multiple targets in bisulfite-treated DNA, simultaneously. Subsequent amplification can enrich non-digested sequences. Ms-NaME can be used, either independently or in combination with other techniques provided herein.
(59) Another method that can be used to determine methylation status after bisulfite treatment of a sample is Methylation-sensitive Single Nucleotide Primer Extension (Ms-SNuPE™) (see, e.g., Gonzalgo 2007 Nat Protoc. 2(8):1931-6, which is herein incorporated by reference with respect to methods of determining methylation status). In Ms-SNuPE, strand-specific PCR is performed to generate a DNA template for quantitative methylation analysis using Ms-SNuPE. SNuPE is then performed with oligonucleotide(s) designed to hybridize immediately upstream of the CpG site(s) being interrogated. Reaction products can be electrophoresed on polyacrylamide gels for visualization and quantitation by phosphor-image analysis. Amplicons can also carry a directly or indirectly detectable labels such as a fluorescent label, radionuclide, or a detachable molecule fragment or other entity having a mass that can be distinguished by mass spectrometry. Detection may be carried out and/or visualized by means of, e.g., matrix assisted laser desorption/ionization mass spectrometry (MALDI) or using electron spray mass spectrometry (ESI).
(60) Certain methods that can be used to determine methylation status after bisulfite treatment of a sample utilize a first oligonucleotide primer, a second oligonucleotide primer, and an oligonucleotide probe in an amplification-based method. For instance, the oligonucleotide primers and probe can be used in a method of real-time polymerase chain reaction (PCR) or droplet digital PCR (ddPCR). In various instances, the first oligonucleotide primer, the second oligonucleotide primer, and/or the oligonucleotide probe selectively hybridize methylated DNA and/or unmethylated DNA, such that amplification or probe signal indicate methylation status of a sample.
(61) Other bisulfite-based methods for detecting methylation status (e.g., the presence of level of 5-methylcytosine) are disclosed, e.g., in Frommer (1992 Proc Natl Acad Sci USA. 1; 89(5):1827-31, which is herein incorporated by reference with respect to methods of determining methylation status).
(62) In certain MSRE-qPCR embodiments, the amount of total DNA is measured in an aliquot of sample in native (e.g., undigested) form using, e.g., real-time PCR or digital PCR.
(63) Various amplification technologies can be used alone or in conjunction with other techniques described herein for detection of methylation status. Those of skill in the art, having reviewed the present specification, will understand how to combine various amplification technologies known in the art and/or described herein together with various other technologies for methylation status determination known in the art and/or provided herein. Amplification technologies include, without limitation, PCR, e.g., quantitative PCR (qPCR), real-time PCR, and/or digital PCR. Those of skill in the art will appreciate that polymerase amplification can multiplex amplification of multiple targets in a single reaction. PCR amplicons are typically 100 to 2000 base pairs in length. In various instances, an amplification technology is sufficient to determine methylations status.
(64) Digital PCR (dPCR) based methods involve dividing and distributing a sample across wells of a plate with 96-, 384-, or more wells, or in individual emulsion droplets (ddPCR) e.g., using a microfluidic device, such that some wells include one or more copies of template and others include no copies of template. Thus, the average number of template molecules per well is less than one prior to amplification. The number of wells in which amplification of template occurs provides a measure of template concentration. If the sample has been contacted with MSRE, the number of wells in which amplification of template occurs provides a measure of the concentration of methylated template.
(65) In various embodiments a fluorescence-based real-time PCR assay, such as MethyLight™, can be used to measure methylation status (see, e.g., Campan 2018 Methods Mol Biol. 1708:497-513, which is herein incorporated by reference with respect to methods of determining methylation status). MethyLight is a quantitative, fluorescence-based, real-time PCR method to sensitively detect and quantify DNA methylation of candidate regions of the genome. MethyLight is uniquely suited for detecting low-frequency methylated DNA regions against a high background of unmethylated DNA, as it combines methylation-specific priming with methylation-specific fluorescent probing. Additionally, MethyLight can be combined with Digital PCR, for the highly sensitive detection of individual methylated molecules, with use in disease detection and screening.
(66) Real-time PCR-based methods for use in determining methylation status typically include a step of generating a standard curve for unmethylated DNA based on analysis of external standards. A standard curve can be constructed from at least two points and can permit comparison of a real-time Ct value for digested DNA and/or a real-time Ct value for undigested DNA to known quantitative standards. In particular instances, sample Ct values can be determined for MSRE-digested and/or undigested samples or sample aliquots, and the genomic equivalents of DNA can be calculated from the standard curve. Ct values of MSRE-digested and undigested DNA can be evaluated to identify amplicons digested (e.g., efficiently digested; e.g., yielding a Ct value of 45). Amplicons not amplified under either digested or undigested conditions can also be identified. Corrected Ct values for amplicons of interest can then be directly compared across conditions to establish relative differences in methylation status between conditions. Alternatively or additionally, delta-difference between the Ct values of digested and undigested DNA can be used to establish relative differences in methylation status between conditions.
(67) In certain particular embodiments, whole genome bisulfite sequencing among other techniques, can be used to determine the methylation status of a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer methylation biomarker that is or includes a single methylation locus. In certain particular embodiments, whole genome bisulfite sequencing, among other techniques, can be used to determine the methylation status of a methylation biomarker that is or includes two or more methylation loci.
(68) Those of skill in the art will further appreciate that methods, reagents, and protocols for whole genome bisulfite sequencing are well-known in the art. Unlike traditional whole genome sequencing, whole genome bisulfite sequencing is able to detect the methylation status of the cytosine nucleotide, due to deamination treatment with bisulfite reagent.
(69) Those of skill in the art will appreciate that in embodiments in which a plurality of methylation loci (e.g., a plurality of DMRs) are analyzed for methylation status in a method of screening for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer provided herein, methylation status of each methylation locus can be measured or represented in any of a variety of forms, and the methylation statuses of a plurality of methylation loci (preferably each measured and/or represented in a same, similar, or comparable manner) be together or cumulatively analyzed or represented in any of a variety of forms. In various embodiments, methylation status of each methylation locus can be measured as methylation portion. In various embodiments, methylation status of each methylation locus can be represented as the percentage value of methylated reads from total sequencing reads compared against reference sample. In various embodiments, methylation status of each methylation locus can be represented as a qualitative comparison to a reference, e.g., by identification of each methylation locus as hypermethylated or hypomethylated.
(70) In some embodiments in which a single methylation locus is analyzed, hypermethylation of the single methylation locus constitutes a diagnosis that a subject is suffering from or possibly suffering from a condition (e.g., colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer), while absence of hypermethylation of the single methylation locus constitutes a diagnosis that the subject is likely not suffering from a condition. In some embodiments, hypermethylation of a single methylation locus (e.g., a single DMR) of a plurality of analyzed methylation loci constitutes a diagnosis that a subject is suffering from or possibly suffering from the condition, while the absence of hypermethylation at any methylation locus of a plurality of analyzed methylation loci constitutes a diagnosis that a subject is likely not suffering from the condition. In some embodiments, hypermethylation of a determined percentage (e.g., a predetermined percentage) of methylation loci (e.g., at least 10% (e.g., at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100%)) of a plurality of analyzed methylation loci constitutes a diagnosis that a subject is suffering from or possibly suffering from the condition, while the absence of hypermethylation of a determined percentage (e.g., a predetermined percentage) of methylation loci (e.g., at least 10% (e.g., at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100%)) of a plurality of analyzed methylation loci constitutes a diagnosis that a subject is not likely suffering from the condition. In some embodiments, hypermethylation of a determined number (e.g., a predetermined number) of methylation loci (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54 DMRs) of a plurality of analyzed methylation loci (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54 DMRs) constitutes a diagnosis that a subject is suffering from or possibly suffering from the condition, while the absence of hypermethylation of a determined number (e.g., a predetermined number) of methylation loci (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54 DMRs) of a plurality of analyzed methylation loci (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54 DMRs) constitutes a diagnosis that a subject is not likely suffering from the condition.
(71) In some embodiments, methylation status of a plurality of methylation loci (e.g., a plurality of DMRs) is measured qualitatively or quantitatively and the measurement for each of the plurality of methylation loci are combined to provide a diagnosis. In some embodiments, the qualitative of quantitatively measured methylation status of each of a plurality of methylation loci is individually weighted, and weighted values are combined to provide a single value that can be comparative to a reference in order to provide a diagnosis.
(72) Applications
(73) Methods and compositions of the present disclosure can be used in any of a variety of applications. For example, methods and compositions of the present disclosure can be used to screen, or aid in screening for a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. In various instances, screening using methods and compositions of the present disclosure can detect any stage of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer, including without limitation early-stage cancer. In some embodiments, screening using methods and compositions of the present disclosure is applied to individuals 45 years of age or older, e.g., 45, 50, 55, 60, 65, 70, 75, 80, 85, or 90 years or older. In some embodiments, screening using methods and compositions of the present disclosure is applied to individuals 20 years of age or older, e.g., 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, or 90 years or older. In some embodiments, screening using methods and compositions of the present disclosure is applied to individuals 20 to 50 years of age, e.g., 20 to 30 years of age, 20 to 40 years of age, 20 to 50 years of age, 30 to 40 years of age, 30 to 50 years of age, or 40 to 50 years of age. In various embodiments, screening using methods and compositions of the present disclosure is applied to individuals experiencing abdominal pain or discomfort, e.g., experiencing undiagnosed or incompletely diagnosed abdominal pain or discomfort. In various embodiments, screening using methods and compositions of the present disclosure is applied to individuals experiencing no symptoms likely to be associated with a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. Thus, in certain embodiments, screening using methods and compositions of the present disclosure is fully or partially preventative or prophylactic, at least with respect to later or non-early stages of cancer.
(74) In various embodiments, colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening using methods and compositions of the present disclosure can be applied to an asymptomatic human subject. As used herein, a subject can be referred to as “asymptomatic” if the subject does not report, and/or demonstrate by non-invasively observable indicia (e.g., without one, several, or all of device-based probing, tissue sample analysis, bodily fluid analysis, surgery, or colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening), sufficient characteristics of the condition to support a medically reasonable suspicion that the subject is likely suffering from the condition. Detection of a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer is particularly likely in asymptomatic individuals screened in accordance with methods and compositions of the present disclosure.
(75) Those of skill in the art will appreciate that regular, preventative, and/or prophylactic screening for a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer improves diagnosis. As noted above, early stage cancers include, according to at least one system of cancer staging, Stages 0 to II C of cancer. Thus, the present disclosure provides, among other things, methods and compositions particularly useful for the diagnosis and treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. Generally, and particularly in embodiments in which screening in accordance with the present disclosure is carried out annually, and/or in which a subject is asymptomatic at time of screening, methods and compositions of the present invention are especially likely to detect early stage cancer.
(76) In various embodiments, colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening in accordance with the present disclosure is performed once for a given subject or multiple times for a given subject. In various embodiments, colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening in accordance with the present disclosure is performed on a regular basis, e.g., every six months, annually, every two years, every three years, every four years, every five years, or every ten years.
(77) In various embodiments, screening using methods and compositions disclosed herein will provide a diagnosis of a condition (e.g., a type or class of a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer). In other instances, screening for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer using methods and compositions disclosed herein will be indicative of having one or more conditions, but not definitive for diagnosis of a particular condition. For example, screening may be used to classify a subject as having one or more conditions or combination of conditions including, but not limited to, colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer. Screening may also be used to classify a subject as having a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer without identifying which condition the subject has. In various instances, screening using methods and compositions of the present disclosure can be followed by a further diagnosis-confirmatory assay, which further assay can confirm, support, undermine, or reject a diagnosis resulting from prior screening, e.g., screening in accordance with the present disclosure.
(78) As used herein, a diagnosis-confirmatory assay can be a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer assay that provides a diagnosis recognized as definitive by medical practitioners, e.g., a colonoscopy-based diagnosed, or a colorectal cancer assay that substantially increases or decreases the likelihood that a prior diagnosis was correct, e.g., a diagnosis resulting from screening in accordance with the present disclosure. Diagnosis-confirmatory assays could include existing screening technologies, which are generally in need of improvement with respect to one or more of sensitivity, specificity, and non-invasiveness, particularly in the detection of early stage cancers.
(79) In some instances, a diagnosis-confirmatory assay is a test that is or includes a visual or structural inspection of subject tissues, e.g., by colonoscopy. In some embodiments, colonoscopy includes or is followed by histological analysis. Visual and/or structural assays for cancer can include inspection of the structure of the colon and/or rectum for any abnormal tissues and/or structures. Visual and/or structural inspection can be conducted, for example, by use of a scope via the rectum or by CT-scan. In some instances, a diagnosis-confirmatory assay is a colonoscopy, e.g., including or followed by histological analysis. According to some reports, colonoscopy is currently the predominant and/or most relied upon diagnosis-confirmatory assay.
(80) Another visual and/or structural diagnosis confirmatory assay based on computer tomography (CT) is CT colonography, sometimes referred to as virtual colonoscopy. A CT scan utilizes numerous x-ray images of the colon and/or rectum to produce dimensional representations of the colon. Although useful as a diagnosis-confirmatory assay, some reports suggest that CT colonography is not sufficient for replacement of colonoscopy, at least in part because a medical practitioner has not physically accessed the subject's colon to obtain tissue for histological analysis.
(81) Another diagnosis-confirmatory assay can be a sigmoidoscopy. In sigmoidoscopy, a sigmoidoscope is used via the rectum to image portions of the colon and/or rectum. According to some reports, sigmoidoscopy is not widely used.
(82) In some instances, a diagnosis-confirmatory assay is a stool-based assay. Typically, stool-based assays, when used in place of visual or structural inspection, are recommended to be utilized at a greater frequency than would be required if using visual or structural inspection. In some instances, a diagnosis-confirmatory assay is a guiac-based fecal occult blood test or a fecal immunochemical test (gFOBTs/FITs) (see, e.g., Navarro 2017 World J Gastroenterol. 23(20):3632-3642, which is herein incorporated by reference with respect to colorectal cancer assays). FOBTs and FITs are sometimes used for diagnosis of colorectal cancer (see, e.g., Nakamura 2010 J Diabetes Investig. October 19; 1(5):208-11, which is herein incorporated by reference with respect to colorectal cancer assays). FIT is based on detection of occult blood in stool, the presence of which is often indicative of colorectal cancer but is often not in sufficient volume to permit identification by the unaided eye. For example, in a typical FIT, the test utilizes hemoglobin-specific reagent to test for occult blood in a stool sample. In various instances, FIT kits are suitable for use by individuals in their own homes. When used in the absence of other diagnosis-confirmatory assays, FIT may be recommended for use on an annual basis. FIT is generally not relied upon to provide sufficient diagnostic information for conclusive diagnosis of colorectal cancer.
(83) Diagnosis-confirmatory assays also include gFOBT, which is designed to detect occult blood in stool by chemical reaction. Like FIT, when used in the absence of other diagnosis-confirmatory assays, gFOBT may be recommended for use on an annual basis. gFOBT is generally not relied upon to provide sufficient diagnostic information for conclusive diagnosis of colorectal cancer.
(84) Diagnosis-confirmatory assays can also include stool DNA testing. Stool DNA testing for colorectal cancer can be designed to identify DNA sequences characteristic of cancer in stool samples. When used in the absence of other diagnosis-confirmatory assays, stool DNA testing may be recommended for use every three years. Stool DNA testing is generally not relied upon to provide sufficient diagnostic information for conclusive diagnosis of colorectal cancer.
(85) One particular screening technology is a stool-based screening test (Cologuard® (Exact Sciences Corporation, Madison, Wis., United States), which combines an FIT assay with analysis of DNA for abnormal modifications, such as mutation and methylation. The Cologuard® test demonstrates improved sensitivity as compared to FIT assay alone, but can be clinically impracticable or ineffective due to low compliance rates, which low compliance rates are at least in part due to subject dislike of using stool-based assays (see, e.g., doi: 10.1056/NEJMc1405215 (e.g., 2014 N Engl J Med. 371(2):184-188)). The Cologuard® test appears to leave almost half of the eligible population out of the screening programs (see, e.g., van der Vlugt 2017 Br J Cancer. 116(1):44-49). Use of screening as provided herein, e.g., by a blood-based analysis, would increase the number of individuals electing to screen for colorectal cancer (see, e.g., Adler 2014 BMC Gastroenterol. 14:183; Liles 2017 Cancer Treatment and Research Communications 10: 27-31). To present knowledge, only one existing screening technology for colorectal cancer, Epiprocolon, is FDA-approved and CE-IVD marked and is blood-based. Epiprocolon is based on hypermethylation of SEPT9 gene. The Epiprocolon test suffers from low accuracy for colorectal cancer detection with sensitivity of 68% and advanced adenoma sensitivity of only 22% (see, e.g., Potter 2014 Clin Chem. 60(9):1183-91). There is need in the art for, among other things, a non-invasive colorectal cancer screen that will likely achieve high subject adherence with high and/or improved specificity and/or sensitivity.
(86) In various embodiments, screening in accordance with methods and compositions of the present disclosure reduces colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer mortality, e.g., by early colorectal cancer diagnosis. Data supports that colorectal cancer screening reduces colorectal cancer mortality, which effect persisted for over 30 years (see, e.g., Shaukat 2013 N Engl J Med. 369(12):1106-14). Moreover, colorectal cancer is particularly difficult to treat at least in part because colorectal cancer, absent timely screening, may not be detected until cancer is past early stages. For at least this reason, treatment of colorectal cancer is often unsuccessful. To maximize population-wide improvement of colorectal cancer outcomes, utilization of screening in accordance with the present disclosure can be paired with, e.g., recruitment of eligible subjects to ensure widespread screening.
(87) In various embodiments, screening of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer including one or more methods and/or compositions disclosed herein is followed by treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer, e.g., treatment of early stage cancer. In various embodiments, treatment of colorectal cancer, colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer, e.g., early stage cancer, includes administration of a therapeutic regimen including one or more of surgery, radiation therapy, and chemotherapy. In various embodiments, treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer, e.g., early stage cancer, includes administration of a therapeutic regimen.
(88) In various embodiments, treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer includes treatment of early stage cancer by surgical removal of cancerous tissue.
(89) In various embodiments, treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer includes treatment of early stage cancer by one or more of surgical removal of cancerous tissue, surgery to remove lymph nodes near to identified colorectal cancer tissue, and chemotherapy (e.g., administration of one or more of 5-FU and leucovorin, oxaliplatin, or capecitabine).
(90) In various embodiments, treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer includes treatment by one or more of surgical removal of cancerous tissue, surgical removal of lymph nodes near to identified colorectal cancer tissue, chemotherapy (e.g., administration of one or more of 5-FU, leucovorin, oxaliplatin, capecitabine, e.g., in a combination of (i) 5-FU and leucovorin, (ii) 5-FU, leucovorin, and oxaliplatin (e.g., FOLFOX), or (iii) capecitabine and oxaliplatin (e.g., CAPEOX)), and radiation therapy.
(91) In various embodiments, treatment of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer includes one or more of surgical removal of cancerous tissue, surgical removal of lymph nodes near to identified colorectal cancer tissue, surgical removal of metastases, chemotherapy (e.g., administration of one or more of 5-FU, leucovorin, oxaliplatin, capecitabine, irinotecan, VEGF-targeted therapeutic agent (e.g., bevacizumab, ziv-aflibercept, or ramucirumab), EGFR-targeted therapeutic agent (e.g., cetuximab or panitumumab), Regorafenib, trifluridine, and tipiracil, e.g., in a combination of or including (i) 5-FU and leucovorin, (ii) 5-FU, leucovorin, and oxaliplatin (e.g., FOLFOX), (iii) capecitabine and oxaliplatin (e.g., CAPEOX), (iv) leucovorin, 5-FU, oxaliplatin, and irinotecan (FOLFOXIRI), and (v) trifluridine and tipiracil (Lonsurf)), radiation therapy, hepatic artery infusion (e.g., if cancer has metastasized to liver), ablation of tumors, embolization of tumors, colon stent, colorectomy, colostomy (e.g., diverting colostomy), and immunotherapy (e.g., pembrolizumab).
(92) Those of skill in the art understand that treatments of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer provided herein can be utilized, e.g., as determined by a medical practitioner, alone or in any combination, in any order, regimen, and/or therapeutic program. Those of skill in the art will further appreciate that advanced treatment options may be appropriate for earlier stage cancers in subjects previously having suffered a cancer.
(93) In some embodiments, methods and compositions for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening provided herein can inform treatment and/or payment (e.g., reimbursement for or reduction of cost of medical care, such as screening or treatment) decisions and/or actions, e.g., by individuals, healthcare facilities, healthcare practitioners, health insurance providers, governmental bodies, or other parties interested in healthcare cost.
(94) In some embodiments, methods and compositions for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening provided herein can inform decision making relating to whether health insurance providers reimburse a healthcare cost payer or recipient (or not), e.g., for (1) screening itself (e.g., reimbursement for screening otherwise unavailable, available only for periodic/regular screening, or available only for temporally- and/or incidentally-motivated screening); and/or for (2) treatment, including initiating, maintaining, and/or altering therapy, e.g., based on screening results. For example, in some embodiments, methods and compositions for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening provided herein are used as the basis for, to contribute to, or support a determination as to whether a reimbursement or cost reduction will be provided to a healthcare cost payer or recipient. In some instances, a party seeking reimbursement or cost reduction can provide results of a screen conducted in accordance with the present specification together with a request for such reimbursement or cost reduction of a healthcare cost. In some instances, a party making a determination as to whether or not to provide a reimbursement or cost reduction of a healthcare cost will reach a determination based in whole or in part upon receipt and/or review of results of a screen conducted in accordance with the present specification.
(95) For the avoidance of any doubt, those of skill in the art will appreciate from the present disclosure that methods and compositions for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer diagnosis of the present specification are at least for in vitro use. Accordingly, all aspects and embodiments of the present disclosure can be performed and/or used at least in vitro.
(96) Kits
(97) The present disclosure includes, among other things, kits including one or more compositions for use in screening as provided herein, optionally in combination with instructions for use thereof in screening (e.g., screening for a colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer (e.g., early-stage cancer)). In various embodiments, a kit for screening for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer can include one or more oligonucleotide capture baits (e.g., one or more biotinylated oligonucleotide probes). In certain embodiments, the kit for screening optionally includes one or more bisulfite reagents as disclosed herein. In certain embodiments, the kit for screening optionally includes one or more enzymatic conversion reagents as disclosed herein.
(98) Oligonucleotide capture baits are useful in next generation sequencing (NGS) techniques to target particular regions of interest of DNA. In certain embodiments, one or more capture baits are targeted to capture a region of interest of the DNA corresponding to one or more methylation loci (e.g., methylation loci comprising at least a portion of one or more DMRs, e.g., as found in Tables 14, 7-11 13 and 15). Oligonucleotide capture baits are intended to enrich the target DNA region, and aid in preparation of a DNA library. The enriched target region will then be sequenced using, for example, an NGS sequencing technique as discussed herein.
(99) In various embodiments, a kit for screening can include one or more of: one or more oligonucleotide primers (e.g., one or more oligonucleotide primer pairs), one or more MSREs, one or more reagents for qPCR (e.g., reagents sufficient for a complete qPCR reaction mixture, including without limitation dNTP and polymerase), and instructions for use of one or more components of the kit for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening. In various embodiments, a kit for screening of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer can include one or more of: one or more oligonucleotide primers (e.g., one or more oligonucleotide primer pairs e.g., as provided in Tables 3 and 4), one or more bisulfite reagents, one or more reagents for qPCR (e.g., reagents sufficient for a complete qPCR reaction mixture, including without limitation dNTP and polymerase), and instructions for use of one or more components of the kit for colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer screening.
(100) In certain embodiments, a kit of the present disclosure includes at least one oligonucleotide primer pair (e.g., as provided in Tables 3 and 4) for amplification of a methylation locus and/or DMR as disclosed herein (e.g., in Tables 1-4, 7-11, 13 and 15).
(101) In some instances, a kit of the present disclosure includes one or more oligonucleotide primer pairs for amplification of one or more methylation loci of the present disclosure. In some instances, a kit of the present disclosure includes one or more oligonucleotide primer pairs for amplification of one or more methylation loci that are or include all or a portion of one or more genes identified in Tables 1-4, 7-11, 13 and 15. In some particular instances, a kit of the present disclosure includes oligonucleotide primer pairs for a plurality of methylation loci that each are or include all or a portion of a gene identified in Table 1, the plurality of methylation loci including, e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54 methylation loci, e.g., as provided in Tables 1-4, 7-11 13 and 15.
(102) In some instances, a kit of the present disclosure includes one or more oligonucleotide primer pairs for amplification of one or more DMRs of the present disclosure. In some instances, a kit of the present disclosure includes one or more oligonucleotide primer pairs for amplification of one or more DMRs that are, include all or a portion of, or are within a gene identified in Tables 1-4, 7-11, 13 and 15. In some instances, a kit of the present disclosure includes one or more oligonucleotide primer pairs for amplification of one or more DMRs that are not associated with a presently known gene. In some particular embodiments, a kit of the present disclosure includes oligonucleotide primer pairs for a plurality of DMRs, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54 DMRs, e.g., as provided in Tables 1-4, 7-11 13 and 15.
(103) A kit of the present disclosure can further include one or more MSREs individually or in a single solution. In various embodiments, one or more MSREs are selected from the set of MSREs including AciI, Hin6I, HpyCH4IV, and HpaII (e.g., such that the kit includes AciI, Hin6I, and HpyCH4IV, either individually or in a single solution). In certain embodiments, a kit of the present disclosure includes one or more reagents for qPCR (e.g., reagents sufficient for a complete qPCR reaction mixture, including without limitation dNTP and polymerase).
EXAMPLES
(104) The present example includes identification of markers relevant to diagnosis of and/or classification of colorectal cancer, breast cancer, lung cancer and/or pancreatic cancer.
(105) Biomarker Discovery
(106) The present Example includes identification of CpG loci that are hypermethylated in one or more of colorectal cancer, breast cancer, lung cancer, and pancreatic cancer as compared to healthy control samples. An initial discovery study included data from The Cancer Genome Atlas (TCGA) Research Network (http://cancergenome.nih.gov/) for colorectal cancer (TCGA-COAD, TCGA-READ), breast cancer (TCGA-BRCA), lung cancer (TCGA-LUAD, TCGA-LUSC), pancreatic cancer (TCGA-PAAD), leukocytes and normal tissue.
(107) For significant marker selection, DNA methylation data from the histologically normal tissues and leukocytes were used to identify CpG methylation sites that lacked tissue-specific DNA methylation (mean b-value <0.25 and b-value >0.3 in no more than five samples across the entire set). From the resulting list of sites, only sites that had a mean b-value difference >0.1 were included. The sites were sorted to determine which sites contributed significantly to separation between normal and cancer samples. The sites were then filtered further based on sorting. As a result of all these filtering steps, 75 breast cancer specific CpG sites, 95 lung cancer specific CpG sites, and 35 pancreatic cancer sites were obtained.
(108) The individual CpG methylation sites serve as markers for cancers as indicated herein. The individual CpG methylation sites are found within the DMRs and loci as described herein.
(109) MSRE-qPCR Validation of Selected Regions
(110) The present Example describes an assay for determining the methylation status of colorectal cancer, breast cancer, lung cancer, and pancreatic cancer methylation biomarkers based on circulating cell free DNA (cfDNA). cfDNA is incomplete and fragmented, and the mechanism by which the cfDNA is transmitted from cancer cells to blood (as a portion called circulating tumor DNA) is unknown.
(111) For screening purposes, it is important to allow diagnostic marker detection from a readily obtainable biospecimen, such as blood, urine or stool. Confirming that tissue markers exist in blood, however, is challenging due to low concentration of circulating tumor-derived DNA (0.1-1%) as compared to non-tumor cfDNA background. For blood-based confirmatory testing, Methylation-Sensitive Restriction Enzyme (MSRE)-qPCR technology was utilized. MSRE-qPCR enables detection of <10 copies of targets in highly multiplexed format, making it suitable for use in low tumor derived circulating DNA context. Design of MSRE-qPCR assays is usually less complicated than the design of bisulfite-based assays, as “native” DNA is targeted with no prior chemical alterations required. However, primer selection requires covering a target region (e.g., a DMR) that presents at least one MSRE-cut-site (i.e., an MSRE-cut-site that covers at least one cancer methylation biomarker site, such that cleavage of the MSRE-cut-site is permitted in nucleic acid molecules where all of the at least one cancer methylation biomarker sites are unmethylated and blocked in nucleic acid molecules where at least one of the at least one cancer methylation biomarker sites is methylated).
(112) CpG-rich regions, which are also candidate regions for methylation differences (e.g., as found in, for example, DMRs), are preferred targets for MSRE-qPCR assay design, as they typically contain a large number of MSRE cut-sites and thus normally a high assay development success-rate is expected. Furthermore, the use of more than one MSRE is preferred. In certain embodiments, restriction enzymes including AciI, Hin6I, HpyCH4IV, and HpaII provide a high coverage of CpG-rich sequences. Assays developed to cover tissue-derived CpG targets were then evaluated for their utility for plasma-based marker detection and clinical prediction by using DNA extracted from plasma of patients found to have colorectal cancer, breast cancer, lung cancer, pancreatic cancer and control patients known not to have any cancer-related symptoms.
(113) An exemplary workflow is described herein. As performed in the present example, 4 ml of plasma were collected from plasma samples of 101 patients with colorectal cancer (N=20), breast cancer (N=29), lung cancer (N=37), pancreatic cancer (N=15) and 71 age/gender-matching non-cancer controls using a methylation-sensitive restriction enzyme qPCR approach. Cancer patients who had received curative treatment prior to blood collection were excluded from the study. The non-cancer patients had no clinical symptoms of cancer at the time of recruitment. The sample cohort is further described in Table 5.
(114) cfDNA from plasma samples was extracted with QIAamp MinElute ccfDNA Kit for manual isolation of the samples following protocol defined by manufacturer (QIAamp MinElute ccfDNA Handbook 08/2018, Qiagene).
(115) One-third of the eluted cfDNA volume was directly used for PCR amplification of the target regions and consecutive uQPCR analysis. This reaction functions as a quality control, showing whether a target of interest is detectable and quantifiable from plasma in its native DNA format. The remaining two thirds of the initially eluted cfDNA volume were used for digestion with methylation specific restriction enzymes. MSRE-qPCR assays of the present Examples utilize the MSREs AciI, Hin6I, and HpyCH4IV, which together were found to provide sufficient coverage.
(116) MSRE-qPCR oligonucleotide primer pairs as described herein were successfully developed for amplification of DMRs. DMRs included 1-15 MSRE cut-sites per target to enrich for the methylation-derived signal. Methylation sensitive restriction enzymes detected unmethylated DNA regions. Application of the MSRE enzymes resulted in the digestion and elimination of the unmethylated DNA strand(s) from the sample. The remaining the methylated regions of DNA (e.g., methylated loci including DMRs) were left intact and quantifiable.
(117) TABLE-US-00005 TABLE 5 Characteristics for sample cohort used in this study, indicating samples used in (i) the Pilot cohort for initial marker evaluation and prediction model development and (ii) the Validation cohort samples that were used for prediction algorithm validation Colorectal Pancreatic Lung Breast Control Cancer Cancer Cancer Cancer Characteristics (n = 71) (n = 20) (n = 15) (n = 30) (n = 29) Age (years, average (IQR)) 60 (32-78) 59 (28-77) 60 (50-71) 59 (42-75) 57 (41-77) Gender (n (%)) Female 36 (51%) 10 (50%) 7 (47%) 12 (32%) 29 (100%) Male 35 (49%) 10 (50%) 8 (53%) 25 (68%) Stage Stage I 6 15 7 Stage II 9 4 8 Stage III 5 15 17 13 Unknown 1 1
(118) The qPCR cycle threshold (CT) values were used for the consecutive data analyses. Data normalization was performed by calculating the delta-ct (dCT) value for each marker by subtracting for each marker the CT-value of undigested reaction from the CT-value of digested reaction. The R version 3.3.2 software was used for data analysis.
(119) Pan-cancer detection assay was built in a 2-step classification tree method. In detail, a first level of decision making was implemented to separate control patients from cancer patients, regardless of the cancer location. In the current Example, control patients were used as a reference point. For the first level decision making process, the potential of the methylation markers were evaluated by first ranking the features by using random forest (RF) classification algorithm using Monte-Carlo cross-validation over 50-runs on cancer vs control group. Features with variance of importance (VIP) >2 were then further used for classification model building and testing of the algorithm.
(120) 2nd level of decision making was intended for establishing, from which pre-defined tissue the detected cancer could have originated. In this random forest, decision tree building was utilized for comparing 4 cancer types against each other. Decision trees were built in a multi-group setting where each cancer type was compared against the other 3 (e.g., colorectal cancer vs combined lung, breast and pancreatic cancer or lung cancer vs colorectal, breast and pancreatic cancer). 500 Random Forest trees were run for finding the most optimal marker-set that gives lowest out-of-bag (OOB) error, which in return means highest separation between multiple groups that are different cancer types. Accuracy for correct cancer type identification was defined as the fraction of each cancer sample falling under correct, pre-defined cancer type.
(121) Results
(122) As shown in the below table, accuracy was defined as the fraction of correct calls (e.g., the fraction correctly identified as cancer). Specificity indicates an ability to separate control (e.g., non-cancer) patients correctly from all cancer patients. Sensitivity indicates an ability to separate cancer patients from control (e.g., non-cancer) patients.
(123) Random forest cross-validation analysis revealed that with 3 markers an area under curve (AUC) of 76% can be achieved. Increasing the number of markers in a panel to 30 methylation markers (e.g., Table 11) showed pan-cancer detection potential with AUC of 90%, where sensitivity of detecting cancer of any origin was 78% at 87% specificity (Table 6), with 100% of the pancreatic cancer (15/15), 95% of the CRC (19/20), 72% of the breast cancer (21/29) and 65% of lung cancer (24/37) correctly identified as cancer patients. Sensitivity for stage I cancers was 61% (17/28), sensitivity for stage II cancers 67% (14/21) and stage III cancers 92% (46/50) correctly identified.
(124) TABLE-US-00006 TABLE 6 Prediction algorithm accuracy estimates according to different marker-combinations for Cancer (colorectal + breast + lung + pancreatic) vs control group 3 5 9 17 30 AUC 0.76 0.80 0.85 0.88 0.90 AUC_CI_LOW 0.74 0.78 0.84 0.87 0.89 AUC_CI_HIGH 0.77 0.81 0.86 0.89 0.91 Sensitivity 0.68 0.71 0.73 0.75 0.78 Specificity 0.71 0.74 0.81 0.85 0.87 Accuracy 0.69 0.72 0.76 0.78 0.79 Kappa 0.36 0.41 0.49 0.55 0.57
(125) Marker panels showing the best predictive performances can be seen in Tables 7-11.
(126) TABLE-US-00007 TABLE 7 3-marker combination Gene chr start end width annotations Sequence HOXA7 7 27156273 27156352 80 HOXA7 CCCGGGGATGTTTTGG TCGTAGGAGGCGCAG GGCAGGTTGCCGTAGG CGTCGGCGCCCAGGCC GTAGCCGGACGCAAA GG (SEQ ID NO: 91) NA 10 100830555 100830658 104 NA AGCGGCTGAAATTGGT GCGCCTTGTGCTGTGG TCTGGGTGTGTCCCGG AGAGGGCGCGCAGGC GCCTATGTCTGTCGCG GGGCGGTCGGCGGGTT CCTGCATAG (SEQ ID NO: 96) JAM2 21 25640320 25640399 80 JAM2 CCGCGTGGTCTGGGCT CTGTAGCGTCCCAGCT GAGCCGGCGATATGC AGCGCACTTGTGGGGC GGAGGTGGAGGGAAT TC (SEQ ID NO: 108)
(127) TABLE-US-00008 TABLE 8 5-marker combination Gene chr start end width Sequence KLK10 19 51019613 51019705 93 GCAGGTAGCTTCACCTGGGAGTCGCCG ATAGGAAGGAGGGAGGGGACCCAGAC GTGCCTCTGCCCTGCCTGTGGTCTGCCG CTGGTATCCTCT (SEQ ID NO: 83) HOXA7 7 27156291 27156403 113 GTAGGAGGCGCAGGGCAGGTTGCCGTA GGCGTCGGCGCCCAGGCCGTAGCCGGA CGCAAAGGGGCTCTGATAAAGGGGGCT GTTGACATTGTATAAGCCCGGAACGGT CGAGG (SEQ ID NO: 92) NA 10 100830555 100830658 104 AGCGGCTGAAATTGGTGCGCCTTGTGC TGTGGTCTGGGTGTGTCCCGGAGAGGG CGCGCAGGCGCCTATGTCTGTCGCGGG GCGGTCGGCGGGTTCCTGCATAG (SEQ ID NO: 96) TXNRD1 12 104215675 104215784 110 GGCTATGACTTCGCTGTTGTCACCGAG CGCCCCGCCCACCGCGTTCTCCGACCC GCGGCCGGCAGGGGGCTCGCGGCCTCC GCCAGGCGTCCTTCGGCTCCGTCAGTTC C (SEQ ID NO: 99) THBD 20 23049354 23049500 147 TCTGACTGGCATTGAGGAAGGTCGCGG GGCCCGGGTAGAGCGCGAAGCAGTCGT GCTCGACGCACTGGCTGCCACCCGGCT GCGGCTCTGCGGGTGCGGGGAACCCCA GGCCGGCCAGGGCCAGCGCGCCAAGG ACCAGGACCCAA (SEQ ID NO: 107)
(128) TABLE-US-00009 TABLE 9 9-marker combination Gene chr start end width Sequence HOXD8 2 176129896 176130001 106 GGGGTTTGTAAACCGAGG CCAGAGTGTCCCCGTGGG CCGAGCGCACTTTTTTCTT GTCCGGGTGCGCTCAGTC ACTGGTGCCTGAGAGGAA ACAGTGGAGGCAGCG (SEQ ID NO: 84) SCGN 6 25652114 25652232 119 CCCCAAAGCGCAGAGACA GACAGGATCTGCCAGGAC AGCGCGCAGGGCGGGGCG GGGACAGGCGCGCCAGGA GCGGGGCGGGCTTCCAGC CGCTGGTTTTGCTGAGGGC TGAGGGACG (SEQ ID NO: 89) HOXA7 7 27156291 27156403 113 GTAGGAGGCGCAGGGCAG GTTGCCGTAGGCGTCGGC GCCCAGGCCGTAGCCGGA CGCAAAGGGGCTCTGATA AAGGGGGCTGTTGACATT GTATAAGCCCGGAACGGT CGAGG (SEQ ID NO: 92) MIR196B 7 27169630 27169719 90 CCAAGGAGAGAACCCTGC CATCGCGCCTGGCCCGGC CCAGCCCAGCCCCTAGGC AACCTGCGCCCGCCAGTG CAACAGAGTGCCCCAGGC (SEQ ID NO: 93) NA 10 100830555 100830658 104 AGCGGCTGAAATTGGTGC GCCTTGTGCTGTGGTCTGG GTGTGTCCCGGAGAGGGC GCGCAGGCGCCTATGTCT GTCGCGGGGCGGTCGGCG GGTTCCTGCATAG (SEQ ID NO: 96) TXNRD1 12 104215675 104215784 110 GGCTATGACTTCGCTGTTG TCACCGAGCGCCCCGCCC ACCGCGTTCTCCGACCCGC GGCCGGCAGGGGGCTCGC GGCCTCCGCCAGGCGTCCT TCGGCTCCGTCAGTTCC (SEQ ID NO: 99) HAPLN3 15 88895676 88895781 106 GTCTCCAGACTCGCTGGG AACCACCGCAAAGAGGGT GTGCAAGAGTTGAGGCCC TCACGTCTTGGGAAAGGA GAGTAGGGGTGGAATAGG AGAGTTTGGGGAGGGG (SEQ ID NO: 100) THBD 20 23049354 23049500 147 TCTGACTGGCATTGAGGA AGGTCGCGGGGCCCGGGT AGAGCGCGAAGCAGTCGT GCTCGACGACTGGCTGC CACCCGGCTGCGGCTCTGC GGGTGCGGGGAACCCCAG GCCGGCCAGGGCCAGCGC GCCAAGGACCAGGACCCC AA (SEQ ID NO: 107) SDC2 8 96493985 96494062 78 CTTCAGAGCAGCCTTCC CGGAGCACCAACTCCGTG TCGGGAGTGCAGAAACCA ACAAGTGAGAGGGCGCCG CGTTC (SEQ ID NO: 109)
(129) TABLE-US-00010 TABLE 10 17-marker combination Gene chr start end width Sequence MAST1 19 12867716 12867820 105 CCCCCTCCATGCAGCA AGCGATTCTCCGCGTC CGAGGCCACTTTTCCTG GAGGGAGAGGCCAGTC CCCCTTTGGGCGCCCG CCGCCGTTTCTCGGCGC TGCTGGAG (SEQ ID NO: 82) HOXD8 2 176129896 176130001 106 GGGGTTTGTAAACCGA GGCCAGAGTGTCCCCG TGGGCCGAGCGCACTT TTTTCTTGTCCGGGTGC GCTCAGTCACTGGTGC CTGAGAGGAAACAGTG GAGGCAGCG (SEQ. ID NO: 84) PREX1 20 48828337 48828448 112 TCACCGCGGGCTACGC CACTCCCACCCGGCAC ACGCGACACCCGCCGC GCGCAGGCTCCTGCTT GCAGGTCCGGCCGCTG CTCGGGCCAAGTAAAC ACCGGGCTGGGAAAGC (SEQ ID NO: 86) AMOTL2 3 134364403 134364512 110 CCATGGCTTCCTTTCTT TGGCAGAGTCAGGCTC CAGAAGTCCGCCTTCC TCCACAGGCACCC TAATCTGCCGTGCCCTT GCAGCTTCTCCTCCCCA GACTCCTCAGGGAA (SEQ ID NO: 87) SCGN 6 25652114 25652232 119 CCCCAAAGCGCAGAGA CAGACAGGATCTGCCA GGACAGCGCGCAGGGC GGGGCGGGGACAGGCG CGCCAGGAGCGGGGCG GGCTTCCAGCCGCTGG TTTTGCTGAGGGCTGA GGGACG (SEQ ID NO: 89) HOXA7 7 27155916 27156027 112 TCGAACCCATTAATTG GGCCATAAAAAGTTTT ATGAGCCTCATTTACAT ACAATGCTATGGGCTC CACGCAATGGCGCCTC CGCTCCAATTAAAACC AGAAAGGCTGCGCCG (SEQ ID NO: 90) MIR196B 7 27169630 27169719 90 CCAAGGAGAGAACCCT GCCATCGCGCCTGGCC CGGCCCAGCCCAGCCC CTAGGCAACCTGCGCC CGCCAGTGCAACAGAG TGCCCCAGGC (SEQ ID NO: 93) TNFRSF10D 8 23163949 23164031 83 GGTGGATCGAAAGCGC CAAAAATCAATCAGAA ATCGTCCCCGTAGTTTG TGCGCGTGCAAAGGTT CTCGCAGCTACACTGC CA (SEQ ID NO: 94) NA 10 100830555 100830658 104 AGCGGCTGAAATTGGT GCGCCTTGTGCTGTGGT CTGGGTGTGTCCCGGA GAGGGCGCGCAGGCGC CTATGTCTGTCGCGGG GCGGTCGGCGGGTTCC TGCATAG (SEQ ID NO: 96) TXNRDI 12 104215675 104215784 110 GGCTATGACTTCGCTGT TGTCACCGAGCGCCCC GCCCACCGCGTTCTCC GACCCGCGGCCGGCAG GGGGCTCGCGGCCTCC GCCAGGCGTCCTTCGG CTCCGTCAGTTCC (SEQ ID NO: 99) HAPLN3 15 88895676 88895781 106 GTCTCCAGACTCGCTG GGAACCACCGCAAAGA GGGTGTGCAAGAGTTG AGGCCCTCACGTCTTG GGAAAGGAGAGTAGG GGTGGAATAGGAGAGT TTGGGGAGGGG (SEQ ID NO: 100) CYBA 16 88651139 88651205 67 CATCTGTAGGGTGCAG GGCTGTCCCGGAGCCT TCTGCCCCCGCCCTCTC TAGCCACGCCGAGGCA TA (SEQ ID NO: 101) ZNF582 19 56393606 56393725 120 TCCGGGAACATAGTC TTTAGGCGTAAAGGCA GCAGCCCGGCCTTGAA GCCGGATCTCGCGATG TTTCAGGGTGAGCCGG ACGCAGGCGTGCCTGC GCAGTGCGCGGAGGAG TGCTGTTC (SEQ ID NO: 103) THBD 20 23049354 23049500 147 TCTGACTGGCATTGAG GAAGGTCGCGGGGCCC GGGTAGAGCGCGAAGC AGTCGTGCTCGACGCA CTGGCTGCCACCCGGC TGCGGCTCTGCGGGTG CGGGGAACCCCAGGCC GGCCAGGGCCAGCGCG CCAAGGACCAGGACCC CAA (SEQ ID NO: 107) SDC2 8 96493985 96494062 78 CTTCAGAGAGCAGCCT TCCCGGAGCACCAACT CCGTGTCGGGAGTGCA GAAACCAACAAGTGAG AGGGCGCCGCGTTC (SEQ ID NO: 109) FGF14 13 102394577 102394651 75 CAACGGAAACTTCCCG CGCTACGGCGGCTCCA ACGGGCCGCTTCCGCC GCATTGCGTAGCGAAG CCCCCGGCGAG (SEQ ID NO: 110) CDKN2A 9 21970919 21971017 99 GCATCTATGCGGGCAT GGTTACTGCCTCTGGTG CCCCCCGCAGCCGCGC GCAGGTACCGTGCGAC ATCGCGATGGCCCAGC TCCTCAGCCAGGTCCA CG SEQ ID NO: 111)
(130) TABLE-US-00011 TABLE 11 30-marker combination Gene chr start end width Sequence MAST1 19 12867716 12867820 105 CCCCCTCCATGCAGCA AGCGATTCTCCGCGTC CGAGGCCAGTTTCCTG GAGGGAGAGGCCAGT CCCCCTTTGGGCGCCC GCCGCCGTTTCTCGGC GCTGCTGGAG (SEQ ID NO: 82) KLK10 19 51019613 51019705 93 GCAGGTAGCTTCACCT GGGAGTCGCCGATAG GAAGGAGGGAGGGGA CCCAGACGTGCCTCTG CCCTGCCTGTGGTCTG CCGCTGGTATCCTCT (SEQ ID NO: 83) HOXD8 2 176129896 176130001 106 GGGGTTTGTAAACCGA GGCCAGAGTGTCCCCG TGGGCCGAGCGCACTT TTTTCTTGTCCGGGTG CGCTCAGTCACTGGTG CCTGAGAGGAAACAG TGGAGGCAGCG (SEQ ID NO: 84) C2orf88 2 190180554 190180682 129 GCCGGCAGCTGCTTGG TAGTTGCGGGGGGCGT GAGGGCGGTGGCCCA GACCAACCGGCTGGC AGCCCAGCTCCGCTCC GCCCGCCCCTGCCTCG GACCCTGCGCCTGAGG AAGTATCGAGGCAAC CCTC (SEQ ID NO: 85) PREX1 20 48828337 48828448 112 TCACCGCGGGCTACGC CACTCCCACCCGGCAC ACGCGACACCCGCCGC GCGCAGGCTCCTGCTT GCAGGTCCGGCCGCTG CTCGGGCCAAGTAAAC ACCGGGCTGGGAAAG C (SEQ ID NO: 86) AMOTL2 3 134364403 134364512 110 CCATGGCTTCCTTTCTT TGGCAGAGTCAGGCTC CAGAAGTCCGCCTTCC TCCACAGGCACCC TAATCTGCCGTGCCCT TGCAGCTTCTCCTCCC CAGACTCCTCAGGGAA (SEQ ID NO: 87) SOX2OT 3 181719490 181719596 107 GGCAAATTGAGGCCG AGCTGACGAGCTCCGG CGGGTGGACCTGACGT CACCGCGGCCCGGGTC ACCTCACCCATGGGGC TCCCCAAGAAGGTGCT GTGTGGGGCTCG (SEQ ID NO: 88) SCGN 6 25652114 25652232 119 CCCCAAAGCGCAGAG ACAGACAGGATCTGCC AGGACAGCGCGCAGG GCGGGGCGGGGACAG GCGCGCCAGGAGCGG GGCGGGCTTCCAGCCG CTGGTTTTGCTGAGGG CTGAGGGACG (SEQ ID NO: 89) HOXA7 7 27155916 27156027 112 TCGAACCCATTAATTG GGCCATAAAAAGTTTT ATGAGCCTCATTTACA TACAATGCTATGGGCT CCACGCAATGGCGCCT CCGCTCCAATTAAAAC CAGAAAGGCTGCGCC G (SEQ ID NO: 90) HOXA7 7 27156273 27156352 80 CCCGGGGATGTTTTGG TCGTAGGAGGCGCAG GGCAGGTTGCCGTAGG CGTCGGCGCCCAGGCC GTAGCCGGACGCAAA GG (SEQ ID NO: 91) HOXA7 7 27156291 27156403 113 GTAGGAGGCGCAGGG CAGGTTGCCGTAGGCG TCGGCGCCCAGGCCGT AGCCGGACGCAAAGG GGCTCTGATAAAGGG GGCTGTTGACATTGTA TAAGCCCGGAACGGTC GAGG (SEQ ID NO: 92) MIR196B 7 27169630 27169719 90 CCAAGGAGAGAACCC TGCCATCGCGCCTGGC CCGGCCCAGCCCAGCC CCTAGGCAACCTGCGC CCGCCAGTGCAACAG AGTGCCCCAGGC (SEQ ID NO: 93) TNFRSF10D 8 23163949 23164031 83 GGTGGATCGAAAGCG CCAAAAATCAATCAG AAATCGTCCCCGTAGT TTGTGCGCGTGCAAAG GTTCTCGCAGCTACAC TGCCA (SEQ ID NO: 94) TNFRSF10D 8 23163995 23164099 105 TTGTGCGCGTGCAAAG GTTCTCGCAGCTACAC TGCCAGAATAGAACGT GCTCCTCCGCTTTTAT ACCCCGGAAAAAAGG CGTGGTCAGTTGTACT CCCTTCCCGC (SEQ ID NO: 95) NA 10 100830555 100830658 104 AGCGGCTGAAATTGGT GCGCCTTGTGCTGTGG TCTGGGTGTGTCCCGG AGAGGGCGCGCAGGC GCCTATGTCTGTCGCG GGGCGGTCGGCGGGTT CCTGCATAG (SEQ ID NO: 96) ME3 11 86672189 86672296 108 AGATCCGGTGCGGGTG ACAGCCGGCGCCACCC CTGCCCCCATCCCTGT GAAAAAGAGGCGACT GCGCGGCGAGGGGTC CCCGTACCCCTAATCC CGCGTGGTGGCTTG (SEQ ID NO: 97) ME3 11 86672338 86672429 92 GAGCTGAGGTCTACGC GGTCCCGCTGCGGAGC AGGCGGGGTGAGGAG CTGCGGTCTGTGAGTC CTCTCCCGCCAATGGG TGGACCGCGCTGG (SEQ ID NO: 98) TXNRD1 12 104215675 104215784 110 GGCTATGACTTCGCTG TTGTCACCGAGCGCCC CGCCCACCGCGTTCTC CGACCCGCGGCCGGC AGGGGGCTCGCGGCCT CCGCCAGGCGTCCTTC GGCTCCGTCAGTTCC (SEQ ID NO: 99) HAPLN3 15 88895676 88895781 106 GTCTCCAGACTCGCTG GGAACCACCGCAAAG AGGGTGTGCAAGAGTT GAGGCCCTCACGTCTT GGGAAAGGAGAGTAG GGGTGGAATAGGAGA GGTTTGGGGAGGGG (SEQ ID NO: 100) CYBA 16 88651139 88651205 67 CATCTGTAGGGTGCAG GGCTGTCCCGGAGCCT TCTGCCCCCGCCCTCT CTAGCCACGCCGAGGC ATA (SEQ ID NO: 101) ZNF568 19 36916284 36916453 170 TGTGTTCTGGCCGGAA GTTGAGTGGGGCCGCG GGGCCTGCTGGGAGGT GTTGTCCTCGGAAACG TCGCTGGCGCGGAGG GATGGTTCGGCGCTTT AGGCGTCTGTCACAGA CCTATCTGCGGGTCGC CTTCACCC AGCATCTCAGAAACTG CGCGCGGGATGAACA TTCG (SEQ ID NO: 102) ZNF582 19 56393606 56393725 120 TCCGGGAAACATAGTC TTTAGGCGTAAAGGCA GCAGCCCGGCCTTGAA GCCGGATCTCGCGATG TTTCAGGGTGAGCCGG ACGCAGGCGTGCCTGC GCAGTGCGCGGAGGA GTGCTGTTC (SEQ ID NO: 103) ZNF471 19 56507527 56507675 149 CCCCACGCGTACTCAC ACCGAAGGCTCAGCC GTCGCGCGTTTCCCTC CCAGGCCCCAGGAACT AGTAACTAGGGACGCT TCTGGTCTCTAGGCGA GGAGAGGGGGAGAGC GCAATCTTTGCGCCTG CGCACACTCCTGCTCT TACCCGC (SEQ ID NO: 104) ZNF471 19 56507558 56507675 118 GTCGCGCGTTTCCCTC CCAGGCCCCAGGAACT AGTAACTAGGGACGCT TCTGGTCTCTAGGCGA GGAGAGGGGGAGAGC GCAATCTTTGCGCCTG CGCACACTCCTGCTCT TACCCGC (SEQ ID NO: 105) ZNF471 19 56507662 56507750 89 CTGCTCTTACCCGCCG GAACCCTGGGCCACGC CCGGCTCGCGTAATCA CGCACTGCGCAGGCAC CGCCCGCTCTGCTCTA AGGTCCCTC (SEQ ID NO: 106) THBD 20 23049354 23049500 147 TCTGACTGGCATTGAG GAAGGTCGCGGGGCC CGGGTAGAGCGCGAA GCAGTCGTGCTCGACG CACTGGCTGCCACCCG GCTGCGGCTCTGCGGG TGCGGGGAACCCCAG GCCGGCCAGGGCCAG CGCGCCAAGGACCAG GACCCCAA (SEQ ID NO: 107) JAM2 21 25640320 25640399 80 CCGCGTGGTCTGGGCT CTGTAGCGTCCCAGCT GAGCCGGCGATATGC AGCGCACTTGTGGGGC GGAGGTGGAGGGAAT TC (SEQ ID NO: 108) SDC2 8 96493985 96494062 78 CTTCAGAGAGCAGCCT TCCCGGAGCACCAACT CCGTGTCGGGAGTGCA GAAACCAACAAGTGA GAGGGCGCCGCGTTC (SEQ ID NO: 109) FGF14 13 102394577 102394651 75 CAACGGAAACTTCCCG CGCTACGGCGGCTCCA ACGGGCCGCTTCCGCC GCATTGCGTAGCGAAG CCCCCGGCGAG (SEQ ID NO: 110) CDKN2A 9 21970919 21971017 99 GCATCTATGCGGGCAT GGTTACTGCCTCTGGT GCCCCCCGCAGCCGCG CGCAGGTACCGTGCGA CATCGCGATGGCCCAG CTCCTCAGCCAGGTCC ACG (SEQ ID NO: 111)
(131) Individual marker accuracy values are represented in
(132) Cancer Location Analysis
(133) Cases that were correctly separated from control group were further evaluated to identify their tissue of origin. A 7 methylation marker panel allowed correct assignment of the tissue of origin to 58% of colorectal cancer (11/19), 71% of lung cancer (17/24), 53% of pancreatic cancer (8/15) and 62% of breast cancer (13/21) cases (Table 12).
(134) TABLE-US-00012 TABLE 12 OOB matrix indicating sample classification according to a 7-marker panel. Entries found on the bold, diagonal indicates a correctly classified sample. For example, 13 breast cancer samples were correctly identified as originating from breast cancer tissue samples. Entries in rows indicate false identifications to other cancer types. For example, two breast cancer tissue samples were incorrectly identified as lung cancer samples. BC—breast cancer, CRC—colorectal cancer, LC—lung cancer, PC—pancreatic cancer BC CRC LC PC class.error Correct BC 13 1 4 3 0.381 62 CRC 2 11 8 0 0.421 58 LC 1 4 17 2 0.292 71 PC 4 1 2 8 0.467 53
(135) Individual markers contributing to the best performing panel are listed in Table 13.
(136) TABLE-US-00013 TABLE 13 7-marker combination annotations chr start end Width sequence HOXA7 7 27155916 27156027 112 TCGAACCCATTAATT GGGCCATAAAAAGTT TTATGAGCCTCATTTA CATACAATGCTATGG GCTCCACGCAATGGC GCCTCCGCTCCAATTA AAACCAGAAAGGCTG CGCCG (SEQ ID NO: 90) MIR196B 7 27169630 27169719 90 CCAAGGAGAGAACCC TGCCATCGCGCCTGG CCCGGCCCAGCCCAG CCCCTAGGCAACCTG CGCCCGCCAGTGCAA CAGAGTGCCCCAGGC (SEQ ID NO: 93) TNFRSF10D 8 23163995 23164099 105 TTGTGCGCGTGCAAA GGTTCTCGCAGCTAC ACTGCCAGAATAGAA CGTGCTCCTCCGCTTT TATACCCCGGAAAAA AGGCGTGGTCAGTTG TACTCCCTTCCCGC (SEQ ID NO: 95) DNM3 1 171841774 171841857 84 CAGAGCGCCGGCAAG AGCTCGGTGCTCGAG AACTTCGTGGGCAGG TAAGCGCGCAGGGCG CGGAGTAAGGATGCG GCAGTGGGG (SEQ ID NO: 112) C1orf230 1 151721583 15172167 97 TTAGCGCAGCGCAGC TGGAGCAGCTGCGAA AATTCAAGGTGGGTG CGCCCGCGCCCCCAT CCAGCGTCCACCAAA GTGTAGCTGCCCCAG GACTGGG (SEQ ID NO: 115) ZNF568 19 36916252 36916371 120 GCCCAAGCCTCACCC TCACACAGGAAAGCA GATGTGTTCTGGCCG GAAGTTGAGTGGGGC CGCGGGGCCTGCTGG GAGGTGTTGTCCTCG GAAACGTCGCTGGCG CGGAGGGATGGTTCG (SEQ ID NO: 118) C9orf50 9 129620787 129620870 84 AGAGTAGCCAATTT GGGGGTTGCTGTGAC GTTTAAATGAGCAAG TACATGCCAGTCTTA GAACAGCAAGCTCGG TACAGTGCC (SEQ ID NO: 119)
(137) A 14 methylation marker panel (Table 15) allowed correct assignment of the tissue of origin to 74% of colorectal cancer (14/19), 79% of lung cancer (19/24), 53% of pancreatic cancer (9/15) and 52% of breast cancer (11/21) cases. (Table 14).
(138) TABLE-US-00014 TABLE 14 OOB matrix indicating sample classification according to 14-marker panel. Diagonal line indicates the correct classification and rows indicate false identifications to other cancer types. BC—breast cancer, CRC—colorectal cancer, LC—lung cancer, PC—pancreatic cancer BC CRC LC PC class.error Correct BC 11 0 6 4 0.476 52 CRC 0 14 3 2 0.263 74 LC 0 3 19 2 0.208 79 PC 4 1 1 9 0.4 60
(139) Individual markers contributing to a 14-methylation maker panel are listed in Table 15.
(140) TABLE-US-00015 TABLE 15 14-marker combination annotations chr start end Width sequence DNM3 1 171841774 171841857 84 CAGAGCGCCGGCAAG AGCTCGGTGCTCGAG AACTTCGTGGGCAGG TAAGCGCGCAGGGCG CGGAGTAAGGATGCG GCAGTGGGG (SEQ ID NO: 112) NA 5 73436635 73436710 76 CAGGCTGGGCGGTCT TTGACCCCCCGCGCCT CCCGCCCACAGCCGG AGCCCGGCAGCTGGA AGCACCCGCTACCCC (SEQ ID NO: 113) HOXA7 7 27155916 27156027 112 TCGAACCCATTAATT GGGCCATAAAAAGTT TTATGAGCCTCATTTA CATACAATGCTATGG GCTCCACGCAATGGC GCCTCCGCTCCAATTA AAACCAGAAAGGCTG CGCCG (SEQ ID NO: 90) MIR196B 7 27169630 27169719 90 CCAAGGAGAGAACCC TGCCATCGCGCCTGG CCCGGCCCAGCCCAG CCCCTAGGCAACCTG CGCCCGCCAGTGCAA CAGAGTGCCCCAGGC (SEQ ID NO: 93) DLX6-AS1 7 97014186 97014266 81 CAAGACCTGGCGCAT CTTTGCAAATTACAG ATAATTGTAAACGTC CAGATTATGATAATA GCATCCTAATCCAGC CTGCAA (SEQ ID NO: 114) TNFRSF10D 8 23163995 23164099 105 TTGTGCGCGTGCAAA GGTTCTCGCAGCTAC ACTGCCAGAATAGAA CGTGCTCCTCCGCTTT TATACCCCGGAAAAA AGGCGTGGTCAGTTG TACTCCCTTCCCGC (SEQ ID NO: 95) C1orf230 1 151721583 151721679 97 TTAGCGCAGCGCAGC TGGAGCAGCTGCGAA AATTCAAGGTGGGTG CGCCCGCGCCCCCAT CCAGCGTCCACCAAA GTGTAGCTGCCCCAG GACTGGG (SEQ ID NO: 115) DIO3OS, 14 101561395 101561505 111 TCCGGGCTCAAGTTG MIR1247 CAAGGGGGCGGGCCG GGCCGGAGGTGGAGT CTCCCGCCAATTGAA GCCTCCGCTATAAATT GAACTCCCTGCACTG CTGAAGCCCAGATGC CTCGC (SEQ ID NO: 116) GSG1L 16 28063861 28063964 104 CCGAAAGAAATCCGA GCCAGGGTGAGGGTC TGAGACGCAAGGAGA ATCCCAG GCAAGGCGCTCCTGA GAAAAGATCCCCACG GCGGACGTGGGGCA CAAAACC (SEQ ID NO: 117) ZNF568 19 36916252 36916371 120 GCCCAAGCCTCACCC TCACACAGGAAAGCA GATGTGTTCTGGCCG GAAGTTGAGTGGGGC CGCGGGGCCTGCTGG GAGGTGTTGTCCTCG GAAACGTCGCTGGCG CGGAGGGATGGTTCG (SEQ ID NO: 118) C9orf50 9 129620787 129620870 84 AGAGTAGCCAACTTT GGGGGTTGCTGTGAC GTTTAAATGAGCAAG TACATGCCAGTCTTA GAACAGCAAGCTCGG TACAGTGCC (SEQ ID NO: 119) LONRF2 2 100322387 100322463 77 CTCTCAGTCCCGCCG GCTTAGGTAACCCAG GTCGCTGCGGTAACG CAGTGACCGCGCTCC AGGTCCGCGTCTCTTG C (SEQ ID NO: 120) PCDH9 13 67231171 67231265 95 GCGTGCGAAGTCTCC TCTAGCGGAGCGGGA CCGGCCGCGGCGGTG GATCGTGGCGGTCCC TGCACTTCTGCTCCAG CCGCGCCTGGAAACC (SEQ ID NO: 121) GFPT2 5 180353729 180353815 87 CGTAAGGGGCAGAGC GAGGGGTCCGGCATC ACTCGCGCGCTCCGG AAACCC GCGTGAGCCGCTGTT CCTGCCGCGCTCCCAT CTGAG (SEQ ID NO: 122)
(141)
(142) Primer-pairs used for best performing assays are indicated in Tables 3-4. These primer-pairs are designed to amplify DMRs of interest with the condition that at least 1 cut-site for methylation specific restriction enzymes is covered. In most cases 3-15 individual cut-sites are covered.
OTHER EMBODIMENTS
(143) While we have described a number of embodiments, it is apparent that our basic disclosure and examples may provide other embodiments that utilize or are encompassed by the compositions and methods described herein. Therefore, it will be appreciated that the scope of is to be defined by that which may be understood from the disclosure and the appended claims rather than by the specific embodiments that have been represented by way of example.
(144) All references cited herein are hereby incorporated by reference.
(145) TABLE-US-00016 SEQUENCE TABLE 1 uid Gene chr start end primer_F primer_R width genome Sequence 103_1B MAST1 19 12867716 12867820 CCCCCT CTCCA 105 hg38 MAST1 CCCCCTCCATG CCATG GCAGC CAGCAAGCGAT CAGCA GCCGA TCTCCGCGTCC AGC GAAAC GAGGCCAGTTT (SEQ ID (SEQ ID CCTGGAGGGAG NO: 29) NO: 56) AGGCCAGTCCC CCTTTGGGCGC CCGCCGCCGTT TCTCGGCGCTG CTGGAG (SEQ ID NO: 82) 123_1B KLK10 19 51019613 51019705 GCAGG AGAGG 93 hg38 KLK10; GCAGGTAGCTT TAGCTT ATACC KLK10; CACCTGGGAGT CACCT AGCGG KLK10; CGCCGATAGGA GGGAG CAGAC KLK10 AGGAGGGAGG TCG CACA GGACCCAGACG (SEQ ID (SEQ ID TGCCTCTGCCC NO: 30) NO: 57) TGCCTGTGGTC TGCCGCTGGTA TCCTCT (SEQ ID NO: 83) 141_1B HOXD8 2 176129896 176130001 GGGGT CGCTG 106 hg38 HOXD8; GGGGTTTGTAA TTGTA CCTCC HOXD8 ACCGAGGCCAG AACCG ACTGTT AGTGTCCCCGT AGGCC TCCTCT GGGCCGAGCGC AGAG CA (SEQ ACTTTTTTCTTG (SEQ ID ID NO: TCCGGGTGCGC NO: 31) 58) TCAGTCACTGG TGCCTGAGAGG AAACAGTGGA GGCAGCG (SEQ ID NO: 84) 149_1B C2orf88 2 190180554 190180682 GCCGG GAGGG 129 hg38 C2orf88; GCCGGCAGCTG CAGCT TTGCCT C2orf88; CTTGGTAGTTG GCTTG CGATA C2orf88; CGGGGGGCGTG GTAGT CTTCCT C2orf88 AGGGCGGTGGC TG (SEQ CA (SEQ CCAGACCAACC ID NO: ID NO: GGCTGGCAGCC 32) 59) CAGCTCCGCTC CGCCCGCCCCT GCCTCGGACCC TGCGCCTGAGG AAGTATCGAGG CAACCCTC (SEQ ID NO: 85) 175_1B PREX1 20 48828337 48828448 TCACC GCTTTC 112 hg38 PREX1 TCACCGCGGGC GCGGG CCAGC TACGCCACTCC CTACG CCGGT CACCCGGCACA CCACT GTTT CGCGACACCCG (SEQ ID (SEQ ID CCGCGCGCAGG NO: 33) NO: 60) CTCCTGCTTGC AGGTCCGGCCG CTGCTCGGGCC AAGTAAACACC GGGCTGGGAA AGC (SEQ ID NO: 86) 215_1B AMOTL2 3 134364403 134364512 CCATG TTCCCT 110 hg38 AMOTL2 CCATGGCTTCC GCTTCC GAGGA TTTCTTTGGCA TTTCTT GTCTG GAGTCAGGCTC TGGCA GGGAG CAGAAGTCCGC GA (SEQ GAG CTTCCTCCACA ID NO: (SEQ ID GGCACCC 34) NO: 61) TAATCTGCCGT GCCCTTGCAGC TTCTCCTCCCC AGACTCCTCAG GGAA (SEQ ID NO: 87) 220_1B SOX2OT 3 181719490 181719596 GGCAA CGAGC 107 hg38 SOX2OT GGCAAATTGAG ATTGA CCCAC GCCGAGCTGAC GGCCG ACAGC GAGCTCCGGCG AGCTG ACCTT GGTGGACCTGA (SEQ ID (SEQ ID CGTCACCGCGG NO: 35) NO: 62) CCCGGGTCACC TCACCCATGGG GCTCCCCAAGA AGGTGCTGTGT GGGGCTCG (SEQ ID NO: 88) 236_1B SCGN 6 25652114 25652232 TCCCC CGTCC 119 hg38 SCGN CCCCAAAGCGC AAAGC CTCAG AGAGACAGAC GCAGA CCCTC AGGATCTGCCA GACAG AGCAA GGACAGCGCGC A (SEQ (SEQ ID AGGGCGGGGC ID NO: NO: 63) GGGGACAGGC 36) GCGCCAGGAGC GGGGCGGGCTT CCAGCCGCTGG TTTTGCTGAGG GCTGAGGGACG (SEQ ID NO: 89) 243_1B HOXA7 7 27155916 27156027 TCGAA CGGCG 112 hg38 HOXA7 TCGAACCCATT CCCATT CAGCC AATTGGGCCAT AATTG TTTCTG AAAAAGTTTTA GGCCA GTTT TGAGCCTCATT TA (SEQ (SEQ ID TACATACAATG ID NO: NO: 15) CTATGGGCTCC 1) ACGCAATGGCG CCTCCGCTCCA ATTAAAACCAG AAAGGCTGCGC CG (SEQ ID NO: 90) 244_1B HOXA7 7 27156273 27156352 CCCGG CCTTTG 80 hg38 HOXA7 CCCGGGGATGT GGATG CGTCC TTTGGTCGTAG TTTTGG GGCTA GAGGCGCAGG TCGT CGG GCAGGTTGCCG (SEQ ID (SEQ ID TAGGCGTCGGC NO: 37) NO: 64) GCCCAGGCCGT AGCCGGACGCA AAGG (SEQ ID NO: 91) 245_1B HOXA7 7 27156291 27156403 GTAGG CCTCG 113 hg38 HOXA7 GTAGGAGGCGC AGGCG ACCGT AGGGCAGGTTG CAGGG TCCGG CCGTAGGCGTC CAGGT GCTTA GGCGCCCAGGC (SEQ ID (SEQ ID CGTAGCCGGAC NO: 38) NO: 65) GCAAAGGGGCT CTGATAAAGGG GGCTGTTGACA TTGTATAAGCC CGGAACGGTCG AGG (SEQ ID NO: 92) 253_1B MIR196B 7 27169630 27169719 CCAAG GCCTG 90 hg38 MIR196B CCAAGGAGAG GAGAG GGGCA AACCCTGCCAT AACCC CTCTGT CGCGCCTGGCC TGCCA TGCAC CGGCCCAGCCC TCG T (SEQ AGCCCCTAGGC (SEQ ID ID NO: AACCTGCGCCC NO: 2) 16) GCCAGTGCAAC AGAGTGCCCCA GGC (SEQ ID NO: 93) 279_1B TNFRSF10D 8 23163949 23164031 GGTGG TGGCA 83 hg38 TNFRSF10D; GGTGGATCGAA ATCGA GTGTA TNFRSF10D AGCGCCAAAA AAGCG GCTGC ATCAATCAGAA CCAAA GAGAA ATCGTCCCCGT (SEQ ID CC (SEQ AGTTTGTGCGC NO: 39) ID NO: GTGCAAAGGTT 66) CTCGCAGCTAC ACTGCCA (SEQ ID NO: 94) 281_1B TNFRSF10D 8 23163995 23164099 TTGTGC GCGGG 105 hg38 TNFRSF10D TTGTGCGCGTG GCGTG AAGGG CAAAGGTTCTC CAAAG AGTAC GCAGCTACACT GTTC AACTG GCCAGAATAGA (SEQ ID ACC ACGTGCTCCTC NO: 3) (SEQ ID CGCTTTTATAC NO: 17) CCCGGAAAAA AGGCGTGGTCA GTTGTACTCCC TTCCCGC (SEQ ID NO: 95) 29_1B NA 10 100830555 100830658 AGCGG CTATG 104 hg38 NA AGCGGCTGAAA CTGAA CAGGA TTGGTGCGCCT ATTGG ACCCG TGTGCTGTGGT TGCGC CCGAC CTGGGTGTGTC C (SEQ CG (SEQ CCGGAGAGGG ID NO: ID NO: CGCGCAGGCGC 40) 67) CTATGTCTGTC GCGGGGCGGTC GGCGGGTTCCT GCATAG (SEQ ID NO: 96) 44_1B ME3 11 86672189 86672296 AGATC CAAGC 108 hg38 ME3; AGATCCGGTGC CGGTG CACCA ME3; GGGTGACAGCC CGGGT CGCGG ME3 GGCGCCACCCC GACAG GATTA TGCCCCCATCC (SEQ ID (SEQ ID CTGTGAAAAAG NO: 41) NO: 68) AGGCGACTGCG CGGCGAGGGGT CCCCGTACCCC TAATCCCGCGT GGTGGCTTG (SEQ ID NO: 97) 51_1B ME3 11 86672338 86672429 GAGCT CCAGC 92 hg38 ME3; GAGCTGAGGTC GAGGT GCGGT ME3; TACGCGGTCCC CTACG CCACC ME3; GCTGCGGAGCA CGGTC CATTG ME3 GGCGGGGTGA CC (SEQ (SEQ ID GGAGCTGCGGT ID NO: NO: 69) CTGTGAGTCCT 42) CTCCCGCCAAT GGGTGGACCGC GCTGG (SEQ ID NO: 98) 69_1B TXNRD1 12 104215675 104215784 GGCTA GGAAC 110 hg38 TXNRD1 GGCTATGACTT TGACTT TGACG CGCTGTTGTCA CGCTG GAGCC CCGAGCGCCCC TTGTCA GAAGG GCCCACCGCGT CC (SEQ A (SEQ TCTCCGACCCG ID NO: ID NO: CGGCCGGCAGG 43) 70) GGGCTCGCGGC CTCCGCCAGGC GTCCTTCGGCT CCGTCAGTTCC (SEQ ID NO: 99) 87_1B HAPLN3 15 88895676 8889581 GTCTCC CCCCTC 106 hg38 HAPLN3 GTCTCCAGACT AGACT CCCAA CGCTGGGAACC CGCTG ACTCTC ACCGCAAAGA GGAAC CTATTC GGGTGTGCAAG CAC CA (SEQ AGTTGAGGCCC (SEQ ID ID NO: TCACGTCTTGG NO: 44) 71) GAAAGGAGAG TAGGGGTGGAA TAGGAGAGTTT GGGGAGGGG (SEQ ID NO: 100) 93_1B CYBA 16 88651139 88651205 CATCT TATGC 67 hg38 CYBA CATCTGTAGGG GTAGG CTCGG TGCAGGGCTGT GTGCA CGTGG CCCGGAGCCTT GGGCT CTAGA CTGCCCCCGCC GTCC GAGG CTCTCTAGCCA (SEQ ID (SEQ ID CGCCGAGGCAT NO: 45) NO: 72) A (SEQ ID NO: 101) Norm_R1_A133 ZNF568 19 36916284 36916453 TGTGTT CGAAT 170 hg38 37407284 TGTGTTCTGGC CTGGC GTTCAT CGGAAGTTGAG CGGAA CCCGC TGGGGCCGCGG GTTGA GCGCA GGCCTGCTGGG GTG GTT AGGTGTTGTCC (SEQ ID (SEQ ID TCGGAAACGTC NO: 46) NO: 73) GCTGGCGCGGA GGGATGGTTCG GCGCTTTAGGC GTCTGTCACAG ACCTATCTGCG GGTCGCCTTCA CCC AGCATCTCAGA AACTGCGCGCG GGATGAACATT CG (SEQ ID NO: 102) Norm_R1_A151 ZNF582 19 56393606 56393725 TCCGG GAACA 12 hg38 ZNF582 TCCGGGAAACA GAAAC GCACT TAGTCTTTAGG ATAGT CCTCC CGTAAAGGCAG CTTTAG GCGCA CAGCCCGGCCT GCGT CTG TGAAGCCGGAT (SEQ ID (SEQ ID CTCGCGATGTT NO: 47) NO: 74) TCAGGGTGAGC CGGACGCAGGC GTGCCTGCGCA GTGCGCGGAGG AGTGCTGTTC (SEQ ID NO: 103) Norm_R1_A156 ZNF471 19 56507527 56507675 CCCCA GCGGG 149 hg38 ZNF471 CCCCACGCGTA CGCGT TAAGA CTCACACCGAA ACTCA GCAGG GGCTCAGCCGT CACCG AGTGT CGCGCGTTTCC AAG G (SEQ CTCCCAGGCCC (SEQ ID ID NO: CAGGAACTAGT NO: 48) 75) AACTAGGGACG CTTCTGGTCTC TAGGCGAGGA GAGGGGGAGA GCGCAATCTTT GCGCCTGCGCA CACTCCTGCTC TTACCCGC (SEQ ID NO: 104) Norm_R1_A157 ZNF471 19 56507558 56507675 GTCGC GCGGG 118 hg38 ZNF471 GTCGCGCGTTT GCGTTT TAAGA CCCTCCCAGGC CCCTCC GCAGG CCCAGGAACTA CAG AGTGT GTAACTAGGGA (SEQ ID G (SEQ CGCTTCTGGTC NO: 49) ID NO: TCTAGGCGAGG 75) AGAGGGGGAG AGCGCAATCTT TGCGCCTGCGC ACACTCCTGCT CTTACCCGC (SEQ ID NO: 105) Norm_R1_A158 ZNF471 19 56507662 56507750 CTGCTC GAGGG 89 hg38 ZNF471 CTGCTCTTACC TTACCC ACCTT CGCCGGAACCC GCCGG AGAGC TGGGCCACGCC AACCC AGAGC CGGCTCGCGTA TG (SEQ GGGC ATCACGCACTG ID NO: (SEQ ID CGCAGGCACCG 50) NO: 76) CCCGCTCTGCT CTAAGGTCCCT C (SEQ ID NO: 106) Norm_R1_A171 THBD 20 23049354 23049500 TCTGA TTGGG 147 hg38 THBD TCTGACTGGCA CTGGC GTCCT TTGAGGAAGGT ATTGA GGTCC CGCGGGGCCCG GGAAG TTGGC GGTAGAGCGCG GTCG GC (SEQ AAGCAGTCGTG (SEQ ID ID NO: CTCGACGCACT NO: 51) 77) GGCTGCCACCC GGCTGCGGCTC TGCGGGTGCGG GGAACCCCAGG CCGGCCAGGGC CAGCGCGCCAA GGACCAGGACC CCAA (SEQ ID NO: 107) Norm_R1_A187 JAM2 21 25640320 25640399 CCGCG GAATT 80 hg38 JAM2 CCGCGTGGTCT TGGTCT CCCTCC GGGCTCTGTAG GGGCT ACCTC CGTCCCAGCTG CTGTA CGCCC AGCCGGCGATA G (SEQ CAC TGCAGCGCAGT ID NO: (SEQ ID TGTGGGGCGGA 52) NO: 78) GGTGGAGGGA ATTC (SEQ ID NO: 108) Top50_Assay44 SDC2 8 96493985 96494062 CTTCA GAACG 78 hg38 SDC2 CTTCAGAGAGC GAGAG CGGCG AGCCTTCCCGG CAGCC CCCTCT AGCACCAACTC TTCCCG CACTT CGTGTCGGGAG G (SEQ (SEQ ID TGCAGAAACCA ID NO: NO: 79) ACAAGTGAGA 53) GGGCGCCGCGT TC (SEQ ID NO: 109) Top50_Assay65 FGF14 13 102394577 102394651 CAACG CTCGC 75 hg38 FGF14 CAACGGAAACT GAAAC CGGGG TCCCGCGCTAC TTCCCG GCTTC GGCGGCTCCAA CGCTA GCTAC CGGGCCGCTTC C (SEQ (SEQ ID CGCCGCATTGC ID NO: NO: 80) GTAGCGAAGCC 54) CCCGGCGAG (SEQ ID NO: 110) UDX_230 CDKN2A 9 21970919 21971017 GCATC CGTGG 99 hg38 CDKN2A GCATCTATGCG TATGC ACCTG GGCATGGTTAC GGGCA GCTGA TGCCTCTGGTG TGGTT GGAGC CCCCCCGCAGC ACTG TG (SEQ CGCGCGCAGGT (SEQ ID ID NO: ACCGTGCGACA NO: 55) 81) TCGCGATGGCC CAGCTCCTCAG CCAGGTCCACG (SEQ ID NO: 111)
(146) TABLE-US-00017 SEQUENCE TABLE 2 Gene chr start end primer_F primer_R width genome Sequence HOXA7 7 27156273 27156352 CCCGGG CCTTTGC 80 hg38 HOXA7 CCCGGGGATGTTTTGGTC GATGTTT GTCCGGC GTAGGAGGCGCAGGGCA TGGTCGT TACGG GGTTGCCGTAGGCGTCGG (SEQ ID (SEQ ID CGCCCAGGCCGTAGCCGG NO: 37) NO: 64) ACGCAAAGG (SEQ ID NO: 91) NA 10 100830555 100830658 AGCGGC CTATGCA 104 hg38 NA AGCGGCTGAAATTGGTGC TGAAATT GGAACCC GCCTTGTGCTGTGGTCTG GGTGCG GCCGACC GGTGTGTCCCGGAGAGGG CC (SEQ G (SEQ ID CGCGCAGGCGCCTATGTC ID NO: 40) NO: 67) TGTCGCGGGGCGGTCGGC GGGTTCCTGCATAG (SEQ ID NO: 96) JAM2 21 25640320 25640399 CCGCGT GAATTCC 80 hg38 JAM2 CCGCGTGGTCTGGGCTCT GGTCTG CTCCACC GTAGCGTCCCAGCTGAGC GGCTCTG TCCGCCC CGGCGATATGCAGCGCAC TAG (SEQ CAC (SEQ TTGTGGGGCGGAGGTGGA ID NO: 52) ID NO: 78) GGGAATTC (SEQ ID NO: 108)
(147) TABLE-US-00018 SEQUENCE TABLE 3 Gene chr start end primer_F primer_R width genome Sequence KLK10 19 51019613 5109705 GCAGGT AGAGGA 93 hg38 KLK10; GCAGGTAGCTTCACCTG AGCTTCA TACCAG KLK10; GGAGTCGCCGATAGGA CCTGGG CGGCAG KLK10; AGGAGGGAGGGGACCC AGTCG ACCACA KLK10 AGACGTGCCTCTGCCCT (SEQ ID (SEQ ID GCCTGTGGTCTGCCGCT NO: 30) NO: 57) GGTATCCTCT (SEQ ID NO: 83) HOXA7 7 27156291 27156403 GTAGGA CCTCGA 113 hg38 HOXA7 GTAGGAGGCGCAGGGC GGCGCA CCGTTC AGGTTGCCGTAGGCGTC GGGCAG CGGGCT GGCGCCCAGGCCGTAGC GT (SEQ TA (SEQ CGGACGCAAAGGGGCT ID NO: 38) ID NO: CTGATAAAGGGGGCTGT 65) TGACATTGTATAAGCCC GGAACGGTCGAGG (SEQ ID NO: 92) NA 10 100830555 100830658 AGCGGC CTATGC 104 hg38 NA AGCGGCTGAAATTGGTG TGAAATT AGGAAC CGCCTTGTGCTGTGGTC GGTGCG CCGCCG TGGGTGTGTCCCGGAGA CC (SEQ ACCG GGGCGCGCAGGCGCCTA ID NO: 40) (SEQ ID TGTCTGTCGCGGGGCGG NO: 67) TCGGCGGGTTCCTGCAT AG (SEQ ID NO: 96) TXNRD1 12 104215675 104215784 GGCTAT GGAACT 110 hg38 TXNRD1 GGCTATGACTTCGCTGT GACTTCG GACGGA TGTCACCGAGCGCCCCG CTGTTGT GCCGAA CCCACCGCGTTCTCCGA CACC GGA CCCGCGGCCGGCAGGG (SEQ ID (SEQ ID GGCTCGCGGCCTCCGCC NO: 43) NO: 70) AGGCGTCCTTCGGCTCC GTCAGTTCC (SEQ ID NO: 99) THBD 20 23049354 23049500 TCTGACT TTGGGG 147 hg38 THBD TCTGACTGGCATTGAGG GGCATT TCCTGG AAGGTCGCGGGGCCCG GAGGAA TCCTTG GGTAGAGCGCGAAGCA GGTCG GCGC GTCGTGCTCGACGCACT (SEQ ID (SEQ ID GGCTGCCACCCGGCTGC NO: 51) NO: 77) GGCTCTGCGGGTGCGGG GAACCCCAGGCCGGCCA GGGCCAGCGCGCCAAG GACCAGGACCCCAA (SEQ ID NO: 107)
(148) TABLE-US-00019 SEQUENCE TABLE 4 Gene chr start end primer_F primer_R width genome Sequence HOXD8 2 176129896 176130001 GGGGTTT CGCTGC 106 hg38 HOXD8; GGGGTTTGTAAACCGAG GTAAAC CTCCAC HOXD8 GCCAGAGTGTCCCCGTG CGAGGC TGTTTC GGCCGAGCGCACTTTTTT CAGAG CTCTCA CTTGTCCGGGTGCGCTCA (SEQ ID (SEQ ID GTCACTGGTGCCTGAGA NO: 31) NO: 58) GGAAACAGTGGAGGCAG CG (SEQ ID NO: 84) SCGN 6 25652114 25652232 TCCCCAA CGTCCC 119 hg38 SCGN CCCCAAAGCGCAGAGAC AGCGCA TCAGCC AGACAGGATCTGCCAGG GAGACA CTCAGC ACAGCGCGCAGGGCGGG GA (SEQ AA (SEQ GCGGGGACAGGCGCGCC ID NO: 36) ID NO: AGGAGCGGGGCGGGCTT 63) CCAGCCGCTGGTTTTGCT GAGGGCTGAGGGACG (SEQ ID NO: 89) HOXA7 7 27156291 27156403 GTAGGA CCTCGA 113 hg38 HOXA7 GTAGGAGGCGCAGGGCA GGCGCA CCGTTC GGTTGCCGTAGGCGTCG GGGCAG CGGGCT GCGCCCAGGCCGTAGCC GT (SEQ TA (SEQ GGACGCAAAGGGGCTCT ID NO: 38) ID NO: GATAAAGGGGGCTGTTG 65) ACATTGTATAAGCCCGG AACGGTCGAGG (SEQ ID NO: 92) MIR196 7 27169630 27169719 CCAAGG GCCTGG 90 hg38 MIR196B CCAAGGAGAGAACCCTG AGAGAA GGCACT CCATCGCGCCTGGCCCG CCCTGCC CTGTTG GCCCAGCCCAGCCCCTA ATCG CACT GGCAACCTGCGCCCGCC (SEQ ID (SEQ ID AGTGCAACAGAGTGCCC NO: 2) NO: 16) CAGGC (SEQ ID NO: 93) NA 10 100830555 100830658 AGCGGC CTATGC 104 1108 NA AGCGGCTGAAATTGGTG TGAAATT AGGAAC CGCCTTGTGCTGTGGTCT GGTGCG CCGCCG GGGTGTGTCCCGGAGAG CC (SEQ ACCG GGCGCGCAGGCGCCTAT ID NO: 40) (SEQ ID GTCTGTCGCGGGGCGGT NO: 67) CGGCGGGTTCCTGCATA G (SEQ ID NO: 96) TXNRD1 12 104215675 104215784 GGCTAT GGAACT 110 hg38 TXNRD1 GGCTATGACTTCGCTGTT GACTTCG GACGGA GTCACCGAGCGCCCCGC CTGTTGT GCCGAA CCACCGCGTTCTCCGACC CACC GGA CGCGGCCGGCAGGGGGC (SEQ ID (SEQ ID TCGCGGCCTCCGCCAGG NO: 43) NO: 70) CGTCCTTCGGCTCCGTCA GTTCC (SEQ ID NO: 99) HAPLN3 15 88895676 88895781 GTCTCCA CCCCTC 106 hg38 HAPLN3 GTCTCCAGACTCGCTGGG GACTCG CCCAAA AACCACCGCAAAGAGGG CTGGGA CTCTCC TGTGCAAGAGTTGAGGC ACCAC TATTCC CCTCACGTCTTGGGAAA (SEQ ID A (SEQ GGAGAGTAGGGGTGGAA NO: 44) ID NO: TAGGAGAGTTTGGGGAG 71) GGG (SEQ ID NO: 100) THBD 20 23049354 23049500 TCTGACT TTGGGG 147 hg38 THBD TCTGACTGGCATTGAGG GGCATT TCCTGG AAGGTCGCGGGGCCCGG GAGGAA TCCTTG GTAGAGCGCGAAGCAGT GGTCG GCGC CGTGCTCGACGCACTGG (SEQ ID (SEQ ID CTGCCACCCGGCTGCGG NO: 51) NO: 77) CTCTGCGGGTGCGGGGA ACCCCAGGCCGGCCAGG GCCAGCGCGCCAAGGAC CAGGACCCCAA (SEQ ID NO: 107) SDC2 8 96493985 96494062 CTTCAGA GAACGC 78 hg38 SDC2 CTTCAGAGAGCAGCCTTC GAGCAG GGCGCC CCGGAGCACCAACTCCG CCTTCCC CTCTCA TGTCGGGAGTGCAGAAA GG (SEQ CTT CCAACAAGTGAGAGGGC ID NO: 53) (SEQ ID GCCGCGTTC (SEQ ID NO: NO: 79) 109)
(149) TABLE-US-00020 SEQUENCE TABLE 5 Gene chr start end primer_F primer_R width genome Sequence MAST1 19 12867716 12867820 CCCCCTC CTCCAG 105 hg38 MAST1 CCCCCTCCATGCAGCAA CATGCA CAGCGC GCGATTCTCCGCGTCCGA GCAAGC CGAGAA GGCCAGTTTCCTGGAGG (SEQ ID AC (SEQ GAGAGGCCAGTCCCCCT NO: 29) ID NO: TTGGGCGCCCGCCGCCGT 56) TTCTCGGCGCTGCTGGAG (SEQ ID NO: 82) HOXD8 2 176129896 176130001 GGGGTTT CGCTGC 106 hg38 HOXD8; GGGGTTTGTAAACCGAG GTAAAC CTCCAC HOXD8 GCCAGAGTGTCCCCGTG CGAGGC TGTTTC GGCCGAGCGCACTTTTTT CAGAG CTCTCA CTTGTCCGGGTGCGCTCA (SEQ ID (SEQ ID GTCACTGGTGCCTGAGA NO: 31) NO: 58) GGAAACAGTGGAGGCAG CG (SEQ ID NO: 84) PREX1 20 48828337 44828448 TCACCGC GCTTTC 112 hg38 PREX1 TCACCGCGGGCTACGCC GGGCTA CCAGCC ACTCCCACCCGGCACAC CGCCACT CGGTGT GCGACACCCGCCGCGCG (SEQ ID TT (SEQ CAGGCTCCTGCTTGCAGG NO: 33) ID NO: TCCGGCCGCTGCTCGGGC 60) CAAGTAAACACCGGGCT GGGAAAGC (SEQ ID NO: 86) AMOTL2 3 134364403 134364512 CCATGG TTCCCT 110 hg38 AMOTL2 CCATGGCTTCCTTTCTTT CTTCCTT GAGGAG GGCAGAGTCAGGCTCCA TCTTTGG TCTGGG GAAGTCCGCCTTCCTCCA CAGA GAGGAG CAGGCACCC (SEQ ID (SEQ ID TAATCTGCCGTGCCCTTG NO: 34) NO: 61) CAGCTTCTCCTCCCCAGA CTCCTCAGGGAA (SEQ ID NO: 87) SCGN 6 25652114 25652232 TCCCCAA CGTCCC 119 hg38 SCGN CCCCAAAGCGCAGAGAC AGCGCA TCAGCC AGACAGGATCTGCCAGG GAGACA CTCAGC ACAGCGCGCAGGGCGGG GA (SEQ AA (SEQ GCGGGGACAGGCGCGCC ID NO: 36) ID NO: AGGAGCGGGGCGGGCTT 63) CCAGCCGCTGGTTTTGCT GAGGGCTGAGGGACG (SEQ ID NO: 89) HOXA7 7 27155916 27156027 TCGAAC CGGCGC 112 hg38 HOXA7 TCGAACCCATTAATTGGG CCATTAA AGCCTT CCATAAAAAGTTTTATGA TTGGGCC TCTGGT GCCTCATTTACATACAAT ATA (SEQ TT (SEQ GCTATGGGCTCCACGCA ID NO: 1) ID NO: ATGGCGCCTCCGCTCCAA 15) TTAAAACCAGAAAGGCT GCGCCG (SEQ ID NO: 90) MIR96B 7 27169630 27169719 CCAAGG GCCTGG 90 hg38 MIR196B CCAAGGAGAGAACCCTG AGAGAA GGCACT CCATCGCGCCTGGCCCG CCCTGCC CTGTTG GCCCAGCCCAGCCCCTA ATCG CACT GGCAACCTGCGCCCGCC (SEQ ID (SEQ ID AGTGCAACAGAGTGCCC NO: 2) NO: 16) CAGGC (SEQ ID NO: 93) TNFRSF10D 8 23163949 23164031 GGTGGA TGGCAG 83 hg38 TNFRSF10D; GGTGGATCGAAAGCGCC TCGAAA TGTAGC TNFRSF10D AAAAATCAATCAGAAAT GCGCCA TGCGAG CGTCCCCGTAGTTTGTGC AA (SEQ AACC GCGTGCAAAGGTTCTCG ID NO: 39) (SEQ ID CAGCTACACTGCCA (SEQ NO: 66) ID NO: 94) NA 10 100830555 100830658 AGCGGC CTATGC 104 hg38 NA AGCGGCTGAAATTGGTG TGAAATT AGGAAC CGCCTTGTGCTGTGGTCT GGTGCG CCGCCG GGGTGTGTCCCGGAGAG CC (SEQ ACCG GGCGCGCAGGCGCCTAT ID NO: 40) (SEQ ID GTCTGTCGCGGGGCGGT NO: 67) CGGCGGGTTCCTGCATA G (SEQ ID NO: 96) TXNRD1 12 104215675 104215784 GGCTAT GGAACT 110 hg38 TXNRD1 GGCTATGACTTCGCTGTT GACTTCG GACGGA GTCACCGAGCGCCCCGC CTGTTGT GCCGAA CCACCGCGTTCTCCGACC CACC GGA CGCGGCCGGCAGGGGGC (SEQ ID (SEQ ID TCGCGGCCTCCGCCAGG NO: 43) NO: 70) CGTCCTTCGGCTCCGTCA GTTCC (SEQ ID NO: 99) HAPLN3 15 88895676 88895781 GTCTCCA CCCCTC 106 hg38 HAPLN3 GTCTCCAGACTCGCTGGG GACTCG CCCAAA AACCACCGCAAAGAGGG CTGGGA CTCTCC TGTGCAAGAGTTGAGGC ACCAC TATTCC CCTCACGTCTTGGGAAA (SEQ ID A (SEQ GGAGAGTAGGGGTGGAA NO: 44) ID NO: TAGGAGAGTTTGGGGAG 71) GGG (SEQ ID NO: 100) CYBA 16 88651139 88651205 CATCTGT TATGCC 67 hg38 CYBA CATCTGTAGGGTGCAGG AGGGTG TCGGCG GCTGTCCCGGAGCCTTCT CAGGGC TGGCTA GCCCCCGCCCTCTCTAGC TGTCC GAGAGG CACGCCGAGGCATA (SEQ. (SEQ ID (SEQ ID NO: 101) NO: 45) NO: 72) ZNF582 19 56393606 56393725 TCCGGG GAACAG 120 hg38 ZINF582 TCCGGGAAACATAGTCTT AAACAT CACTCC TAGGCGTAAAGGCAGCA AGTCTTT TCCGCG GCCCGGCCTTGAAGCCG AGGCGT CACTG GATCTCGCGATGTTTCAG (SEQ ID (SEQ ID GGTGAGCCGGACGCAGG NO: 47) NO: 74) CGTGCCTGCGCAGTGCG CGGAGGAGTGCTGTTC (SEQ ID NO: 103) THBD 20 23049354 23049500 TCTGACT TTGGGG 147 hg38 THBD TCTGACTGGCATTGAGG GGCATT TCCTGG AAGGTCGCGGGGCCCGG GAGGAA TCCTTG GTAGAGCGCGAAGCAGT GGTCG GCGC CGTGCTCGACGCACTGG (SEQ ID (SEQ ID CTGCCACCCGGCTGCGG NO: 51) NO: 77) CTCTGCGGGTGCGGGGA ACCCCAGGCCGGCCAGG GCCAGCGCGCCAAGGAC CAGGACCCCAA (SEQ ID NO: 107) SDC2 8 96493985 96494062 CTTCAGA GAACGC 78 hg38 SDC2 CTTCAGAGAGCAGCCTTC GAGCAG GGCGCC CCGGAGCACCAACTCCG CCTTCCC CTCTCA TGTCGGGAGTGCAGAAA GG (SEQ CTT CCAACAAGTGAGAGGGC ID NO: 53) (SEQ ID GCCGCGTTC (SEQ ID NO: NO: 79) 109) FGF14 13 102394577 102394651 CAACGG CTCGCC 75 hg38 FGF14 CAACGGAAACTTCCCGC AAACTTC GGGGGC GCTACGGCGGCTCCAAC CCGCGCT TTCGCT GGGCCGCTTCCGCCGCAT AC (SEQ AC (SEQ TGCGTAGCGAAGCCCCC ID NO: 54) ID NO: GGCGAG (SEQ ID NO: 110) 80) CDKN2A 9 21970919 21971017 GCATCTA CGTGGA 99 hg38 CDKN2A GCATCTATGCGGGCATG TGCGGG CCTGGC GTTACTGCCTCTGGTGCC CATGGTT TGAGGA CCCCGCAGCCGCGCGCA ACTG GCTG GGTACCGTGCGACATCG (SEQ ID (SEQ ID CGATGGCCCAGCTCCICA NO: 55) NO: 81) GCCAGGTCCACG (SEQ ID NO: 111)
(150) TABLE-US-00021 SEQUENCE TABLE 6 annotations chr start end Width genome primer_F primer_R annotations sequence DNM3 1 171841774 171841857 84 hg39 CAGAGCGCC CCCCACTGC DNM3 CAGAGCGCCGGC GGCAAGAGC CGCATCCTT AAGAGCTCGGTG (SEQ ID NO: AC (SEQ ID CTCGAGAACTTCG 4) NO: 18) TGGGCAGGTAAG CGCGCAGGGCGC GGAGTAAGGATG CGGCAGTGGGG (SEQ ID NO: 112) NA 5 73436635 73436710 76 hg47 CAGGCTGGG GGGGTAGCG NA CAGGCTGGGCGG CGGTCTTTG GGTGCTTCC TCTTTGACCCCCC AC (SEQ ID AG (SEQ ID GCGCCTCCCGCCC NO: 8) NO: 22) ACAGCCGGAGCC CGGCAGCTGGAA GCACCCGCTACCC C (SEQ ID NO: 113) HOXA7 7 27155916 27156027 112 hg50 TCGAACCCA CGGCGCAGC HOXA7 TCGAACCCATTAA TTAATTGGG CITICTGGT TTGGGCCATAAA CCATA (SEQ TT (SEQ ID AAGTTTTATGAGC ID NO: 1) NO: 15) CTCATTTACATAC AATGCTATGGGCT CCACGCAATGGC GCCTCCGCTCCAA TTAAAACCAGAA AGGCTGCGCCG (SEQ ID NO: 90) MIR196B 7 27169630 27169719 90 hg53 CCAAGGAGA GCCTGGGGC MIR196B CCAAGGAGAGAA GAACCCTGC ACTCTGTTG CCCTGCCATCGCG CATCG (SEQ CACT (SEQ CCTGGCCCGGCCC ID NO: 2) ID NO: 16) AGCCCAGCCCCT AGGCAACCTGCG CCCGCCAGTGCA ACAGAGTGCCCC AGGC (SEQ ID NO: 93) DLX6-AS1 7 97014186 97014266 81 hg49 CAAGACCTG TTGCAGGCT DLX6-AS1 CAAGACCTGGCG GCGCATCTT GGATTAGGA CATCTTTGCAAAT TGC (SEQ ID TGC (SEQ ID TACAGATAATTGT NO: 9) NO: 23) AAACGTCCAGAT TATGATAATAGC ATCCTAATCCAGC CTGCAA (SEQ ID NO: 114) TNFRSF10D 8 23163995 23164099 105 hg55 TTGTGCGCG GCGGGAAG TNFRSF10D TTGTGCGCGTGCA TGCAAAGGT GGAGTACAA AAGGTTCTCGCA TC (SEQ ID CTGACC GCTACACTGCCA NO: 3) (SEQ. ID NO: GAATAGAACGTG 17) CTCCTCCGCTTTT ATACCCCGGAAA AAAGGCGTGGTC AGTTGTACTCCCT TCCCGC (SEQ ID NO: 95) C1orf230 1 151721583 151721679 97 hg38 TTAGCGCAG CCCAGTCCT RIIAD1, TTAGCGCAGCGC CGCAGCTGG GGGGCAGCT CELF3 AGCTGGAGCAGC AG (SEQ ID ACA (SEQ ID TGCGAAAATTCA NO: 5) NO: 19) AGGTGGGTGCGC CCGCGCCCCCATC CAGCGTCCACCA AAGTGTAGCTGC CCCAGGACTGGG (SEQ ID NO: 115) DI030S, 14 101561395 101515056 111 hg66 TCCGGGCTC GCGAGGCAT DIO3OS, TCCGGGCTCAAGT MIR1247 AAGTTGCAA CTGGGCTTC MIR1247 TGCAAGGGGGCG GG (SEQ ID AG (SEQ ID GGCCGGGCCGGA NO: 10) NO: 24) GGTGGAGTCTCCC GCCAATTGAAGC CTCCGCTATAAAT TGAACTCCCTGCA CTGCTGAAGCCC AGATGCCTCGC (SEQ ID NO: 116) GSG1L 16 28063861 28063964 104 hg69 CCGAAAGAA GGTTTTGTT GSG1L CCGAAAGAAATC ATCCGAGCC GCCCCACGT CGAGCCAGGGTG AGGGTGA CC (SEQ ID AGGGTCTGAGAC (SEQ ID NO: NO: 25) GCAAGGAGAATC 11) CCAG GCAAGGCGCTCC TGAGAAAAGATC CCCACGGCGGAC GTGGGGCAACAA AACC (SEQ ID NO: 117) ZNF568 19 36916252 36916371 120 hg77 GCCCAAGCC CGAACCATC ZNF568 GCCCAAGCCTCA TCACCCTCA CCTCCGCGC CCCTCACACAGG CACAG (SEQ CA (SEQ ID AAAGCAGATGTG ID NO: 6) NO: 20) TTCTGGCCGGAA GTTGAGTGGGGC CGCGGGGCCTGC TGGGAGGTGTTGT CCTCGGAAACGT CGCTGGCGCGGA GGGATGGTTCG (SEQ ID NO: 118) C9orf50 9 129620787 129620870 84 hg57 AGAGTAGCC GGCACTGTA C9orf50 AGAGTAGCCAAC AACTTTGGG CCGAGCTTG TTTGGGGGTTGCT GGTTGGT CTGTTCT GTGACGTTTAAAT (SEQ ID NO: (SEQ ID NO: GAGCAAGTACAT 7) 21) GCCAGTCTTAGA ACAGCAAGCTCG GTACAGTGCC (SEQ ID NO: 119) LONRF2 2 100322387 100322463 77 hg43 CTCTCAGTC GCAAGAGAC LONRF2 CTCTCAGTCCCGC CCGCCGGCT GCGGACCTG CGGCTTAGGTAA TAGGTA GAGC (SEQ CCCAGGTCGCTGC (SEQ ID NO: ID NO: 26) GGTAACGCAGTG 12) ACCGCGCTCCAG GTCCGCGTCTCTT GC (SEQ ID No: 120) PCDH9 13 67231171 67231265 95 hg65 GCGTGCGAA CTCAGGTTT PCDH9 GCGTGCGAAGTC GTCTCCTCT CCAGGCGCG TCCTCTAGCGGAG AGCGGA GCT (SEQ ID CGGGACCGGCCG (SEQ ID NO: NO: 27) CGGCGGTGGATC 13) GTGGCGGTCCCTG CACTTCTGCTCCA GCCGCGCCTGGA AACCTGAG (SEQ ID NO: 121) GFPT2 5 180353729 180353815 87 hg46 CGTAAGGGG CTCAGATGG GFPT2 CGTAAGGGGCAG CAGAGCGAG GAGCGCGGC AGCGAGGGGICC GGGT (SEQ AGGAA (SEQ GGCATCACTCGC ID NO: 14) ID NO: 28) GCGCTCCGGAAA CCC GCGTGAGCCGCT GTTCCTGCCGCGC TCCCATCTGAG (SEQ ID NO: 122)
(151) TABLE-US-00022 SEQUENCE TABLE 7 annotations chr start end Width genome primer_F primer_R annotations sequence HOXA7 7 27155916 27156027 112 hg50 TCGAACCCA CGGCGCAGC HOXA7 TCGAACCCATTAATTG TTAATTGGG CTTTCTGGT GGCCATAAAAAGTTTT CCATA (SEQ TT (SEQ ID ATGAGCCTCATTTACA ID NO: 1) NO: 15) TACAATGCTATGGGCT CCACGCAATGGCGCCT CCGCTCCAATTAAAAC CAGAAAGGCTGCGCC G (SEQ ID NO: 90) MIR196B 7 27169630 27169719 90 hg53 CCAAGGAGA GCCTGGGGC MIR196B CCAAGGAGAGAACCC GAACCCTGC ACTCTGTIG TGCCATCGCGCCTGGC CATCG (SEQ CACT (SEQ CCGGCCCAGCCCAGCC ID NO: 2) ID NO: 16) CCTAGGCAACCTGCGC CCGCCAGTGCAACAG AGTGCCCCAGGC (SEQ. ID NO: 93) TNFRSF10D 8 23163995 23164099 105 hg55 TTGTGCGCG GCGGGAAG TNFRSF10D TTGTGCGCGTGCAAAG TGCAAAGGT GGAGTACAA GTTCTCGCAGCTACAC TC (SEQ ID CTGACC TGCCAGAATAGAACGT NO: 3) (SEQ ID NO: GCTCCTCCGCTTTTAT 17) ACCCCGGAAAAAAGG CGTGGTCAGTTGTACT CCCTTCCCGC (SEQ ID NO: 95) DNM3 1 171841774 171841857 84 hg39 CAGAGCGCC CCCCACTGC DNM3 CAGAGCGCCGGCAAG GGCAAGAGC CGCATCCTT AGCTCGGTGCTCGAGA (SEQ ID NO: AC (SEQ ID ACTTCGTGGGCAGGTA 4) NO: 18) AGCGCGCAGGGCGCG GAGTAAGGATGCGGC AGTGGGG (SEQ ID NO: 112) C1orf230 1 151721583 151721679 97 hg38 TTAGCGCAG CCCAGTCCT RIIAD1, TTAGCGCAGCGCAGCT CGCAGCTGG GGGGCAGCT CELF3 GGAGCAGCTGCGAAA AG (SEQ ID ACA (SEQ ID ATTCAAGGTGGGTGCG NO: 5) NO: 19) CCCGCGCCCCCATCCA GCGTCCACCAAAGTGT AGCTGCCCCAGGACTG GG (SEQ ID NO: 115) ZNF568 19 36916252 36916371 120 hg77 GCCCAAGCC CGAACCATC ZNF568 GCCCAAGCCTCACCCT TCACCCTCA CCTCCGCGC CACACAGGAAAGCAG CACAG (SEQ CA (SEQ ID ATGTGTTCTGGCCGGA ID NO: 6) NO: 20) AGTTGAGTGGGGCCGC GGGGCCTGCTGGGAG GTGTTGTCCTCGGAAA CGTCGCTGGCGCGGAG GGATGGTTCG (SEQ ID NO: 118) C9orf50 9 129620787 129620870 84 hg57 AGAGTAGCC GGCACTGTA C9orf50 AGAGTAGCCAACTTM AACTTTGGG CCGAGCTTG GGGGTTGCTGTGACGT GGTTGCT CTGTTCT TTAAATGAGCAAGTAC (SEQ ID NO: (SEQ ID NO: ATGCCAGTCTTAGAAC 7) 21) AGCAAGCTCGGTACA GTGCC (SEQ ID NO: 119)