DIAGNOSTIC ASSAY FOR URINE MONITORING OF BLADDER CANCER
20230040907 · 2023-02-09
Inventors
- Trevor LEVIN (South San Francisco, CA, US)
- Carly KING (South San Francisco, CA, US)
- Kevin PHILLIPS (South San Francisco, CA, US)
- Martha EVANS-HOLM (South San Francisco, CA, US)
- Yueyao LI (South San Francisco, CA, US)
Cpc classification
G16B25/10
PHYSICS
G16B25/00
PHYSICS
G16B25/20
PHYSICS
International classification
G16B25/00
PHYSICS
G16B25/10
PHYSICS
Abstract
An improved diagnostic assay and methods relating to the same that are directed to mutation focused disease diagnosis and surveillance biomarker panels wherein potential genomic regions are selected based on their ability to encompass the genomic diversity of a patient population, maximize the number of unique markers monitored within each patient are maximized while balancing these factors with empirical sequencing performance, geographic clustering of events with a region across diverse patients, and size and cost associated with measuring the respective genomic region. The methods also include quality control steps to reduce noise and
Claims
1. A method of diagnosis and/or monitoring bladder cancer and/or bladder cancer recurrence in a subject comprising: (a) contacting a urine sample from the subject with a preservation buffer; (b) extracting total nucleic acid from the buffered sample, (c) performing nucleic acid fragmentation of the extracted nucleic acid, (d) ligating sequencing adapters to the fragmented nucleic acid, (e) separating the fragmented adapter-ligated nucleic acid based on fragment size, (f) amplifying the fragmented adapter-ligated nucleic acid, (g) sequencing the amplified nucleic acid to obtain nucleic acid sequence data, and (h) detecting the present or absence of at least one mutation or epigenetic alteration in the MML2 gene and optionally at least one other gene associated with bladder cancer in the nucleic acid sequence data, thereby diagnosis and/or monitoring bladder cancer and/or bladder cancer recurrence in a subject.
2. The method of claim 1, wherein the at least one other gene associated with bladder cancer is selected from the group consisting of KDM6A, TSC1, NOTCH2, PTEN, TP53, NOTCH 1, CDKN2A, RBI, ATM, ERBB2, PIK3CA, FGFR3, EGFR, FGFR1, CREBBP, LRP1B, MYC, ARID 1A, MLL3, BIRC3, WWOX, PALB2, SOX4, YAP1, CCND1, BCL2L1, MYCL1, MDM4, FGF3, MDM2, CCNE1, ZNF703, PRKCI, NCOR1, YWHAZ, PPARG, TBL1XR1, PDE4D, IKZF2, SPAG1, E2F3, NIT1, BEND3, GDI2, PVLR4, CCSER1, TERT Promoter, SPTAN1, HRAS, CTNNB 1, FBXW7, EP300, RHOA, CCND3, NOS 1AP, ELF3, PTPRD, STAG2, ERBB3, CDKN1A, NFE2L2, AIRE, BTG2, TTC28, IKZF3, FHIT, SHANK2, ERCC2, TPTE, KLF5, FOXA1, PON3, RXRA, ZFP36L1, GPC5, PCSK5, CTIF, FOXQ1, TIMM9, CX3CL1, TXNIP, RHOB, PAIP1, PHACTR1, CDKAL1, TACC3, ASXL2, HORMAD1, PHLDA3, MILPOL1, ZFR2, PIGH, WRB, MRO, STYX, MDFIC, ERMN, RND3 and a combination thereof.
3. The method of claim 1, further comprising: (i) using adapter sequences from step (d) to identify molecular clonal families within a diverse population of adapter ligated amplified nucleic acid molecules, and (ii) distinguishing amplification errors and sequencing errors from mutations or epigenetic alteration present in a gene, wherein in a clonal family a predominant base call at a location is defined as a true base call and a base call not present in a majority of the amplified nucleic acid molecules of a clonal family is replaced by the predominant call, wherein a comparison of the base call in a clonal family to a reference indicates the presence of a mutation or epigenetic alteration, and wherein a base call not present in a majority of the amplified nucleic acid molecules of a clonal family correspond to a amplification and/or sequencing error.
4. The method of claim 1, wherein nucleic acid fragmentation is performed by a mechanical fragmentation technique such as ultrasonication, enzyme-based fragmentation, a restriction enzyme, and/or a cocktail of restriction enzymes.
5. The method of claim 1, wherein nucleic acid fragmentation is used to fragment nucleic acids that are greater than 1,000 bp.
6. The method of claim 1, wherein nucleic acid fragmentation is used to fragment nucleic acids that are greater than 1,000 bp to create fragments in the 500-600 bp range.
7. The method of claim 1, wherein nucleic acid fragmentation is used to fragment nucleic acids that are in the 5,000-10,000 bp range to create fragments in the 500-600 bp range.
8. The method of claim 1, wherein the adapter of step (d) comprises an 8-base pair sample barcode.
9. The method of claim 1, wherein the adapter of step (d) comprises one or more 6-10 nucleotide length sequences that are either degenerate or random or are a uniquely defined sequence.
10. The method of claim 1, wherein separating the fragmented adapter-ligated nucleic acid based on fragment size comprises passing the fragmented adapter-ligated nucleic acid over a size selection column, treatment of the fragmented adapter-ligated nucleic acid with carboxylated para-magnetic beads, capillary gel electrophoresis, gel electrophoresis and anion exchange.
11. A method of preparing nucleic acid from a urine sample for nucleic acid analysis comprising: (a) contacting the urine sample with a urine preservation buffer (b) extracting total nucleic acid from the buffered sample, (c) performing nucleic acid fragmentation, (d) ligating sequencing adapters to the fragmented nucleic acid, (e) separating the fragmented adapter-ligated nucleic acid based on fragment size, amplifying the fragmented adapter-ligated nucleic acid, and (g) sequencing the amplified nucleic acid, thereby preparing nucleic acid from a urine sample for nucleic acid analysis.
12. The method of claim 11, further comprising: (i) using adapter sequences from step (d) to identify molecular clonal families within a diverse population of adapter ligated amplified nucleic acid molecules, and (ii) distinguishing amplification errors and sequencing errors from mutations or epigenetic alteration present in a gene, wherein in a clonal family a predominant base call at a location is defined as a true base call and a base call not present in a majority of the amplified nucleic acid molecules of a clonal family is replaced by the predominant call, wherein a comparison of the base call in a clonal family to a reference indicates the presence of a mutation or epigenetic alteration, and wherein a base call not present in a majority of the amplified nucleic acid molecules of a clonal family correspond to a amplification and/or sequencing error.
13. The method of claim 11, wherein said nucleic acid fragmentation is performed by a mechanical fragmentation technique such as ultrasonication, enzyme-based fragmentation, a restriction enzyme, and/or a cocktail of restriction enzymes.
14. The method of claim 11, wherein said nucleic acid fragmentation is used to fragment nucleic acids that are greater than 1,000 bp.
15. The method of claim 11, wherein said nucleic acid fragmentation is used to fragment nucleic acids that are greater than 1,000 bp to create fragments in the 500-600 bp range.
16. The method of claim 11, wherein said nucleic acid fragmentation is used to fragment nucleic acids that are in the 5,000-10,000 bp range.
17. The method of claim 11, wherein said nucleic acid fragmentation is used to fragment nucleic acids that are in the 5,000-10,000 bp range to create fragments in the 500-600 bp range.
18. The method of claim 11, wherein the adapter of step (d) comprises an 8-base pair sample barcode.
19. The method of claim 11, wherein the adapter of step (d) comprises one or more 6-10 nucleotide length sequences that are either degenerate or random or are a uniquely defined sequence.
20. The method of claim 11, wherein separating the fragmented adapter-ligated nucleic acid based on fragment size comprises passing the fragmented adapter-ligated nucleic acid over a size selection column, treatment of the fragmented adapter-ligated nucleic acid with carboxylated para-magnetic beads, capillary gel electrophoresis, gel electrophoresis and anion exchange.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0030] These and other features, aspects, and advantages of the present embodiments will become better understood with reference to the following description and appended claims, and accompanying drawings where:
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
DETAILED DESCRIPTION OF THE INVENTION
[0041] In the description that follows, a number of terms are extensively utilized. In order to provide a clearer and consistent understanding of the specification and claims, including the scope to be given such terms, the following definitions are provided.
[0042] The use of the word “a” or “an” when used in conjunction with the term “comprising,” “including,” “having” or “containing,” or other tenses thereof, in the claims and/or the specification may mean “one,” but are also consistent with the meaning of “one or more,” “at least one,” and “one or more than one” or “a plurality.”
[0043] Throughout the written description hereof (which includes the claims), the term “about” is used to indicate that a value includes the standard deviation of error for the device or method being employed to determine the value.
[0044] The use of the term “or” in the claims is used to mean either “and” or “or” (“and/or”) unless explicitly indicated to refer to alternatives only or if alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and/or.”
[0045] As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.
[0046] It also is specifically understood that any numerical value recited herein includes all values from the lower value to the upper value, inclusive of such values, and that all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this written description (which includes the claims). For example, if a range is stated as 1% to 50%, it is intended that values such as 2% to 40%, 10% to 30%, or 1% to 3%, etc., are expressly enumerated in the specification and claims.
[0047] “Contacting” refers to the process of bringing into contact at least two distinct species such that they can react. It should be appreciated, however, the resulting reaction product can be produced directly from a reaction between the added reagents or from an intermediate from one or more of the added reagent which can be produced in the reaction mixture.
[0048] A “Single Nucleotide Polymorphism” or “SNP” is a DNA sequence variation occurring when a single nucleotide at a specific location in the genome differs between members of a species or between paired chromosomes in an individual. Most SNP polymorphisms have two alleles. Each individual is in this instance either homozygous for one allele of the polymorphism (i.e. both chromosomal copies of the individual have the same nucleotide at the SNP location), or the individual is heterozygous (i.e. the two sister chromosomes of the individual contain different nucleotides). The SNP nomenclature as reported herein refers to the official Reference SNP (rs) ID identification tag as assigned to each unique SNP by the National Center for Biotechnological Information (NCBI).
[0049] “Nucleic acid,” “oligonucleotide,” and “polynucleotide” refer to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. The term nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene.
[0050] The term “susceptibility”, as described herein, refers to the proneness of an individual towards the development of a certain state (e.g., a certain trait, phenotype or disease, e.g., bladder cancer), or towards being less able to resist a particular state than the average individual. The term encompasses both increased susceptibility and decreased susceptibility. Thus, particular mutations in certain genes of certain embodiments as described herein may be characteristic of increased susceptibility (i.e., increased risk) of bladder cancer, as characterized by a relative risk (RR) or odds ratio (OR) of greater than one for the particular mutation, allele or haplotype. Alternatively, the mutations or combinations thereof of certain embodiments are characteristic of decreased susceptibility (i.e., decreased risk) of bladder cancer, as characterized by a relative risk of less than one.
[0051] An “indel” is a common form of polymorphism comprising a small insertion or deletion that is typically only a few nucleotides long.
[0052] A “computer-readable medium”, is an information storage medium that can be accessed by a computer using a commercially available or custom-made interface. Exemplary computer-readable media include memory (e.g., RAM, ROM, flash memory, etc.), optical storage media (e.g., CD-ROM), magnetic storage media (e.g., computer hard drives, floppy disks, etc.), punch cards, or other commercially available media. Information may be transferred between a system of interest and a medium, between computers, or between computers and the computer-readable medium for storage or access of stored information. Such transmission can be electrical, or by other available methods, such as IR links, wireless connections, etc.
[0053] The word “subject” includes human, animal, avian, e.g., horse, donkey, pig, mouse, hamster, monkey, chicken, sheep, cattle, goat, buffalo.
[0054] Reference to “neoplasm” or “cancer” should be understood as a reference to a lesion, tumor or other encapsulated or unencapsulated mass or other form of growth which comprises neoplastic or cancer cells. A “cancer cell” should be understood as a reference to a cell exhibiting abnormal growth. The term “growth” should be understood in its broadest sense and includes reference to proliferation. In this regard, an example of abnormal cell growth is the uncontrolled proliferation of a cell. Another example is failed apoptosis in a cell, thus prolonging its usual life span. The neoplastic cell may be a benign cell or a malignant cell. In a certain embodiment, the subject neoplasm is a bladder tumor.
[0055] Reference to “DNA region” should be understood as a reference to a specific section of genomic DNA. These DNA regions are specified either by reference to a gene name or a set of chromosomal coordinates. Both the gene names and the chromosomal coordinates would be well known to, and understood by, the person of skill in the art. The chromosomal coordinates presented herein correspond to the Hg19 version of the genome. In general, a gene can be routinely identified by reference to its name, via which both its sequences and chromosomal location can be routinely obtained, or by reference to its chromosomal coordinates, via which both the gene name and its sequence can also be routinely obtained.
[0056] In reference to genes/DNA, the following should be noted as well. Reference to each of the genes/DNA regions detailed herein are understood as a reference to all forms of these molecules and to fragments or variants thereof. As would be appreciated by the person of skill in the art, some genes are known to exhibit allelic variation between individuals or single nucleotide polymorphisms. SNPs encompass insertions and deletions of varying size and simple sequence repeats, such as dinucleotide and trinucleotide repeats. Variants include nucleic acid sequences from the same region sharing at least 90%, 95%, 98%, 99% sequence identity i.e. having one or more deletions, additions, substitutions, inverted sequences etc. relative to the DNA regions described herein. Accordingly, certain present embodiments should be understood to extend to such variants which, in terms of the present diagnostic applications, achieve the same outcome despite the fact that minor genetic variations between the actual nucleic acid sequences may exist between individuals. The present embodiments should therefore be understood to extend to all forms of DNA which arise from any other mutation, polymorphic or allelic variation.
[0057] Cancer diagnosis as described herein refers to determining or classifying the nature of the cancer state, e.g., the mutational or genetic phenotype of a cancer or tumor, the clinical stage of a cancer associated with its progression, and/or the metastatic nature of the cancer. Cancer diagnosis based on genetic phenotyping can help guide proper therapeutic intervention as described herein.
[0058] Cancer prognosis as described herein includes determining the probable progression and course of the cancerous condition, and determining the chances of recovery and survival of a subject with the cancer, e.g., a favorable prognosis indicates an increased probability of recovery and/or survival for the cancer patient, while an unfavorable prognosis indicates a decreased probability of recovery and/or survival for the cancer patient. A subject's prognosis can be determined by the availability of a suitable treatment (i.e., a treatment that will increase the probability of recovery and survival of the subject with cancer). This aspect of certain present embodiments may further include selecting a suitable cancer therapeutic based on the determined prognosis and administering the selected therapeutic to the subject.
[0059] Prognosis also encompasses the metastatic potential of a cancer. For example, a favorable prognosis based on the presence or absence of a genetic phenotype can indicate that the cancer is a type of cancer having low metastatic potential, and the patient has an increased probability of long term recovery and/or survival. Alternatively, an unfavorable prognosis, based on the presence or absence of a genetic phenotype can indicate that the cancer is a type of cancer having a high metastatic potential, and the patient has a decreased probability of long term recovery and/or survival. Prognosis is in part assessed by pathologic grade and stage. Wherein grade is defined as papilloma, or low grade, or high grade based on standards set by the American Joint Committee on Cancer. Wherein stage is defined by the Tumor, Node, Metastasis (TNM) staging system. For example, tumor stage may be defined as T, T0, Ta, Tis, T1, T2, T2a, T2b, T3, T3a, T3b, T4a, T4b. For example, node stage may be defined as NX, NO, N1, N2, N3. For example, metastasis stage may be defined as M0, M1. In one embodiment, genomic phenotypes or combinations of one or more mutations or epigenetic alterations may be compared to a database containing genomic phenotypes and staging information and wherein this comparison approximates tumor stage and grade by computational measurement of urine genomic phenotypic similarity to other tumors with known stage, grade, and patients' outcomes in the database.
[0060] Another aspect of certain present embodiments is directed at identification of the type of bladder cancer present. Bladder cancer can be defined as transitional cell type or urothelial cancer, squamous cell bladder cancer, adenocarcinoma of the bladder, sarcoma of the bladder, small cell cancer of the bladder. In one aspect of the present embodiments, genomic phenotypes or combinations of one or more mutations or epigenetic alterations or one or more mutations or epigenetic alterations may be compared to a database containing genomic phenotypes and defining cancer cell type information and wherein this comparison approximates tumor cell type by computational measurement of urine genomic phenotypic similarity to other tumors with known cell type in the database. In another aspect of the present embodiments, genomic phenotypes or combinations of one or more epigenetic alterations may be used to generate in silico models approximating the tumor microenvironment and relative abundance of non-cancerous cells which may modulate the activity and biology of cancer cells.
[0061] Another aspect of certain present embodiments is directed to a method of monitoring cancer progression in a subject that involves obtaining first and second urine samples containing nucleic acid, at different points in time, from the subject having cancer. The nucleic acid in the samples is contacted with one or more reagents suitable for detecting the presence or absence of one or more mutations and/or epigenetic alterations in one or more genes associated with bladder cancer, and the presence or absence of the one or more mutations and/or epigenetic alterations in the one or more genes associated with bladder cancer is detected. The method further involves comparing the presence or absence of the one or more mutations and/or epigenetic alterations detected in the first urine sample nucleic acid to the presence or absence of the one or more mutations and/or epigenetic alterations detected in the second urine sample nucleic acid and monitoring cancer progression in the subject based on the comparison.
[0062] A change in the mutational and/or epigenetic alterations status of one or more genes associated with bladder cancer, for example, detecting the presence of a mutation and/or epigenetic alterations in the second urine sample whereas no mutation and/or epigenetic alteration was detected in the first urine sample, indicates that a change in the cancer phenotype has occurred with disease progression. This change may have therapeutic implications, i.e., it may signal the need to change the subject's course of treatment. The change can also be indicative of the progression of the cancer to a metastatic phenotype. Therefore, periodic monitoring of urine nucleic acid mutational and/or epigenetic status provides a means for detecting primary tumor progression, metastasis, and facilitating optimal targeted or personalized treatment of the cancerous condition.
[0063] The time between obtaining a first urine nucleic acid sample and a second, or any additional subsequent urine nucleic acid samples can be any desired period of time, for example, weeks, months, years, as determined is suitable by a physician and based on the characteristics of the primary tumor (tumor type, stage, location, etc.). In one embodiment of this aspect, the first sample is obtained before treatment and the second sample is obtained after treatment. Alternatively, both samples can be obtained after one or more treatments; the second sample obtained at some point in time later than the first sample. Alternatively, one or more samples can be obtained before presence of disease.
[0064] Mutations and/or epigenetic alterations in several genes have been shown to be associated with bladder cancer. Table 1 shows a list of genes from which to choose for assaying mutations and/or epigenetic alterations related to bladder cancer. Mutations can include insertions, deletions, duplications, amplifications, and translocations. Epigenetic features can include methylation of cytosine nucleotides. Other genes found to be associated with bladder cancer can also be used in a present embodiment based on empirical validation. Using individually synthesized DNA or RNA hybridization probes allows for modularity of hybrid capture libraries and iterative optimization (removal/addition of probes) based on empirical validation. Specificity of capture probes can be addressed computationally during the design of probes but also during sequencing validation. In a CLIA lab setting, an exemplary approach for validating inconclusive or unexpected results has been to complement hybrid capture with a secondary PCR amplicon based enrichment approach to provide coverage of regions not amenable to hybrid capture and to confirm novel results. Massively parallel amplification systems such as RainDance, AmpliSeq, and Wafergen provide high efficiency and uniformity for amplicon library preparation.
[0065] Any known methods for isolating cells from urine, of isolating cell-free nucleic acid in urine as well as nucleic acid from cells found in urine, are incorporated herein in their respective entireties. A urine preservation buffer may contain the following classes of reagents, microbial static agents such as EDTA, Isothiazolinone and/or its derivatives such as Methylisothiazolinone, antibiotics, pH Buffering reagents such as Tris salt, DNAse/RNAse inhibitors such as EDTA and Aurintricarboxylic acid, modifiers of nucleic acid hydration including chaotropic salts such as Guanidinium thiocyanate, Ammonium Acetate, Sodium Acetate, Sodium Dodecyl Sulfate. In one aspect, a urine preservation buffer results indicate preservation of DNA for at least 1 week at room temperature. Other buffers can be used per the knowledge and skill in the art. In one embodiment, the buffers and reagents are optimized to avoid co-precipitation of salts which inhibit many enzyme-based reactions such as PCR or ligation while simultaneously maximizing high yield from the sample.
[0066] Cancer markers can be identified in both cell-associated and cell-free nucleic acids within urine. As shown in
[0067] According to embodiments of the invention, theses respective patient profiles are determined using nucleic acid data, categorized and then compared to control groups of both heathy patients and previously determined patient profiles characterized by having bladder cancer.
[0068]
[0069] Now referring to
[0070] When library preparation efficiency is poor (poor ligation efficiency due to size of DNA, overloading of DNA, or presence of end-repair, A-tailing, and ligase enzyme inhibitors, or presence of single stranded DNA which is measured but cannot ligate) or when hybrid capture efficiency is poor (due to non-human nucleic acid) samples that perform like shown in
[0071] The nature and extent of Qualify Control features are in part depending on the nature of the sample. For example, Nitrates in urine indicates that there may be high bacterial levels. When bacterial DNA is abundant in nucleic acid extracted from urine it has the ability to disrupt the efficiency of hybrid capture. This disruption in hybrid capture is in part due to the fact that most nucleic acid quantification technologies do not distinguish between human and non-human DNA (UV absorbance, fluorimetry, and capillary electrophoresis all do not distinguish human from non-human nucleic acid). Efficient hybrid capture designed to enrich for human genes is dependent upon accurate up front DNA input into the reaction where this defined input is of human origin. Positive nitrate results can act as a flag in lab protocols and indicate that additional quality controls were necessary in which PCR is used to quantify the abundance of human DNA to non-human DNA so that sufficient human DNA can be loaded into the library preparation reaction. In some cases, the non-human DNA may be reach a level of abundance that despite human/non-human normalization, it begins to overload or actively inhibit library preparation (both the end-repair, A-tailing, ligation, or hybrid capture reactions). In this case steps are taken to actively destroy or deplete non-human sequences prior to library preparation (this may be performed by treatment with restriction enzymes targeting bacterial specific sequence motifs, differential nucleic acid methylation patterns e.g. methyl-CpG binding domains, described in dx.doi.org/10.1371/journal.pone.0076096, treatment with non-ionic surfactants such as saponin 0.025%, as described in jcm.asm.org/content/early/2016/01/07/JCM.03050-15).
[0072] As such, quality control measures for both the impact of non-human sequence on a library and enrichment efficiency and other subtle decreases in efficiency even when a sample was purported to be negative for urinary tract infection or was negative for nitrates by urine chemistry can improve assay performance. In this regard, even in “healthy” and “normal” urine samples bacterial and yeast levels from a normal microbiome can be sufficient to impact sequencing efficiency (See
[0073] Additional quality control steps relate to urine chemistry, including levels of pH, Hemaglobin, Myoglobin, Ketones, urobilinogen, and specific gravity. These markers are tested for and then used for normalization of mutation calling algorithms. These analytes may modify the chemical structure of nucleic acids in such ways to introduce errors in sequencing. One aspect of the empirical reference library (denoted in the algorithm flow diagram) is to use sequencing data from many samples with these abnormalities to build sequencing error pattern profiles for different analytical ranges of these analytes. These error models can then be used to then reduce sequencing errors and correct for potential false-positive signals within sequencing results.
[0074] Leukocyte esterase is a marker for white blood cells (WBCs) in urine. In urine samples with high levels of white blood cells a tumor signature may be diluted by the normal DNA present in these cells. Embodiments of the invention involve two approaches to correct for high WBCs, (1) active depletion prior to urine extraction (examples of methods including separation through differential centrifugation or exposure to solute gradients, differential lysis through treatment with salt solutions, use of cell surface markers to deplete by antibody pull down or column), and/or (2) the adjustment of the algorithm thresholds to account for elevated levels on non-cancer DNA.
[0075] Specific Gravity and creatinine values can serve as surrogates for kidney function and urine dilution. In some cases, these markers can approximate the levels of systemic (trans-renal) nucleic acid relative to urologic tract nucleic acid. These markers may also inform how size distributions correlate to systemic vs. urologic tract nucleic acid. In embodiments the values are tested, a reference library created and the algorithm can be appropriately adjusted. Specific gravity and pH. values may correlate to the levels of double stranded DNA vs. single stranded DNA present in a urine sample.
[0076] In an embodiment, a method of total nucleic acid processing and extraction from urine comprises:
[0077] (i) a step of incubation of urine in a lysis solution. Such a solution can optionally contain a detergent, a salt, e.g. 5M NaCl, chaotropic salts (e.g. Guanidinium thiocyanate, Sodium Acetate), protein digesting enzymes such as Protinase K, and isopropyl alcohol, or ethanol;
[0078] (ii) a step of addition of a nucleic acid binding substrate, such as a silica resin slurry (Norgen Urine DNA kit), or magnetic negatively charged nucleic acid binding beads (such as Invitrogen MagMax total nucleic acid kit) or a siliconized column (such as Qiagen QIAprep Spin Miniprep Kit);
[0079] (iii) a step of washing of the bound DNA with lysis solution;
[0080] (iv) a step of elution of the DNA in a buffered solution, e.g. containing Tris and EDTA; and
[0081] (v) an optional step of conversion and tagging/barcoding of RNA into cDNA.
[0082] This final optional step can be done by any method known in the art. For example, using ClonTech's Smarter (Switching Mechanism at 5′ End of RNA Template) cDNA conversion kit. This technology allows the efficient incorporation of known sequences at both ends of cDNA during first strand synthesis, without adaptor ligation. The presence of these known sequences is crucial for downstream applications where DNA, and RNA derived cDNA (generated by the SMARTER kit), are prepared in the same library and sequenced together within a single sequencing run. Inclusion of both DNA and RNA within a single library permits genomic translocations to be identified from the RNA/cDNA while mutations and epigenetic alterations can be identified from DNA or RNA. SMARTER incorporated known sequences allow downstream informatic deconvolution of DNA and RNA unique signals.
[0083] In one embodiment, the extracted nucleic acid is DNA. In another embodiment, the extracted nucleic acid is RNA. RNAs are in certain embodiments reverse-transcribed into complementary DNAs. Such reverse transcription may be performed alone or in combination with an amplification step, e.g., using reverse transcription polymerase chain reaction (RT-PCR), which may be further modified to be quantitative, e.g., quantitative RT-PCR as described in U.S. Pat. No. 5,639,606, which is hereby incorporated by reference in its entirety.
[0084] In one embodiment, the extracted nucleic acids, including DNA and/or RNA, are analyzed directly without an amplification step. Direct analysis may be performed with different methods including, but not limited to, nanostring technology. NanoString technology enables identification and quantification of individual target molecules in a biological sample by attaching a color coded fluorescent reporter to each target molecule. This approach is similar to the concept of measuring inventory by scanning barcodes. Reporters can be made with hundreds or even thousands of different codes allowing for highly multiplexed analysis. The technology is described in a publication by Geiss et al. “Direct Multiplexed Measurement of Gene Expression with Color-Coded Probe Pairs,” Nat Biotechnol 26(3): 317-25 (2008), which is hereby incorporated by reference in its entirety.
[0085] In another embodiment, it may be beneficial or otherwise desirable to amplify the nucleic acid for enrichment of known bladder cancer genes prior to analyzing it. Methods of nucleic acid amplification are commonly used and generally known in the art. If desired, the amplification can be performed such that it is quantitative. Quantitative amplification will allow quantitative determination of relative amounts of the various nucleic acids. Enrichment of bladder cancer genes can occur by PCR, emulsion PCR, massively multiplexed PCR, allele specific PCR, Molecular inversion probes, fragmentation and binding of site specific probes followed by circularization, or hybrid capture. A certain embodiment uses hybrid capture in which adapter ligated DNA libraries are incubated with 1. an oligo nucleotide complementary to adapter sequence (blocking oligo) 2. A buffer optimized for DNA hybridization (Illumina Nextera) and 3. A set of biotinylated custom synthesized oligo nucleotides complementary to genomic regions of interest (Nextera Custom Capture, or IDT XGen lockdown probes).
[0086] Nucleic acid amplification methods include, without limitation, polymerase chain reaction (PCR) (U.S. Pat. No. 5,219,727, which is hereby incorporated by reference in its entirety) and its variants such as in situ polymerase chain reaction (U.S. Pat. No. 5,538,871, which is hereby incorporated by reference in its entirety), quantitative polymerase chain reaction (U.S. Pat. No. 5,219,727, which is hereby incorporated by reference in its entirety), nested polymerase chain reaction (U.S. Pat. No. 5,556,773), self-sustained sequence replication and its variants (Guatelli et al. “Isothermal, In vitro Amplification of Nucleic Acids by a Multienzyme Reaction Modeled after Retroviral Replication,” Proc Natl Acad Sci USA 87(5): 1874-8 (1990), which is hereby incorporated by reference in its entirety), transcriptional amplification system and its variants (Kwoh et al. “Transcription-based Amplification System and Detection of Amplified Human Immunodeficiency Virus type 1 with a Bead-Based Sandwich Hybridization Format,” Proc Natl Acad Sci USA 86(4): 1173-7 (1989), which is hereby incorporated by reference in its entirety), Qb Replicase and its variants (Miele et al. “Autocatalytic Replication of a Recombinant RNA.” J Mol Biol 171(3): 281-95 (1983), which is hereby incorporated by reference in its entirety), cold-PCR (Li et al. “Replacing PCR with COLD-PCR Enriches Variant DNA Sequences and Redefines the Sensitivity of Genetic Testing.” Nat Med 14(5): 579-84 (2008), which is hereby incorporated by reference in its entirety) or any other nucleic acid amplification methods, followed by the detection of the amplified molecules using techniques known to those of skill in the art. Especially useful are those detection schemes designed for the detection of nucleic acid molecules if such molecules are present in very low numbers.
[0087] Detecting the presence or absence of one or more mutations and/or epigenetic alterations in bladder cancer genes in a tumor or urine-derived nucleic acid sample from a subject can be carried out using methods that are well known in the art.
[0088] In one embodiment, the one or more mutations in the one or more identified genes is detected using a hybridization assay. In a hybridization assay, the presence or absence of a gene mutation is determined based on the hybridization of one or more allele-specific oligonucleotide probes to one or more nucleic acid molecules in the DNA sample from the subject. The oligonucleotide probe or probes comprise a nucleotide sequence that is complementary to at least the region of the gene that contains the mutation of interest. The oligonucleotide probes are designed to be complementary to the wildtype, non-mutant nucleotide sequence and/or the mutant nucleotide sequence of the one or more genes to effectuate the detection of the presence or the absence of the mutation in the sample from the subject upon contacting the sample with the oligonucleotide probes. A variety of hybridization assays that are known in the art are suitable for use in the methods of the present embodiments. These methods include, without limitation, direct hybridization assays, such as northern blot or Southern blot (see e.g., Ausabel et al., Current Protocols in Molecular Biology, John Wiley & Sons, NY (1991)).
[0089] Alternatively, direct hybridization can be carried out using an array based method where a series of oligonucleotide probes designed to be complementary to a particular non-mutant or mutant gene region are affixed to a solid support (glass, silicon, nylon membranes). A labeled DNA or cDNA sample from the subject is contacted with the array containing the oligonucleotide probes, and hybridization of nucleic acid molecules from the sample to their complementary oligonucleotide probes on the array surface is detected. Examples of direct hybridization array platforms include, without limitation, the Affymetrix GeneChip or SNP arrays and Illumina's Bead Array.
[0090] In another embodiment, a sample is bound to a solid support (often DNA or PCR amplified DNA) and labeled with oligonucleotides in solution (either allele specific or short so as to allow sequencing by hybridization).
[0091] Detecting specific mutations can be accomplished by methods known in the art for detecting sequences at specific sites. For example, fluorescence-based techniques (Chen, X. et al., Genome Res. 9(5): 492-98 (1999)), utilizing PCR, LCR, Nested PCR and other techniques for nucleic acid amplification. Specific commercial methodologies available include, but are not limited to, TaqMan genotyping assays and SNPlex platforms (Applied Biosystems), gel electrophoresis (Applied Biosystems), mass spectrometry (e.g., MassARRAY system from Sequenom), minisequencing methods, real-time PCR, Bio-Plex system (BioRad), CEQ and SNPstream systems (Beckman), array hybridization technology (e.g., Affymetrix GeneChip; Perlegen), BeadArray Technologies (e.g., Illumina GoldenGate and Infinium assays), array tag technology (e.g., Parallele), and endonuclease-based fluorescence hybridization technology (Invader; Third Wave). Some of the available array platforms, including Affymetrix SNP Array 6.0 and Illumina CNV370-Duo and 1M BeadChips, include SNPs that tag certain CNVs. This allows detection of copy number variations (CNVs) via surrogate SNPs included in these platforms. Thus, by use of these or other methods available to the person skilled in the art, one or more mutations and/or epigenetic alterations can be identified.
[0092] In certain embodiments, a mutation in a gene is detected by sequencing technologies. Obtaining sequence information about an individual identifies particular nucleotides in the context of a sequence. For SNPs, sequence information about a single unique sequence site is sufficient to identify alleles at that particular SNP. For markers comprising more than one nucleotide, sequence information about the nucleotides of the individual that contain the polymorphic site identifies the alleles of the individual for the particular site. The sequence information can be obtained from a nucleic acid sample from the urine of the subject or individual.
[0093] Various methods for obtaining nucleic acid sequence are known to the skilled person, and all such methods are useful for practicing the embodiments. Sanger sequencing is a well-known method for generating nucleic acid sequence information. Recent methods for obtaining large amounts of sequence data have been developed, and such methods are also contemplated to be useful for obtaining sequence information. These include pyrosequencing technology (Ronaghi, M. et al. Anal Biochem 267:65-71 (1999); Ronaghi, et al., Biotechniques 25:876-878 (1998)), e.g. 454 pyrosequencing (Nyren, P., et al. Anal Biochem 208:171-175 (1993)), Illumina/Solexa sequencing technology (www.illumina.com; see also Strausberg, R L, et al. Drug Disc Today 13:569-577 (2008)), and Supported Oligonucleotide Ligation and Detection Platform (SOLiD) technology (Applied Biosystems, www.appliedbiosystems.com); Strausberg, R L, et al. Drug Disc Today 13:569-577 (2008). The foregoing are incorporated by reference in their respective entireties.
[0094] Other common genotyping methods include, but are not limited to, restriction fragment length polymorphism assays; amplification based assays such as molecular beacon assays, nucleic acid arrays, high resolution melting curve analysis (Reed and Wittwer, “Sensitivity and Specificity of Single-Nucleotide Polymorphism Scanning by High Resolution Melting Analysis,” Clinical Chem 50(10): 1748-54 (2004), which is hereby incorporated by reference in its entirety); allele-specific PCR (Gaudet et al., “Allele-Specific PCR in SNP Genotyping,” Methods Mol Biol 578: 415-24 (2009), which is hereby incorporated by reference in its entirety); primer extension assays, such as allele-specific primer extension (e.g., Illumina™ Infinium™ assay), arrayed primer extension (see Krjutskov et al., “Development of a Single Tube 640-plex Genotyping Method for Detection of Nucleic Acid Variations on Microarrays,” Nucleic Acids Res. 36(12) e75 (2008), which is hereby incorporated by reference in its entirety), homogeneous primer extension assays, primer extension with detection by mass spectrometry (e.g., Sequenom™ iPT EX SNP genotyping assay) (see Zheng et al., “Cumulative Association of Five Genetic Variants with Prostate Cancer,” N. Eng. J. Med. 358(9):910-919 (2008), which is hereby incorporated by reference in its entirety), multiplex primer extension sorted on genetic arrays; flap endonuclease assays (e.g., the Invader™ assay) (see Olivier M., “The Invader Assay for SNP Genotyping,” Mutat. Res. 573 (1-2) 103-10 (2005), which is hereby incorporated by reference in its entirety); 5′ nuclease assays, such as the TaqMan™ assay (see U.S. Pat. No. 5,210,015 to Gelfand et al. and U.S. Pat. No. 5,538,848 to Livak et al., which are hereby incorporated by reference in their entirety); and oligonucleotide ligation assays, such as ligation with rolling circle amplification, homogeneous ligation, OLA (see U.S. Pat. No. 4,988,617 to Landgren et al., which is hereby incorporated by reference in its entirety), multiplex ligation reactions followed by PCR, wherein zipcodes are incorporated into ligation reaction probes, and amplified PCR products are determined by electrophoretic or universal zipcode array readout (see U.S. Pat. Nos. 7,429,453 and 7,312,039 to Barany et al., which are hereby incorporated by reference in their entirety). Such methods may be used in combination with detection mechanisms such as, for example, luminescence or chemiluminescence detection, fluorescence detection, time-resolved fluorescence detection, fluorescence resonance energy transfer, fluorescence polarization, mass spectrometry, and electrical detection. In general, the methods for analyzing genetic aberrations are reported in numerous publications, not limited to those cited herein, and are available to those skilled in the art. The appropriate method of analysis will depend upon the specific goals of the analysis, the condition/history of the patient, and the specific cancer(s), diseases or other medical conditions to be detected, monitored or treated.
[0095] Alternatively, the presence or absence of one or more mutations identified supra can be detected by direct sequencing of the genes, or in one embodiment particular gene regions comprising the one or more identified mutations, from the patient sample. Direct sequencing assays typically involve isolating DNA sample from the subject using any suitable method known in the art, and cloning the region of interest to be sequenced into a suitable vector for amplification by growth in a host cell (e.g. bacteria) or direct amplification by PCR or other amplification assay. Following amplification, the DNA can be sequenced using any suitable method. As certain sequencing methods involve high-throughput next generation sequencing (NGS) to identify genetic variation. Various NGS sequencing chemistries are available and suitable for use in carrying out the embodiments, including pyrosequencing (Roche™ 454), sequencing by reversible dye terminators (Illumina™ HiSeq, Genome Analyzer and MiSeq systems), sequencing by sequential ligation of oligonucleotide probes (Life Technologies™ SOLiD), and hydrogen ion semiconductor sequencing (Life Technologies™, Ion Torrent™). Alternatively, classic sequencing methods, such as the Sanger chain termination method or Maxam-Gilbert sequencing, which are well known to those of skill in the art, can be used to carry out the methods of the present embodiments.
[0096] Certain present embodiments also provide kits which are useful for carrying out the disclosures set forth herein. The present kits comprise one or more container means containing the above-described assay components. The kit also comprises other container means containing solutions necessary or convenient for carrying out the embodiments. The container means can be made of glass, plastic or foil and can be a vial, bottle, pouch, tube, bag, etc. The kit may also contain written information, such as procedures for carrying out certain present embodiments or analytical information, such as the amount of reagent contained in the first container means. The container means may be in another container means, e.g. a box or a bag, along with the written information.
[0097] The following examples are included to demonstrate certain embodiments hereof. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventors and thought to function well in the practice of the embodiments, and thus can be considered to constitute certain modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of what is described.
[0098] All documents cited herein are hereby incorporated in their entirety by reference thereto.
[0099] The following materials and methods were used in the Examples below.
Example 1
DNA Repair and Sequencing Adapter Ligation
[0100] 1. Repair of DNA strand nicks or gaps by treatment with one or more of the following enzymes: Taq DNA Ligase, Endonuclease IV, Bst DNA Polymerase, Fpg, Uracil-DNA Glycosylase (UDG), T4 PDG (T4 Endonuclease V) and Endonuclease VIII, polynucleotide kinase, mammalian DNA polymerase β and/or DNA ligase I
[0101] 2. Repair and A-tailing of DNA ends by treatment of DNA with one or more of the following enzymes: T4 DNA Polymerase and Klenow Fragment
[0102] 3. T4-ligation of a sequencing adapter and nucleic acid insert where the adapter is an Illumina TruSeq style adapter or equivalent. In an embodiment the adapter contains an 8-base pair sample barcode in the double stranded stem portion of adapter and the same barcode is present on both the p5 and p7 ends. In such embodiments matched dual index barcodes are used to avoid low frequency adapter contamination or adapter swaping/jumping between pooled samples. The adapter may also contain a diverse library of defined or random sequences in either the stem or y-portion of the adapter. And in which these defined or random sequences are used in part to tag an individual molecule prior to library amplification.
[0103] 4. Or alternatively in place of steps 2 & 3: nucleic acid inserts are consecutively ligated to single strand adapter molecules as described in Nature Protocols 8, 737-748 (2013). Briefly, DNA is treated with a phosphatase to remove residual phosphate groups from the 5′ and 3′ ends of the DNA strands. A 5′-phosphorylated adapter oligonucleotide, and a long 3′-biotinylated spacer arm, is ligated to the 3′ends of the DNA strands using CircLigase II. The adapter-ligated molecules, as well as excess adapter molecules, are immobilized on streptavidin beads, and a primer complementary to the adapter is used to copy the template strand. This reaction is performed using Bst polymerase 2.0. After removal of 3′ overhangs using T4 DNA polymerase, a second adapter is joined to the newly synthesized strands by blunt-end ligation with T4 DNA ligase. To prevent ligation between adapters, only one adapter strand is ligatable, whereas the other is blocked by a 3′-terminal dideoxy modification. After washing away excess adapter, the library molecules are released from the beads by heat denaturation
[0104] 5. Design of the adapter sequences (used in steps 3 or 4) to include a specific number of DNA bases positioned within the adapter sequence (between 6-10 nucleotides in length) which are a degenerate or random sequence or in which the 6-10 nucleotide sequence is one of many (50-200 unique) defined sequences. And in which these adapters with divergently defined or degenerate sequences are present within the same mixture so as to create a diverse library of unique adapter sequences. And in which these unique sequences are subsequently used (in combination with other variables such as DNA insert start and stop site) to uniquely identify the clonal origin of an insert molecule following PCR amplification of a diverse population of adapter ligated insert molecules.
Example 2
Enrichment of Known Bladder Cancer Genes
[0105] Enrichment of bladder cancer genes can occur by PCR, emulsion PCR, massively multiplexed PCR, allele specific PCR, Molecular inversion probes, fragmentation and binding of site specific probes followed by circularization, or hybrid capture.
[0106] An embodiment uses hybrid capture in which adapter ligated DNA libraries are incubated with 1. An oligo nucleotide complementary to adapter sequence (blocking oligo) 2. A buffer optimized for DNA hybridization (Illumina Nextera) and 3. A set of biotinylated custom synthesized oligo nucleotides complementary to genomic regions of interest (Nextera Custom Capture, or IDT XGen lockdown probes).
[0107] A series of incubations at various temperatures to promote hybridization of oligos to their target sequences.
[0108] Incubation of the hybridization reaction with strepavadin beads to enrich bound oligos from the solution. Washing and elution of the bound oligos from the beads.
[0109] A second repeated hybrid capture reaction with enriched fraction and custom oligos to further enrich for targets of interest.
[0110] Capture of bound oligos with strepavadin beads, wash and elution from the beads.
[0111] Load enriched sample onto sequencing machine.
Example 3
Data Analysis Methods and Utilization and Interpretation of Results
[0112] 1. Deconvolution of DNA and cDNA sequences based on known sequences.
[0113] 2. Mapping of DNA and cDNA reads to a reference genome.
[0114] 3. Identification of molecular clonal families using unique pairs of degenerate or defined adapter sequences (both on the 5-prime and 3-prime ends of the molecule) and start/stop sites of DNA inserts.
[0115] 4. Within clonal families, comparison of sequencing reads for base-pair call discrepancies.
[0116] 5. Filtering or correction of discrepancies within an individual clone through a voting process in which the predominant base call at a particular location wins and is defined as the true base call and those base calls not present in a majority of molecules from the same clonal origin loose and are replaced with the predominant base call within that individual family.
[0117] 6. Counting of the number of unique molecular/clonal families identified for a particular gene and comparing these counts to a set of reference genes within the same sample and also comparing these counts to an empirical distribution of counts for that gene across multiple samples. Copy number loss or copy number gains are identified when unique counts for a gene vary above a defined threshold relative to reference genes and/or empirical distributions.
[0118] 7. Analysis of cDNA sequences for translocations or fusions of specific genes by reading through the break site on a sequence read.
[0119] 8. Comparison of mutations and copy number counts between DNA and cDNA for confirmation of called mutational events.
[0120] 9. Utilization of quantitative abundance of urine based mutational, and/or copy number changes, and/or translocations to determine the presence or absence of bladder cancer in a patient previously treated for bladder cancer.
[0121] 10. Utilization of quantitative abundance of urine based mutational, and/or copy number changes, and/or translocations to determine the prognosis or risk of disease progression in a patient diagnosed with bladder cancer.
[0122] 11. Utilization of quantitative abundance of urine based mutational, and/or copy number changes, and/or translocations to diagnosis bladder cancer in patients presenting with blood in their urine.
[0123] 12. Utilization of quantitative abundance of urine based mutational, and/or copy number changes, and/or translocations to screen for bladder cancer or other cancers risk in asymptomatic or otherwise believed to be healthy individuals and/or high risk populations such as cigarette smokers, individuals with histories of occupational carcinogen exposures, individuals with histories of drinking water from wells or ground water contaminated with arsenic or other suspected carcinogens, or individuals living within geographical cancer hotspots.
[0124] 13. Utilization of quantitative abundance of urine based mutational, and/or copy number changes, and/or translocations to perform short term individual screening for genotoxic stress induced by an external stimulus (testing in the hours to days to weeks following exposures) such as assessing potential genotoxicity when testing a new pharmaceutical product in mammals, or stratifying an individual's cancer risk from exposures to environmental or recreational carcinogens such as smog or products of combustion, alcohol, tobacco, UV radiation. Changes in mutational burden may be transient or persistent, and these genomic changes may be tracked longitudinally over time.
Example 4
[0125] DNA is abundant in urine and can be optimally extracted for measurement of bladder cancer genomic biomarkers.
[0126] In order to improve upon previous attempts to minimally detect bladder cancer in urine, embodiments focus on urine DNA as an analyte because of technical advances in next generation DNA sequencing that permit massively multiplexed analysis of tens to thousands of genes in a single sequencing reaction. DNA also has the advantage of being relatively stable and undergoes unique changes during tumor formation that are highly specific to cancer.
[0127] To assess the viability of utilizing urine DNA, DNA extraction from 20-100 ml of urine is performed and optimized, using multiple extraction approaches. Total DNA yield is measured using a fluorescent double strand DNA binding dye assay (Quantlt, Life Technologies), capillary electrophoresis, and Real-Time PCR.PCR amplification efficiency was measured using quantitative real-time PCR amplification of the RNaseP gene from multiple urine samples. Subsequent analysis demonstrates superior yield and enhanced PCR amplification (lower threshold cycle (Ct)) when DNA is extracted using a functionalized magnetic bead approach. In embodiments positively charged functionalized magnetic beads provide advantageous extraction yields when used in low volume, low concentration, or degraded samples.
Example 5
[0128] To further validate the types of urine DNA as effective disease biomarkers, cell pellet associated and urine cell free DNA is analyzed. Wherein these two populations, and various size fractionations thereof, are compared to each other to determine where the most abundance disease signals exist as defined by a prior analysis of matched tumor tissue. Further where the differences in disease marker abundance within these populations is compared to urine chemistry, urine cytology, nucleic acid fragmentation patterns, and clinical correlates and wherein these correlations are used to develop algorithms that predict for future patients which nucleic acid population will contain the most abundant level of disease specific biomarkers.
Example 6
Development of a Biomarker Panel which Encompasses the Genomic Diversity of Bladder Cancer
[0129] Significant developments in nucleic acid sequencing capacity, speed, sensitivity, and declines in cost have led to rapid adoption of cancer DNA sequencing in clinical molecular pathology labs. One significant shortcoming in previous FDA approved assays to monitor bladder cancer is that the biomarkers used have not been specific (detecting hematuria or inflammation) or they do not fully encompass the proteomic or genomic diversity of the disease. In order to improve upon prior art bladder cancer tests, an in particular their low sensitivity, specific embodiments of the present inventions are directed to a panel of multiple DNA bladder cancer biomarkers which better encompass the genomic diversity of bladder cancer. In order to assess the efficacy of using NGS for monitoring bladder cancer mutational burden, a panel of multiplexed amplicon based library enrichment reagents that focus on 12 recurrently mutated or amplified genes in bladder cancer (
[0130] Plot inlayed to right of matrix represents the abundance and type of mutation variants associated with a particular gene across this population. The top inlayed bar graph above the matrix represents the number and type of unique events on a per patient basis. Based on this analysis, 127 patients (94.8%) contain one or more abnormality in our biomarker panel with an average of 2.2 SNVs per patient. This panel was developed to create a minimally informative DNA based disease signature that encompasses the genomic diversity of the disease but also allows economical high depth sequencing, enrichment of fragmented DNA, and multiplexed sample analysis in a single sequencing run. Our preliminary embodiment of the panel amplifies 68 kb of genomic material using 690 PCR amplicons, provides 93% coverage of the target genes and very high (>99%) predicted on-target gene enrichment by blast alignment.
Example 7
Sensitive and Specific Detection of Bladder Cancer Burden
[0131] To validate our assay disclosed herein, we analyze 11 control bladder cancer cell lines which have been previously sequenced by whole exome sequencing ((J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehar, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jane-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi, M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, and L. A. Garraway, “The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity,” Nature, vol. 483, no. 7391, pp. 603-607, March 2012); and (S. A. Forbes, D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward, C. Y. Kok, M. Jia, T. De, J. W. Teague, M. R. Stratton, U. McDermott, and P. J. Campbell, “COSMIC: exploring the world's knowledge of somatic mutations in human cancer,” Nucleic Acids Res., p. gku1075, October 2014)). We chose these cell lines for their dynamic range within our panel, some lines containing no mutations and other lines contain multiple mutations. This analysis allows us to identify and mask out recurrent false-positive calls due to recurrent mapping errors or due to redundant (homopolymer) sequence context. Sensitivity of our pipeline is optimized by establishing multiple sequencing quality thresholds for alignment, base call and mutation call quality scores, loci specific read depth, and variant allele frequencies.
[0132] Using this refined mutation calling pipeline, an analysis on 14 cancer patients with diverse tumor stage, grade and clinical subtype (analysis of blood, tumor, and pre-surgery urines) was performed. Expansion of the panel to include additional genomic regions frequently mutated in bladder carcinoma in situ and other clinical subtypes of bladder cancer is supported. Expansion of the panel is conceived to further benefit assay sensitivity as performance increases with increasing numbers of mutations that can be monitored in a patient.
[0133] To assess the specificity of this type of approach, embodiments of the invention have validated the panel on 7 non-cancer controls (blood and urine). This cohort included patients with diverse urologic conditions including benign prostate hyperplasia, urinary retention, kidney stones, an individual seeking fertility consult and health controls. Among these, 2 patients were cigarette smokers with 10 & 60 pack years of smoking history. Future studies will expand the non-cancer control cohort to include further analysis of smokers and individuals with chronic urologic inflammatory disease as some of these patients may contain panel mutations in the absence of clinically detectible bladder cancer.
Example 8
Longitudinal Analysis of Urine DNA can Predict Future Disease Recurrence
[0134] To assess the ability of this approach to predict longitudinal disease recurrence, a further embodiment of the invention involves the analysis of two patients with known recurrence and long term longitudinal follow up including urine samples collected between trans-urethral resections of primary and recurrent tumors.
[0135] Using PCR amplicon based library enrichment, a lower limit of allele detection ranging from ˜1-5% allele fraction depending on sequencing depth and amplicon performance was determined. An analysis pipeline was iteratively improved with increased data collection, including the recalibration of base quality scores, application of thresholds and modification of mutation calling algorithms to filter out recurrent panel specific mapping errors and analytical noise.
Example 9
Design of an Enhanced Genomic Panel which Encompasses the Diversity of Bladder Cancer
[0136] Adoption of hybrid capture based library enrichment methodologies, deeper sequencing, and interrogation of a more diverse and encompassing set of biomarkers has the ability in an exemplary embodiment to enhance the sensitivity of the UriSeq recurrence assay by up to 2 orders of magnitude. We chose to focus exclusively on mutations (single nucleotide variants) as opposed to SNV and copy number alterations. Current algorithms for detection of SNVs are more sensitive at lower sequencing coverage than algorithms for detection of copy number variation and provide a good compromise between sensitivity and sequencing cost. To expand the panel of biomarkers assessed, we established a set of ranking criteria to prioritize recurrently mutated genes for inclusion in an enhanced panel. These criteria include: 1. Prevalence of recurrent mutations. 2. Prioritization of known oncogenes. 3. The size of the gene and its marginal cost of analysis (accounting for limitations in the number of probes which can be pooled into a single reaction). 4. Mutual exclusivity of mutations and the number of unique patients captured by addition of a gene or exon to the panel. 5. Differential prevalence of a mutated gene in unique clinical subtypes (e.g. enrichment in CIS, low grade or high grade lesions).
[0137] Based on these criteria, an embodiment directed to an enhanced panel targeting 750 exons in 23 genes for inclusion in the recurrence assay is provided. The comprehensive nature of this revised gene panel was validated computationally using the COSMIC database and 2 other publically available bladder cancer data sets, summarized in Table 2 ((The Cancer Genome Atlas Research Network, “Comprehensive molecular characterization of urothelial bladder carcinoma,” Nature, vol. 507, no. 7492, pp. 315-322, March 2014); (S. A. Forbes, D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward, C. Y. Kok, M. Jia, T. De, J. W. Teague, M. R. Stratton, U. McDermott, and P. J. Campbell, “COSMIC: exploring the world's knowledge of somatic mutations in human cancer,” Nucleic Acids Res., p. gku1075, October 2014); and (P. H. Kim, E. K. Cha, J. P. Sfakianos, G. Iyer, E. C. Zabor, S. N. Scott, I. Ostrovnaya, R. Ramirez, A. Sun, R. Shah, A. M. Yee, V. E. Reuter, D. F. Bajorin, J. E. Rosenberg, N. Schultz, M. F. Berger, H. A. Al-Ahmadie, D. B. Solit, and B. H. Bochner, “Genomic Predictors of Survival in Patients with High-grade Urothelial Carcinoma of the Bladder,” Eur. Urol., August 2014)).
[0138] This design increases the percent of patients covered by the assay and increases the average number of SNVs per patient.
TABLE-US-00001 TABLE 2 Summary of studies used for design a silico validation of an enhanced gene panel Patients Average Study size encompassed # events Study (# patients) (%) per patient Kim P H, et al. 109 98 3.5 2014 TCGA, 2014 134 96 3.3
[0139] In silico validation based on these previous studies may underestimate the percent of patients that will be encompassed by this biomarker panel. To date, large scale (exome) sequencing studies in bladder cancer have focused on late stage muscle invasive disease. As part of our efforts to increase the comprehensive nature of our panel across clinical subtypes we include TERT promoter, FGFR3 and STAG2 mutations, all of which are significantly more prevalent in low grade disease. Previous exome sequencing studies do not capture TERT promoter mutations, a highly prevalent biomarker present in 70-80% of bladder cancer patients ((C. D. Hurst, F. M. Platt, and M. A. Knowles, “Comprehensive Mutation Analysis of the TERT Promoter in Bladder Cancer and Detection of Mutations in Voided Urine,” Eur. Urol); (P. J. Killela, Z. J. Reitman, Y. Jiao, C. Bettegowda, N. Agrawal, L. A. Diaz, A. H. Friedman, H. Friedman, G. L. Gallia, B. C. Giovanella, A. P. Grollman, T.-C. He, Y. He, R. H. Hruban, G. I. Jallo, N. Mandahl, A. K. Meeker, F. Mertens, G. J. Netto, B. A. Rasheed, G. J. Riggins, T. A. Rosenquist, M. Schiffman, I.-M. Shih, D. Theodorescu, M. S. Torbenson, V. E. Velculescu, T.-L. Wang, N. Wentzensen, L. D. Wood, M. Zhang, R. E. McLendon, D. D. Bigner, K. W. Kinzler, B. Vogelstein, N. Papadopoulos, and H. Yan, “TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal,” Proc. Natl. Acad. Sci., vol. 110, no. 15, pp. 6021-6026, April 2013); (X. Liu, G. Wu, Y. Shan, C. Hartmann, A. von Deimling, and M. Xing, “Highly prevalent TERT promoter mutations in bladder cancer and glioblastoma,” Cell Cycle, vol. 12, no. 10, pp. 1637-1638, May 2013); and (I. Kinde, E. Munari, S. F. Faraj, R. H. Hruban, M. Schoenberg, T. Bivalacqua, M. Allaf, S. Springer, Y. Wang, L. A. Diaz, K. W. Kinzler, B. Vogelstein, N. Papadopoulos, and G. J. Netto, “TERT promoter mutations occur early in urothelial neoplasia and are biomarkers of early disease and disease recurrence in urine,” Cancer Res., vol. 73, no. 24, pp. 7162-7167, December 2013)). In addition to an expansion and optimization of panel design, in certain embodiments we transition from amplicon sequencing to a hybrid capture library preparation approach. Hybrid capture reagents provide more uniform coverage across our targets, enhanced genomic complexity in our library, greater ability to computationally mark duplicates, fewer PCR cycles and reduced polymerase introduced error, and reduced library preparation costs allowing affordable deeper sequencing, with any one or more of these advantages contributing to enhanced assay sensitivity.
Example 10
Development of Error Suppression Methodologies to Permit Sensitive and Specific Urine Based Genome Monitoring
[0140] Traditional NGS methods produce substantial noise which limits detection of allele variants below 1-5%. In
Certain Computer Processor Based Embodiments
[0141] In certain embodiments, the steps described and/or performed hereinabove can be implemented by and in numerous ways, including without limitation, as one or more systems or apparatuses; one or a plurality of processes; a composition of matter; a series of instructions resident or non-resident to one, or a plurality of hardware devices coupled and/or in communications together; one or a plurality of computer program products being tangibly embodied on a computer readable storage medium and operable upon on one or more processors; any one or more processor configured to execute instructions provided by a memory coupled to the processor; and any technologies known to skilled persons involving the reading and/or execution of instructions by machines. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term “processor” refers without limitation to one or more devices, circuits, processing cores or other instructions being executed by one, or a plurality of machines communicatively coupled together, and may be configured to process resident or non-resident data in any form.
[0142] Now referring to
[0143] In certain embodiments, the methods of any one or combination multiple embodiments herein, are instructed by a computer-readable medium having stored thereon computer-readable instructions for carrying out such methods.
[0144] Turning back to the drawings,
[0145] The above described execution of instructions in any and all of the foregoing manners of execution, are employed in reference to: analysis algorithms being implemented in the improved assay that may allow, for example, longitudinal monitoring of urine DNA following initial assessment of a patient's primary tumor or following longitudinal analysis of multiple urine DNA nucleic acid samples; developing an enhanced targeted panel of biomarkers that, for example, are capable of encompassing the genomic and clinical diversity of a bladder cancer, and in certain embodiments hematuria; in certain embodiments providing high technical performance while simultaneously achieving clinically feasible assay costs and processing times; monitoring the urine of bladder patients in a manner that yield high sensitivity and specificity; detecting mutations in one or more genes associated with bladder cancer; isolating nucleic acid, DNA or RNA, from a urine sample from a subject, and analyzing the nucleic acid to obtain nucleic acid sequence data suitable to detect presence or absence of one or more mutations in one or more of genes associated with bladder cancer; isolating nucleic acid being cell-free nucleic acid and/or being nucleic acid isolated from cells in a urine sample; and/or performing one or more of the methods cited herein in relation to an individual or group of individuals for detection, prognosis, diagnosis and treatment of bladder cancer in accordance with the embodiments including without limitation via use of genetic biomarkers and methodologies in gene sequencing.
[0146] In exemplary embodiments, a sequence or other data, is input to a processor or other computer hardware component. Here, the processor is coupled or otherwise in communication with a sequencing device that reads and/or analyzes sequences of nucleic acids from samples. The sequences are provided from processing tools or from sequence storage sources. One or more memory devices buffers or stores the sequences. The memory can also store reads, tags, fragments, phase information and islands, etc., for various chromosomes or genomes, and can store instructions for analyzing and presenting the sequence or aligned data.
[0147] In certain embodiments, the methods also include collecting data regarding a plurality of nucleotide sequences. Examples include reads, tags and/or reference chromosome sequences. The data can be sent to a processing device, hardware system or other computational system. In an exemplary embodiment, processor is connected to laboratory equipment. Such equipment can include a nucleotide amplification means, a sample collection means, nucleotide sequencing means and/or a hybridization means.
[0148] The processor can then collect applicable data having been gathered by the laboratory device. In exemplary embodiments, not to be taken as an exhaustive list, the data is stored in resident or non-resident storage means of a machine or other processing apparatus; the data is collected in real time, prior, during or in conjunction with the transmission of the data; the data is stored on a computer-readable medium that is extractable from the processor; the data is transmitted to a remote location via any means of coupling or communications, including without limitation, via a computer bus, via a local area network, via a wide area network, over an Intranet or the Internet, via wireline, wireless or satellite signals, and over any known form or media of transmission; the data is processed and operated upon at the remote location.
[0149] Now referring to
[0150] As reflected by
[0151] According to an embodiment of a method of the invention, urine is collected from morning void urine samples to aid enrichment of urologic tract signals and the urine is processed according to one or more techniques disclosed herein. Next, the nucleic acid markers and or other markers are analyzed. Data regarding the time of the collection is collected, as well as other patient data including identity, age, weight, gender, medications, diseases, clinical and other personal data, and such is tracked with the sample and entered into a database. In embodiments the resulting data is compared with same patient data at the same collection time and at different collection times. In other embodiments, the data is compared with other patient data. Depending on capillary electrophoresis and/or real time PCR quality control the quantification and normalization of the data is performed and a DNA library is constructed using size profile information.
[0152] The variability in DNA fragmentation profiles may be caused by diverse physiology and storage conditions. In addition to the time of day collected, certain individuals appear to have natural biases to one urine profile type over another. In individuals with a predominantly small/trans-renal profile it is of further importance to collect samples when urine incubation with the bladder has been maximized (early morning) and in other cases immediate voiding of urine into a preservation buffer which inhibits nuclease activity to prevent degradation of nucleic acid.
[0153] The heterogeneity of nucleic acid size in urine is described across a representative sampling of people of various age, gender, and disease or wellness states by using capillary electrophoresis and/or analysis of sequencing read start and stop site analysis. Using this data, it is possible to assess if and how urine nucleic acid size and fragmentation profiles change within an individual over the course of time (hours, days, weeks, or months) and in response to physiologic perturbations such as disease, circadian rhythm, diet, and hydration. In a version of this embodiment, nucleic acid size and fragmentation patterns within urine are used as one component of a disease classifying algorithm.
[0154] Additionally, it has been determined that sample handling, preservation and storage conditions will influence molecules of various size or the heterogeneity observed within a particular urine sample. In addition, nucleic acid extraction methods also influence size variability in samples had not been previously characterized in connection with such urine analysis. As discussed below particular patient's sampled also manifest different nucleic acid size profiles. While the ultimate impact of various size profiles on sequencing library preparation efficiency and sequencing performance has not been completely characterized, embodiments of the present invention involve the characterization of each of these variables and then data collected is used to create a database of patient profiles and ultimately improve both the diagnoses and prognosis of bladder cancer. In embodiments, data collected relating to the nucleic acid size is associated with one or more of the various correlating factors discussed above. In an embodiment, samples from patients with predetermined profiles are normalized according to their profile and assessed by sequencing to measure unique fragmentation (sequencing start/stop) sites and where this sequence context fragmentation pattern is integrated as one aspect into a disease diagnosis algorithm.
[0155] It has been determined that urine nucleic acid has substantial heterogeneity in its size distribution across individuals. Further, the heterogeneity of the nucleic acids in the sample size is not uniform within individuals throughout the day. In addition, nucleic acid degradation in samples can substantially reduce the size of nucleic acid molecules when urine is left at room temperature for minutes to hours to days. Degradation may occur in various ways, in one example higher molecular weight DNA degrades, becoming smaller in size and increasing the abundance of small molecular weight DNA within the urine, referred to as low molecular weight pooling. In another example, high molecular weight DNA completely degrades beyond detection and does not accumulate within a low molecular weight pool. Additionally, the process of freezing and defrosting urine has substantial impact on nucleic acid size and damage. In one embodiment, degradation of DNA due to handling damage can be distinguished from DNA fragmented due to biologic processes such as apoptosis and necrosis through analysis of sequence context around read start and stop bases. And where this information is used to create a sample quality ratio to normalize sequencing data.
[0156] According to an aspect of the invention, a database is developed that includes various nucleic acid size profiles and fragmentation sequence context analysis across thousands of unique urine samples and across hundreds of unique physiologies, pathologies, and treatment conditions. This data is then correlated with the sampling data and the records are compared to provide outputs that relate to the underlying causes for variable urine nucleic acid size.
[0157] Embodiments of the present invention involve the steps of (1) iterative optimization of sequencing methodologies to various size profiles, (2) optimizing sample collection and storage techniques to maintain integrity of nucleic acid size, (3) the implementation of quality controls to filter out samples of poor quality, and (4) the normalization of final sequencing data back to unique features of nucleic acid size profiles. Taken together, these steps and multiple iterations on sequencing methods have led to a high quality urine based genomics analysis.
[0158] The diagnostic sensitivity for detection of diseases within the urologic tract is influenced by the size distribution of nucleic acids in the sample. Based on this understanding, we have defined the following parameters/combinations to enhance assay performance.
[0159] In an embodiment, an analysis includes the targeting and enrichment of nucleic acids in the 120-5000 bp range and/or in the 5,000-10,000 bp range depending on sample profile and wherein these size ranges may be fragmented to population of molecules that are 500-600 bp in size through (1) mechanical fragmentation techniques such as ultasonication disclosed by Covaris, (2) enzyme based fragmentation, such as that performed by Kapa Hyper-plus, (3) restriction enzyme or (4) a cocktail of various restriction enzymes and wherein fragmented molecules are then placed into a library preparation reaction.
[0160] In an embodiment, an analysis of urine samples that are collected from an individual during the first or second morning void, thereby maximizing the time the urine has spent in contact with bladder epithelium is conducted. After sample processing using fragmentation techniques, the analysis of the genome is performed to determine if a plurality of marker DNA or RNA marker segments are present in the sample.
[0161] In an embodiment, an analysis of urine samples collected prior to consuming a meal or drinking fluids, minimizing physiologic activity of the kidney, wherein said analysis comprises processing to determine if a plurality of marker DNA or RNA segments are in the sample.
[0162] In an embodiment, the normalization of the sample, to size of nucleic acids is performed (1) to develop a urine sequencing diagnostic that analyzes signals in urologic tract it is favorable to enrich for and analyze nucleic acid that is greater than 100-150 bp in size and (2) to develop a urine sequencing diagnostic that analyzes nucleic acid signals from systemic circulation it is favorable to enrich nucleic acid that is smaller than 100 base pairs in size and specifically may range from 20-100 base pairs depending on kidney function/health. Common DNA measurements such as UV absorption and fluorimetry do not provide size information and may cause over or under loading of DNA into a library preparation reaction if used in isolation (see
[0163]
[0164]
CONCLUSION
[0165] It should be noted that the depicted order and labeled operations herein are indicative of one or more exemplary embodiments of certain presented methods. Other operations and methods can be conceived by skilled persons that are equivalent in function, logic, or effect to one or more operations, or portions thereof, of the illustrated methods. Although the operations of the methods herein are shown and described in a particular order, the order of the operations of each method may be altered so that certain operations may be performed in an inverse order or so that certain operations may be performed, at least in part, concurrently with other operations. In other embodiments, instructions or sub-operations of distinct operations may be implemented in an intermittent and/or alternating manner.
[0166] Lastly, while various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present embodiments should not be limited by any of the above-described description.
LITERATURE CITED
[0167] A. M. Newman, S. V. Bratman, J. To, J. F. Wynne, N. C. W. Eclov, L. A. Modlin, C. L. Liu, J. W. Neal, H. A. Wakelee, R. E. Merritt, J. B. Shrager, B. W. Loo Jr, A. A. Alizadeh, and M. Diehn, “An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage,” Nat. Med., vol. advance online publication, April 2014. [0168] S. R. Kennedy, M. W. Schmitt, E. J. Fox, B. F. Kohrn, J. J. Salk, E. H. Ahn, M. J. Prindle, K. J. Kuong, J.-C. Shen, R.-A. Risques, and L. A. Loeb, “Detecting ultralow-frequency mutations by Duplex Sequencing,” Nat. Protoc., vol. 9, no. 11, pp. 2586-2606, November 2014. [0169] M. W. Schmitt, S. R. Kennedy, J. J. Salk, E. J. Fox, J. B. Hiatt, and L. A. Loeb, “Detection of ultra-rare mutations by next-generation sequencing,” Proc. Natl. Acad. Sci. U.S.A, vol. 109, no. 36, pp. 14508-14513, September 2012. [0170] E. Crowley, F. Di Nicolantonio, F. Loupakis, and A. Bardelli, “Liquid biopsy: monitoring cancer-genetics in the blood,” Nat. Rev. Clin. Oncol., vol. 10, no. 8, pp. 472-484, August 2013. [0171] M. Murtaza, S.-J. Dawson, D. W. Y. Tsui, D. Gale, T. Forshew, A. M. Piskorz, C. Parkinson, S.-F. Chin, Z. Kingsbury, A. S. C. Wong, F. Marass, S. Humphray, J. Hadfield, D. Bentley, T. M. Chin, J. D. Brenton, C. Caldas, and N. Rosenfeld, “Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA,” Nature, vol. 497, no. 7447, pp. 108-112, May 2013. [0172] T. Forshew, M. Murtaza, C. Parkinson, D. Gale, D. W. Y. Tsui, F. Kaper, S.-J. Dawson, A. M. Piskorz, M. Jimenez-Linan, D. Bentley, J. Hadfield, A. P. May, C. Caldas, J. D. Brenton, and N. Rosenfeld, “Noninvasive Identification and Monitoring of Cancer Mutations by Targeted Deep Sequencing of Plasma DNA,” Sci. Transl. Med., vol. 4, no. 136, pp. 136ra68-136ra68, May 2012. [0173] G. Sozzi, D. Conte, M. Leon, R. Ciricione, L. Roz, C. Ratcliffe, E. Roz, N. Cirenei, M. Bellomi, G. Pelosi, M. A. Pierotti, and U. Pastorino, “Quantification of free circulating DNA as a diagnostic marker in lung cancer,” J. Clin. Oncol. Off J. Am. Soc. Clin. Oncol., vol. 21, no. 21, pp. 3902-3908, November 2003. [0174] C. Fernandez, Shore, and A. Shuber, “Noninvasive multianalyte diagnostic assay for monitoring bladder cancer recurrence,” Res. Rep. Urol., p. 49, October 2012. [0175] C. Fernandez, Millholland, Li, and A. Shuber, “Detection of low frequency FGFR3 mutations in the urine of bladder cancer patients using next-generation deep sequencing,” Res. Rep. Urol., p. 33, June 2012. [0176] W. Ranasinghe and R. Pers, “The Changing Incidence of Carcinoma In-Situ of the Bladder Worldwide,” in Advances in the Scientific Evaluation of Bladder Cancer and Molecular Basis for Diagnosis and Treatment, R. Persad, Ed. InTech, 2013. [0177] S. Myllykangas, J. D. Buenrostro, G. Natsoulis, J. M. Bell, and H. P. Ji, “Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing,” Nat. Biotechnol., vol. 29, no. 11, pp. 1024-1027, November 2011. [0178] H. Lee, B. T. Lau, and H. P. Ji, “Targeted Sequencing Strategies in Cancer Research,” in Next Generation Sequencing in Cancer Research, W. Wu and H. Choudhry, Eds. Springer New York, 2013, pp. 137-163. [0179] “Press Announcements—FDA allows marketing of four ‘next generation’ gene sequencing devices.” [Online] Available: www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm375742.htm. [Accessed: 2, Dec. 2014]. [0180] K. Bijwaard, J. S. Dickey, K. Kelm, and Z. Teak, “The first FDA marketing authorizations of next-generation sequencing technology and tests: challenges, solutions and impact for future assays,” Expert Rev. Mol. Diagn., pp. 1-8, November 2014 [0181] F. S. Collins and M. A. Hamburg, “First FDA Authorization for Next-Generation Sequencer,” N. Engl. J. Med., vol. 369, no. 25, pp. 2369-2371, November 2013 [0182] D. C. Koboldt, Q. Zhang, D. E. Larson, D. Shen, M. D. McLellan, L. Lin, C. A. Miller, E. R. Mardis, L. Ding, and R. K. Wilson, “VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing,” Genome Res., vol. 22, no. 3, pp. 568-576, March 2012. [0183] A. Wilm, P. P. K. Aw, D. Bertrand, G. H. T. Yeo, S. H. Ong, C. H. Wong, C. C. Khor, R. Petric, M. L. Hibberd, and N. Nagarajan, “LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets,” Nucleic Acids Res., vol. 40, no. 22, pp. 11189-11201, December 2012. [0184] Z. Wei, W. Wang, P. Hu, G. J. Lyon, and H. Hakonarson, “SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data,” Nucleic Acids Res., vol. 39, no. 19, p. e132, October 2011. [0185] K. Cibulskis, M. S. Lawrence, S. L. Carter, A. Sivachenko, D. Jaffe, C. Sougnez, S. Gabriel, M. Meyerson, E. S. Lander, and G. Getz, “Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples,” Nat. Biotechnol., vol. 31, no. 3, pp. 213-219, March 2013. [0186] J. Reading, R. R. Hall, and M. K. Parmar, “The application of a prognostic factor analysis for Ta.T1 bladder cancer in routine urological practice,” Br. J. Urol., vol. 75, no. 5, pp. 604-607, May 1995.
TABLE-US-00002 TABLE 1 HG19 HG19 Gene Synonym Chromo- Basepair Basepair Symbol Symbol Full Gene Name some Start Site Stop Site KDM6A lysine (K)-specific demethylase 6A X 44732423 44971845 MLL2 KMT2B lysine (K)-specific methyltransferase 2D 12 49412758 49449107 TSC1 tuberous sclerosis 1 9 135766735 135820020 NOTCH2 notch 2 1 120454176 120612317 PTEN phosphatase and tensin homolog 10 89623195 89728532 TP53 tumor protein p53 17 7571720 7590868 NOTCH 1 notch 1 9 139388896 139440238 CDKN2A cyclin-dependent kinase inhibitor 2A 9 21967751 21994490 RB 1 retinoblastoma 1 13 48877883 49056026 ATM ATM serine/threonine kinase 11 108093559 108239826 ERBB2 2 erb-b2 receptor tyrosine kinase 17 37844393 37884915 PIK3CA phosphatidylinositol-4,5-bisphosphate 3- 3 178866311 178952497 kinase, catalytic subunit alpha FGFR3 fibroblast growth factor receptor 3 4 1795039 1810599 EGFR epidermal growth factor receptor 7 55086725 55275031 FGFR1 fibroblast growth factor receptor 1 8 38268656 38326352 CREBBP CREB binding protein 16 3775056 3930121 LRP1B low density lipoprotein receptor-related 2 140988996 142889270 protein 1B MYC v-myc avian myelocytomatosis viral 8 128748315 128753680 oncogene homolog ARID 1 A AT rich interactive domain 1A (SWI-like) 1 27022522 27108601 MLL3 KMT2C lysine (K)-specific methyltransferase 2C 7 151832010 152133090 BIRC3 baculoviral IAP repeat containing 3 11 102188181 102210135 WWOX WW domain containing oxidoreductase 16 78133327 79246564 PALB2 partner and localizer of BRCA2 16 23614483 23652678 SOX4 SRY (sex determining region Y)-box4 6 21593972 21598849 YAP1 Yes-associated protein 1 11 101981192 102104154 CCND1 cyclin D 1 11 69455873 69469242 BCL2L1 BCL2-like 1 20 30252261 30310656 MYCL1 v-myc avian myelocytomatosis viral 1 40361096 40367687 oncogene lung carcinoma derived homolog MDM4 MDM4, p53 regulator 1 204485507 204527248 FGF3 fibroblast growth factor 3 11 69624736 69634192 MDM2 MDM2 proto-oncogene, E3 ubiquitin 12 69201971 69239320 protein ligase CCNE1 cyclin E1 19 30302901 30315215 ZNF703 zinc finger protein 703 8 37553301 37556396 PRKCI protein kinase C, iota 3 169940220 170023770 NCOR1 nuclear receptor corepressor 1 17 15933408 16118874 YWHAZ tyrosine 3-monooxygenase/tryptophan 5- 8 101930804 101965623 monooxygenase activation protein, zeta PPARG peroxisome proliferator- activated receptor 3 12329349 12475855 gamma TBL1XR1 transducin (beta)-like 1 X- linked receptor 3 176738542 176915048 1 PDE4D phosphodiesterase 4D, cAMP-specific 5 58264866 59783925 IKZF2 IKAROS family zinc finger 2 (Helios) 2 213864411 214016333 SPAG1 sperm associated antigen 1 8 101170263 101254132 E2F3 E2F transcription factor 3 6 20402137 20493945 NIT1 nitrilase 1 1 161087862 161095235 BEND3 BEN domain containing 3 6 107386385 107435636 GDI2 GDP dissociation inhibitor 2 10 5807186 5855512 PVRL4 poliovirus receptor-related 4 1 161040781 161059385 CCSER1 coiled-coil serine-rich protein 1 4 91048684 92523370 TERT telomerase reverse transcriptase promoter 5 1253287 1295162 Promoter region SPTAN1 spectrin, alpha, non-erythrocytic 1 9 131314837 131395944 HRAS Harvey rat sarcoma viral 11 532242 535550 oncogene homolog CTNNB 1 catenin (cadherin-associated protein), beta 3 41240942 41281939 1, 88 kDa PBXW7 F-box and WD repeat domain containing 7, 4 153242410 153456393 E3 ubiquitin protein ligase EP300 E1 A binding protein p300 22 41488614 41576081 RHOA ras homolog family member A 3 49396579 49449526 CCND3 cyclin D3 6 41902671 42016610 NOS1AP nitric oxide synthase 1 (neuronal) adaptor 1 162039581 162339813 protein ELF3 E74-like factor 3 (ets domain transcription 1 201979690 201986315 factor, epithelial- specific) PTPRD protein tyrosine phosphatase, receptor type, 9 8314246 10612723 D STAG2 stromal antigen 2 X 123094475 123236505 ERBB3 erb-b2 receptor tyrosine kinase 3 12 56473809 56497291 CDKN1A cyclin-dependent kinase inhibitor 1A (p21, 6 36644237 36655116 Cipl) NFE2L2 nuclear factor, erythroid 2-like 2 2 178095031 178129859 AIRE autoimmune regulator 21 45705721 45718102 BTG2 BTG family, member 2 1 203274664 203278729 TTC28 tetratricopeptide repeat domain 28 22 28374002 29075853 IKZF3 IKAROS family zinc finger 3 (Aiolos) 17 37913968 38020441 FHIT fragile histidine triad 3 59735036 61237133 SHANK2 SH3 and multiple ankyrin repeat domains 2 11 70313961 70935808 ERCC2 excision repair cross- complementation 19 45854649 45873845 group 2 TPTE transmembrane phosphatase with tensin 21 10906743 10990920 homology KLF5 Kruppel-like factor 5 (intestinal) 13 73633142 73651676 FOXA1 forkhead box A1 14 38058757 38064325 PON3 paraoxonase 3 7 94989184 95025687 RXRA retinoid X receptor, alpha 9 137218316 137332431 ZFP36L1 ZFP36 ring finger protein-like 1 14 69254372 69262960 GPC5 glypican 5 13 92050935 93519487 PCSK5 proprotein convertase subtilisin/kexin 9 78505560 78977255 type 5 CTIF CBP80/20-dependent translation initiation 18 46065427 46389586 factor FOXQ1 forkhead box Q1 6 1312675 1314993 TIMM9 translocase of inner mitochondrial 14 58875370 58894232 membrane 9 homolog (yeast) CX3CL1 chemokine (C-X3-C motif) ligand 1 16 57406414 57418956 TXNIP thioredoxin interacting protein 1 145438462 145442628 RHOB ras homolog family member B 2 20646835 20649201 PAIP1 poly(A) binding protein interacting 5 43526370 43557521 protein 1 PHACTR1 phosphatase and actin regulator 1 6 12717037 13288075 CDKAL1 CDK5 regulatory subunit associated 6 20534688 21232634 protein 1-like 1 TACC3 transforming, acidic coiled- coil containing 4 1723217 1746905 protein 3 ASXL2 additional sex combs like transcriptional 2 25962253 26101312 regulator 2 HORMAD1 HORMA domain containing 1 1 150670535 150693364 PHLDA3 pleckstrin homology-like domain, family 1 201434607 201438299 A, member 3 MIPOL1 mirror-image Polydactyly 1 14 37667118 38020464 ZFR2 zinc finger RNA binding protein 2 19 3804022 3869027 PIGH phosphatidylinositol glycan anchor 14 68056023 68067017 biosynthesis, class H WRB tryptophan rich basic protein 21 40752213 40769815 MRO Maestro 18 48321490 48351754 STYX serine/threonine/tyrosine interacting 14 53196883 53241705 protein MDFIC MyoD family inhibitor domain containing 7 114562209 114659970 ERMN ermin, ERM-like protein 2 158175125 158184146 RND3 Rho family GTPase 3 2 151324707 151344209