Methods of quantifying RNA and DNA variants through sequencing employing phosphorothioates
11802311 · 2023-10-31
Assignee
Inventors
- Bo Cao (Cambridge, MA, US)
- Peter C. DEDON (Boston, MA, US)
- Jennifer F. Hu (Cambridge, MA, US)
- Michael S. DeMott (Newton, MA, US)
Cpc classification
C12Q2535/101
CHEMISTRY; METALLURGY
C12Q2535/101
CHEMISTRY; METALLURGY
C12N15/1096
CHEMISTRY; METALLURGY
International classification
C12Q1/6874
CHEMISTRY; METALLURGY
C12N15/10
CHEMISTRY; METALLURGY
Abstract
This disclosure provides methods and compositions for analyzing nucleic acids such as DNA and RNA, and including determination of absolute numbers of such nucleic acids and/or detection and localization of lesions or other modifications on such nucleic acids.
Claims
1. A method for analyzing modifications in naturally occurring nucleic acids comprising, (a) incubating a naturally occurring nucleic acid with a first polymerase and a ddNTP under conditions sufficient to fill in one or more single-stranded nicks in the nucleic acid, (b) treating the nucleic acid to convert a nucleic acid modification into a single-stranded nick, wherein the nucleic acid modification is a phosphorothioate modification, a methyl5C modification, or a DNA damage modification that is 8-oxoguanine, thereby generating a nicked nucleic acid, (c) incubating the nicked nucleic acid with a second polymerase and alpha-thio-dNTPs under conditions sufficient to generate a phosphorothioate-labeled nucleic acid fragment, and (d) mapping the phosphorothioate-labeled nucleic acid fragment onto a genomic map corresponding to a source of the naturally occurring nucleic acid to determine location of modifications in the naturally occurring nucleic acid, wherein the naturally occurring nucleic acid is DNA.
2. The method of claim 1, wherein the ddNTP is dideoxycytidine.
3. The method of claim 1, wherein the treating of step (b) is enzymatically, chemically and/or mechanically treating.
4. The method of claim 1, wherein the first polymerase and/or the second polymerase is DNA polymerase I.
5. The method of claim 1, wherein the nucleic acid modification is a phosphorothioate modification, wherein said nucleic acid modification is converted into a single-stranded nick using iodine.
6. The method of claim 1, wherein the nucleic acid modification is a methyl5C modification, wherein said nucleic acid modification is converted into a single-stranded nick using TET or TDG enzyme that converts a methyl5C to an abasic site and an AP endonuclease that converts abasic sites to single-stranded nicks capable of nick translation.
7. The method of claim 1, wherein the nucleic acid modification is a DNA damage modification that is 8-oxoguanine, wherein said nucleic acid modification is converted into a single-stranded nick using FAPY glycosylase.
8. The method of claim 1, wherein the phosphorothioate-labeled nucleic acid fragment is 100-500 nucleotides in length.
9. A method for detecting and mapping one or more modifications in a naturally occurring DNA sample comprising (a) incubating a DNA sample with DNA polymerase I and dideoxycytidine under conditions sufficient to fill in and/or block existing single-stranded nicks in the naturally occurring DNA sample, (b) treating the DNA sample to convert existing DNA modifications into single-stranded nicks, wherein said treating is enzymatically, chemically or mechanically treating, thereby generating nicked DNA, wherein the DNA modification is a phosphorothioate modification, a methyl5C modification, or a DNA damage modification that is 8-oxoguanine, (c) incubating the nicked DNA with alpha-thio-dNTPs and DNA polymerase I under conditions sufficient to generate phosphorothioate-labeled DNA fragments through a process of nick translation/strand displacement, wherein said fragments are at least 100-500 nucleotides in length, (d) incubating the DNA sample with nuclease P1 or an endo- or exo-nuclease that does not cleave phosphorothioate-labeled DNA fragments, (e) isolating the phosphorothioate-labeled DNA fragments, (f) amplifying and sequencing the phosphorothioate-labeled DNA fragments to generate sequencing reads, and (g) mapping the sequencing reads onto a genomic map of the source of the naturally occurring DNA sample to determine location of the DNA modifications in the naturally occurring DNA sample.
10. The method of claim 9, wherein the DNA modification is a phosphorothioate modification, wherein said nucleic acid modification is converted into a single-stranded nick using iodine.
11. The method of claim 9, wherein the DNA modification is a methyl5C modification, wherein said nucleic acid modification is converted into a single-stranded nick using TET or TDG enzyme that converts a methyl5C to an abasic site and an AP endonuclease that converts abasic sites to single-stranded nicks capable of nick translation.
12. The method of claim 9, wherein the DNA modification is a DNA damage modification that is 8-oxoguanine, wherein said nucleic acid modification is converted into a single-stranded nick using FAPY glycosylase.
13. A method for detecting and mapping one or more nucleic acid lesions in a naturally occurring nucleic acid sample comprising (a) incubating a nucleic acid sample with a polymerase and alpha-thio-dNTPs under conditions sufficient to generate a phosphorothioate-labeled nucleic acid fragment, (b) removing unlabeled nucleic acids under conditions that specifically degrade said unlabeled nucleic acids and do not degrade the phosphorothioate-labeled nucleic acid fragment, and (c) mapping the phosphorothioate-labeled nucleic acid fragment onto a genomic map corresponding to a source of the naturally occurring nucleic acid sample to determine location of the one or more nucleic acid lesions.
14. The method of claim 13, wherein the polymerase is DNA polymerase I.
15. A kit comprising alpha-thio-dNTPs, ddNTP, wherein the ddNTP is or comprises dideoxycytidine, a polymerase, and a buffer(s), iodine, FAPY glycosylase, TET or TDG enzyme capable of converting a methyl5C to an abasic site, an enzyme capable of converting a DNA damage lesion to a single-stranded nick, an enzyme capable of removing a sugar residue from a nucleic acid, hydroxyl radicals, or a chemical capable of generating hydroxyl radicals, and instructions directing use according to the method of claim 1.
16. A method for measuring RNA in a sample comprising (a) dephosphorylating RNA in a sample, thereby generating dephosphorylated RNA, (b) ligating, to the dephosphorylated RNA, a ddNTP-ended oligodeoxynucleotide linker having two or more randomized nucleotides at its 5′-end (Linker 1), thereby generating a linker-ligated RNA, (c) optionally treating the linker-ligated RNA conjugate with an AlkB enzyme capable of reducing level of RNA modification, (d) removing excess Linker 1 by treating with deadenylase to remove a ligase-mediated intermediate and then degrading Linker 1 with the 2′-deoxyribonuclease Rec J, (e) reverse transcribing the linker-ligated RNA into cDNA using a primer complementary to (e) Linker 1 and reverse transcriptase, (f) degrading residual RNA, (g) ligating a hairpin/splint oligodeoxynucleotide linker comprising a double-stranded stem region, a single-stranded loop region, a random nucleotide sequence region capable of hybridizing to the cDNA, and a single-stranded 3′ end (Linker 2) to the cDNA, (h) removing excess Linker 2 by treating with deadenylase to remove a ligase-mediated intermediate and then degrading Linker 2 with the 2′-deoxyribonuclease Rec J, and (i) amplifying the linker-ligated cDNA using primers that comprise reverse complements of sequences in Linkers 1 and 2 (Primer 1 and Primer 2).
17. A kit comprising DNA oligonucleotides, a ddNTP-ended oligodeoxynucleotide linker having two or more randomized nucleotides at its 5′-end (Linker 1), a hairpin/splint oligodeoxynucleotide linker comprising a double-stranded stem region, a single-stranded loop region, a random nucleotide sequence region capable of hybridizing to a cDNA, and a single-stranded 3′ end (Linker 2), a reverse transcription (RT) primer, and primers that comprise reverse complements of sequences in Linkers 1 and 2 (Primer 1 and Primer 2).
18. The method of claim 1, further comprising removing unlabeled nucleic acids under conditions that specifically degrade said unlabeled nucleic acids and do not degrade the phosphorothioate-labeled nucleic acid fragment prior to step (d).
19. The method of claim 1, further comprising isolating or purifying the phosphorothioate-labeled nucleic acid fragment prior to step (d).
20. The method of claim 1, further comprising amplifying and/or sequencing the phosphorothioate-labeled nucleic acid fragment prior to step (d).
21. The method of claim 9, wherein step (e) is conducted by ethanol precipitation or column chromatography.
22. The method of claim 16, wherein: (i) in step (a) dephosphorylating RNA in a sample comprises using alkaline phosphatase, (ii) in step (b) the ddNTP-ended oligodeoxynucleotide linker is dideoxycytidine-ended oligodeoxynucleotide linker, (iii) in step (c) the AlkB enzyme is a mutant AlkB enzyme, (iv) in step (f) degrading residual RNA includes degrading RNA that is not linker-ligated, (v) in step (f) degrading residual RNA includes degrading RNA using alkaline hydrolysis, and/or (vi) in step (g) ligating the hairpin/splint oligodeoxynucleotide linker (Linker 2) to the cDNA comprises using T4 DNA ligase.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5) TABLE-US-00001 (SEQ ID NO: 3) 5′-G.sub.PTT.sub.PTC.sub.PTC.sub.PTT.sub.PTT.sub.PTGGTGCCCGAGTG-OH-3′.
(6)
(7)
(8) TABLE-US-00002 (SEQ ID NO: 6) 5′-AATGATACGGCGACCACCGAGATCTACA-C-XXXXXX- ACACTCTTTCCCTACACGACGCTCTTCCGATCT- TGAACAGCGACTAGGCTCTTCA-3′,
and primer 2 as the sequence of
(9) TABLE-US-00003 (SEQ ID NO: 7) 5′-CAAGCAGAAGACGGCATACGAGAT-XXXXXX- CGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT- GTCCTTGGTGCCCGAGTG-3′.
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17) A color version of the Figures is being filed along with a gray-scale version. Reference may be made to the color version where color is used to distinguish and/or highlight information in the Figures.
DETAILED DESCRIPTION
(18) This disclosure provides various methods and products relating to detection, quantification and sequencing of nucleic acids such as but not limited to RNA. This disclosure further provides various methods and products relating to universal detection of nucleic acid features (e.g., mutations, modifications, etc.). The various aspects and embodiments of this disclosure are discussed in greater detail below.
(19) RNA Sequencing Methods for Absolute Quantification of RNA Molecules
(20) This disclosure provides, in part, methods that enable one skilled in the art to perform RNA sequencing in which the abundances (i.e., copy numbers) of different RNA molecules can be compared directly in the same sample. Unlike other RNA sequencing methods, which allow only relative quantification of changes in the levels of RNA molecules between different samples, the method described here enables a direct, linear correlation between the sequencing read counts and the number of copies of all RNA molecules within a single sample. This allows quantitative definition of landscape of RNA molecules in a cell or tissue at any given moment. This process can be applied to RNA from any source, with multiplexing to accommodate many samples, and it can be used to investigate gene expression, RNA metabolism, RNA stability, RNA therapeutics, and any other problems related to quantitative analysis of RNA molecules.
(21) As shown in
(22) These methods cannot however be used for absolute quantification of RNA molecules, in which the levels of different types of RNA transcripts are compared in the same sample of RNA. For example, there are ˜30-55 different types of transfer RNAs (tRNAs) in most types of prokaryotic cells such as bacteria. Eukaryotic cells, including yeast and human cells, can have up to hundreds of different transfer RNAs. tRNAs represent the adaptor molecules that read the genetic code in messenger RNAs (mRNAs) and carry the corresponding amino acid for synthesis of the protein encoded by the mRNA. The level of each type of tRNA is thought to reflect the translational needs of the cell at any given moment, with some specific types of tRNA occurring at very low levels while other types potentially present at orders-of-magnitude higher levels.
(23) There are two reasons why current RNA sequencing techniques cannot be used for absolute quantification of RNA molecules. First, the attachment of oligonucleotide linkers to each end of an RNA molecule before reverse transcription results in the loss of information about some RNA molecules when the reverse transcriptase falls off the RNA due to an error in processivity of the enzyme or due to an encounter with some types of modified nucleosides. Regarding the latter, the cells in each type of organism contain 25-50 or more chemical modifications of the canonical A, G, C and U nucleotides in RNA..sup.1,2 Some of these modifications block the polymerase activity during reverse transcription, so that the enzyme falls off the RNA molecule before completely copying the molecule through to the other end. Such failure sequences do not possess the second PCR linker, so they cannot be amplified and thus fail to appear in the final sequencing results (
(24) A second problem is that the ligase enzymes used for RNA sequencing vary in their efficiency by more than 10.sup.3-fold due to differences in the last two nucleotides at each end of the RNA molecule..sup.3-5 This variation in linker ligation efficiency manifests as 10.sup.6-fold variation in the read counts from RNA sequencing applied to tRNA molecules..sup.6 There is thus no predictable or direct correlation between sequencing read counts and the number of copies of an RNA molecule in a sample analyzed by current RNA sequencing methods.
(25) As described herein, to satisfy the need for absolute quantification in RNA sequencing analyses, an RNA sequencing method has been developed that enables a direct, linear correlation between the sequencing read counts and the number of copies of all RNA molecules within the same sample. The method is detailed in
(26) This process may involve the following steps: (1) dephosphorylate purified RNA, in some instances consisting of all RNA molecules including for example all RNA molecules less than 200 nt in length (“small RNA”), which includes tRNAs; (2) ligate a dideoxycytidine-ended oligodeoxynucleotide linker with two randomized nucleotides at the 5′-end (Linker 1) to the dephosphorylated RNA using for example T4 RNA ligase, which results in >91% ligation efficiency; (3) reduce the levels of RNA modifications using an AlkB enzyme which may be a mutant AlkB enzyme or it may be a mixture of AlkB enzymes having differing fidelities and/or susceptibilities to modifications; (4) remove excess Linker 1 by treating the sample with for example deadenylase to remove a ligase-mediated intermediate and then degrading Linker 1 for example with the 2′-deoxyribonuclease Rec J; (5) reverse transcribe the linker-ligated RNA into cDNA using a primer complementary to Linker 1 and reverse transcriptase; (6) degrade the RNA template for example by alkaline hydrolysis; (7) ligate a uniquely-designed hairpin/splint oligodeoxynucleotide linker (Linker 2) to the cDNA molecules using for example T4 DNA ligase; (8) remove excess Linker 2 for example by deadenylation and Rec J treatment as described in step #4; and (9) ligate standard NGS sequencing linkers by PCR followed by sequencing using standard NGS platforms.
(27) As a method for absolute quantification of RNA molecules in the same sample, which is not possible for existing methods, this RNA sequencing method involves an new and nonobvious combination of RNA- and DNA-manipulating enzymes and uniquely structured oligodeoxynucleotide linkers to process a mixture of RNA molecules (i.e., prepare an RNA library) for subsequent sequencing by standard NGS methods. The inventive features of these methods include: (1) uniquely designed oligodeoxynucleotide linkers and optimized reaction conditions that enhance the efficiency of the RNA and DNA ligase enzymes to >91%; (2) unique combinations of enzymes (deadenylase, Rec J) that allow removal of excess linkers without harming the RNA template or cDNA product, thus enhancing the efficiency of subsequent enzymatic reactions; and (3) the ligation of the 5′ linker (Linker 2) after the reverse transcription step, which avoids loss of RNA molecules by fall-off of the reverse transcriptase. This novel and nonobvious combination of reagents and conditions allows deep-sequencing analysis of the RNA molecules such that the sequencing read count for each type of RNA is directly and linearly correlated with the number of copies that RNA sequence. This method can be applied by researchers in the form of a kit in many fields of academic, regulatory or industrial science using any type of synthetic or natural RNA from any organism, such as viruses, bacteria, parasites, yeast, and mammalian and human cell and tissues.
(28) This RNA sequencing method has been reduced to practice in at least three applications: (1) with standard mixtures containing 5 RNA oligos of varying lengths and abundances to determine the extent of length-dependent biases and confirm the linearity of the sequencing method for RNAs between 25 and 80 nucleotides, (2) with an equimolar mixture of microRNA standards to determine the extent of sequence-dependent biases on quantification, and (3) with DNA from Mycobacterium bovis BCG bacteria to demonstrate the landscape and how the landscape changes when the cells are subjected to the stress of nutrient deprivation.
(29) The present disclosure provides a widely applicable methodology to quantify expressed intracellular RNA species, including tRNA isoacceptors and tRNA fragments, using next generation sequencing (NGS). This novel method for NGS library preparation can efficiently capture small RNA sequences without bias for length or sequence and quantitatively convert these sequences into cDNA by reverse transcription. The resulting cDNA is then PCR amplified and sequenced using paired-end high-throughput sequencing. Aligned output reads can be used to determine the absolute abundance of expressed small RNAs including both full-length and fragment tRNA isoacceptors.
(30) Existing methods for quantifying RNAs by sequencing have mainly focused on mRNA (transcriptional profiling) using either total RNA or enriched mRNA as starting materials. mRNA is considerably easier to sequence by NGS methods because relative to tRNA, it is less structured and it contains fewer RNA modifications, both of which hamper cDNA synthesis from tRNA. For example, existing methods for mRNA sequencing involve simultaneous ligation of 5′ and 3′ adapters followed by reverse transcription and PCR amplification. In contrast, in the present disclosure, only the 3′ linker is ligated to the RNA starting material prior to reverse transcription. This is done to reduce loss of templates that form truncated cDNA due to the presence of polymerase-blocking modifications or secondary structures (
(31) An example of an alternate approach for tRNA sequencing has been reported. It incorporates an enzymatic tRNA demethylation step prior to cDNA generation to minimize the effect of polymerase-blocking modifications, uses a template-switching reverse transcriptase in order to obviate the need for linker ligation, and employs a cDNA circularization strategy prior to amplification..sup.8 This reported method however does not take as rigorous an approach to quantitation as the methods disclosed herein. In both the reverse transcription and circularization steps, their method does not address known sequence-dependent biases in the activities and efficiencies of the enzymes..sup.9 For example, the circularization efficiency of CircLigaseII, the commercially available ssDNA ligase used in the prior method, is known to vary depending on identity of the two terminal nucleotides..sup.9 In the context of measuring the composition of the expressed tRNA pool, such sequence-dependent biases will skew the capture of isoacceptors carrying “preferred” sequences and cannot be relied up on to provide accurate quantitation of the full tRNA or small RNA landscape.
(32) In the present disclosure, by contrast, every step and parameter of the method, including adapter sequences, stoichiometries, and enzymatic reaction conditions, have been designed, tested, and optimized to either quantitatively capture all sequences or be free from sequence-dependent biases. To start, the 3′-Linker 1 is designed with two randomized nucleotides at the 5′-end to minimize ligation differences between varied sample sequences. Indeed, using this approach, it is demonstrated that >91% of starting sequences are ligated with a 3′-end Linker 1 by Bioanalyzer analysis. Reverse transcription proceeds after the first ligation step and demethylation. To avoid sequence biases reported with the template-switching reverse transcriptase enzyme used by others,.sup.8 a high-processivity, high-accuracy, thermostable, commercially-available reverse transcriptase was selected, although it was also found that a mixture of enzymes with slightly differing fidelities and susceptibilities to modifications could also be used in this step. The 5′-Linker 2 is designed to have a hairpin with a six-nucleotide NNNNNN overhang that is complementary with the 3′-end of the cDNA. This structure brings the 5′-end of the Linker in close proximity to the 3′-end of the target cDNA to maximize ligation efficiency. With these optimizations, there is nearly complete conversion of the cDNA to cDNA+5′ adapter by Bioanalyzer.
(33) An unbiased RNA sequencing method has many commercial applications in basic and applied research, biomedical diagnostics, drug development, and any other biological or biomedical application requiring knowledge of RNA levels. All commercial applications would derive from a basic kit containing DNA oligos, buffers, and enzymes to allow high-throughput quantification of RNA and RNA fragments in any size range.
(34) Examples of applications of the RNA sequencing method of this disclosure and variations thereof include: RNA quantification—The basic methodology is applicable to generating RNA profiles in samples of cultured cells and tissues, including bodily fluids and excretions (urine, saliva, blood, feces). These RNA profiles can be used in a number of research applications related to diagnosis of infectious disease, quantification of gene expression, quantification of microRNAs or non-coding regulatory RNAs, measurement of RNA stability, analysis of RNA processing, and any other application requiring quantitative analysis of RNA. Identification and quantification RNA modifications—There is increasing evidence that RNA modifications have important functions in RNA processing, RNA stability, and the regulation of translation. This methodology would find use in identifying and quantifying RNA modification sites as reverse transcriptase “fall-off” sites when the AlkB treatment (e.g., mutant AlkB treatment) is omitted or as bypass mutagenesis sites when a modification induces a misincorporation. There is great value in defining maps of the locations of modified nucleotides in different RNA molecules, all in the same sample, as well as changes in the maps as a function cell state and stress. RNA fragmentation analysis—Recent studies have revealed novel mechanisms of control of gene expression and RNA stability by small RNA fragments generated from larger functional RNAs such as tRNA, mRNA and rRNAs. The present methodology is able to quantify all RNA fragments in a sample, to quantify the fragments in relation to the parent RNA species, and to define the location of the fragmentation reaction. This would provide invaluable insights into the biological and medical impact of small regulatory RNA species. Study of the biogenesis of functional RNA and RNA decay—Primary RNA transcripts undergo various processing events including splicing, joining of different units, trimming, and circularization prior to maturation. The present methodology is able to quantify intermediary RNA fragments in a sample and to quantify these intermediates in relation to the mature RNA species, and to define the localization of intermediates. This would provide invaluable insights into the biological and medical impact of small regulatory RNA species. RNA biomarker discovery—This methodology can generate the landscape of all RNA molecules in a cell that can be compared against a known landscape that is from specific disease or condition, thus informing a biomarker of this disease or condition. This special biomarker can be used for predictive of prognosis and/or diagnosis of the disease or condition.
Mapping DNA Modifications and DNA Damage at Single-Nucleotide Resolution Across Genomes
(35) This disclosure further provides widely applicable methods for quantitative profiling, localizing or mapping of nucleic acid (e.g., DNA) modifications, damage or structures at single-nucleotide resolution in any type of nucleic acid (e.g., DNA) and across entire genomes from any organism. These maps can be related to other DNA structures and genome architecture, and also provide a means to identify the biological source of the DNA. This process can be applied to any DNA modification or DNA structure that exists as a single-strand break or that can be converted into a single-strand break, including, among other examples, (1) DNA nicks arising during natural DNA metabolism, such as damage, repair, modification, replication, transcription and other processes, and (2) intentional conversion of these and other DNA and chromatin features into DNA nicks by chemical, mechanical or enzymatic means. The resulting profile or map of the DNA nicks across a genome provides information about the genomic location of the feature, the frequency of a feature at any specific site in the genome, and changes in the locations and quantities of DNA features as a function of cell stress, cell type, disease state or any other situation. The method will find wide application in many fields of biological and biomedical research and development in academia and in the clinic.
(36) The ability to localize DNA damage and DNA repair processes throughout an entire genome, to define regions that are hotspots for damage or that show different rates of repair, which may be strongly associated with the frequency of mutations causing disease such as cancer, diabetes and many others has many applications. DNA sequencing is being used to localize sites of DNA damage and repair in the genome. For example, following fragmentation of genomic DNA, antibodies against specific kinds of DNA damage can be used to affinity purify DNA fragments containing the damage, with the fragments subjected to standard DNA sequencing to crudely localize the damage in the genome. However, this is imprecise at best.
(37) DNA sequencing technology has also been used to define the locations of the m5C modifications—the methylome—in specific genes in a genome and the patterns of modification that correlate with gene expression patterns in different cells and tissues. One method for mapping m5C in genomes involves the selective conversion of C but not m5C to uracil (U) by reaction with bisulfite. Subsequent sequencing then reveals the location of all m5Cs as a normal C, while U's arising from unmodified C's are sequenced as thymidine (T).
(38) The problem with the current use of DNA sequencing technologies for mapping DNA features such as damage and modifications across a genome, however, is that each method is uniquely designed for only one feature. For example, bisulfite sequencing can only be applied to methylome mapping. There are no universal methods for mapping different types of DNA modifications or damage products.
(39) This disclosure provides such universal methods.
(40) The Nick-seq™ sequencing method has been developed for single-nucleotide-resolution, genome-wide localization of any kind of DNA feature that can be converted to a DNA strand-break (i.e., nick). The method is illustrated in
(41) Specifically, the novel process may involve the following steps for DNA with modifications and features that need to be converted into nicks (for DNA already containing nicks of interest, proceed to step 3): (1) treat the purified DNA with DNA polymerase I and ddNTP, such as dideoxycytidine, to block existing DNA nicks; (2) treat the DNA samples (e.g., enzymatically, chemically, mechanically) to create DNA nicks at features of interest (e.g., DNA modifications, DNA damage); (3) label the new DNA nicks by nick translation (i.e., DNA strand displacement with α-thio-dNTPs) by DNA polymerase 1 to create phosphorothioate-containing DNA fragments (PT-DNA) starting at nick sites and extending several hundred nucleotides; (4) remove the original, unlabeled DNA by digestion with nucleases, such as nuclease P1, a combination of RecJ and Exonuclease III, or other exo- or endo-nuclease(s), which do not cleave PT-containing DNA; (5) purify the PT-DNA fragments for example by ethanol precipitation or column chromatography; (6) amplify and optionally sequence the DNA for example using standard deep-sequencing techniques; and optionally (7) map the deep-sequencing reads onto the original DNA or genome by standard informatics methods. The sites of the original DNA nicks will be evident as the 5′-most nucleotides of the sequenced PT-DNA fragments.
(42) As a method for mapping DNA features at ultra-high (i.e., single-nucleotide) resolution, this Nick-seq™ sequencing method relates to the ability to extend (or translate) DNA nicks with DNA polymerase 1. Nick translation has been used to label sites containing DNA nicks. The methods provided herein differ from classical nick translation, at least in part, by transforming the DNA sites of interest into nuclease-resistant DNA fragments (i.e., PT-DNA). This transformation allows deep-sequencing analysis of the DNA fragments and enhances the signal-to-noise ratio of the sequencing by destroying the bulk of the unlabeled genomic DNA. This method can be applied by researchers in the form of a kit in many fields of academic, regulatory or industrial science using any type of DNA or organism containing DNA, such as viruses, bacteria, parasites, yeast, mammalian cells, and human cells. Further, the method can be applied to any kind of DNA modification, DNA damage, enzymatic cleavage site in DNA, or any other DNA-related feature that can be converted to a DNA nick. The methodology provides unprecedented access to information about the genomic locations of DNA features, as well as a means to identify the source of the nick-containing DNA, such as organisms in complex environments and the microbiome. This has not been possible heretofore with existing methods.
(43) The specificity and sensitivity of the Nick-seq™ method has been demonstrated using synthesized DNA oligos (
(44) The specificity and sensitivity of the Nick-seq™ method has also been demonstrated in a bacterial genome (
(45) The specificity and sensitivity of the Nick-seq™ method for mapping PT modification in a bacterial genome (
(46) The present disclosure provides a widely applicable method to identify the genomic locations of DNA modifications, damage and structures that can be converted to strand-breaks. The methodology labels these sites nuclease-resistant modifications that allow destruction of the unlabeled genome with a nuclease and subsequent deep-sequencing of the protected fragments in which the 5′-end maps the location of the nick of interest. This process allows one to map any kind of DNA feature that can be converted to a nick, in DNA of virtually any size, from oligonucleotides to genomes, and in DNA from any source. Existing methods to map DNA damage and modifications across genomes, such as bisulfite sequencing to locate methylation modifications, are limited to a specific modification and cannot be applied to other modifications. Unlike the present method that allows single-nucleotide resolution and high sensitivity, existing methods for mapping strand-breaks, such as 3′-terminal labeling with biotin or fluorescent molecules by terminal transferase (TdT) or DNA polymerase, are highly insensitive and most do not provide information about the precise location in the genome, with resolution limited to very crude estimates of the position in large DNA molecules or regions of the genome. For those high-resolution genome mapping methods that provide single-nucleotide resolution of DNA features, they also suffer from reliance on computational predictions or specialized immunoprecipitation steps that limit the methods to specific types of DNA features. For example, a computational approach has been developed to map DNA structures across genomes by comparing DNA sequenced-based structural predictions to a computed likelihood of DNA cleavage by hydroxyl radicals (ORChID: •OH Radical Cleavage Intensity Database) based on an empirical collection of DNA cleavage patterns from small DNA fragments..sup.10 While this method can help our understanding of how DNA sequence determines the locations of protein-binding sites and other biologically important structural features of DNA, this method approach identifies predicted structures and not true structures. In contrast, the Nick-seq™ method would allow one to generate genome-wide maps of hydroxyl radical cleavage patterns that would reveal the true structures present in any genome.
(47) Another method to map sites of DNA repair across a genome has been recently developed. This method intends to map sites of nucleotide excision repair (NER) in the human genome, as a tool to better understand mechanisms governing NER and to correlate defects in NER with mutations that cause disease..sup.11 However it requires two immunoprecipitation steps to enrich for the repair sites prior to deep-sequencing, so the method is applicable only to NER-related studies. The Nick-seq™ method would be immediately and directly applicable to the analysis of NER repair sites, without the need for immunoprecipitation steps.
(48) Another example of a method to map DNA damage across a genome involves the single-strand DNA (ssDNA)-associated protein immunoprecipitation followed by sequencing (SPI-seq)..sup.12 Here, sites of single-stranded DNA are enriched by immunoprecipitation of a DNA single-strand-binding protein. The sites of SS DNA are then defined by deep-sequencing. Again, the Nick-seq™ method can be applied to map regions of single-strand DNA in a genome without the need for the immunoprecipitation step that adds sequence noise as well as time and expense to the method.
(49) Thus, certain methods of this disclosure may be carried out in the absence of immunoprecipitation or other enrichment steps, in contrast to various existing methods.
(50) For DNA modifications that cannot be converted to strand-breaks, there are existing methods that allow, in some cases, mapping of the modifications across genomes, such as single-molecule, real-time (SMRT) sequencing technologyl.sup.13. SMRT sequencing has been applied for direct mapping of phosphorothioate DNA modifications across bacterial genomes..sup.13 However, SMRT sequencing requires specific instrumentation as well as highly specialized software programming skills to optimize the sequencing signal for a specific modification, with many modifications not revealed by SMRT. Phosphorothioate modifications represent an example of a modification that can be converted to a strand break site-specifically,.sup.13 with the single-strand breaks amenable to Nick-seq™ sequencing.
(51) In summary, the nick translation sequencing methodology provides a universal method for mapping single-strand breaks across genomes, with significant advantages in cost, time, resolution and sensitivity over existing methods.
(52) The commercial applications of the Nick-seq™ sequencing methodology are many. All commercial applications would derive from a basic kit containing α-thio-dNTPs, ddNTP such as ddCTP, buffers, and enzymes to allow blocking of existing nicks (if needed) and nick translation of nicks of interest. The PT-labeled DNA product of nick translation would then be subjected to deep-sequencing using any platform available to the user. For specialized applications that require processing of DNA features into strand-breaks, the kit would be accompanied by accessory kits containing specific enzymes and buffers, and/or detailed instructions for converting the feature into a nick-translatable strand break. For example, phosphorothioate modifications can be converted to single-strand breaks by treatment with iodine, with the results nicks readily mapped by Nick-seq™ sequencing..sup.13 As another example, an accessory kit with FAPY glycosylase (wide commercially available) would allow mapping of 8-oxoguanine and other purine DNA lesions across a genome by nick translation.
(53) Examples of applications of the Nick-seq™ kit include: m5C methylation—an accessory kit containing TET or TDG enzyme that specifically converts m5C to an abasic site along with the AP endonuclease that converts abasic sites to nick translation-amendable strand breaks. Other DNA modifications—accessory kits or instructions for chemical or enzymatic conversion of specific DNA modifications into single-strand breaks. DNA lesions—accessory kits containing specific enzymes that convert DNA damage products to single-strand breaks, as noted earlier for 8-oxoguanine. DNA repair mechanisms—as noted earlier, no accessory enzymes would be needed in many cases and the single-strand gaps and nicks resulting from the repair process could be mapped directly using the Nick-seq™ methodology. Accessory kits or instructions for cleaning up repair-induced single-strand gaps or nicks are described below. DNA single-strand breaks—in most cases, no accessory kit is needed. However, some types of DNA-damaging agents cause single-strand breaks that contain sugar residues that could block nick translation. Here an accessory kit could be provided containing enzymes that remove the sugar residues (widely commercially available), or instructions for using these enzymes to “clean up” the strand breaks. DNA structures—as described earlier, many DNA secondary structures can be converted to single-strand breaks by chemicals such as hydroxyl radicals or ionizing radiation. These nicks can then be directly nick translated using the basic kit. Chromatin structures—proteins binding to DNA in the nucleus of eukaryotic cells or in bacteria protect the DNA from nicking by many types of DNA-nicking enzymes, with the resulting nick sites in unprotected DNA providing a substrate for the Nick-seq™ kit. The genomic map of nick sites would reveal stretches of DNA lacking nicks, which represent the binding sites of proteins. Immunoprecipitation of specific proteins bound to DNA would reveal which proteins are associated with the nick-free “footprints” revealed by nick translation. However, the immunoprecipitation results would not provide the level of resolution needed to define the precise binding site, which is the power of the Nick-seq™ method. Identifying an organism whose genome contains a specific modification—studies of DNA modifications in individual microbes in complex mixtures of microorganisms, such as the gut microbiome with 1000's of bacteria and viruses, are facilitated by the Nick-seq™ methodology. For example, a large portion of the bacteria in the human gut microbiome contain phosphorothioate modifications of their genomes. However, there is not technology available to identify the organisms actually containing the modification. The Nick-seq™ sequencing methodology allows labeling of single-strand breaks resulting from iodine treatment of the modified DNA, with subsequent sequences of the labeled DNA useful for defining the genus, species and strain of the bacterium by overlying the sequences on the organism's genome.
(54) The kits are applicable to a wide variety of genomic studies in diverse areas of biology, biomedical research and biotechnology, including genetics, genomics, molecular and cell biology, microbiology, biotechnology, medicine, and clinical research, toxicology, pharmacology and other areas. Research and development in nearly every type of human disease involves genome-wide analyses of DNA damage and repair, modifications and chromatin structures and would benefit from the methodologies provided herein.
EXAMPLES
Example 1. Analysis of Standard Mixtures Containing Oligos of Varying Lengths and Abundances
(55) In the first example, five mixtures consisting of five RNA oligos between 25 and 80 nucleotides in length spiked at different abundances were created. Each mixture was then split into three technical replicates. All the samples were further spiked with a defined quantity of a 80-mer synthetic RNA oligonucleotide as an internal standard.
(56) The RNA samples were then subjected to the series of reactions shown in
(57) Step #1, Dephosphorylate the RNA
(58) A mixture of tRNA (40 ng; ˜2 pmol), 50-mer RNA internal standard, NEB T4 RNA ligase buffer (0.5 μL; New England Biolabs), shrimp alkaline phosphatase (1 μL, New England Biolabs) and water in 5 μL was incubated at 37° C. for 30 minutes and the reaction stopped by heat inactivation at 65° C. for 5 minutes. RNA denaturation was maintained by holding the samples on ice for the next step.
(59) Step #2, Ligate Linker 1 to the RNA
(60) To the 10 μL dephosphorylation mixture from Step #1 was added 10 μL of a master mix of 1 μL of Linker 1 (100 pmol/μL; sequence in
(61) Step #3, Reduce the Level of RNA Modifications
(62) The 20 μL of linker-ligated tRNA from Step #2 was mixed with 50 μL of freshly-prepared, 2×-concentrated optimized AlkB buffer mixture (150 μM 2-ketoglutarate, 4 mM L-ascorbic acid, 150 μM (NH.sub.4).sub.2Fe(SO.sub.4).sub.2 6H.sub.2O, 100 μg/mL BSA, 100 mM HEPES, pH8; slight purple color that turns brown over time), 2 μL of AlkB enzyme (ArrayStar), 1 μL RNase Inhibitor (NEB) and 27 μL of water. The reaction was incubated at ambient temperature for 2 h, followed by denature of AlkB by heating at 65° C. for 5 minutes AlkB protein was removed by extraction with 100 μL of phenol:chloroform:isoamyl alcohol 25:24:1, pH 5.2. The aqueous layer (˜90 μL) was washed once with 100 μL of chloroform. The RNA in the aqueous layer (˜75 μL) was purified using the Zymo kit noted earlier, with the RNA eluted into 16-20 μL of water.
(63) Step #4, Remove Excess Linker 1
(64) The purified RNA from Step #3 was first treated with deadenylase to remove the 5′-adenylpyrophosphoryl group remaining on Linker 1 as an intermediate from the ligase reaction. Removal of the 5′-adenylation is necessary for Rec J-mediated degradation of the Linker 1. The deadenylation reaction consisted of the 16-20 μL of linker-ligated RNA product from Step #3, 2 μL of NEB Buffer #2 and 2 μL of 5′-deadenylase (NEB). The reaction was allowed to proceed at 30° C. for 1 hour. Linker 1 in this mixture was then hydrolyzed by adding 2 μL of Rec J enzyme (30 U/μL; NEB), incubating at 37° C. for 30 minutes, adding another 2 μL of RecJ and incubating again at 37° C. 30 minutes. RNA was purified using a Dyex Kit (Qiagen) according to manufacturer's instructions, with the final eluted RNA reduced to 24 μL under vacuum.
(65) Step #5, Reverse Transcription of the Linker-Ligated RNA
(66) The purified linker-ligated RNA from Step #4 was next subjected to reverse transcription to create a cDNA copy. The key feature here is the use of an oligodeoxynucleotide primer (RT primer, reverse complementary to Linker 1) possessing 6 phosphorothioate linkages at the 5′-end (
(67) Step #6, Remove the RNA by Alkaline Hydrolysis and Purify cDNA
(68) The cDNA was purified by first hydrolyzing the RNA by adding 1 μL of 5 M NaOH to 25 μL of the mixture from Step #5, with heating to 90-95° C. for 3 minutes. After cooling to ambient temperature, the pH was adjusted to 7 by adding 1 μL of 5 M HCl. The cDNA was then purified using the Zymo kit noted earlier, with the eluted cDNA evaporated to dryness under vacuum. Stopping point: the cDNA can be stored at −20° C. for at least one week.
(69) Step #7, Ligate Linker 2 to the cDNA
(70) The design of Linker 2 is shown in
(71) Step #8, Remove Excess Linker 2
(72) The remainder of Linker 2 was removed with by deadenylation and Rec J treatment as noted earlier, to reduce ligation artifacts during Step #9. The reaction starts with deadenylation 5′-adenylated Linker 2 (16 μL ligation mixture from Step #7, 2 μL of NEB Buffer 2, and 2 μL of 5′-deadenylase (add separately; no master mix). Following 1 hour incubation at 30° C., 2 μL of RecJ (30 U/μL) was added, the sample incubated for 30 minutes at 37° C., another 2 μL of RecJ was added with an additional 30 minutes incubation at 37° C. If Clonetech Polymerase is used in Step #9, then the sample (24 μL) can be proceed directly to the PCR reaction. If a Q5 PCR kit is used in Step #9, then change the buffer conditions by Dyex kit purification of the ligated DNA, with the eluted DNA evaporated to dryness and resuspended in 17 μL of water.
(73) Step #9, PCR Attachment of Standard Illumina Primers
(74) The final step involves attachment of standard Illumina PCR primers as shown in
(75) Following this series of reactions in Steps #1-#9, the sample was sequenced on the Illumina platform and the data mined using standard alignment workflow. The number of read counts for each RNA oligo standard and for the 80-mer internal standard were quantified. As shown in
Example 2. Analysis of Equimolar microRNA Standards
(76) The second example involves using a commercially-available mixture of synthetically derived microRNA standards to determine the extent of sequence-dependent biases on quantification. The Miltenyi miRXplore universal reference contains 963 microRNA sequences that range from 16 to 29 nucleotides in length. The oligos are mixed together in an equimolar fashion. This sample represents a highly diverse pool of RNA sequences and the abundance of each sequence relative to any other should be 1. After applying the RNA sequencing method (Steps #1-#9), the ratio of the normalized read count to the expected read count was calculated for each standard.
Example 3. Analysis of Starvation-Induced Changes in Small RNA Species in M. bovis BCG
(77) The third example of a reduction to practice for the RNA sequencing method of this disclosure involves analysis of the behavior of all small RNA species (<200 nt) in M. bovis BCG, a surrogate for the tuberculosis-causing M. tuberculosis, subjected to the stress of nutrient deprivation. Samples of small RNA species were isolated from BCG during growth in nutrient-rich medium (S0), on days 4, 10 and 20 after growth in nutrient-free phosphate-buffered saline (S4-S20), and on day 6 after returning the bacteria to nutrient-rich medium (resuscitation, R6). The RNA was processed for the RNA sequencing method in Steps #1-#9 described earlier and the resulting linker-ligated cDNA subjected to Illumina sequencing. As shown in
REFERENCES
(78) 1. Gu C, Begley T J, Dedon P C. tRNA modifications regulate translation during cellular stress. FEBS Lett. 2014; 588(23):4287-96. PMCID: 4403629. 2. Phizicky E M, Hopper A K. tRNA biology charges to the front. Genes Dev. 2010; 24(17):1832-60. PMCID: 2932967. 3. Zhang Z, Lee J E, Riemondy K, Anderson E M, Yi R. High-efficiency RNA cloning enables accurate quantification of miRNA expression by deep sequencing. Genome Biol. 2013; 14(10):R109. PMCID: PMC3983620. 4. Hafner M, Renwick N, Brown M, Mihailovic A, Holoch D, Lin C, Pena J T, Nusbaum J D, Morozov P, Ludwig J, Ojo T, Luo S, Schroth G, Tuschl T. RNA-ligase-dependent biases in miRNA representation in deep-sequenced small RNA cDNA libraries. RNA. 2011; 17(9):1697-712. PMCID: PMC3162335. 5. Linsen S E, de Wit E, Janssens G, Heater S, Chapman L, Parkin R K, Fritz B, Wyman S K, de Bruijn E, Voest E E, Kuersten S, Tewari M, Cuppen E. Limitations and possibilities of small RNA digital gene expression profiling. Nat Methods. 2009; 6(7):474-6. 6. Pang Y L, Abo R, Levine S S, Dedon P C. Diverse cell stresses induce unique patterns of tRNA up- and down-regulation: tRNA-seq for quantifying changes in tRNA copy number. Nucleic Acids Res. 2014; 42(22):e170. PMCID: 4267671. 7. Cai W M, Chionh Y H, Hia F, Gu C, Kellner S, McBee M E, Ng C S, Pang Y L, Prestwich E G, Lim K S, Babu I R, Begley T J, Dedon P C. A Platform for Discovery and Quantification of Modified Ribonucleosides in RNA: Application to Stress-Induced Reprogramming of tRNA Modifications. Methods Enzymol. 2015; 560:29-71. PMCID: PMC4774897. 8. Zheng G, Qin Y, Clark W C, Dai Q, Yi C, He C, Lambowitz A M, Pan T. Efficient and quantitative high-throughput tRNA sequencing. Nature Methods. 2015; 12:835-7. 9. Tate C M, Nunez A N, Goldstein C A, Gomes I, Robertson J M, Kavlick M F, Budowle B. Evaluation of circular DNA substrates for whole genome amplification prior to forensic analysis. Forensic Sci Int Genet. 2012; 6(2):185-90. 10. Chiu T-P, Yang L, Zhou T, Main B J, Parker S C J, Nuzhdin S V, Tullius T D, Rohs R. GBshape: a genome browser database for DNA shape annotations. Nucleic Acids Research. 2014. 11. Li W, Hu J, Adebali O, Adar S, Yang Y, Chiou Y Y, Sancar A. Human genome-wide repair map of DNA damage caused by the cigarette smoke carcinogen benzo[a]pyrene. Proc Natl Acad Sci USA. 2017; 114(26):6752-7. PMCID: PMC5495276. 12. Zhou Z X, Zhang M J, Peng X, Takayama Y, Xu X Y, Huang L Z, Du L L. Mapping genomic hotspots of DNA damage by a single-strand-DNA-compatible and strand-specific ChIP-seq method. Genome Res. 2013; 23(4):705-15. PMCID: PMC3613587. 13. Cao B, Chen C, DeMott M S, Cheng Q, Clark T A, Xiong X, Zheng X, Butty V, Levine S S, Yuan G, Boitano M, Luong K, Song Y, Zhou X, Deng Z, Turner S W, Korlach J, You D, Wang L, Chen S, Dedon P C. Genomic mapping of phosphorothioates reveals partial modification of short consensus sequences. Nat Commun. 2014; 5:3951. PMCID: 4322921.
EQUIVALENTS
(79) While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.
(80) All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
(81) All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.
(82) The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
(83) The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
(84) As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.
(85) As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
(86) It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
(87) In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.