HIGHLY SENSITIVE IN VITRO ASSAYS TO DEFINE SUBSTRATE PREFERENCES AND SITES OF NUCLEIC-ACID BINDING, MODIFYING, AND CLEAVING AGENTS
20210155984 · 2021-05-27
Inventors
- J. Keith Joung (Winchester, MA)
- Vikram Pattanayak (Wellesley, MA, US)
- Karl Petri (Cambridge, MA, US)
- Kanae Esther Sasaki (Somerville, MA, US)
Cpc classification
C12N9/22
CHEMISTRY; METALLURGY
C12N15/1058
CHEMISTRY; METALLURGY
C12Q2565/525
CHEMISTRY; METALLURGY
C12Q2565/531
CHEMISTRY; METALLURGY
C12Q2565/531
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C12Q1/6874
CHEMISTRY; METALLURGY
International classification
Abstract
Methods and compositions for performing highly sensitive in vitro assays to define substrate preferences and off-target sites of nucleic-acid binding, modifying, and cleaving agents.
Claims
1.-15. (canceled)
16. A method of selecting for double stranded DNA sequence(s) that are bound by a DNA-binding domain, the method comprising: (i) providing a plurality of linear dsDNA oligonucleotides of known sequences; (ii) incubating the plurality of linear dsDNA oligonucleotides in the presence of the DNA binding domain affinity-tagged with an affinity tag that can be bound to a substrate molecule under conditions sufficient for binding of the affinity-tagged DNA-binding domain and one or more of the plurality of linear dsDNA oligonucleotide(s) to occur, thereby creating bound linear dsDNA oligonucleotide(s); and (iii) selecting for bound linear dsDNA oligonucleotide(s), thereby creating selected linear dsDNA oligonucleotide(s).
17. A method of identifying double stranded DNA sequence(s) that are bound by a DNA-binding domain, the method comprising determining the sequence(s) of the selected linear dsDNA oligonucleotide(s) produced by the method of claim 16, thereby identifying double stranded DNA sequence(s) that are bound by the DNA-binding domain.
18. A method of enriching for double stranded DNA sequence(s) that are bound by a DNA-binding domain, the method comprising: (i) incubating the plurality of selected linear dsDNA oligonucleotides produced by the method of claim 16 in the presence of the DNA binding domain affinity-tagged with an affinity tag that can be bound to a substrate molecule under conditions sufficient for binding of the affinity-tagged DNA-binding domain and one or more of the plurality of selected linear dsDNA oligonucleotide(s) to occur, thereby creating bound selected linear dsDNA oligonucleotide(s); and (ii) selecting for bound selected linear dsDNA oligonucleotide(s), thereby creating enriched linear dsDNA oligonucleotide(s).
19. A method of identifying double stranded DNA sequence(s) that are bound by a DNA-binding domain, the method comprising determining the sequence(s) of the enriched linear dsDNA oligonucleotide(s) produced by the method of claim 16, thereby identifying double stranded DNA sequence(s) that are bound by the DNA-binding domain.
20. The method of claim 16, wherein the affinity-tagged DNA-binding domain is a Cas9 protein complexed with a sgRNA, a variant of a Cas9 protein complexed with a sgRNA, a Cas9 fusion protein complexed with a sgRNA, or a variant of a Cas9 fusion protein complexed with a sgRNA.
21. The method of claim 20, wherein the sgRNA targets a site selected from the group consisting of EMX1, FANCF, HBB, HEK2, HEK3, HEK4, RNF2, ABE14, ABE16, ABE18, and VEGFA3.
22. The method of claim 16, wherein the affinity-tagged DNA-binding domain is inactivated Cas9 (dCas9) complexed with a sgRNA.
23. The method of claim 16, wherein the affinity-tagged DNA-binding domain is an engineered zinc finger array.
24. The method of claim 16, wherein the affinity-tagged DNA-binding domain is an engineered TALE repeat array.
25. The method of claim 16, wherein the substrate molecule is a magnetic bead carrying a molecule that binds to the affinity tag.
26. The method of claim 16, wherein the affinity tag is a molecule that can be covalently bound to benzylguanine and the substrate molecule is a benzylguanine-carrying substrate molecule.
27. The method of claim 16, wherein said selecting comprises: (i) incubating the bound linear dsDNA oligonucleotide(s) under conditions sufficient for binding of the affinity tag to the substrate molecule, thereby creating substrate bound linear dsDNA oligonucleotide(s); (ii) separating the substrate bound linear dsDNA oligonucleotide(s) from unbound linear dsDNA oligonucleotide(s); and (ii) eluting the substrate bound linear dsDNA oligonucleotide(s) in either (a) an appropriate buffer to promote dissociation of the substrate bound linear dsDNA oligonucleotide(s) or (b) a buffer containing a protease under conditions effective to degrade bead-bound protein and release substrate bound linear dsDNA oligonucleotide(s), thereby creating selected linear dsDNA oligonucleotides.
28. The method of claim 27, wherein the protease is proteinase K.
29. The method of claim 18, wherein said enriching comprises: (i) incubating the bound selected linear dsDNA oligonucleotide(s) under conditions sufficient for binding of the affinity tag to the substrate molecule, thereby creating substrate bound selected linear dsDNA oligonucleotide(s); (ii) separating the substrate bound selected linear dsDNA oligonucleotide(s) from unbound linear dsDNA oligonucleotide(s); and (ii) eluting the substrate bound selected linear dsDNA oligonucleotide(s) in either (a) an appropriate buffer to promote dissociation of the substrate bound linear dsDNA oligonucleotide(s) or (b) a buffer containing a protease under conditions effective to degrade bead-bound protein and release substrate bound linear dsDNA oligonucleotide(s), thereby creating enriched linear dsDNA oligonucleotides.
30. The method of claim 16, wherein the linear dsDNA oligonucleotides comprise 16 to 10.sup.8 different sequences.
31. The method of claim 16, wherein the linear dsDNA oligonucleotides comprise sequences that are 50 to 500 bp long.
32. The method of claim 16, wherein the linear dsDNA oligonucleotides comprise potential DNA substrate sequences comprising: (i) a set of all potential off-target sequences for the cytidine deaminase base editing enzyme in a reference genome bearing up to a certain number of substitutions, single base pair deletions, and/or single base pair insertions relative to an identified on-target site for the cytidine deaminase base editing enzyme; (ii) a comprehensive set of all potential off-target sequences for the cytidine deaminase base editing enzyme bearing up to a certain number of substitutions, single base pair deletions, and/or single base pair insertions relative to an identified on-target site for the cytidine deaminase base editing enzyme; (iii) a set of potential off-target sequences for the cytidine deaminase base editing enzyme present in a set of variant genomes from defined populations bearing up to a certain number of substitutions, single base pair deletions, and/or single base pair insertions relative to an identified on-target site for the cytidine deaminase base editing enzyme; (iv) a set of all potential off-target sequences for the cytidine deaminase base editing enzyme in the coding sequence of a reference genome bearing up to six mismatches relative to an identified on-target site for the cytidine deaminase base editing enzyme; or (v) a set of all potential off-target sequences for the cytidine deaminase base editing enzyme in the sequence of an oncogene hotspot and/or tumor suppressor gene of a reference genome bearing up to a certain number of substitutions, single base pair deletions, and/or single base pair insertions relative to an identified on-target site for the cytidine deaminase base editing enzyme.
Description
DESCRIPTION OF DRAWINGS
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
DETAILED DESCRIPTION
[0047] In vitro/biochemical strategies to understanding on- and off-target activity of DNA binding domains generally fall into two types (
[0048] Both types of strategies to study off-target activity have limitations that affect their abilities to identify bona fide off-target sites. In genome-wide selections, the tens to hundreds of cleaved off-target sites must be enriched from a background of billions of other sites that are not cleaved (the human genome has a length of ˜3 billion base pairs and therefore contains ˜6 billion sites to be assayed). For example, due to noise in the enrichment method and in the sequencing results, the CIRCLE-seq method is limited to detection of sites that have no more than six mismatches relative to the on-target site, which represents only ˜0.002% of the genomic material present in the assay. While some methods, such as Digenome-seq, rely on massive over sequencing of nuclease treated DNA libraries, methods like CIRCLE-seq and GUIDE-seq typically incorporate an enrichment step for edited sequences. This enrichment step can be performed in cells (GUIDE-seq) or in vitro (CIRCLE-seq). Although it is substantially more sensitive than other methods for off-target screening, the CIRCLE-seq method requires a very large input of genomic DNA (25 μg) for each experimental sample.
[0049] In vitro selections on unbiased base substitution libraries are limited by library size (the set of sequences that can practically be assayed). For example, an SpCas9 target site contains 22 potentially-specified base pairs (20 from hybridization to the guide RNA and two from the PAM sequence). To assay all potential target sites bearing all possible combinations of base substitutions at all positions, at least 4.sup.22˜10.sup.13 unique molecules of DNA, would need to be generated and interrogated, neither of which is possible using current technologies. For example, library construction methodologies are currently limited to producing 10.sup.11-10.sup.12 unique molecules of DNA. Furthermore, even if library construction methodologies were improved, it is not feasible to sequence 10.sup.12 molecules of DNA. To overcome this restriction, doped oligonucleotide synthesis is traditionally used to create a library of sites bearing base substitutions that follow a binomial distribution, such that the on-target site is present in more copies than each variant site in the library bearing a single mutation, each of which are present in more copies than each variant site in the library bearing double variant site, and so forth. Therefore, selections performed with these random base substitution libraries are limited by the fact that 1) it is not possible to create a completely unbiased library (i.e., they are heavily biased towards the intended on-target site sequence) and 2) it is not possible to create a library that uniformly represents the potential sequence space. Furthermore, using the outputs from defined libraries assays to predict or identify off-target sites in genomic sequences often requires extrapolation (Sander et al. Nucleic Acids Res. 41: e181 (2013)), because not all relevant genomic sequences are guaranteed to be covered in pre-selection (limited to 10.sup.12 sequences, which corresponds to six or seven substitutions) or post-selection libraries (limited to 10.sup.7-8 sequences by sequencing capacity).
[0050] Methods of Identifying DNA Binding, Modification, or Cleavage Sites Herein, we provide improved methods (
[0051] The pre-enriched linear DNA library members are initially synthesized on high-density oligonucleotide arrays as individual single-stranded DNA sequences, each bearing a unique identifier/barcode, which is present/duplicated on both sides of the oligonucleotide (
[0052] The ability to define identical barcodes flanking a defined recognition site represents a significant advance over previous in vitro profiling methods (U.S. Pat. Nos. 9,322,006, 9,163,284), because the sequence of the library member is encoded in at least three locations on each individual member of a DNA pool. This redundancy of information is particularly advantageous when seeking to define DNA modifying activity (such as base editing) where the target sequence is modified. The original sequence information can be obtained from the information content contained in a flanking barcode, even if the actual DNA sequence of the library member itself is modified. The redundancy of information in two barcodes and a recognition site also allows for an endonuclease cleavage selection (or paired base modification+cleavage selection) to be performed on potential cleavage sites that are present in a single copy per library member, as opposed to multiple copies (U.S. Pat. Nos. 9,322,006, 9,163,284). Without the present barcoding strategy, sequences of library members that get cleaved within a recognition site cannot be reassembled, since the cut separates in space the two sides of the cut site (above figure, bottom right, blue region).
EXAMPLES
[0053] The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
[0054] Target sites used in following examples:
TABLE-US-00001 target name Sequence (5′ -> 3′) SEQ ID NO: EMX1 GAGTCCGAGCAGAAGAAGAAGGG 22 RNF2 GTCATCTTAGTCATTACCTGAGG 42 FANCF GGAATCCCTTCTGCAGCACCTGG 43 HBB TTGCCCCACAGGGCAGTAACGG 44 HEK2 (HEK293_2) GAACACAAAGCATAGACTGCGGG 45 HEK3 (HEK293_3) GGCCCAGACTGAGCACGTGATGG 46 HEK4 (HEK293_4) GGCACTGCGGCTGGAGGTGGGGG 47 ABE14 (ABE_14) GGCTAAAGACCATAGACTGTGGG 48 ABE16 (ABE_16) GGGAATAAATCATAGAATCCTGG 49 ABE18 (ABE_18) ACACACACACTTAGAATCTGTGG 50 VEGFA3 (VEGFA_3) GGTGAGTGAGTGTGTGCGTGTGG 51
Example 1: DNA Cleavage Selection with SpCas9 and SpCas9-HF1
[0055] In this example, a random base substitution library designed for an SpCas9 nuclease programmed with a guide RNA (gRNA) designed against an on-target site in the human EMX1 gene (hereafter referred to as the EMX1 gRNA and EMX1 target site) and a library of potential EMX1 gRNA off-target sites from the human reference genome were selected for cleavage with SpCas9 or SpCas9-HF1.
[0056] In this example (
[0057] A selection performed using strategy 1 on the random base substitution library with a 1:1:1 ratio of SpCas9:sgRNA:DNA library (EMX1 target sites) demonstrated enrichment of sequences that could be cleaved (
[0058] A screen performed using strategy 2 with the random base substitution library yielded similar results (
[0059] Genomic libraries are generally composed of all potential off-target sites in the hg19 reference human genome that had zero to six mismatches relative to the on-target sequence, up to four mismatches in combination with a DNA bulge of one or two nucleotides, and up to four mismatches with an RNA bulge of one nucleotide, and up to three mismatches with an RNA bulge of two nucleotides (
[0060] Selections were performed using strategy 1 (referred to as ONE-seq) with genomic DNA-inspired libraries for six non-promiscuous guide RNAs (HBB, RNF2, HEK2, HEK3, FANCF, and EMX1) with relatively few expected off-target sequences. On target sequences (
[0061] In addition, this method can be generalized to any library/defined set of nucleic acid sequences. For example, using publicly available data from the 1000 genomes project, ONE-seq selections were performed on an EMX1 genomic off-target site library that accounts for naturally occurring sequence variation on a population scale. In this example library, all sequences from the reference hg19 human genome assembly that were in the original EMX1 library (
Example 2: Base Editor Screen with BE1
[0062] In this example (
[0063] A base editor screen following the protocol above with BE1 was applied to an EMX1 target site and the substrate profiling library yielded enrichment of an expected profile of tolerated off-target sites (
Example 3: Base Editor Selection with BE3
[0064] In this example (
[0065] Using this approach, we examined BE3 targeting with genomic DNA-inspired libraries for eight target sites, including all seven BE3 targets tested previously by Digenome-seq (Kim et al. Nat. Biotech. 35:475, 2017). ONE-seq selection results revealed enrichment of the intended target sites to the top 13 of tens of thousands of library members for all eight selections (
Example 4: Base Editor Selection with ABE
[0066] In this example (
[0067] Selection of the base substitution library demonstrates enrichment of substrates with an NGG. In addition, as expected, this experiment (
TABLE-US-00002 TABLE 1 Enrichment of sequences with an A in position 5 in the ABE selection. Number of times observed First five nucleotides out of the top 100 most of post-selection enriched post-selection library member library members GAGTA 83 AAGTA 12 GAGTC (canonical first 3 five nucleotides) GAAGT 1 GGAGT 1
[0068] We have also performed the above selection on the EMX1 genomic DNA library (Table 2), which demonstrates enrichment of the EMX1 on-target site (highlighted; 96.sup.th most abundant post-selection library sequence) and the EMX1 off-target site with the highest off-target recognition (bold and asterisk; 9.sup.th most abundant post-selection).
TABLE-US-00003 TABLE 2 Top 96 most-enriched sites in the post-selection library for an ABE selection on a genomic DNA library of potential EMX1 off-target sites. SEQ ID chromosome location target NO: chr4 33321459 GTACAGGAGCAGGAGAAGAATGG 52 chr17 72740376 CAAACGGAGCAGAAGAAGAAAGG 53 chr10 58848711 GAGCACGAGCAAGAGAAGAAGGG 54 chr10 128080178 GAGTACAAGCAGATGAAAAACGG 55 chr6 99699155 GAGTTAGAGCAGAGGAAGAGAGG 56 chr7 141972555 AAGTCCGGGCAAAAGAGGAAAGG 57 chr19 24250496 GAGTCCAAGCAGTAGAGGAAGGG 58 chr11 111680799 CAGTAGTGAGCAGAAGAAGATAGG 59 chr5 45359060* GAGTTAGAGCAGAAGAAGAAAGG 60 chr7 17446431 GTCCAAGAGCAGGAGAAGAAGGG 61 chr12 106646073 AAGTCCATGCAGAAGAGGAAGGG 62 chr15 22366604 GGAGTAGAGCAGAGGAAGAAGGG 63 chr10 109561613 GGAACTGAGCAAAAGAAGATAGG 64 chr11 62365266 GAATCCAAGCAGAAGAAGAGAAG 65 chr2 21489994 GCGACAGAGCAGAAGAAGAAGGG 66 chr1 234492858 GAAGTAGAGCAGAAGAAGAAGCG 67 chr2 218378101 GAGTCTAAGCAGGAGAATAAAGG 68 chr18 32722283 TGTCCAGAGCAGATGAAGAATGG 69 chr22 22762518 GAACATGAGCAGAAGAAGAGGAG 70 chr11 34538379 AGGCCAGAGCAAAAGAAGAGAGG 71 chr11 106142352 GTACAAGAGCAGGAGAAGAAGGG 72 chr15 91761953 GAGTCAGGGCAGAAGAAGAAAAT 73 chr4 87256685 GAGTAAGAGAAGAAGAAGAAGGG 74 chr4 21141327 AAGCCCGAGCAGAAGAAGTTGAG 75 chr8 128801241 GAGTCCTAGCAGGAGAAGAAGAG 76 chr7 106584579 GAGGGGAGCAAAAGAAGGAGGG 77 chr1 117139004 CAGGGAGAGCAAAAGAAGAGAGG 78 chr1 231750724 GAGTCAGAGCAAAAGAAGTAGTG 79 chr15 44109746 GAGTCTAAGCAGAAGAAGAAGAG 80 chr21 23586410 CAGGGAGAAGAAGAAGAAGGG 81 chr7 2127682 GAGTTAGAGAAGAAGAAGACTGG 82 chr10 98718174 ACAATCGAGCAGCAGAAGAATGG 83 chr1 221020698 GAGTAGGAGCAGATGAAGAGAGG 84 chr9 115729750 CAGTATGAGCAAAAGAAGAAAGA 85 chr11 102753237 GAGTCCATACAGAGGAAGAAAAG 86 chr1 48581991 GAATGAGCAAAAGAAGAAAGC 87 chr12 73504668 GAGTTAGAGCAGAAAAAAAATGG 88 chr1 184236226 AATACAGAGCAGAAGAAGAATGG 89 chr11 119322554 TAGTGAGCAGAAGAAGAGAGA 90 chr1 151027591 TTCTCCAAGCAGAAGAAGAAGAG 91 chr11 68772640 GAGTCCATACAGGAGAAGAAAGA 92 chr2 9821536 AGGTGGGAGCAGAAGAAGAAGGG 93 chr2 54284994 AAGGCAGAGCAGAGGAAGAGAGG 94 chr1 99102020 GAGGCACAAGCAAAAGAAGAAAAG 95 chr19 1438808 GAAGTAGAGCAGAAGAAGAAGCG 96 chr2 73160981 GAGTCCGAGCAGAAGAAGAAGGG 22
[0069] The two sequences highlighted are the most active cleavage off-target site (chr5: 45359060), asterisked, and the on-target site (chr2: 73160981). It is expected for the off-target site to be more enriched in the selection due to presence of an A in a more favorable position in the editing window.
[0070] We have additionally performed the above selection on genomic DNA libraries designed to identify off-target sequences of six guide RNAs (
Example 5: Base Editor Selections with ABE or BE3 Using an Enzyme that Creates a Double-Strand Break at Positions that have been Modified
[0071] In this example, modified library members containing a deoxyinosine could be made to have blunt, double-stranded ends through the action of the TkoEndoMS protein (Ishino et al, Nucleic Acids Res. 44: 2977 (2016)). TkoEndoMS can be used to create a double-strand breaks at the dI:dT base pairs that result from dA->dI editing by ABE. DNA with a double strand break is then subject to the same downstream steps as in Example 1, with ligation of adapters to phosphorylated, blunt ended DNA if a base editing enzyme without nicking activity is used. If a base editing enzyme with nicking activity is used, end polishing with a blunt-end creating DNA polymerase (such as T4 or Phusion), such as in Examples 4 and 5, is used to allow for enrichment of both sides of a cut library member.
[0072] We have demonstrated that TkoEndoMS can also create double-strand breaks at dG:dU mismatched base pairs that result from dC->dU editing (in this example by BE1), demonstrating its additional applicability to BE1, BE3 and other enzymes that cause dC->dU changes after DNA binding (see U.S. Ser. No. 62/571,222 and
Example 6: Enrichment of DNA Binding Sites by Pulldown
[0073] SELEX (selective evolutions of ligands by exponential enrichment) has been used to define the DNA-binding specificity of DNA-binding domains (originally by Oliphant et al., Mol Cell Biol. 9: 2944, 1989). In the SELEX method, libraries of randomized DNA sequences are subjected to multiple rounds of pulldown and enrichment with an immobilized DNA binding domain of interest to identify the sequences in the initial pool that can bind to a DNA of interest. The SELEX method has been applied to the zinc finger and TALE moieties of ZFNs (Perez et al., Nat Biotech. 26: 808 (2008)) and TALENs (Miller et al. Nat Biotech. 29: 143 (2011)), however, there are no reports of SELEX studies on Cas9 proteins. We speculated that SELEX studies on Cas9 proteins are difficult due to the need to selectively enrich a 22 base pair target site from a large library, which would have to contain >10.sup.13 unique molecules, or at minimum 10.sup.12 molecules, corresponding to a 20 base pair target site, if an NGG PAM is fixed.
[0074] In this example, we took advantage of pre-enriching our pre-selection libraries for sites that are most likely to be bound by a given Cas9:sgRNA complex (or other DNA-binding domain with predictable binding motifs). We assessed Cas9 DNA binding preferences and specificity by performing sequential rounds of DNA pull down experiments on the pre-enriched libraries (
Example 7: Homing Endonuclease Selections
[0075] Homing endonucleases, such as I-Ppol, represent a group of naturally occurring nucleases that have longer base recognition motifs than the majority of restriction enzymes. Though homing endonucleases (also called meganucleases) do not have specificities that can be easily reprogrammed, if they target a genomic sequence of interest, they could be of research, commercial, or clinical use. Here, we show that we could adapt our in vitro selection to analyze the specificity profile of the I-Ppol homing endonuclease. We created an unbiased library of potential I-Ppol off-targets including all sites with up to 3 mismatches and single DNA/RNA bulges. The I-Ppol library contained 15533 members. I-Ppol selections enriched 501 of the 15533 library members (Table 3) while the intended, on-target site was ranked close to the top of the selection (28 out of 15533). Sequences with one mismatch or one insertion were the most enriched library members. Analysis of mismatch positions among top scoring I-Ppol off-target candidates revealed that certain positions within the recognition motif were more important for I-Ppol cleavage than others (
TABLE-US-00004 TABLE 3 Top 30 most-enriched sites in the post-selection library for I-PpoI on a unbiased DNA library. Found Found Alignment target # seqs_cleaved seqs_cleaved_rmv 1_0_1 CTATCTTAAGGTAGTC 97. 1507 1459 1_0_1 ACTCTCTTAAGGTAGC 98. 1329 1294 1_0_1 CTATCTTAAGGTAGCC 99. 1264 1235 3_0_0 CTACCTTAAGGTAGT 100. 1100 1071 3_0_0 CTACCTTAAGGGAGC 101. 1017 989 2_0_0 CTATCTTAAGGGAGC 102. 967 951 2_0_0 CTCCCTTAAGGGAGC 103. 960 923 1_0_1 CTATCTTAAGGTAGGC 104. 947 919 1_0_1 CTCTCTTAAGGGAGCC 105. 920 896 1_0_1 CTCTCTTAAGGTAGCT 106. 913 883 2_0_0 CTCCCTTAAGGTAGT 107. 885 866 0_0_1 CTCTCTTAAGGTAGTC 108. 865 842 1_0_1 CTCTCTTAAGATAGCC 109. 858 836 2_0_0 CTACCTTAAGGTAGC 110. 829 799 1_0_1 CTCCCTTAAGGTAGTC 111. 781 765 1_0_1 CTCTCATAAGGTAGTC 112. 744 724 1_0_1 CTCTCATAAGGTAGCC 113. 744 722 1_0_1 CTCTGTTAAGGTAGTC 114. 729 710 3_0_0 CTCCCTTAAGAGAGC 115. 732 702 1_0_1 CTCCCTTAAGGTAGCC 116. 713 694 1_0_1 CTCCCTTAAGGTAGAC 117. 709 689 1_0_0 CTCTCTTAAGGTAGT 118. 679 670 2_0_0 CTATCTTAAGGTAGT 119. 687 670 1_0_0 CTCTCTTAAGGGAGC 120. 673 653 3_0_0 CTATCTTAAGGGAGT 121. 651 639 1_0_1 CTCTGTTAAGGTAGCC 122. 650 636 0_0_1 CTCTCTTAAGGTAGGC 123. 650 633 0_0_0 CTCTCTTAAGGTAGC 124. 643 628 1_0_0 CTATCTTAAGGTAGC 125. 629 620 2_0_0 CTCTCTTAAGAGAGC 126. 634 620
#, SEQ ID NO:
[0076] Mostly closely matched off-target candidates were enriched to the top of the selection. However, the selections demonstrated that I-Ppol off-target candidates are exitent in abundance
Methods: Library Generation
[0077] Oligonucleotide library synthesis on high density chip arrays were purchased from Agilent.
Substrate Profiling Library:
[0078] 1) An oligonucleotide backbone was developed that had 50% GC content and no potential canonical PAM sequences (NGG for S. pyogenes Cas9).
2) 13-14 base pair barcodes were generated that were at least two substitutions away from all other barcodes, were 40-60% GC, and did not contain any canonical PAM sequences for the minimally unbiased libraries:
3) potential off-target sites were generated for all possible combinations of substitutions, insertions, and deletions for an SpCas9 target site (this can be variable):
TABLE-US-00005 single base pair single base pair substitutions deletions insertions <=3 0 0 <=1 1 0 0 2 0 <=1 0 1
4) barcodes/potential off-target sites for all i off target sites (I˜50,000) were combined into the backbone:
TABLE-US-00006 (SEQ ID NO: 127) GACGTTCTCACAGCAATTCGTACAGTCGACGTCGATTCGTGCT (barcode.sub.i)TTTGACATTCTGCAATTGCACACAGCGT (potential_off_target_site.sub.i)TGCAGACTGTAAG TATGTATGCTTCGCGCAGTGCGACTTCGCAGCGCATCACTTCA (barcode.sub.i)AGTAGCTGCGAGTCTTACAGCATTGC
Genome-Inspired Library:
[0079] 1) Potential off-target sites were generated with CasOffFinder according to the table below (these parameters can vary) and 20-113 bp (this can be variable) of genomic flanking sequence was added
TABLE-US-00007 single base pair single base pair substitutions deletions insertions <=6 0 0 <=4 <=2 0 <=3 0 <=2 4 0 1
[0080] For an EMX1 site, here is an example of the number of sequences present given the above parameters.
TABLE-US-00008 insertion (DNA deletion (RNA # of mismatches bulge) length bulge length) sequences 0 0 0 1 2 0 0 1 3 0 0 25 4 0 0 378 5 0 0 3903 6 0 0 30213 1 0 2 1 2 1 0 6 2 2 0 7 2 0 1 17 2 0 2 161 3 1 0 130 3 2 0 126 3 0 1 566 3 0 2 7579 4 1 0 2214 4 2 0 1942 4 0 1 8279 Total 55549
2) barcodes/potential off-target sites for all i off target sites (i˜50,000) were combined into the backbone as for the minimally unbiased library with maximal genomic flanking context:
TABLE-US-00009 (SEQ ID NO: 128) GACGTTCTCACAGCAATTCGT(barcode.sub.i)(flanking genomic context.sub.i)(potential_off_target_site.sub.i) (flanking_genomic_context)(barcode.sub.i)TGCGAGTCTTACA GCATTGC
[0081] Constant backbone sequence can be increased as the flanking genomic context is varied.
[0082] For example, with 10 bp genomic flanking sequence on both sides:
TABLE-US-00010 (SEQ ID NO: 129) GACGTTCTCACAGCAATTCGTACAGTCGACGTCGATTCGTGCT (barcode.sub.i)TTTGACATTCTGCAATGT (flanking_genomic_context.sub.i) (potential_off_target_site.sub.i) (flanking_genomic_context)(AAGTATGTATGCTTCGCGCAGTGC GACTTCGCAGCGCATCACTTCA(barcode.sub.i)AGTAGCTGCGAGTCTTACA GCATTGC
Other Library Generation Strategies:
[0083] incorporate population based SNPs into genomic sequences [0084] generate libraries based on only coding DNA sequences [0085] generate libraries of sites that are oncogene hotspots or tumor suppressor genes
[0086] The following are examples of methods using off-target libraries constructed using the above principles.
Method for In Vitro Selection of Cleaved Library Members
[0087] 1. Library Amplification
[0088] We amplify the oligonucleotide libraries using primers that bind to the constant flanking regions that are found in all library members. These primers contain 5′ prime overhangs that introduce additional length and a unique molecular identifier. The libraries are amplified using the following protocol using 2 μl of 5 nM input library.
TABLE-US-00011 SV (2l of 5 nM input library) 2 Thermopol buffer 5 Taq Polym. 0.25 dNTP 10 mM 1 KP_extension_new_fw* 1 KP_extension_new_rev* 1 H2O 39.75 RV 50 PCR program cycles 12 ID 95 30 D 95 20 A 50 15 E 68 1 FE 68 30 min SV Samples Volume RV Reaction Volume *KP_extension_new_fw, Primer Sequence: GCTGACTAGACACTGCTATCACACTCTCTCANNNNNNNNAGACGTTCTCACAGCAATTCG (SEQ ID NO: 130) *KP_extension_new_rev, Primer Sequence: GCGTAATCACTGATGCTTCGTAAATGAGACANNNNNNNNTGCAATGCTGTAAGACTCGCA (SEQ ID NO: 131)
[0089] 2. DNA Purification:
[0090] DNA purification with AMPure magnetic beads at a sample:bead ratio of 0.9× according to manufacturer's protocol.
[0091] 3. Enzymatic Incubation:
[0092] Incubation of 300 ng of the chip-synthesized library with protein of interest at varying enzyme concentrations and incubation times. In most cases (Cas9, Cas9HF, BE3, ABE) it is sufficient to perform an 1-2 h incubation of the enzyme in activity buffer on 300 ng of oligonucleotide library at a molar ratio of 10:10:1 for protein, sgRNA and DNA substrate, respectively. Depending on the specific protein function, these parameters may need to be optimized.
[0093] 4. Optional DNA Nicking:
[0094] Depending on the analyzed protein, enzymatic incubation may not result in the creation of a DNA double strand break (DSB). In the case of BE3 and ABE both enzymes merely nick on strand of DNA while base editing the other. By employing USER enzyme or Endonuclease V for BE3 and ABE, repectively, it is possible to convert this DNA nick into a staggered DSB (see
[0095] 5. DNA Purification:
[0096] DNA purification with AMPure magnetic beads at a sample:bead ratio of 1.5× according to manufacturer's protocol.
[0097] 6. Optional DNA Blunting:
[0098] If an additional nicking step (5) was required, the staggered DSB will be blunted by incubation with Phusion Polymerase for 20 min at 72° C. and then cooled to 4° C.
[0099] 7. DNA Purification:
[0100] DNA purification with AMPure magnetic beads at a sample:bead ratio of 1.5× according to manufacturer's protocol.
[0101] 8. Adapter Ligation:
[0102] Next, half functional Y-shape adapters are ligated to the blunted DNA from step 7. To achieve this, we supply adapter in 10-fold molar excess over library fragments and ligate using the NEB quick ligation kit, incubating the reaction at 25° C. for 10 min.
[0103] 9. Gel Purification:
[0104] Next, we perform a gel purification of the ligation reaction by employing a 2.5% Agarose gel. The electrophoresis is performed at 120 Volt for 1 hour. After 1 hour the sample containing lanes are excised at around 180 bp fragment size and DNA is extracted using the Qiagen gel extraction kit according to manufacturer's protocol.
[0105] 10. PCR-Amplification:
[0106] The eluate from step 9 is subsequently used as input for two PCR reactions that amplify the Protospacer-adjacent and PAM-adjacent site of cut library members. The primers used in this PCR contain 5′ overhangs that can be subsequently used to append Illumina sequencing barcodes. Optionally, QPCR can be performed to determine the minimum number of PCR cycles required. The PCRs are performed using the following parameters:
TABLE-US-00012 Sample Volume 6 Phusion High Fidelity Buffer 5X 10 Phusion Polymerase 0.5 dNTP 10 mM 1 PrimerA 2.5 PrimerB 2.5 H2O 27.5 PCR program cycles 25-35 ID 98 30 D 98 10 A 65 20 E 72 5 FE 72 5 min
[0107] 11. DNA Purification:
[0108] DNA purification with AMPure magnetic beads at a sample:bead ratio of 1.5× according to manufacturer's protocol.
[0109] 12. Quality Control Using Capillary Electrophoresis:
[0110] Quality control is performed by examining the PCR products via capillary electrophoresis.
[0111] 13. PCR-Based NGS Library Preparation:
[0112] Sequencing adapters are appended to the PCR products from step 12 by performing a PCR with primers containing Illumina sequencing adapters. The PCRs are performed using the following parameters:
TABLE-US-00013 Sample Volume 50 ng total Phusion High Fidelity Buffer 5X 10 Phusion Polymerase 0.5 dNTP 10 mM 1 IndexPrimerA 2.5 IndexPrimerB 2.5 H2O Ad 50 PCR program cycles 10 ID 98 30 D 98 10 A 65 30 E 72 35 FE 72 10 min
[0113] 14. DNA Purification:
[0114] DNA purification with AMPure magnetic beads at a sample:bead ratio of 1.5× according to manufacturer's protocol.
[0115] 15. Next Generation Sequencing on Illumina Sequencers:
[0116] The DNA libraries from step 14 are quantified via digital droplet PCR and sequenced on Illumina sequencer's according to the manufacturer's protocol.
Method for Enrichment of DNA Binding Sites by Pulldown
[0117] 1) Resuspend Snap Capture Beads (NEB) [0118] 2) Pipette 80 uL of the beads to a new 1.5 mL Eppendorf tube [0119] 3) Place tube in a magnetic particle separator and discard the supernatant [0120] 4) Add 1 mL of Immobilization Buffer (20 mM HEPES, 150 mM NaCl, 0.5% Tween20, 1 mM DTT, pH 6.5) and vortex gently [0121] 5) Place tube in a magnetic particle separator and discard the supernatant [0122] 6) Prepare the protein: Add Engen Spy dCas9 (SNAP-tag) (NEB) (4.5 uL of 20 uM per pull down reaction) to 500 uL of Immobilization Buffer [0123] 7) Add the diluted protein to the beads and mix well via pipetting [0124] 8) Incubate for 1 hour shaking at room temperature [0125] 9) Place tube in a magnetic particle separator and discard the supernatant [0126] 10) Wash the beads. Add 1 mL of Immobilization Buffer, pipette mix well, and then place the tube in a magnetic particle separator and discard the supernatant [0127] 11) Repeat step 10 twice more for a total of 3 washes. Perform the last wash with Immobilization Buffer with 10 ug/mL Heparin [0128] 12) Resuspend the beads in 45 uL of immobilization buffer per pull down [0129] 13) Mix the following:
TABLE-US-00014 Component Amount for 1 Pull Down Reaction Water Add enough to make the final volume 60 uL after adding everything including 0.9 pmol of Library 10X Immobilization Buffer + 6 uL 100 ug/mL Heparin gRNA 3500 ng Engen Spy dCas9 45 uL (SNAP-tag) + Magnet Beads [0130] 14) Incubate for 25 deg C. for 10 min [0131] 15) Add 0.9 pmol of library [0132] 16) Incubate at 37 deg C. for 30 min [0133] 17) Place the tube on a magnetic bead separator and discard the supernatant [0134] 18) Wash the beads 5 times with 200 uL of Immobilization Buffer with 10 ug/mL Heparin [0135] 19) Add 50 uL of water and 2 uL of Proteinase K and incubate at room temperature for 10 min while shaking [0136] 20) Clean up the pulled down product with DNA purification beads (for example, Ampure) and elute in 10 uL of 0.1× Buffer EB (QIAgen)
Other Embodiments
[0137] It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.