Engineered CRISPR-Cas9 nucleases with Altered PAM Specificity
20220145275 · 2022-05-12
Inventors
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12N15/111
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
C07K2319/80
CHEMISTRY; METALLURGY
C12N2800/22
CHEMISTRY; METALLURGY
C07K2319/71
CHEMISTRY; METALLURGY
C12N2800/80
CHEMISTRY; METALLURGY
C12N15/90
CHEMISTRY; METALLURGY
C12N15/63
CHEMISTRY; METALLURGY
International classification
C12N9/22
CHEMISTRY; METALLURGY
C12N15/10
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
C12N15/63
CHEMISTRY; METALLURGY
Abstract
Engineered CRISPR-Cas9 nucleases with altered and improved PAM specificities and their use in genomic engineering, epigenomic engineering, and genome targeting.
Claims
1.-30. (canceled)
31. A complex comprising: a catalytically inactive Streptococcus pyogenes Cas9 (SpCas9) protein, comprising an amino acid sequence that has at least 90% sequence identity to SEQ ID NO: 1, with a mutation at D1135 and optionally at one or more of the following positions: G1104, 51109, L1111, 51136, G1218, N1317, R1335, T1337, and a guide RNA having a region complementary to a selected portion of the genome of the cell, wherein The complex can interact with a guide RNA and a target DNA.
32. The complex of claim 31, wherein: the mutation at D1135 is selected from the group consisting of: D1135V; D1135E; D1135N; and D1135Y.
33. The complex of claim 31, wherein the mutations are: (i) D1135E (D1135E variant); (ii) D1135V/R1335Q/T1337R (VQR variant); (iii) D1135V/G1218R/R1335Q/T1337R (VRQR variant); (iv) D1135E/R1335Q/T1337R (EQR variant); (v) D1135N/G1218R/R1335Q/T1337R (NRQR variant); (vi) D1135Y/G1218R/R1335Q/T1337R (YRQR variant); (vii) G1104K/D1135V/G1218R/R1335Q/T1337R (KVRQR variant); (viii) S1109T/D1135V/G1218R/R1335Q/T1337R (TVRQR variant); (ix) L1111H/D1135V/G1218R/R1335Q/T1337R (HVRQR variant); (x) D1135V/S1136N/G1218R/R1335Q/T1337R (VNRQR variant); (xi) D1135V/G1218R/N1317K/R1335Q/T1337R (VRKQR variant); or (xii) D1135V/G1218R/R1335E/T1337R (VRER variant).
34. The complex of claim 31, further comprising one or more mutations that decrease nuclease activity selected from the group consisting of mutations at D10, E762, D839, H983, or D986; and at H840 or N863.
35. The complex of claim 34, wherein the mutations are: (i) D10A or D10N, and (ii) H840A, H840N, or H840Y.
36. The complex of claim 31, comprising an amino acid sequence that has at least 95% sequence identity to the amino acid sequence of SEQ ID NO:1, with a mutation at D1135 and optionally at one or more of the following positions: G1104, 51109, L1111, 51136, G1218, N1317, R1335, T1337.
37. The complex of claim 31, comprising the amino acid sequence of SEQ ID NO:1, with a mutation at D1135, G1104, 51109, L1111, 51136, G1218, N1317, R1335 and T1337.
38. The complex of claim 31, wherein the complex comprises one or more mutations at G1104, 51109, L1111, 51136, G1218, N1317, R1335 or T1337.
39. The complex of claim 38, wherein the complex comprises one or more mutations selected from the group consisting of: G1104K; S1109T; L1111H; S1136N; G1218R; N1317K; R1335E; R1335Q and T1337R.
40. The complex of claim 31, wherein the mutations are D1135V, G1218R, R1335E and T1337R (VRER variant).
41. The complex of claim 31, wherein the mutations are D1135V, R1335Q and T1337R (VQR variant).
42. The complex of claim 31, wherein the mutations are D1135E, R1335Q and T1337R (EQR variant).
43. The complex of claim 31, wherein the SpCas9 is fused to a heterologous functional domain, with an optional intervening linker, wherein the linker does not interfere with activity of the fusion protein.
44. The complex of claim 43, wherein the heterologous functional domain is a transcriptional activation domain.
45. The complex of claim 44, wherein the transcriptional activation domain is from VP64 or NF-κB p65.
46. The complex of claim 43, wherein the heterologous functional domain is a transcriptional silencer or transcriptional repression domain.
47. The complex of claim 46, wherein the transcriptional repression domain is a Krueppel-associated box (KRAB) domain, ERF repressor domain (ERD), or mSin3A interaction domain (SID).
48. The complex of claim 46, wherein the transcriptional silencer is Heterochromatin Protein 1 (HP1).
49. The complex of claim 43, wherein the heterologous functional domain is an enzyme that modifies the methylation state of DNA.
50. The complex of claim 49, wherein the enzyme that modifies the methylation state of DNA is a DNA methyltransferase (DNMT) or a ten-eleven translocation (TET) protein.
51. The complex of claim 50, wherein the TET protein is ten-eleven translocation 1 (TET1).
52. The complex of claim 43, wherein the heterologous functional domain is an enzyme that modifies a histone subunit.
53. The complex of claim 43, wherein the enzyme that modifies a histone subunit is a histone acetyltransferase (HAT), histone deacetylase (HDAC), histone methyltransferase (HMT), or histone demethylase.
54. The complex of claim 43, wherein the heterologous functional domain is a biological tether.
55. The complex of claim 54, wherein the biological tether is MS2, Csy4 or lambda N protein.
56. The complex of claim 43, wherein the heterologous functional domain is FokI.
Description
DESCRIPTION OF DRAWINGS
[0023]
[0024]
[0025]
[0026]
TABLE-US-00001 SEQ ID FIG Name NO Description 5A BPK764 7 T7-humanSpCas9-NLS-3xFLAG-T7-BsaIcassette-SpgRNA T7 promoters: nts 1-17 and 4360-4376; human codon optimized S. pyogenes Cas9 88-4224; Nuclear Localization Signal (NLS) (CCCAAGAAGAAGAGGAAAGTC (SEQ ID NO: 650)) at nts 4198- 4218, 3xFLAG tag 4225-4290, BsaI sites 4379-4384 and 4427- 4432, gRNA (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCG TTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 651)) 4434-4509, T7 terminator 4252-4572 of SEQ ID NO: 7 MSP712 8 T7-humanSpdCas9(D10A/H840A)-T7-BsaIcassette-SpgRNA T7 promoters at nts 1-17 and 4360-4376, human codon optimized S. pyogenes Cas9 88-4293, modified codons iat 115-117 and 2605-2607, bold and underlined, NLS (C CCA AG AAG AAG AG GAAAGTC (SEQ ID NO: 650)) at nts 4198- 4218, 3xFLAG lag 4225-4290, BsaI sites 4379-4384 and 4427- 4432, gRNA (GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGT CCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC (SEQ ID NO: 651)) at nts 4434-4509, T7 terminator 4252-4572 of SEQ ID NO: 8 5B BPK2169 9 T7-humanSt1Cas9-NLS-T7-BspMIcassette-St1gRNA T7 promoters at 1-17 and 3555-3571, human codon optimized S. thermophilus1 Cas9 at 88-3489, NLS at 3454 to 3486; BspMI sites at 3577-3582 and 3625-3630, gRNA at 3635-3763, T7 terminator 3778-3825 of SEQ ID NO: 9. 5C BPK2101 10 T7-humanSaCas9-NLS-3xFLAG-T7-BsaIcassette-SagRNA T7 promoters at 1-17 and 3418-3434, human codon optimized S. aureus Cas9 at 88-3352, NLS at 3256-3276, 3xFLAG tag at 3283- 3348, BsaI sites at 3437-3442 and 3485-3490, gRNA at 3492- 3616, T7 terminator at 3627-2674 of SEQ ID NO: 10. 5D p11-IacY- — BAD-ccDB-Amp.sup.R-AraC-IacY(A177C) wtx1.sup.17 5E JDS246 11 CMV-T7-humanSpCas9-NLS-3xFLAG ADDGENE ID: 43861 Human codon optimized S. pyogenes Cas9 1-4206, NLS at 4111- 4131, 3xFLAG tag at 4138-4203 of SEQ ID NO: 11. MSP469 12 CMV-T7-humanSpCas9(D1135V/R1335Q/T1337R)-NLS-3xFLAG (VQR variant) Human codon optimized S. pyogenes Cas9 1-4206, modified codons at 3403-3405, 4003-4005, and 4009-4011, NLS at 411- 4131, 3xFLAG tag 4138-4203 of SEQ ID NO: 12. MSP680 13 CMV-T7-humanSpCas9(D1135E/R1335Q/T1337R)-NLS-3xFLAG (EQR variant) Human codon optimized S. pyogenes Cas9 1-4206, modified codons at 3403-3405, 3652-3654, 4003-4005, and 4009-4011, NLS at 411-4131, 3xFLAG tag 4138-4203 of SEQ ID NO: 13. MSP1101 14 CMV-T7-humanSpCas9(D1135V/G1218R/R1335E/T1337R)-NLS- 3xFLAG (VRER variant) Human codon optimized S. pyogenes Cas9 1-4206, modified codons at 3403-3405, 4003-4005, and 4009-4011, NLS at 411- 4131, 3xFLAG tag 4138-4203 of SEQ ID NO: 14 MSP977 15 CMV-T7-humanSpCas9(D1135E)-NLS-3xFLAG Human codon optimized S. pyogenes Cas9 1-4206, modified codons at 3403-3405, NLS at 411-4131, 3xFLAG tag 4138-4203 of SEQ ID NO: 15. 5F MSP1393 16 CAG-humanSt1 Cas9-NLS Human codon optimized S. thermophilus1 Cas9 1-3402, NLS at 3367-3399 of SEQ ID NO: 16. 5G BPK2139 17 CAG-humanSaCas9-NLS-3xFLAG Human codon optimized S. aureus Cas9 1-3195, NLS 3169-3189, 3xFLAG tag 3196-3261 of SEQ ID NO: 17. 5H BPK1520 18 U6-BsmBIcassette-SpgRNA U6 promoter at 1-318, BsmBI sites at 320-325 and 333-338, S. pyogenes gRNA 339-422, U6 terminator 416-422 of SEQ ID NO: 18. 5I BPK2301 19 U6-BsmBIcassette-St1gRNA U6 promoter 1-318, BsmBI sites at 320-325 and 333-338, S. thermophilus1 gRNA 340-471, U6 terminator 464-471 of SEQ ID NO: 19. 5J VVT1 20 U6-BsmBIcassette-SagRNA U6 promoter 1-318, BsmBI sites at 320-325 and 333-338, S. aureus gRNA 340-466, U6 terminator 459-466 of SEQ ID NO: 20.
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
DETAILED DESCRIPTION
[0059] Although CRISPR-Cas9 nucleases are widely used for genome editing.sup.1-4, the range of sequences that Cas9 can cleave is constrained by the need for a specific protospacer adjacent motif (PAM) in the target site.sup.5, 6. For example, SpCas9, the most robust and widely used Cas9 to date, primarily recognizes NGG PAMs. As a result, it can often be difficult to target double-stranded breaks (DSBs) with the precision that is necessary for various genome editing applications. In addition, imperfect PAM recognition by Cas9 can lead to the creation of unwanted off-target mutations.sup.7,8. The ability to evolve Cas9 derivatives with purposefully altered or improved PAM specificities would address these limitations but, to the present inventors' knowledge, no such Cas9 variants have been described.
[0060] A potential strategy for improving the targeting range of orthogonal Cas9s that recognize extended PAMs is to alter their PAM recognition specificities. As described herein, PAM recognition specificity of SpCas9 can be altered using a combination of structure-guided design and directed evolution performed with a bacterial cell-based selection system; see Examples 1 and 2. Also described herein are variants that have been evolved to have relaxed or partially relaxed specificities for certain positions within the PAM; see Example 3. These variants expand the utility of Cas9 orthologues that specify longer PAM sequences.
[0061] Engineered Cas9 Variants with Altered PAM Specificity
[0062] The SpCas9 variants engineered in this study greatly increase the sites accessible by wild-type SpCas9, further enhancing the opportunities to use the CRISPR-Cas9 platform to practice efficient HDR, to target NHEJ-mediated indels to small genetic elements, and to exploit the requirement for a PAM to distinguish between two different alleles in the same cell. The altered PAM specificity SpCas9 variants can efficiently disrupt endogenous gene sites that are not currently targetable by SpCas9 in both zebrafish embryos and human cells, suggesting that they will work in a variety of different cell types and organisms. Importantly, GUIDE-seq experiments show that the global profiles of the VQR and VRER SpCas9 variants are similar to or better than those observed with wild-type SpCas9. In addition, the improved specificity D1135E variant that we identified and characterized provides a superior alternative to the widely used wild-type SpCas9. D1135E has similar activity to wild-type SpCas9 on sites with canonical NGG PAMs but reduces genome-wide cleavage of off-target sites bearing mismatched spacer sequences and either canonical or non-canonical PAMs.
[0063] All of the SpCas9 and SaCas9 variants described herein can be rapidly incorporated into existing and widely used vectors, e.g., by simple site-directed mutagenesis, and because they require only a small number of mutations contained within the PAM-interacting domain, the variants should also work with other previously described improvements to the SpCas9 platform (e.g., truncated sgRNAs (Tsai et al., Nat Biotechnol 33, 187-197 (2015); Fu et al., Nat Biotechnol 32, 279-284 (2014)), nickase mutations (Mali et al., Nat Biotechnol 31, 833-838 (2013); Ran et al., Cell 154, 1380-1389 (2013)), dimeric FokI-dCas9 fusions (Guilinger et al., Nat Biotechnol 32, 577-582 (2014); Tsai et al., Nat Biotechnol 32, 569-576 (2014)).
[0064] Beyond the mutations to R1335 that presumably contact the 3.sup.rd PAM base position, the SpCas9 variants evolved in this study bear amino acid substitutions at D1135, G1218, and T1337, all of which are located near or adjacent to residues that make direct or indirect contacts to the 3.sup.rd PAM position in the SpCas9-PAM structure but do not themselves mediate contacts with the PAM bases (Anders et al., Nature 513, 569-573 (2014)) (
[0065] The present results clearly establish the feasibility of engineering Cas9 nucleases with altered PAM specificities. Characterization of additional Cas9 orthologues (Esvelt et al., Nat Methods 10, 1116-1121 (2013); Fonfara et al., Nucleic Acids Res 42, 2577-2590 (2014)) or generation of domain-swapped Cas9 chimeras (Nishimasu et al., Cell. 156(5):935-49 (2014)) as previously described also provide potential avenues for targeting different PAMs. The engineering strategy delineated herein can also be performed with such orthologues or synthetic hybrid Cas9s to further diversify the range of targetable PAMs. St1Cas9 and SaCas9 make particularly attractive frameworks for future engineering efforts given their smaller sizes relative to SpCas9 and our demonstration of their robust genome editing activities in our bacterial selection systems and in human cells.
[0066] Our results strongly suggested that R1015 in wild-type SaCas9 contacts the G in the third PAM position. Without wishing to be bound by theory, the R1015H substitution may remove this contact and relax specificity at the third position; however, loss of the R1015 to G contact could also conceivably reduce the energy associated with target site binding, which may explain why the R1015H mutation alone is not sufficient for robust activity at NNNRRT sites in human cells. Because the E782K and N968K substitutions both add positive charge, it is possible that they may make non-specific interactions with the DNA phosphate backbone to compensate energetically for the loss of the R1015 to guanine contact.
[0067] The genetic approach described here does not require structural information and therefore should be applicable to many other Cas9 orthologues. The only requirement to evolve Cas9 nucleases with broadened PAM specificities is that they function in a bacterial-based selection. While previous studies demonstrated that PAM recognition can be altered by swapping the PAM-interacting domains of highly related Cas9 orthologues (Nishimasu et al., Cell (2014)), it remains to be determined whether this strategy is generalizable or effective when using more divergent orthologues. By contrast, the evolution strategies we have described herein can be used engineer PAM recognition specificities beyond those encoded within naturally occurring Cas9 orthologues. This overall strategy can be employed to expand the targeting range and extend the utility of the numerous Cas9 orthologues that exist in nature.
[0068] SpCas9 Variants with Altered Specificity
[0069] Thus, provided herein are spCas9 variants. The SpCas9 wild type sequence is as follows:
TABLE-US-00002 (SEQ ID NO: 1) 10 20 30 40 50 60 MDKKYSIGLD IGTNSVGWAV ITDEYKVPSK KFKVLGNTDR HSIKKNLIGA LLFDSGETAE 70 80 90 100 110 120 ATRLKRTARR RYTRRKNRIC YLQEIFSNEM AKVDDSFFHR LEESFLVEED KKHERHPIFG 130 140 150 160 170 180 NIVDEVAYHE KYPTIYHLRK KLVDSTDKAD LRLIYLALAH MIKFRGHFLI EGDLNPDNSD 190 200 210 220 230 240 VDKLFIQLVQ TYNQLFEENP INASGVDAKA ILSARLSKSR RLENLIAQLP GEKKNGLFGN 250 260 270 280 290 300 LIALSLGLTP NFKSNFDLAE DAKLQLSKDT YDDDLDNLLA QIGDQYADLF LAAKNLSDAI 310 320 330 340 350 360 LLSDILRVNT EITKAPLSAS MIKRYDEHHQ DLTLLKALVR QQLPEKYKEI FFDQSKNGYA 370 380 390 400 410 420 GYIDGGASQE EFYKFIKPIL EKMDGTEELL VKLNREDLLR KQRTFDNGSI PHQIHLGELH 430 440 450 460 470 480 AILRRQEDFY PFLKDNREKI EKILTFRIPY YVGPLARGNS RFAWMTRKSE ETITPWNFEE 490 500 510 520 530 540 VVDKGASAQS FIERMTNFDK NLPNEKVLPK HSLLYEYFTV YNELTKVKYV TEGMRKPAFL 550 560 570 580 590 600 SGEQKKAIVD LLFKTNRKVT VKQLKEDYFK KIECFDSVEI SGVEDRFNAS LGTYHDLLKI 610 620 630 640 650 660 IKDKDFLDNE ENEDILEDIV LTLTLFEDRE MIEERLKTYA HLFDDKVMKQ LKRRRYTGWG 670 680 690 700 710 720 RLSRKLINGI RDKQSGKTIL DFLKSDGFAN RNFMQLIHDD SLTFKEDIQK AQVSGQGDSL 730 740 750 760 770 780 HEHIANLAGS PAIKKGILQT VKVVDELVKV MGRHKPENIV IEMARENQTT QKGQKNSRER 790 800 810 820 830 840 MKRIEEGIKE LGSQILKEHP VENTQLQNEK LYLYYLQNGR DMYVDQELDI NRLSDYDVDH 850 860 870 880 890 900 IVPQSFLKDD SIDNKVLTRS DKNRGKSDNV PSEEVVKKMK NYWRQLLNAK LITQRKFDNL 910 920 930 940 950 960 TKAERGGLSE LDKAGFIKRQ LVETRQITKH VAQILDSRMN TKYDENDKLI REVKVITLKS 970 980 990 1000 1010 1020 KLVSDFRKDF QFYKVREINN YHHAHDAYLN AVVGTALIKK YPKLESEFVY GDYKVYDVRK 1030 1040 1050 1060 1070 1080 MIAKSEQEIG KATAKYFFYS NIMNFFKTEI TLANGEIRKR PLIETNGETG EIVWDKGRDF 1090 1100 1110 1120 1130 1140 ATVRKVLSMP QVNIVKKTEV QTGGFSKESI LPKRNSDKLI ARKKDWDPKK YGGFDSPTVA 1150 1160 1170 1180 1190 1200 YSVLVVAKVE KGKSKKLKSV KELLGITIME RSSFEKNPID FLEAKGYKEV KKDLIIKLPK 1210 1220 1230 1240 1250 1260 YSLFELENGR KRMLASAGEL QKGNELALPS KYVNFLYLAS HYEKLKGSPE DNEQKQLFVE 1270 1280 1290 1300 1310 1320 QHKHYLDEII EQISEFSKRV ILADANLDKV LSAYNKHRDK PIREQAENII HLFTLTNLGA 1330 1340 1350 1360 PAAFKYFDTT IDRKRYTSTK EVLDATLIHQ SITGLYETRI DLSQLGGD
[0070] The SpCas9 variants described herein can include mutations at one or more of the following positions: D1135, G1218, R1335, T1337 (or at positions analogous thereto). In some embodiments, the SpCas9 variants include one or more of the following mutations: D1135V; D1135E; G1218R; R1335E; R1335Q; and T1337R. In some embodiments, the SpCas9 variants are at least 80%, e.g., at least 85%, 90%, or 95% identical to the amino acid sequence of SEQ ID NO:1, e.g., have differences at up to 5%, 10%, 15%, or 20% of the residues of SEQ ID NO:1 replaced, e.g., with conservative mutations. In preferred embodiments, the variant retains desired activity of the parent, e.g., the nuclease activity (except where the parent is a nickase or a dead Cas9), and/or the ability to interact with a guide RNA and target DNA).
[0071] To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein nucleic acid “identity” is equivalent to nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. Percent identity between two polypeptides or nucleic acid sequences is determined in various ways that are within the skill in the art, for instance, using publicly available computer software such as Smith Waterman Alignment (Smith, T. F. and M. S. Waterman (1981) J Mol Biol 147:195-7); “BestFit” (Smith and Waterman, Advances in Applied Mathematics, 482-489 (1981)) as incorporated into GeneMatcher Plus™, Schwarz and Dayhof (1979) Atlas of Protein Sequence and Structure, Dayhof, M. O., Ed, pp 353-358; BLAST program (Basic Local Alignment Search Tool; (Altschul, S. F., W. Gish, et al. (1990) J Mol Biol 215: 403-10), BLAST-2, BLAST-P, BLAST-N, BLAST-X, WU-BLAST-2, ALIGN, ALIGN-2, CLUSTAL, or Megalign (DNASTAR) software. In addition, those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the length of the sequences being compared. In general, for proteins or nucleic acids, the length of comparison can be any length, up to and including full length (e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%). For purposes of the present compositions and methods, at least 80% of the full length of the sequence is aligned using the BLAST algorithm and the default parameters.
[0072] For purposes of the present invention, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
[0073] Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.
[0074] In some embodiments, the SpCas9 variants include one of the following sets of mutations: D1135V/R1335Q/T1337R (VQR variant); D1135V/G1218R/R1335Q/T1337R (VRQR variant); D1135E/R1335Q/T1337R (EQR variant); or D1135V/G1218R/R1335E/T1337R (VRER variant).
[0075] In some embodiments, the SpCas9 variants also include one of the following mutations, which reduce or destroy the nuclease activity of the Cas9: D10, E762, D839, H983, or D986 and H840 or N863, e.g., D10A/D10N and H840A/H840N/H840Y, to render the nuclease portion of the protein catalytically inactive; substitutions at these positions could be alanine (as they are in Nishimasu al., Cell 156, 935-949 (2014)), or other residues, e.g., glutamine, asparagine, tyrosine, serine, or aspartate, e.g., E762Q, H983N, H983Y, D986N, N863D, N863S, or N863H (see WO 2014/152432). In some embodiments, the variant includes mutations at D10A or H840A (which creates a single-strand nickase), or mutations at D10A and H840A (which abrogates nuclease activity; this mutant is known as dead Cas9 or dCas9).
[0076] Also provided herein are SaCas9 variants. The SaCas9 wild type sequence is as follows:
TABLE-US-00003 (SEQ ID NO: 2) 10 20 30 40 50 MKRNYILGLD IGITSVGYGI IDYETRDVID AGVRLFKEAN VENNEGRRSK 60 70 80 90 100 RGARRLKRRR RHRIQRVKKL LFDYNLLTDH SELSGINPYE ARVKGLSQKL 110 120 130 140 150 SEEEFSAALL HLAKRRGVHN VNEVEEDTGN ELSTKEQISR NSKALEEKYV 160 170 180 190 200 AELQLERLKK DGEVRGSINR FKTSDYVKEA KQLLKVQKAY HQLDQSFIDT 210 220 230 240 250 YIDLLETRRT YYEGPGEGSP FGWKDIKEWY EMLMGHCTYF PEELRSVKYA 260 270 280 290 300 YNADLYNALN DLNNLVITRD ENEKLEYYEK FQIIENVFKQ KKKPTLKQIA 310 320 330 340 350 KEILVNEEDI KGYRVTSTGK PEFTNLKVYH DIKDITARKE IIENAELLDQ 360 370 380 390 400 IAKILTIYQS SEDIQEELTN LNSELTQEEI EQISNLKGYT GTHNLSLKAI 410 420 430 440 450 NLILDELWHT NDNQIAIFNR LKLVPKKVDL SQQKEIPTTL VDDFILSPVV 460 470 480 490 500 KRSFIQSIKV INAIIKKYGL PNDIIIELAR EKNSKDAQKM INEMQKRNRQ 510 520 530 540 550 TNERIEEIIR TTGKENAKYL IEKIKLHDMQ EGKCLYSLEA IPLEDLLNNP 560 570 580 590 600 FNYEVDHIIP RSVSFDNSFN NKVLVKQEEN SKKGNRTPFQ YLSSSDSKIS 610 620 630 640 650 YETFKKHILN LAKGKGRISK TKKEYLLEER DINRFSVQKD FINRNLVDTR 660 670 680 690 700 YATRGLMNLL RSYFRVNNLD VKVKSINGGF TSFLRRKWKF KKERNKGYKH 710 720 730 740 750 HAEDALIIAN ADFIFKEWKK LDKAKKVMEN QMFEEKQAES MPEIETEQEY 760 770 780 790 800 KEIFITPHQI KHIKDFKDYK YSHRVDKKPN RELINDTLYS TRKDDKGNTL 810 820 830 840 850 IVNNLNGLYD KDNDKLKKLI NKSPEKLLMY HHDPQTYQKL KLIMEQYGDE 860 870 880 890 900 KNPLYKYYEE TGNYLTKYSK KDNGPVIKKI KYYGNKLNAH LDITDDYPNS 910 920 930 940 950 RNKVVKLSLK PYRFDVYLDN GVYKFVTVKN LDVIKKENYY EVNSKCYEEA 960 970 980 990 1000 KKLKKISNQA EFIASFYNND LIKINGELYR VIGVNNDLLN RIEVNMIDIT 1010 1020 1030 1040 1050 YREYLENMND KRPPRIIKTI ASKTQSIKKY STDILGNLYE VKSKKHPQII KKG
[0077] The SaCas9 variants described herein include mutations at one or more of the following positions: E782, N968, and/or R1015 (or at positions analogous thereto). In some embodiments, the variants include one or more of the following mutations: R1015Q, R1015H, E782K, N968K, E735K, K929R, A1021T, K1044N. In some embodiments, the SaCas9 variants include mutations E782K, K929R, N968K, and R1015X, wherein X is any amino acid other than R. In some embodiments, the SaCas9 variants are at least 80%, e.g., at least 85%, 90%, or 95% identical to the amino acid sequence of SEQ ID NO:2, e.g., have differences at up to 5%, 10%, 15%, or 20% of the residues of SEQ ID NO:2 replaced, e.g., with conservative mutations. In preferred embodiments, the variant retains desired activity of the parent, e.g., the nuclease activity (except where the parent is a nickase or a dead Cas9), and/or the ability to interact with a guide RNA and target DNA).
[0078] In some embodiments, the SaCas9 variants also include one of the following mutations, which may reduce or destroy the nuclease activity of the SaCas9: D10A, D556A, H557A, N580A, e.g., D10A/H557A and/or D10A/D556A/H557A/N580A, to render the nuclease portion of the protein catalytically inactive; substitutions at these positions could be alanine (as they are in Nishimasu al., Cell 156, 935-949 (2014)), or other residues, e.g., glutamine, asparagine, tyrosine, serine, or aspartate. In some embodiments, the variant includes mutations at D10A, D556A, H557A, or N580A (which may create a single-strand nickase), or mutations at D10A/H557A and/or D10A/D556A/H557A/N580A may (which may abrogate nuclease activity by analogy to SpCas9; these are referred to as dead Cas9 or dCas9).
[0079] Also provided herein are isolated nucleic acids encoding the SpCas9 and/or SaCas9 variants, vectors comprising the isolated nucleic acids, optionally operably linked to one or more regulatory domains for expressing the variant proteins, and host cells, e.g., mammalian host cells, comprising the nucleic acids, and optionally expressing the variant proteins.
[0080] The variants described herein can be used for altering the genome of a cell; the methods generally include expressing the variant proteins in the cells, along with a guide RNA having a region complementary to a selected portion of the genome of the cell. Methods for selectively altering the genome of a cell are known in the art, see, e.g., U.S. Pat. No. 8,697,359; US2010/0076057; US2011/0189776; US2011/0223638; US2013/0130248; WO/2008/108989; WO/2010/054108; WO/2012/164565; WO/2013/098244; WO/2013/176772; US20150050699; US20150045546; US20150031134; US20150024500; US20140377868; US20140357530; US20140349400; US20140335620; US20140335063; US20140315985; US20140310830; US20140310828; US20140309487; US20140304853; US20140298547; US20140295556; US20140294773; US20140287938; US20140273234; US20140273232; US20140273231; US20140273230; US20140271987; US20140256046; US20140248702; US20140242702; US20140242700; US20140242699; US20140242664; US20140234972; US20140227787; US20140212869; US20140201857; US20140199767; US20140189896; US20140186958; US20140186919; US20140186843; US20140179770; US20140179006; US20140170753; Makarova et al., “Evolution and classification of the CRISPR-Cas systems” 9(6) Nature Reviews Microbiology 467-477 (1-23) (June 2011); Wiedenheft et al., “RNA-guided genetic silencing systems in bacteria and archaea” 482 Nature 331-338 (Feb. 16, 2012); Gasiunas et al., “Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria” 109(39) Proceedings of the National Academy of Sciences USA E2579-E2586 (Sep. 4, 2012); Jinek et al., “A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity” 337 Science 816-821 (Aug. 17, 2012); Carroll, “A CRISPR Approach to Gene Targeting” 20(9) Molecular Therapy 1658-1660 (September 2012); U.S. Appl. No. 61/652,086, filed May 25, 2012; Al-Attar et al., Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs): The Hallmark of an Ingenious Antiviral Defense Mechanism in Prokaryotes, Biol Chem. (2011) vol. 392, Issue 4, pp. 277-289; Hale et al., Essential Features and Rational Design of CRISPR RNAs That Function With the Cas RAMP Module Complex to Cleave RNAs, Molecular Cell, (2012) vol. 45, Issue 3, 292-302.
[0081] The variant proteins described herein can be used in place of the SpCas9 proteins described in the foregoing references with guide RNAs that target sequences that have PAM sequences according to the following Table 4.
TABLE-US-00004 TABLE 4 Variant protein Stronger PAM Weaker PAM SpCas9-D1135E NGG NAG, NGA, and NNGG SpCas9-VQR NGAN and NGCG NGGG, NGTG, and NAAG SpCas9-VRQR NGAN SpCas9-EQR NGAG NGAT, NGAA, and NGCG SpCas9-VRER NGCG NGCA, NGCC, and NGCT SaCas9-KKH NNNRRT SaCas9-KKQ NNRRRT NNNRRT (SEQ ID NO: 45) SaCas9-KKE NNCRRT NNNRRT (SEQ ID NO: 47) SaCas9- NNTRRT NNNRRT (KKL or KKM) (SEQ ID NO: 48)
[0082] In addition, the variants described herein can be used in fusion proteins in place of the wild-type Cas9 or other Cas9 mutations (such as the dCas9 or Cas9 nickase described above) as known in the art, e.g., a fusion protein with a heterologous functional domains as described in WO 2014/124284. For example, the variants, preferably comprising one or more nuclease-reducing or killing mutation, can be fused on the N or C terminus of the Cas9 to a transcriptional activation domain or other heterologous functional domains (e.g., transcriptional repressors (e.g., KRAB, ERD, SID, and others, e.g., amino acids 473-530 of the ets2 repressor factor (ERF) repressor domain (ERD), amino acids 1-97 of the KRAB domain of KOX1, or amino acids 1-36 of the Mad mSIN3 interaction domain (SID); see Beerli et al., PNAS USA 95:14628-14633 (1998)) or silencers such as Heterochromatin Protein 1 (HP1, also known as swi6), e.g., HP1α or HP1β; proteins or peptides that could recruit long non-coding RNAs (lncRNAs) fused to a fixed RNA binding sequence such as those bound by the MS2 coat protein, endoribonuclease Csy4, or the lambda N protein; enzymes that modify the methylation state of DNA (e.g., DNA methyltransferase (DNMT) or TET proteins); or enzymes that modify histone subunits (e.g., histone acetyltransferases (HAT), histone deacetylases (HDAC), histone methyltransferases (e.g., for methylation of lysine or arginine residues) or histone demethylases (e.g., for demethylation of lysine or arginine residues)) as are known in the art can also be used. A number of sequences for such domains are known in the art, e.g., a domain that catalyzes hydroxylation of methylated cytosines in DNA. Exemplary proteins include the Ten-Eleven-Translocation (TET)1-3 family, enzymes that converts 5-methylcytosine (5-mC) to 5-hydroxymethylcytosine (5-hmC) in DNA.
[0083] Sequences for human TET1-3 are known in the art and are shown in the following table:
TABLE-US-00005 GenBank Accession Nos. Gene Amino Acid Nucleic Acid TET1 NP_085128.2 NM_030625.2 TET2* NP_001120680.1 (var 1) NM_001127208.2 NP_060098.3 (var 2) NM_017628.4 TET3 NP_659430.1 NM_144993.1 *Variant (1) represents the longer transcript and encodes the longer isoform (a). Variant (2) differs in the 5′ UTR and in the 3′ UTR and coding sequence compared to variant 1. The resulting isoform (b) is shorter and has a distinct C-terminus compared to isoform a.
[0084] In some embodiments, all or part of the full-length sequence of the catalytic domain can be included, e.g., a catalytic module comprising the cysteine-rich extension and the 2OGFeDO domain encoded by 7 highly conserved exons, e.g., the Tet1 catalytic domain comprising amino acids 1580-2052, Tet2 comprising amino acids 1290-1905 and Tet3 comprising amino acids 966-1678. See, e.g., FIG. 1 of Iyer et al., Cell Cycle. 2009 Jun. 1; 8(11):1698-710. Epub 2009 Jun. 27, for an alignment illustrating the key catalytic residues in all three Tet proteins, and the supplementary materials thereof for full length sequences (see, e.g., seq 2c); in some embodiments, the sequence includes amino acids 1418-2136 of Tet1 or the corresponding region in Tet2/3.
[0085] Other catalytic modules can be from the proteins identified in Iyer et al., 2009. In some embodiments, the heterologous functional domain is a biological tether, and comprises all or part of (e.g., DNA binding domain from) the MS2 coat protein, endoribonuclease Csy4, or the lambda N protein. These proteins can be used to recruit RNA molecules containing a specific stem-loop structure to a locale specified by the dCas9 gRNA targeting sequences. For example, a dCas9 variant fused to MS2 coat protein, endoribonuclease Csy4, or lambda N can be used to recruit a long non-coding RNA (lncRNA) such as XIST or HOTAIR; see, e.g., Keryer-Bibens et al., Biol. Cell 100:125-138 (2008), that is linked to the Csy4, MS2 or lambda N binding sequence. Alternatively, the Csy4, MS2 or lambda N protein binding sequence can be linked to another protein, e.g., as described in Keryer-Bibens et al., supra, and the protein can be targeted to the dCas9 variant binding site using the methods and compositions described herein. In some embodiments, the Csy4 is catalytically inactive. In some embodiments, the Cas9 variant, preferably a dCas9 variant, is fused to FokI as described in WO 2014/204578.
[0086] In some embodiments, the fusion proteins include a linker between the dCas9 variant and the heterologous functional domains. Linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins. In preferred embodiments, the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine). In some embodiments, the linker comprises one or more units consisting of GGGS (SEQ ID NO:188) or GGGGS (SEQ ID NO:189), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:188) or GGGGS (SEQ ID NO:189) unit. Other linker sequences can also be used.
[0087] Expression Systems
[0088] To use the Cas9 variants described herein, it may be desirable to express them from a nucleic acid that encodes them. This can be performed in a variety of ways. For example, the nucleic acid encoding the Cas9 variant can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression. Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the Cas9 variant for production of the Cas9 variant. The nucleic acid encoding the Cas9 variant can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.
[0089] To obtain expression, a sequence encoding a Cas9 variant is typically subcloned into an expression vector that contains a promoter to direct transcription. Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010). Bacterial expression systems for expressing the engineered protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., 1983, Gene 22:229-235). Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
[0090] The promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the Cas9 variant is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the Cas9 variant. In addition, a preferred promoter for administration of the Cas9 variant can be a weak promoter, such as HSV TK or a promoter having similar activity. The promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino et al., 1998, Gene Ther., 5:491-496; Wang et al., 1997, Gene Ther., 4:432-441; Neering et al., 1996, Blood, 88:1147-55; and Rendahl et al., 1998, Nat. Biotechnol., 16:757-761).
[0091] In addition to the promoter, the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic. A typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the Cas9 variant, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination. Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
[0092] The particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the Cas9 variant, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
[0093] Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
[0094] The vectors for expressing the Cas9 variants can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the H1, U6 or 7SK promoters. These human promoters allow for expression of Cas9 variants in mammalian cells following plasmid transfection.
[0095] Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
[0096] The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
[0097] Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264:17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983).
[0098] Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the Cas9 variant.
[0099] The present invention includes the vectors and cells comprising the vectors.
EXAMPLES
[0100] The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
[0101] Methods
[0102] The following materials and methods were used in Examples 1 and 2.
[0103] Plasmids and Oligonucleotides
[0104] Schematic maps and DNA sequences for parent constructs used in this study can be found in
TABLE-US-00006 TABLE 1 SEQ ID sequence description NO: Oligos used to generate positive and negative selection plasmids CtagaGGGCA top oligo to clone 190 CGGGCAGCTT site 1 into the GCCGGTGGgc positive selection atg vector (XbaI/SphI cut p11-lacY-wtx1) CCCACCGGCA bottom oligo to clone 191 AGCTGCCCGT site 1 into the GCCCt positive selection vector CtagaGGTCG top oligo to clone 192 CCCTCGAACT site 2 into the TCACCTCGGg positive selection catg vector (XbaI/SphI cut p11-lacY-wtx1) CCCGAGGTGA bottom oligo to clone 193 AGTTCGAGGG site 2 into the CGACCt positive selection vector aattCGGGCA top oligo to clone 194 CGGGCAGCTT site 1 into the GCCGGTGGgc negative selection atg vector (EcoRI/SphI cut p11-lacY-wtx1) CCCACCGGCA bottom oligo to clone 195 AGCTGCCCGT site 1 into the GCCCg negative selection vector aattCGGTCG top oligo to clone 196 CCCTCGAACT site 2 into the TCACCTCGGg negative selection catg vector (EcoRI/SphI cut p11-lacY-wtx1) CCCGAGGTGA bottom oligo to clone 197 AGTTCGAGGG site 2 into the CGACCg negative selection vector Oligos used to generate positive and negative selection plasmids GCAGgaattc top strand oligo for site 1 198 GGGCACGGGC PAM library, cut AGCTTGCCGG with EcoRI once filled in NNNNNNCTN NNGCGCAGGT CACGAGGCAT G GCAGgaattC top strand oligo for site 2 199 GTCGCCCTCG PAM library, cut AACTTCACCT with EcoRI once filled in NNNNNNCTN NNGCGCAGGT CACGAGGCAT G /5Phos/CC reverse primer to fill 200 TCGTGACCT in library oligos GCGC Primers used to amplify site-depletion libraries for sequencing GATACCGCTC forward primer 201 GCCGCAGC CTGCGTTCTG reverse primer 202 ATTTAATCTG TATCAGGC Primers used for T7E1 experiments GGAGATGTAA forward primer targeted 203 ATCACCTCCA to th1 in zebrafish TCTGA ATGTTAGCCT reverse primer targeted 204 ACCTCGAAAA to th1 in zebrafish CCTTC CCTGTGCTCT forward primer targeted 205 CCTGTTTTTA to tia1L in zebrafish GGTAT AACATGGTAA reverse primer targeted 206 GAAGCGTGAG to tia1L in zebrafish TGTTT CAGGCTGTTG forward primer targeted 207 AACCGTAGAT to fh in zebrafish TTAGT TCCACATGTT reverse primer targeted 208 TTGAGTTTGA to fh in zebrafish GAGTC GGAGCAGCTG forward primer targeted 209 GTCAGAGGGG to EMX1 in U20S human cells CCATAGGGAA reverse primer targeted 210 GGGGGACACT to EMX1 in U20S human cells GG GGGCCGGGAA forward primer targeted 211 AGAGTTGCTG to FANCF in U20S human cells GCCCTACATC reverse primer targeted 212 TGCTCTCCCT to FANCF in U20S human cells CC CCAGCACAAC forward primer targeted 213 TTACTCGCAC to RUNX1 in U20S human cells TTGAC CATCACCAAC reverse primer targeted 214 CCACAGCCAA to RUNX1 in U20S human cells GG GATGAGGGCT forward primer targeted 215 CCAGATGGCA to VEGFA in U20S human cells C GAGGAGGGAG reverse primer targeted 216 CAGGAAAGTG to VEGFA in U20S human cells AGG
TABLE-US-00007 TABLE 2 Se- quence With Spacer SEQ extend- SEQ Prep length Se- ID ed ID Name Name (nt) quence NO: PAM NO: S. pyogenes gRNAs EGFP NXX gRNAs FYF1320 NGG 20 GGGC 217 GGGC 218 1-20 ACGG ACGG GCAG GCAG CTTG CTTG CCGG CCGG TGGT BPK1345 NGG 20 GTCG 219 GTCG 220 2-20 CCCT CCCT CGAA CGAA CTTC CTTC ACCT ACCT CGGC MSP792 NGG 20 GGTC 221 GGTC 222 3-20 GCCA GCCA CCAT CCAT GGTG GGTG AGCA AGCA AGGG MSP795 NGG 20 GGTC 223 GGTC 224 4-20 AGGG AGGG TGGT TGGT CACG CACG AGGG AGGG TGGG FYF1328 NGG 20 GGTG 225 GGTG 226 5-20 GTGC GTGC AGAT AGAT GAAC GAAC TTCA TTCA GGGT MSP160 NAG 20 GGGT 227 GGGT 228 1-20 GGTG GGTG CCCA CCCA TCCT TCCT GGTC GGTC GAGC MSP161 NAG 20 GACG 229 GACG 230 2-20 TAAA TAAA CGGC CGGC CACA CACA AGTT AGTT CAGC MSP162 NAG 20 GTGC 231 GTGC 232 3-20 AGAT AGAT GAAC GAAC TTCA TTCA GGGT GGGT CAGC MSP163 NAG 20 GGGT 233 GGGT 234 4-20 GGTC GGTC ACGA ACGA GGGT GGGT GGGC GGGC CAGG MSP164 NAA 20 GGTC 235 GGTC 236 1-20 GAGC GAGC TGGA TGGA CGGC CGGC GACG GACG TAAA MSP165 NAA 20 GTCG 237 GTCG 238 2-20 AGCT AGCT GGAC GGAC GGCG GGCG ACGT ACGT AAAC MSP168 NGA 20 GGGG 239 GGGG 240 1-20 TGGT TGGT GCCC GCCC ATCC ATCC TGGT TGGT CGAG MSP366 NGA 20 GCCA 241 GCCA 242 2-20 CCAT CCAT GGTG GGTG AGCA AGCA AGGG AGGG CGAG MSP171 NGA 20 GTCG 243 GTCG 244 3-20 CCGT CCGT CCAG CCAG CTCG CTCG ACCA ACCA GGAT NGXX gRNAs BPK1466 NGA 20 GCAT 245 GCAT 246 4-20 CGCC CGCC CTCG CTCG CCCT CCCT CGCC CGCC GGAC BPK1468 NGA 20 GTTC 247 GTTC 248 5-20 GAGG GAGG GCGA GCGA CACC CACC CTGG CTGG TGAA BPK1468 NGAA 20 GTTC 249. GTTC 250. 1-20 GAGG GAGG GCGA GCGA CACC CACC CTGG CTGG TGAA MSP807 NGAA 20 GTTC 251. GTTC 252. 2-20 ACCA ACCA GGGT GGGT GTCG GTCG CCCT CCCT CGAA BPK1469 NGAA 20 GCAG 253. GCAG 254. 3-20 AAGA AAGA ACGG ACGG CATC CATC AAGG AAGG TGAA MSP787 NGAA 17 GAAG 255. GAAG 256. 3-17 AACG AACG GCAT GCAT CAAG CAAG G GTGA A MSP170 NGAC 20 GCCC 257. GCCC 258. 1-20 ACCC ACCC TCGT TCGT GACC GACC ACCC ACCC TGAC MSP790 NGAC 20 GCCC 259. GCCC 260. 2-20 TTGC TTGC TCAC TCAC CATG CATG GTGG GTGG CGAC MSP171 NGAT 20 GTCG 261. GTCG 262. 1-20 CCGT CCGT CCAG CCAG CTCG CTCG ACCA ACCA GGAT BPK1979 NGAT 17 GCCG 263. GCCG 264. 1-17 TCCA TCCA GCTC GCTC GACC GACC A AGGA T MSP169 NGAT 20 GTGT 265. GTGT 266. 2-20 CCGG CCGG CGAG CGAG GGCG GGCG AGGG AGGG CGAT BPK1464 NGAT 20 GGGC 267. GGGC 268. 3-20 AGCT AGCT TGCC TGCC GGTG GGTG GTGC GTGC AGAT MSP788 NGAT 19 GGCA 269. GGCA 270. 3-19 GCTT GCTT GCCG GCCG GTGG GTGG TGC TGCA GAT MSP789 NGAT 18 GCAG 271. GCAG 272. 3-18 CTTG CTTG CCGG CCGG TGGT TGGT GC GCAG AT MSP168 NGAG 20 GGGG 273. GGGG 274. 1-20 TGGT TGGT GCCC GCCC ATCC ATCC TGGT TGGT CGAG MSP783 NGAG 19 GGGT 275. GGGT 276. 1-19 GGTG GGTG CCCA CCCA TCCT TCCT GGT GGTC GAG MSP784 NGAG 18 GGTG 277. GGTG 278. 1-18 GTGC GTGC CCAT CCAT CCTG CCTG GT GTCG AG MSP785 NGAG 17 GTGG 279. GTGG 280. 1-17 TGCC TGCC CATC CATC CTGG CTGG T TCGA G MSP366 NGAG 20 GCCA 281. GCCA 282. 2-20 CCAT CCAT GGTG GGTG AGCA AGCA AGGG AGGG CGAG MSP368 NGAG 20 GCCG 283. GCCG 284. 3-20 TAGG TAGG TCAG TCAG GGTG GGTG GTCA GTCA CGAG BPK1974 NGAG 17 GTAG 285. GTAG 286. 3-17 GTCA GTCA GGGT GGGT GGTC GGTC A ACGA G MSP376 NGAG 20 GCTG 287. GCTG 288. 4-20 CCCG CCCG ACAA ACAA CCAC CCAC TACC TACC TGAG BPK1978 NGAG 17 GCCC 289. GCCC 290. 4-17 GACA GACA ACCA ACCA CTAC CTAC C CTGA G MSP1028 NGCA 20 GCGA 291. GCGA 292. 1-20 GGGC GGGC GATG GATG CCAC CCAC CTAC CTAC GGCA MSP1030 NGCA 20 GTGG 293. GTGG 294. 2-20 TCGG TCGG GGTA GGTA GCGG GCGG CTGA CTGA AGCA MSP1032 NGCC 20 GGAG 295. GGAG 296. 1-20 CTGT CTGT TCAC TCAC CGGG CGGG GTGG GTGG TGCC MSP1033 NGCC 20 GAAC 297. GAAC 298. 2-20 TTGT TTGT GGCC GGCC GTTT GTTT ACGT ACGT CGCC MSP1036 NGCT 20 GGTG 299. GGTG 300. 1-20 AACA AACA GCTC GCTC CTCG CTCG CCCT CCCT TGCT MSP1037 NGCT 20 GGTG 301. GGTG 302. 2-20 GTGC GTGC CCAT CCAT CCTG CCTG GTCG GTCG AGCT MSP800 NGCG 20 GCCA 303. GGTG 304. 1-20 CAAG GTGC TTCA CCAT GCGT CCTG GTCC GTCG AGCT MSP801 NGCG 20 GCGT 305. GCCA 306. 2-20 GTCC CAAG GGCG TTCA AGGG GCGT CGAG GTCC GGCG MSP1360 NGCG 18 GTGT 307. GTGT 308. 2-18 CCGG CCGG CGAG CGAG GGCG GGCG AG AGGG CG MSP802 NGCG 20 GCCC 309. GCCC 310. 3-20 GAAG GAAG GCTA GCTA CGTC CGTC CAGG CAGG AGCG MSP803 NGCG 20 GTCG 311. GTCG 312. 4-20 TCCT TCCT TGAA TGAA GAAG GAAG ATGG ATGG TGCG MSP1366 NGCG 17 GTCC 313. GTCC 314. 4-17 TTGA TTGA AGAA AGAA GATG GATG G GTGC G MSP792 NGGG 20 GGTC 315. GGTC 316. 1-20 GCCA GCCA CCAT CCAT GGTG GGTG AGCA AGCA AGGG MSP794 NGGG 20 GGTG 317. GGTG 318. 2-20 GTCA GTCA CGAG CGAG GGTG GGTG GGCC GGCC AGGG MSP796 NGTG 20 GATC 319. GATC 320. 1-20 CACC CACC GGTC GGTC GCCA GCCA CCAT CCAT GGTG MSP799 NGTG 20 GTAA 321. GTAA 322. 2-20 ACGG ACGG CCAC CCAC AAGT AAGT TCAG TCAG CGTG Endogenous genes EMX1 FYF1548 NGG 20 GAGT 323. GAGT 324. 1-20 CCGA CCGA GCAG GCAG AAGA AAGA AGAA AGAA GGGC MSP809 NGG 20 GTCA 325. GTCA 326. 2-20 CCTC CCTC CAAT CAAT GACT GACT AGGG AGGG TGGG MSP811 NGA 20 GAGG 327. GAGG 328. 1-20 AGGA AGGA AGGG AGGG CCTG CCTG AGTC AGTC CGAG MSP812 NGA 20 GGTT 329. GGTT 330. 2-20 GCCC GCCC ACCC ACCC TAGT TAGT CATT CATT GGAG MSP813 NGA 20 GCTG 331. GCTG 332. 3-20 AGCT AGCT GAGA GAGA GCCT GCCT GATG GATG GGAA MSP814 NGA 20 GCCA 333. GCCA 334. 4-20 CGAA CGAA GCAG GCAG GCCA GCCA ATGG ATGG GGAG FAAWF DR348 NGG 20 GGAA 335. GGAA 336. 1-20 TCCC TCCC TTCT TTCT GCAG GCAG CACC CACC TGGA MSP815 NGG 20 GCTG 337. GCTG 338. 2-20 CAGA CAGA AGGG AGGG ATTC ATTC CATG CATG AGGT MSP818 NGA 20 GAAT 339. GAAT 340. 1-20 CCCT CCCT TCTG TCTG CAGC CAGC ACCT ACCT GGAT MSP819 NGA 20 GTGC 341. GTGC 342. 2-20 TGCA TGCA GAAG GAAG GGAT GGAT TCCA TCCA TGAG MSP820 NGA 20 GCGG 343. GCGG 344. 3-20 CGGC CGGC TGCA TGCA CAAC CAAC CAGT CAGT GGAG MSP885 NGA 20 GGTT 345. GGTT 346. 4-20 GTGC GTGC AGCC AGCC GCCG GCCG CTCC CTCC AGAG MSP1060 NGCG 20 GAGG 347. GAGG 348. 1-20 CAAG CAAG AGGG AGGG CGGC CGGC TTTG TTTG GGCG MSP1061 NGCG 19 GGGG 349. GGGG 350. 2-19 TCCA TCCA GTTC GTTC CGGG CGGG ATT ATTA GCG MSP1062 NGCG 20 GCAG 351. GCAG 352. 3-20 AAGG AAGG GATT GATT CCAT CCAT GAGG GAGG TGCG MSP1063 NGCG 19 GAAG 353. GAAG 354. 4-19 GGAT GGAT TCCA TCCA TGAG TGAG GTG GTGC GCG RIALK1 MSP823 NGG 20 GCTG 355. GCTG 356. 1-20 AAAC AAAC AGTG AGTG ACCT ACCT GTCT GTCT TGGT MSP824 NGG 20 GATG 357. GATG 358. 2-20 TAGG TAGG GCTA GCTA GAGG GAGG GGTG GGTG AGGC MSP826 NGA 20 GGTG 359. GGTG 360. 1-20 CATT CATT TTCA TTCA GGAG GGAG GAAG GAAG CGAT MSP827 NGA 20 GTTT 361. GTTT 362. 2-20 TCGC TCGC TCCG TCCG AAGG AAGG TAAA TAAA AGAA MSP828 NGA 20 GAGA 363. GAGA 364. 3-20 TGTA TGTA GGGC GGGC TAGA TAGA GGGG GGGG TGAG MSP829 NGA 20 GCAG 365. GCAG 366. 4-20 AGGG AGGG GAGA GAGA AGAA AGAA AGAG AGAG AGAT MSP1068 NGCG 19 GGGT 367. GGGT 368. 1-19 GCAT GCAT TTTC TTTC AGGA AGGA GGA GGAA GCG VEGFA VC228 NGG 20 GGTG 369. GGTG 370. 1-20 AGTG AGTG AGTG AGTG TGTG TGTG CGTG CGTG TGGG MSP830 NGG 20 GTTG 371. GTTG 372. 2-20 GAGC GAGC GGGG GGGG AGAA AGAA GGCC GGCC AGGG BPK1846 NGA 20 GCGA 373. GCGA 374. 1-20 GCAG GCAG CGTC CGTC TTCG TTCG AGAG AGAG TGAG BPK1848 NGA 20 GACG 375. GACG 376. 2-20 TGTG TGTG TGTC TGTC TGTG TGTG TGGG TGGG TGAG BPK1850 NGA 20 GGTT 377. GGTT 378. 3-20 GAGG GAGG GCGT GCGT TGGA TGGA GCGG GCGG GGAG MSP831 NGA 20 GCTT 379. GCTT 380. 4-20 TGGA TGGA AAGG AAGG GGGT GGGT GGGG GGGG GGAG MSP1074 NGCG 20 GCAG 381. GCAG 382. 1-20 ACGG ACGG CAGT CAGT CACT CACT AGGG AGGG GGCG MSP1075 NGCG 20 GCTG 383. GCTG 384. 2-20 GGTG GGTG AATG AATG GAGC GAGC GAGC GAGC AGCG MSP1076 NGCG 19 GTGT 385. GTGT 386. 3-19 GGGT GGGT GAGT GAGT GAGT GAGT GTG GTGT GCG MSP1077 NGCG 19 GTGT 387. GTGT 388. 4-19 GCGT GCGT GTGG GTGG GGTT GGTT GAG GAGG GCG S. aureus gRNAs M5P1395 Site 20 GTCG 389. GTCG 390. 1-20 TGCT TGCT GCTT GCTT CATG CATG TGGT TGGT CGGG GT M5P1405 Site 23 GAAG 391. GAAG 392. 1-23 TCGT TCGT GCTG GCTG CTTC CTTC ATGT ATGT GGT GGTC GGGG T M5P1396 Site 21 GCCG 393. GCCG 394. 2-21 GTGG GTGG TGCA TGCA GATG GATG AACT AACT T TCAG GGT M5P1397 Site 21 GCCG 395. GCCG 396. 3-21 TAGG TAGG TCAG TCAG GGTG GGTG GTCA GTCA C CGAG GGT M5P1400 Site 21 GCAA 397. GCAA 398. 4-21 CATC CATC CTGG CTGG GGCA GGCA CAAG CAAG C CTGG AGT M5P1404 Site 22 GGCA 399. GGCA 400. 4-22 ACAT ACAT CCTG CCTG GGGC GGGC ACAA ACAA GC GCTG GAGT M5P1398 Site 21 GAAG 401. GAAG 402. 5-21 CACT CACT GCAC GCAC GCCG GCCG TAGG TAGG T TCAG GGT M5P1408 Site 24 GCTG 403. GCTG 404. 5-24 AAGC AAGC ACTG ACTG CACG CACG CCGT CCGT AGGT AGGT CAGG GT M5P1428 Site 21 GCCC 405. GCCC 406. 6-21 TCGA TCGA ACTT ACTT CACC CACC TCGG TCGG C CGCG GGT M5P1409 Site 24 GTCG 407. GTCG 408. 6-24 CCCT CCCT CGAA CGAA CTTC CTTC ACCT ACCT CGGC CGGC GCGG GT M5P1403 Site 22 GCAA 409. GCAA 410. 7-22 GGGC GGGC GAGG GAGG AGCT AGCT GTTC GTTC AC ACCG GGGT M5P1406 Site 24 GAGC 411. GAGC 412. 7-24 AAGG AAGG GCGA GCGA GGAG GGAG CTGT CTGT TCAC TCAC CGGG GT MSP1410 Site 24 GCCC 413. GCCC 414. 8-24 TTCA TTCA GCTC GCTC GATG GATG CGGT CGGT TCAC TCAC CAGG GT S. thermophiLus1 gRNAs EGFP M5P1412 Site 20 GTCT 415. GTCT 416. 1-20 ATAT ATAT CATG CATG GCCG GCCG ACAA ACAA GCAG AA M5P1414 Site 21 GCAG 417. GCAG 418. 2-21 CTCG CTCG CCGA CCGA CCAC CCAC TACC TACC A AGCA GAA M5P1417 Site 23 GTGC 419. GTGC 420. 2-23 AGCT AGCT CGCC CGCC GACC GACC ACTA ACTA CCA CCAG CAGA A M5P1413 Site 21 GCCT 421. GCCT 422. 3-21 TCGG TCGG GCAT GCAT GGCG GGCG GACT GACT T TGAA GAA M5P1418 Site 24 GTAG 423. GTAG 424. 3-24 CCTT CCTT CGGG CGGG CATG CATG GCGG GCGG ACTT ACTT GAAG AA M5P1416 Site 23 GTCT 425. GTCT 426. 4-23 ATAT ATAT CATG CATG GCCG GCCG ACAA ACAA GCA GCAG AAGA A M5P1415 Site 23 GTCT 427. GTCT 428. 5-23 TGTA TGTA GTTG GTTG CCGT CCGT CGTC CGTC CTT CTTG AAGA A M5P1419 Site 24 GGTC 429. GGTC 430. 5-24 TTGT TTGT AGTT AGTT GCCG GCCG TCGT TCGT CCTT CCTT GAAG AA
[0105] Bacterial Cas9/sgRNA expression plasmids were constructed with two T7 promoters to separately express Cas9 and the sgRNA. These plasmids encode human codon optimized versions of Cas9 for S. pyogenes (BPK764, SpCas9 sequence subcloned from JDS246′.sup.7), S. thermophilus Cas9 from CRISPR locus 1 (MSP1673, St1Cas9 sequence modified from previous published description.sup.20), and S. aureus (BPK2101, SaCas9 sequence codon optimized from Uniprot J7RUA5). Previously described sgRNA sequences were utilized for SpCas9.sup.34, 35 and St1Cas9.sup.20, while the SaCas9 sgRNA sequence was determined by searching the European Nucleotide Archive sequence HE980450 for crRNA repeats using CRISPRfinder and identifying the tracrRNA using a bioinformatic approach similar to one previously described.sup.36. Annealed oligos to complete the spacer complementarity region of the sgRNA were ligated into BsaI cut BPK764 and BPK2101, or BspMI cut MSP1673 (append 5′-ATAG to the spacer to generate the top oligo and append 5′-AAAC to the reverse compliment of the spacer sequence to generate the bottom oligo).
[0106] Residues 1097-1368 of SpCas9 were randomly mutagenized using Mutazyme II (Agilent Technologies) at a rate of ˜5.2 substitutions/kilobase to generate mutagenized PAM-interacting (PI) domain libraries. The theoretical complexity of each PI domain library was estimated to be greater than 10.sup.7 clones based on the number of transformants obtained. Positive and negative selection plasmids were generated by ligating annealed target site oligos into XbaI/SphI or EcoRI/SphI cut p11-lacy-wtx1.sup.17, respectively.
[0107] Two randomized PAM libraries (each with a different protospacer sequence) were constructed using Klenow(-exo) to fill-in the bottom strand of oligos that contained six randomized nucleotides directly adjacent to the 3′ end of the protospacer (see Table 1). The double-stranded product was cut with EcoRI to leave EcoRI/SphI ends for ligation into cut p11-lacY-wtx1. The theoretical complexity of each randomized PAM library was estimated to be greater than 10.sup.6 based on the number of transformants obtained.
[0108] SpCas9 and SpCas9 variants were expressed in human cells from vectors derived from JDS246.sup.16. For St1Cas9 and SaCas9, the Cas9 ORFs from MSP1673 and BPK2101 were subcloned into a CAG promoter vector to generate MSP1594 and BPK2139, respectively. Plasmids for U6 expression of sgRNAs (into which desired spacer oligos can be cloned) were generated using the sgRNA sequences described above for the SpCas9 sgRNA (BPK1520), the St1Cas9 sgRNA (BPK2301), and the SaCas9 gRNA (VVT1). Annealed oligos to complete the spacer complementarity region of the sgRNA were ligated into the BsmBI overhangs of these vectors (append 5′-CACC to the spacer to generate the top oligo and append 5′-AAAC to the reverse complement of the spacer sequence to generate the bottom oligo).
[0109] Bacterial-Based Positive Selection Assay for Evolving SpCas9 Variants
[0110] Competent E. coli BW25141(λDE3).sup.23 containing a positive selection plasmid (with embedded target site) were transformed with Cas9/sgRNA-encoding plasmids. Following a 60 minute recovery in SOB media, transformations were plated on LB plates containing either chloramphenicol (non-selective) or chloramphenicol+10 mM arabinose (selective). Cleavage of the positive selection plasmid was estimated by calculating the survival frequency: colonies on selective plates/colonies on non-selective plates (see also
[0111] To select for SpCas9 variants that can cleave novel PAMs, PI-domain mutagenized Cas9/sgRNA plasmid libraries were electroporated into E. coli BW25141(λDE3) cells containing a positive selection plasmid that encodes a target site+PAM of interest. Generally 50,000 clones were screened to obtain between 50-100 survivors. The PI domains of surviving clones were subcloned into fresh backbone plasmid and re-tested in the positive selection. Clones that had greater than 10% survival in this secondary screen for activity were sequenced. Mutations observed in the sequenced clones were chosen for further assessment based on their frequency in surviving clones, type of substitution, proximity to the PAM bases in the SpCas9/sgRNA crystal structure (PDB:4UN3).sup.14, and (in some cases) activities in a human cell-based EGFP disruption assay.
[0112] Bacterial-Based Site-Depletion Assay for Profiling Cas9 PAM Specificities
[0113] Competent E. coli BW25141(λDE3) containing a Cas9/sgRNA expression plasmid were transformed with negative selection plasmids harboring cleavable or non-cleavable target sites. Following a 60 minute recovery in SOB media, transformations were plated on LB plates containing chloramphenicol+carbenicillin. Cleavage of the negative selection plasmid was estimated by calculating the colony forming units per μg of DNA transformed (see also
[0114] The negative selection was adapted to determine PAM specificity profiles of Cas9 nucleases by electroporating each randomized PAM library into E. coli BW25141(λDE3) cells that already harbored an appropriate Cas9/sgRNA plasmid. Between 80,000-100,000 colonies were plated at a low density spread on LB+chloramphenicol+carbenicillin plates. Surviving colonies containing negative selection plasmids refractory to cleavage by Cas9 were harvested and plasmid DNA isolated by maxi-prep (Qiagen). The resulting plasmid library was amplified by PCR using Phusion Hot-start Flex DNA Polymerase (New England BioLabs) followed by an Agencourt Ampure XP cleanup step (Beckman Coulter Genomics). Dual-indexed Tru-Seq Illumina deep-sequencing libraries were prepared using the KAPA HTP library preparation kit (KAPA BioSystems) from ˜500 ng of clean PCR product for each site-depletion experiment. The Dana-Farber Cancer Institute Molecular Biology Core performed 150-bp paired-end sequencing on an Illumina MiSeq Sequencer.
[0115] The raw FASTQ files outputted for each MiSeq run were analyzed with a Python program to determine relative PAM depletion. The program (see Methods) operates as follows: First, a file dialog is presented to the user from which all FASTQ read files for a given experiment can be selected. For these files, each FASTQ entry is scanned for the fixed spacer region on both strands. If the spacer region is found, then the six variable nucleotides flanking the spacer region are captured and added to a counter. From this set of detected variable regions, the count and frequency of each window of length 2-6 nt at each possible position was tabulated. The site-depletion data for both randomized PAM libraries was analyzed by calculating the post-selection PAM depletion value (PPDV): the post-selection frequency of a PAM in the selected population divided by the pre-selection library frequency of that PAM. PPDV analyses were performed for each experiment across all possible 2-6 length windows in the 6 bp randomized region. The windows we used to visualize PAM preferences were: the 3 nt window representing the 2.sup.nd, 3.sup.rd, and 4.sup.th PAM positions for wild-type and variant SpCas9 experiments, and the 4 nt window representing the 3.sup.rd, 4.sup.th, 5.sup.th, 6.sup.th PAM positions for St1Cas9 and SaCas9.
[0116] Two significance thresholds for the PPDVs were determined based on: 1) a statistical significance threshold based on the distribution of dCas9 versus pre-selection library log read count ratios (see
[0117] Human Cell Culture and Transfection
[0118] U2OS.EGFP cells harboring a single integrated copy of a constitutively expressed EGFP-PEST reporter gene.sup.15 were cultured in Advanced DMEM media (Life Technologies) supplemented with 10% FBS, 2 mM GlutaMax (Life Technologies), penicillin/streptomycin, and 400 μg/ml of G418 at 37° C. with 5% CO.sub.2. Cells were co-transfected with 750 ng of Cas9 plasmid and 250 ng of sgRNA plasmid (unless otherwise noted) using the DN-100 program of a Lonza 4D-nucleofector according to the manufacturer's protocols. Cas9 plasmid transfected together with an empty U6 promoter plasmid was used as a negative control for all human cell experiments. Target sites for endogenous gene experiments were selected within 200 bp of NGG sites cleavable by wild-type SpCas9 (see
[0119] Zebrafish Care and Injections
[0120] Zebrafish care and use was approved by the Massachusetts General Hospital Subcommittee on Research Animal Care. Cas9 mRNA was transcribed with PmeI-digested JDS246 (wild-type SpCas9) or MSP469 (VQR variant) using the mMESSAGE mMACHINE T7 ULTRA Kit (Life Technologies) as previously described.sup.21. All sgRNAs in this study were prepared according to the cloning-independent sgRNA generation method.sup.24. sgRNAs were transcribed by the MEGAscript SP6 Transcription Kit (Life Technologies), purified by RNA Clean & Concentrator-5 (Zymo Research), and eluted with RNase-free water.
[0121] sgRNA- and Cas9-encoding mRNA were co-injected into one-cell stage zebrafish embryos. Each embryo was injected with ˜2-4.5 nL of solution containing 30 ng/4 gRNA and 300 ng/μL Cas9 mRNA. The next day, injected embryos were inspected under a stereoscope for normal morphological development, and genomic DNA was extracted from 5 to 9 embryos.
[0122] Human Cell EGFP Disruption Assay
[0123] EGFP disruption experiments were performed as previously described.sup.16. Transfected cells were analyzed for EGFP expression ˜52 hours post-transfection using a Fortessa flow cytometer (BD Biosciences). Background EGFP loss was gated at approximately 2.5% for all experiments (graphically represented as a dashed line).
[0124] T7E1 Assay, Targeted Deep-Sequencing, and GUIDE-Seq to Quantify Nuclease-Induced Mutation Rates
[0125] T7E1 assays were performed as previously described for human cells.sup.15 and zebrafish.sup.21. For U2OS.EGFP human cells, genomic DNA was extracted from transfected cells ˜72 hours post-transfection using the Agencourt DNAdvance Genomic DNA Isolation Kit (Beckman Coulter Genomics). Target loci from zebrafish or human cell genomic DNA were amplified using the primers listed in Table 1. Roughly 200 ng of purified PCR product was denatured, annealed, and digested with T7E1 (New England BioLabs). Mutagenesis frequencies were quantified using a Qiaxcel capillary electrophoresis instrument (QIagen), as previously described for human cells.sup.15 and zebrafish.sup.21.
[0126] For targeted deep-sequencing, previously characterized on- and off-target sites (Tsai et al., Nat Biotechnol 33, 187-197 (2015); Fu et al., Nat Biotechnol 31, 822-826 (2013; Fu et al., Nat Biotechnol 32, 279-284 (2014)) were amplified using Phusion Hot-start Flex with the primers listed in Table 1. Genomic loci were amplified for a control condition (empty sgRNA), wild-type, and D1135E SpCas9. An Agencourt Ampure XP cleanup step (Beckman Coulter Genomics) was performed prior to pooling ˜500 ng of DNA from each condition for library preparation. Dual-indexed Tru-Seq Illumina deep-sequencing libraries were generated using the KAPA HTP library preparation kit (KAPA BioSystems). The Dana-Farber Cancer Institute Molecular Biology Core performed 150-bp paired-end sequencing on an Illumina MiSeq Sequencer. Mutation analysis of targeted deep-sequencing data was performed as previously described (Tsai et al., Nat Biotechnol 32, 569-576 (2014)). Briefly, Illumina MiSeq paired end read data was mapped to human genome reference GRChr37 using bwa (Li et al., Bioinformatics 25, 1754-1760 (2009)). High-quality reads (quality score>=30) were assessed for indel mutations that overlapped the target or off-target sites. 1-bp indel mutations were excluded from the analysis unless they occurred within 1-bp of the predicted breakpoint. Changes in activity at on- and off-target sites comparing D1135E versus wild-type SpCas9 were calculated by comparing the indel frequencies from both conditions (for rates above background control amplicon indel levels).
[0127] GUIDE-seq experiments were performed as previously described (Tsai et al., Nat Biotechnol 33, 187-197 (2015)). Briefly, phosphorylated, phosphorothioate-modified double-stranded oligodeoxynucleotides (dsODNs) were transfected into U2OS cells with Cas9 nuclease along with Cas9 and sgRNA expression plasmids, as described above. dsODN-specific amplification, high-throughput sequencing, and mapping were performed to identify genomic intervals containing DSB activity. For wild-type versus D1135E experiments, off-target read counts were normalized to the on-target read counts to correct for sequencing depth differences between samples. The normalized ratios for wild-type and D1135E SpCas9 were then compared to calculate the fold-change in activity at off-target sites. To determine whether wild-type and D1135E samples for GUIDE-seq had similar oligo tag integration rates at the intended target site, restriction fragment length polymorphism (RFLP) assays were performed by amplifying the intended target loci with Phusion Hot-Start Flex from 100 ng of genomic DNA (isolated as described above) using primers listed in Table 1. Roughly 150 ng of PCR product was digested with 20 U of NdeI (New England BioLabs) for 3 hours at 37° C. prior to clean-up using the Agencourt Ampure XP kit. RFLP results were quantified using a Qiaxcel capillary electrophoresis instrument (QIagen) to approximate oligo tag integration rates. T7E1 assays were performed for a similar purpose, as described above.
[0128] Software—for Analyzing PAM Depletion MiSeq Data
TABLE-US-00008 Run in the command prompt (in the directory containing the file) using the command “python PAM_depletion.py” ------------------------------------------------------------------------------------------------------------------------ ----- import numpy as np import pandas as pd import glob import fnmatch import os from collections import Counter from Bio.Seq import Seq from Bio import SeqIO import itertools import re from pandas import ExcelWriter import Tkinter, tkFileDialog __author__ = “Ved V. Topkar” __version__ = “1.0” ″″″ IUPAC_notation_regex describes a mapping between certain base characters and the relavent regex string (Useful for parsing out ambiguous base strings) ″″″ IUPAC_notation_regex = { ‘N’ : ‘[ATCG]’, ‘Y’ : ‘[CT]’, ‘R’ : ‘[AG]’, ‘W’ : ‘[AT]’, ‘S’ : ‘[CG]’, ‘A’ : ‘A’, ‘T’ : ‘T’, ‘C’ : ‘C’, ‘G’ : ‘G’ } def ambiguous_PAMs(length): ″″″ Given an inputted length, return a list of strings describing all possible PAM sequences NOTE: Returned strings include ambiguous base characters ″″″ permutations = itertools.product([‘N’, ‘A’, ‘T’, ‘C’, ‘G’], repeat=length) PAMs = [ ] for item in permutations: PAMs.append(‘’.join(item)) return PAMs def unambiguous_PAMs(length): permutations = itertools.product([‘A’, ‘T’, ‘C’, ‘G’], repeat=length) PAMs = [ ] for item in permutations: PAMs.append(‘’.join(item)) return PAMs def regex_from_seq(seq): ″″″ Given a sequence with ambiguous base characters, returns a regex that matches for the explicit (unambiguous) base characters ″″″ regex = ‘’ for c in seq: regex += IUPAC_notation_regex[c] return regex def regex_match_count(regex, list_of_counts): ″″″ Given a list of strings and a regex, return the number of strings in the list that the regex matches. ″″″ c = 0 for item in list_of_counts: if re.search(regex, item): c += 1 return c def tabulate_substring_frequencies(pams, indices): ″″″ Given a list of raw pams and substring indices, tabulates the frequency of tabulate_substring_frequencies RETURNS a Pandas Series base_PAMs = unambiguous_PAMs(indices[1] - indices[0]) tmp_PAMs = Counter([pam[indices[0]:indices[1]] for pam in pams]) c = Counter( ) for base PAM in base_PAMs: c[base_PAM] = tmp_PAMs[base_PAM] PAMs = pd.Series(c) PAMs.sort(ascending=False) excel_PAMs = pd.DataFrame( ) excel_PAMs[‘PAM’] = PAMs.index excel_PAMs[‘Count’] = PAMs.values excel_PAMs[‘Frequency’] = PAMs.values.astype(float)/sum(PAMs.values) return excel_PAMs def generate_raw_PAM_counts(filepaths, targetsites, PAM_length): ″″″ Here, we get all of our relavent PAM sequences from the inputted files by searching for the targetsites and looking at the flanking region ″″″ reverse_target_sequences = {targetsite: str(Seq(targetsites[targetsite]).reverse_complement( )) for targetsite in targetsites} all_pams = {targetsite: [ ] for targetsite in targetsites} # Iterate through each file and collect the PAMs of each sequence # Checks both forward and reverse reads for filepath in filepaths: print ‘Scanning file: ’ + os.path.basename(filepath) pams = [ ] records = SeqIO.parse(filepath, filepath.split(‘.’)[−1]) for record in records: seq = str(record.seq) for targetsite in targetsites: target_seq = targetsites[targetsite] target = seq.find(targetsites[targetsite]) if target > −1: index = target + len(target_seq) all_pams[targetsite].append(seq[index:index + PAM_length]) else: target = seq.find(reverse_target_sequences[targetsite]) if target > −1: index = target all_pams[targetsite].append(str(Seq(seq[index - PAM_length:index]).reverse_complement( ))) return all_pams def analyze_PAM_depletion_data(filepaths, targetsites, PAM_length=3): ″″″ Given a directory that contains a given file extension and a target sequence, do the entire PAM depletion analysis ″″″ # Make sure that dirnames and target sequences are inputted if filepaths is None: raise Exception(‘Please specify a directory name’) if targetsites is None: raise Exception(‘Please specify a target sequence’) if PAM_length is None or PAM_length < 3: raise Exception(‘Please enter a valid PAM length’) all_pams = generate_raw_PAM_counts(filepaths, targetsites, PAM_length) letters = [‘A’, ‘T’, ‘C’, ‘G’] all_counters = {targetsite: Counter(all_pams[targetsite]) for targetsite in targetsites} for targetsite in targetsites: pams = all_pams[targetsite] base_counters = [Counter( ) for x in range(PAM_length)] for pam in pams: for i, c in enumerate(pam): base_counters[i][c] += 1 raw_PAM_counts = pd.Series(all_counters[targetsite]) raw_PAM_counts.sort(ascending=False) raw_counts_df = pd.DataFrame( ) raw_counts_df[‘PAM’] = raw_PAM_counts.index raw_counts_df[‘Count’] = raw_PAM_counts.values single_base_counts = pd.DataFrame(base_counters) single_base_frequencies = single_base_counts.divide(single_base_counts.sum(axis=1).ix[0]) # Prepare substring counts and frequencies writer = ExcelWriter(‘out/’ + os.path.basename(filepath).split(‘.’)[0] + ‘_’ + targetsite + ‘.xlsx’) single_base_counts.to_excel(writer, ‘Single Base Counts’) single_base_frequencies.to_excel(writer, ‘Single Base Frequencies’) raw_counts_df.to_excel(writer, ‘Raw PAM Counts’) # Designate which windows should be analyzed and name them settings = { ‘XXXNNN’: [0,3], ‘NXXXNN’: [1,4], ‘NNXXXN’: [2,5], ‘NNNXXX’: [3,6], ‘XXXXNN’: [0,4], ‘NXXXXN’: [1,5], ‘NNXXXX’: [2,6], ‘XXNNNN’: [0,2], ‘NXXNNN’: [1,3], ‘NNXXNN’: [2,4], ‘NNNXXN’: [3,5], ‘NNNNXX’: [4,6], ‘XXXXXN’: [0,5], ‘NXXXXX’: [1,6], ‘XXXXXX’: [0,6], } for item in settings: df = tabulate_substring_frequencies(pams, settings[item]) df.to_excel(writer, item) writer. save( ) print ‘Saved excel output for ’ + targetsite if __name__ == “__main__”: # Display the filepicker, accepting only FASTQ files root = Tkinter.Tk( ) root.withdraw( ) file_paths = tkFileDialog.askopenfilenames(parent=root, title=‘Choose FASTQ files’, filetypes=[(“FastQ files”, “*.fastq”)]) # Describe the targetsite(s) to search for targetsites = {‘EUP site 1’: ‘GTCGCCCTCGAACTTCACCT’} # Run the analysis on the inputted filepaths and targetsite for a given variable nucleotide region length analyze_PAM_depletion_data(file_paths, targetsites, PAM_length=6) ------------------------------------------------------------------------------------------------------
Example 1
[0129] One potential solution to address targeting range limitations would be to engineer Cas9 variants with novel PAM specificities. A previous attempt to alter PAM specificity utilized structural information about base-specific SpCas9-PAM interactions to mutate arginine residues (R1333 and R1335) that contact guanine nucleotides at the second and third PAM positions, respectively (Anders et al., Nature 513, 569-573 (2014)). Substitution of both arginines with glutamines (whose side-chains might be expected to interact with adenines) failed to yield SpCas9 variants that could cleave targets harboring the expected NAA PAM in vitro (Anders et al., Nature 513, 569-573 (2014)). Using a human cell-based U2OS EGFP reporter gene disruption assay in which nuclease-induced indels lead to loss of fluorescence (Reyon et al., Nat Biotechnol 30, 460-465 (2012); Fu et al., Nat Biotechnol 31, 822-826 (2013)), we confirmed that an R1333Q/R1335Q SpCas9 variant failed to efficiently cleave target sites with NAA PAMs (
[0130] To identify additional positions that might be critical for modifying PAM specificity, we adapted a bacterial selection system previously used to study properties of homing endonucleases (hereafter referred to as the positive selection) (Chen & Zhao, Nucleic Acids Res 33, e154 (2005); Doyon et al., J Am Chem Soc 128, 2477-2484 (2006)). In our adaptation of this system, Cas9-mediated cleavage of a positive selection plasmid encoding an inducible toxic gene enables cell survival, due to subsequent degradation and loss of the linearized plasmid (
[0131] To assess the global PAM specificity profiles of our novel SpCas9 variants, we used a bacterial-based negative selection system (
[0132] Using the site-depletion assay, we obtained PAM specificity profiles for the VQR and EQR SpCas9 variants using the two randomized PAM libraries. The VQR variant strongly depleted sites bearing NGAN and NGCG PAMs, and more weakly NGGG, NGTG, and NAAG PAMs (
[0133] We next sought to extend the generalizability of our engineering strategy by attempting to identify SpCas9 variants capable of recognizing an NGC PAM. We first designed Cas9 mutants bearing amino acid substitutions of R1335 that might be expected to interact with a cytosine (D, E, S, or T) and found no activity on an NGC PAM site using the positive selection system. We then randomly mutagenized the PAM-interacting domain of each of these singly substituted SpCas9 variants but still failed to obtain surviving colonies in positive selections. Because the T1337R mutation had increased the activities of our VQR and EQR SpCas9 variants (
[0134] To demonstrate directly that our VQR and VRER SpCas9 variants can enable targeting of sites not currently modifiable by wild-type SpCas9, we tested their activities on endogenous genes in zebrafish embryos and human cells. In single cell zebrafish embryos, we found that the VQR variant could efficiently modify endogenous gene sites bearing NGAG PAMs with mean mutagenesis frequencies of 20 to 43% (
[0135] To determine the genomewide specificity of our VQR and VRER SpCas9 nucleases, we used the recently described GUIDE-seq (Genome-wide Unbiased Identification of Double-stranded breaks Enabled by sequencing) method.sup.10 to profile off-target cleavage events of these SpCas9 variants in human cells. We profiled the genome-wide activities of the VQR and VRER SpCas9 variants using a total of 13 different sgRNAs (eight for VQR and five for VRER from
[0136] Previous studies have shown that imperfect PAM recognition by SpCas9 can lead to recognition of unwanted sites that contain non-canonical NAG, NGA, and other PAMs in human cells (Hsu et al., Nat Biotechnol 31, 827-832 (2013); Tsai et al., Nat Biotechnol 33, 187-197 (2015); Jiang et al., Nat Biotechnol 31, 233-239 (2013); Mali et al., Nat Biotechnol 31, 833-838 (2013); Zhang et al., Sci Rep 4, 5405 (2014)). Therefore, we were interested in exploring if mutations at or near residues that mediate PAM-interaction might improve SpCas9 PAM specificity. While engineering the VQR variant we had noticed that a D1135E SpCas9 mutant appeared to better discriminate between a canonical NGG PAM and a non-canonical NGA PAM compared to wild-type SpCas9 (
[0137] We next tested whether the improved PAM specificity of D1135E SpCas9 also could be observed in human cells. In direct comparisons of wild-type and D1135E SpCas9 on eight target sites with non-canonical NAG or NGA PAMs, we observed that these sites were consistently less efficiently cleaved by D1135E than by wild-type SpCas9 in the EGFP disruption assay (
[0138] To more directly assess whether the introduction of D1135E could reduce off-target cleavage effects of SpCas9, we used deep-sequencing to compare mutation rates induced by wild-type and D1135E SpCas9 on 25 previously known off-target sites of three different sgRNAs (Hsu et al., Nat Biotechnol 31, 827-832 (2013); Tsai et al., Nat Biotechnol 33, 187-197 (2015); Fu et al., Nat Biotechnol 31, 822-826 (2013)). These 25 sites included off-target sites with various mismatches in the spacer sequence and both canonical NGG and non-canonical PAMs (
[0139] Although all of the experiments described above were performed with SpCas9, there are many Cas9 orthologues from other bacteria that could make attractive candidates for characterizing and engineering Cas9s with novel PAM specificities (Fonfara et al., Nucleic Acids Res 42, 2577-2590 (2014); Ran et al., Nature 520, 186-191 (2015)). To explore the feasibility of doing this, we determined whether two smaller-size orthologues, Streptococcus thermophilus Cas9 from the CRISPR1 locus (St1Cas9) (Deveau et al., J Bacteriol 190, 1390-1400 (2008); Horvath et al., J Bacteriol 190, 1401-1412 (2008)) and Staphyloccocus aureus (SaCas9) (Hsu et al., Cell 157, 1262-1278 (2014); Ran et al., Nature 520, 186-191 (2015)), might also function in our bacterial selection assays. While the PAM of St1Cas9 has previously been characterized as NNAGAA (SEQ ID NO:3) (Esvelt et al., Nat Methods 10, 1116-1121 (2013); Fonfara et al., Nucleic Acids Res 42, 2577-2590 (2014); Deveau et al., J Bacteriol 190, 1390-1400 (2008); Horvath et al., J Bacteriol 190, 1401-1412 (2008)), our attempts to bioinformatically derive the SaCas9 PAM using a previously described approach (Fonfara et al., Nucleic Acids Res 42, 2577-2590 (2014)) failed to yield a consensus sequence (data not shown). Therefore, we used our site-depletion assay to determine the PAM for SaCas9 and, as a positive control, for St1Cas9. These experiments were performed using the two different protospacers and sgRNAs with two different complementarity lengths for each protospacer, resulting in four selections for each Cas9. For St1Cas9, we identified two novel PAMs in addition to the six PAMs that had been previously described (Esvelt et al., Nat Methods 10, 1116-1121 (2013); Fonfara et al., Nucleic Acids Res 42, 2577-2590 (2014); Horvath et al., J Bacteriol 190, 1401-1412 (2008)) (
[0140] Because not all Cas9 orthologues function efficiently outside of their native context (Esvelt et al., Nat Methods 10, 1116-1121 (2013)), we tested whether St1Cas9 and SaCas9 can robustly cleave target sites in human cells. St1Cas9 has been previously shown to function as a nuclease in human cells but on only a few sites (Esvelt et al., 2013; Cong et al., Science 339, 819-823 (2013)). We assessed St1Cas9 activity on sites harboring NNAGAA (SEQ ID NO:3) PAMs using sgRNAs with variable-length complementarity regions and found high activity at three of the five target sites (
TABLE-US-00009 TABLE 3 SEQ ID NO: Wild-type 5pCas9 sequence from K1097-D1368 of SEQ ID NO: 1 KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI aa MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS 1097- HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF 1368 TLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD of SEQ ID NO: 1 Selected mutant clones for VQR and EQR variant, sequence from K1097-D1368 KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 431. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 432. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKRYISTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKFKKLKSVKELLGITI 433. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 434. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 435. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQRGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 436. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQ1SEFSKRVILADANLDKVLSAYNKHRDNPIREQAENIIHL FTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 437. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTPIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 438. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 439. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTKLGAPAAIKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 440. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLEATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 441. MERSSFEKNPMDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLA SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL FTLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVEAKVEKGKSKKLKSVKELLGITI 442. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFESPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 443. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPFKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPSAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESIFPKRNSDKLIARKKDWDPKKYGGLYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 444. MERSSFEKNPIDFLEAKGYKEIKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASH YEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFT LTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 445. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDKEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 446. MERSSFEKNPIDFLEAKGYKEVKEDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPICEQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVENRKSKKLKSVKELLGITI 447. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDATIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVANVEKGKSKKLKSVKELLGITI 448. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILTDANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 449. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSLEDNEQKQLFVEQHRHYLDEIIEQISEFSKRVILADANLDKVLSAYNKYRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKVWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 450. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSILVVAKVEKGKSKKLKSVKELLGITI 451. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGRFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 452. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 453. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIHEQAENIIHL FTLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 454. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 455. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRNKPIREQAENIIHL FTLTNLGAPAAFKYFDTMIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLVGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSRKLKSVKELLGITI 456. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 457. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSLLGGD KTEVQTGGFSKESILPNRNSDKLIARKKDWDPKKYGGFDSPIVAYSVLVVAKVEKGKSKKLKSVKELLGITI 458. MERSSFEKKPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPIVAYSVLVVAKVKKGKSKKLKSVKELLGITI 459. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 460. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD ETEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPIVAFSVLVVAKVEKGKSKKLKSVKELLGITI 461. MERSFFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKDLLGITI 462. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILVDANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGIT 463. NMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAEELQKGNELALPSKYVNFLYLA SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL FTLTNLGAPAAFKYFDATIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 464. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIYRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVANVEKGKSKKLKSVKELLGITI 465. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTDLGAPAAFKYFDTTIDRKQYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD Selected mutant clones for VRER variant, sequence from K1097-D1368 KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 466. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQFFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL FTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEMQTGGFSKESVLPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKRLKSVKELLGI 467. TIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNELALPSKYVNFLYLA SHYEKLKGSPDDNEQKRLFVEQHKHYLDEIIEQISEFSKRVILADANRDKVLSAYNKHRDKPIREQAENIIH LFTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 468. MERSSFEENPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPKDNEQKQLFVEKHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVANVEKGKSKKLKSVKELLGITI 469. MERSSYEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 470. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSFTGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 471. MERSSFEKNPIDFLEAKGYKEVKKDLLIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKDEKGKSKKLKSVKELLGITI 472. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPICEQAENIIHLF TLTKLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKVIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 473. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPIVAYSVLVVAKVEKGKSKKLKSVKELLGITI 474. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 475. MERSSFEKNPIDFLEAKGYKEIKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASH YEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKTIREQAENTIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 476. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHHSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 477. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADGNLDKVLSAYNKHRDKPIREQAENIIHL FTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 478. MERSSFEKNPIDFLEAKGYKEVKKDLLIKLPKYNLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 479. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPDYNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPKVAYSVLVVAKVEKGKSKKLKSVKELLGITI 480. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKQYRSTKEVLDATLIHQSITGLYETRIDMSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 481. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLIAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEEQTGGFSKESIHPKRNSDKLIARKKDWDPKKYGGFHSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 482. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNMHRDKPIREQAENIIHL FTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLEGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 483. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQMQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL FTLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGRFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 484. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKVKSVKELLGITI 485. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 486. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HFEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTMIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 487. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQPKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 488. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEIALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTKIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 489. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 490. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGVLQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 491. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 492. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 493. MERSSFEKNPFDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADPNLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFLSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 494. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 495. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVIELLGITI 496. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 497. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKTKKLKSVKELLGITI 498. MERSSFEKNPIDFLEAKGYKEVIKDFIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASH YEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFT LTNLGAPAAFKYFDTTIDRKQYRSPKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 499. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVPVVAKVEKGKSKKLKSVKELLGITI 500. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELESGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKEYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 501. MERSSFEKNPIDFLEAKGYKEVNKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 502. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIHEQAENIIHL FTLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 503. MERSSFEKNPIDFLEAKGYKEVKKDLMIKLPKYSLFELKNGRKRMLASAGELQKGNELALPSKYVNFLYLA SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHL FTLTKLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHHSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 504. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGLYSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 505. MERSSFEKNPIDFLEAKGYKEVKRDLIIKLPKYSLFELKNGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKTYRSTKEVVDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESIHPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 506. MERSSFEKNPIDFLEAKGYKEVKKDLIITLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADSNLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKGNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 507. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKDLLGITI 508. MERSSFEKNPIDFLEAKGYKEVKKDLMIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLA SHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIVH LFTLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFNSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 509. MERSSFEKNPIDFLEAKGYKEIKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASH YEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADVNLDKVLSAYNKHRDKPIREQAENIIHLFT LTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD KTEVQTGGFSKESIHPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITI 510. MERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARELQKGNELALPSKYVNFLYLAS HYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKDRDKPIREQAENIIHLF TLTNLGAPAAFKYFDTTIDRKTYRSTKEVLDATLIHQSITGLYETRIDLSQLGGD
Example 2. Engineering the PAM Specificity of Staphylococcus aureus Cas9
[0141] Because we knew what residues of Streptococcus pyogenes Cas9 (SpCas9) were important for PAM recognition (R1333 and R1335), we generated an alignment of Cas9 orthologues to look for homologous residues in the PAM-interacting domain (PI domain) of Staphylococcus aureus Cas9 (SaCas9) (see
[0142] We generated alanine (A) and glutamine (Q) substitutions at these five positions to determine if the mutant clones could still cleave a site containing the canonical NNGRRT PAM (SEQ ID NO:46), or possibly cleave the previously non-targetable PAM of NNARRT (SEQ ID NO:43) (
[0143] We then selected randomly mutagenized either wild-type SaCas9, or the R1015Q variant and selected for altered PAM specificity clones against sites containing NNAAGT (SEQ ID NO:41) or NNAGGT (SEQ ID NO:42) PAMs (as previously described for SpCas9). We identified, re-screened, and sequenced a number of mutant clones that could target these PAMs, with their amino acid sequences shown in
[0144] After identifying the positions and mutations essential for altering the PAM specificity of SaCas9 to NNARRT (SEQ ID NO:43), we assessed the contributions of the most abundant mutations to the specificity change by making single, double, and triple mutants combinations (Table 5). When testing these mutations against various PAMs in our positive selection (as previously described), we observed that a number of mutations allowed activity on both a canonical NNGAGT (SEQ ID NO:5) and non-canonical NNAAGT (SEQ ID NO:41) or NNAGGT (SEQ ID NO:42) PAMs, whereas the wild-type SaCas9 enzyme had very low activity on the non-canonical PAMs. Specifically, it appeared as though the triple mutations enabled a relaxed specificity at the third position of the PAM (KKQ, KKH, GKQ, GKH—named based on mutations to positions E782/N968/R1015), leading to a consensus PAM motif of NNRRRT (SEQ ID NO:45) versus the canonical NNGRRT (SEQ ID NO:46). This relaxation of the PAM requirement theoretically doubles the targeting range of SpCas9. Henceforth, variants will be named based on their identities at positions 782, 968, and 1015. For example, E782K/N968K/R1015H would be named the SaCas9 KKH variant.
TABLE-US-00010 TABLE 5 SaCas9 mutant activity in the bacterial screen NNGAGT NNAAGT NNAGGT (SEQ ID (SEQ ID (SEQ ID mutation(s) NO: 5) NO: 41) NO: 42) % activ- % activ- % activ- E782 N968 R1015 ity ity ity 100.0 21.4 15.7 Q 0.0 4.3 0.0 H 100.0 100.0 57.1 K 85.7 61.4 57.1 G 85.7 57.1 57.1 K 100.0 57.1 57.1 K Q 85.7 92.9 85.7 K H 100.0 100.0 85.7 G Q 71.4 85.7 71.4 G H 100.0 85.7 85.7 K Q 85.7 85.7 85.7 K H 85.7 92.9 92.9 K K 71.4 71.4 71.4 G K 85.7 71.4 71.4 K K Q 100.0 100.0 100.0 K K H 92.9 100.0 100.0 G K Q 92.9 92.9 100.0 G K H 100.0 100.0 100.0
[0145] We next assessed two of the triple mutants in the human cell EGFP disruption assay (as previously described) to determine whether the engineered variants could target non-canonical PAMs in a human cell context (
[0146] Overall, we've identified mutations in SaCas9 (KKQ or KKH variants) that appear to relax the preference of the wild-type enzyme at the third position of the PAM from a G to an R (A or G). This effectively relaxes the targeting of SaCas9 from an NNGRRT (SEQ ID NO:46) PAM constraint to an NNRRRT (SEQ ID NO:45) PAM.
[0147] Because we had successfully derived variants that could target NNARRT (SEQ ID NO:43) PAMs in human cells, we next asked the question of whether we could engineer variants with specificity for NNCRRT (SEQ ID NO:47) or NNTRRT (SEQ ID NO:48). To do so, we first mutated R1015 to E (in the case of specifying a C at the 3.sup.rd position of the PAM) and to L or M (in the case of specifying a T at the 3.sup.rd position of the PAM), and tested these against their expected PAMs in our bacterial positive selection assay (previously described) (
[0148] For the SaCas9 evolved variants against NNARRT (SEQ ID NO:43) PAMs, the E782K and N968K mutations were necessary and essential along with the R1015(H/Q). To test whether these mutations would increase the activity of the R1015(E/L/M) variants against their expected PAM, we generated the KKE, KKL, and KKM variants. As shown in
[0149] We were also curious as to whether the KKQ, KKH, KKE, KKL, or KKM variants had relaxed specificity against any nucleotide at the 3.sup.rd position of the PAM, so we interrogated a number of sites in our bacterial positive selection assay containing NNNRRT PAMs. As shown in
[0150] Thus, the KKH variant (and some of the other variants in
TABLE-US-00011 TABLE 6 SEQ ID residues A652-G1053 of SaCas9 NO: Wild Type SaCas9 ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN Aa ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK 652- YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY 1053 of HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA SEQ ID HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE NO: 2. AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG Sequences of selected clones of SaCas9 variants ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 53. ADFIFKEWKRLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVKSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSNKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 54. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKELINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNMVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFMASFYKNDLIKFNGELYRVIGVNNDLLNRIEVNMIDITYREYLENMN DKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDIKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANA 55. DFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKY SHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYH HDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHL NITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAK KLKKISNQAEFISSFYSNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRP PHIIKRIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 56. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIVITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPEIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 57. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIRINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSIKGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 58. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDFK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDNYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYRGYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 59. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMKD KRPPHIIKTIASKTQSIIKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 60. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEHEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSNKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 61. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHINDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKRPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 62. ADFIFKEWKKLDKAKKLMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVRNLDVIKKENHYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 63. ADFIFKEWKKLDKAKKVMENQMFEEKQAESKPEIETEQEYKEIFITPHQIKHIKDFKDYKY SHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYH HDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHL DITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNTKCYEEAK KLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKR PPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIVKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 64. ADFIFKEWKKLDRAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGYTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTITSKTQSIKKYSTDILGNLYEVKSKKQPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 65. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLILEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAH LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVIVKNLDVIKKENYYEVNSKCYEEA KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDK RPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 66. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGIYKFVTVKNMDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYIENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 67. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVRNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQVIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 68. ADFIFKEWKRLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDDKNPLYKYYEETGNYLIKYSKKDNGPVIKKIKYYGNKLNAH LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVRNLDVIKKENYYEVNSKCYEEA KKLKKISNQAEFIAYFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDK RPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIK*G ATRGLMNLLRSYFRVNNLDVKVKSINGGFTRFLRRKWKFKKERNKGYKHHAEDALIIAN 69. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQITKKG ATRGLMNLLRNYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 70. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDQQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKNENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQNIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQII*KG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 71. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGYYLTKYSKKDNGPVIKKIKYYGNKINAH LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEA KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDK RPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIVKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 72. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 73. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYKNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 74. ADFIFKEWKKLDKAKKLMENQMFEEKQAESMPEIETEQEYKEIFMTPHQIKHIKDFKDY KYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLM YHHDPQTYQKLKLIMEQYGDEKNPLYKYHEETGNYLTKYSKKDNGPVIKKIKYYGNKLN AHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYE EAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMN DKRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 75. and ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK 966 YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDIGPVIKKIKYYGNKLNAH LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEA KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNYK RPPQIIKTIASKTQSIKKYSSDILGNLYEVKSKKHP*IIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 76. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIINTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 77. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKELFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYKNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKCSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 78. ADFIFKEWKKLDKAKKVMENQMFEKKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRGLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEDTGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDLIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENVNDK RPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 79. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGIYKFVTVKNLDVIKKENYYEVNSKCYEKA KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDK RPPQIIKTIASKTQSIKKYSTDILGNLYEVKSNKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 80. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFIIPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 81. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETQQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASYYNNDLIKINGELYRVIGVNNDLLNRIEVKMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPHIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 82. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNILNGLYDKDNDKLKKLINKSPEKLLMYH HDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIRYYGNKLNAHL DITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAK KLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKR PPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 83. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYFENMNV KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 84. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKHNRELVNDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLM YHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTITSKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 85. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIMDFKDY KYSHRVDKKPNRELINDTLYSTRKDEKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLM YHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVRNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 86. ADFIFKEWKKLDKAKKVMENQMFEEKQAVSMPEIETEQEYKEIFINPHQIKHIKDFKDY KYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKYNDKLKKLINKSPEKLLM YHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSRKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYRENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 87. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKNENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENINGK RPPQIIKTITSKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNNGYKHHAEDALIIAN 88. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKEFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGIYLTKYSKKDNGPVIKKIKYYGNKLNAH LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYGEA KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLEIMNDKR PPQIIKTIASKTQSIKKYSTDILGNLYEVKSNKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 89. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKVIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYIYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAH LDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEA KKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDK RPPQIIKTIASKTQSIKKYSTDILGNVYEVKSKKHPQIIIKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 90. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETGQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLKSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 91. ADFIFKEWKKLDKSKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKHNRKLINDTLYSTRKDDKGNTLIVNNINGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNTIDITYREYLENMNDK RPPQIIKTIASKTQSIKKYSTDILGNLYEVKPKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 92. and ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK 966 YSHRVDKKPNRKLINDTLYSTREDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDISDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIYITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHP*IIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 93. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPYQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVRNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 94. ADFIFKEWKKLDNAKKVMENQMFEEKRAESMPEIETEQEYKEIFITPHQIKHIKDFKDFK YSHMVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLIY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 95. and ADFIFKEWIRLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK 966 YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHP*IIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIVAN 96. and ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK 966 YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVINLNGLYDKDNDKLKKLINKSPEKLLMYH HDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHL DITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAK KLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKR PPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHP*IIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 97. and ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK 966 YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHP*IIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 98. and ADFIFKEWKKLDKAKKVMENQMFEEKQAMSMPEIETEQEYKEIFITPHQIKHIKDFKDY 966 KYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLM YHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPDSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSQKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 99. ADFIFKEWKKLDKAKKVMENQMFEEKQAGSMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNRLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLESMNDK RPPQIIKTIASKTQTIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYYRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 100. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLTNKSPGKLLM YHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA YLDITDDYPNSRNNVVKLSLKPYRFDVYLDNGVYKFVIVKNLDVIEKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPHIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 101. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKFKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVIVKNLDVIKKDNYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIATKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRTYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 102. ADFIFKEWKKLDKAKKVMENQMFEEKHAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLIDKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIK*G ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 103. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYNEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLYVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG ATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIAN 104. ADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYK YSHRVDKKPNRKLINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMY HHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNA HLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEE AKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMND KRPPQIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG
Methods for Example 3
[0151] The following materials and methods were used in Example 3.
Plasmids and Oligonucleotides
[0152] Oligonucleotides are listed in Table 11, sgRNA target sites are listed in Table 12, and plasmids used in this study are listed in Table 10.
[0153] Bacterial Cas9/sgRNA expression plasmids were used to express both a human codon optimized version of SaCas9 and the sgRNA, each expressed under a separate T7 promoter. Bacterial expression plasmids used in the selections were derived from BPK2101 (see Examples 1-2) while those used in the site-depletion assay were modified to express a sgRNA with a shortened repeat:anti-repeat sequence (see below). All sgRNAs in these bacterial expression plasmids included two guanines at the 5′ end of the spacer sequence for proper expression from the T7 promoter.
[0154] To generate libraries of SaCas9 variants, amino acids M657-G1053 of SaCas9 were randomly mutagenized using Mutazyme II (Agilent Technologies) at a frequency of ˜5.5 mutations/kilobase. Both wild-type and R1015Q SaCas9 were used as starting template for mutagenesis, resulting in two libraries with estimated complexities of greater than 6×10.sup.6 clones.
[0155] Positive selection plasmids were assembled by ligating oligonucleotide duplexes encoding target sites into XbaI/SphI-digested p11-lacY-wtx1 (Chen, Z. & Zhao, H. A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res 33, e154 (2005)). For the site-depletion experiments, two separate libraries containing different spacer sequences were generated. For each library, an oligonucleotide containing 8 randomized nucleotides adjacent to the spacer sequence (in place of the PAM) was complexed with a bottom strand primer and filled in using Klenow(-exo) (refer to Table 11). The resulting product was digested with EcoRI and ligated into EcoRI/SphI-digested p11-lacY-wtx1. Estimated complexities of the two site-depletion libraries were greater than 4×10.sup.6 clones.
[0156] For human cell experiments, human codon-optimized wild-type and variant SaCas9s were expressed from a plasmid containing a CAG promoter (Table 12). sgRNA expression plasmids (containing a U6 promoter) were generated by ligating oligonucleotide duplexes encoding the spacer sequence into BsmBI digested VVT1 (See Examples 1-2 or BPK2660 (containing the full length 120 nt crRNA:tracrRNA sgRNA or a 84 nt shortened repeat:anti-repeat version, respectively). All sgRNAs used in this study for human expression included one guanine at the 5′ end of the spacer to ensure proper expression from the U6 promoter, and also used a shortened sgRNA (
Bioinformatic Analysis of Cas9 Orthologue Sequences
[0157] Similar to alignments performed in previous studies (Fonfara, I. et al. Phylogeny of Cas9 determines functional exchangeability of dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems. Nucleic Acids Res 42, 2577-2590 (2014); Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191 (2015); Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569-573 (2014)), Cas9 orthologues similar to both SpCas9 and SaCas9 were aligned using ClustalW2 (ebi.ac.uk/Tools/msa/clustalw2/). The resulting phylogenetic tree and protein alignment were visualized using Geneious version 8.1.6 and ESPript (espript.ibcp.fr/ESPript/ESPript/).
Bacterial-Based Positive Selection Assay
[0158] The bacterial positive selection assays were performed as previously described (See Examples 1-2). Briefly, Cas9/sgRNA plasmids were transformed into E. coli BW25141(λDE3) (Kleinstiver et al., Nucleic Acids Res 38, 2411-2427 (2010)) containing a positive selection plasmid. Transformations were plated on both non-selective (chloramphenicol) and selective (chloramphenicol+10 mM arabinose) conditions. Cas9 cleavage of the selection plasmid was estimated by calculating the percent survival: (# of colonies on selective plates/# of colonies on non-selective plates)×100. To select for SaCas9 variants capable of recognizing alternative PAMs, the wild-type and R1015Q libraries with mutagenized PI domains were transformed into competent E. coli BW25141(λDE3) containing positive selection plasmids with NNAAGT (SEQ ID NO:41), NNAGGT (SEQ ID NO:42), NNCAGT (SEQ ID NO:511), NNCGGT (SEQ ID NO:512), NNTAGT (SEQ ID NO:513), or NNTGGT (SEQ ID NO:514) PAMs. Approximately 1×10.sup.5 clones were screened by plating on selective conditions, and surviving colonies containing SaCas9 variants presumed to cleave the selection plasmid were mini-prepped (MGH DNA Core). All variants were re-screened individually in the positive selection assay, and those with greater than ˜20% survival were sequenced to determine the mutations required for recognition of the alternate PAM.
Bacterial-Based Site-Depletion Assay
[0159] The site-depletion experiments were performed as previously described (See Examples 1-2). Briefly, the randomized PAM libraries were electroporated into competent E. coli BW25141(λDE3) containing either wild-type, catalytically inactive (D10A/H557A), or KKH variant SaCas9/sgRNA plasmids. Greater than 1×10.sup.5 colonies were plated on chloramphenicol/carbenicillin plates, and selection plasmids with PAMs resistant to Cas9 targeting contained within the surviving colonies were isolated by maxiprep (Qiagen). The region of the plasmid containing the spacer sequence and PAM was PCR-amplified using the primers listed in Table 11. The KAPA HTP library preparation kit (KAPA BioSystems) was used to generate a dual-indexed Tru-seq Illumina sequencing library using ˜500 ng purified PCR product from each site-depletion condition prior to an Illumina MiSeq high-throughput sequencing run at the Dana-Farber Cancer Institute Molecular Biology Core. The data from the site-depletion experiments was analyzed as previously described (See Examples 1-2), with the exception that the script was modified to analyze 8 randomized nucleotides. Cas9 ability to recognize PAMs was determined by calculating the post-selection PAM depletion value (PPDV) of any given PAM: the ratio of the post-selection frequency of that PAM to the pre-selection library frequency. A control experiment using catalytically inactive SaCas9 was used to establish that a PPDV of 0.794 represents statistically significant depletion relative to the input library.
Human Cell Culture and Transfection
[0160] U2OS cells obtained from our collaborator T. Cathomen (Freiburg) and U2OS.EGFP cells harboring a single integrated copy of an EGFP-PEST reporter gene (Reyon, D. et al. FLASH assembly of TALENs for high-throughput genome editing. Nat Biotechnol 30, 460-465 (2012)) were cultured in Advanced DMEM medium (Life Technologies) with 10% FBS, penicillin/streptomycin, and 2 mM GlutaMAX (Life Technologies) at 37° C. with 5% CO.sub.2. Cell line identities were validated by STR profiling (ATCC) and deep sequencing, and cells were tested bi-weekly for mycoplasma contamination. U2OS.EGFP culture medium was additionally supplemented with 400 μg/mL G418. Cells were co-transfected with 750 ng Cas9 plasmid and 250 ng sgRNA plasmid using the DN-100 program of a Lonza 4D-nucleofector following the manufacturer's instructions.
Human Cell EGFP Disruption Assay
[0161] EGFP disruption experiments were performed as previously described (Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat Biotechnol 31, 822-826 (2013); Reyon, D. et al. FLASH assembly of TALENs for high-throughput genome editing. Nat Biotechnol 30, 460-465 (2012)). Approximately 52 hours post-transfection, a Fortessa flow cytometer (BD Biosciences) was used to measure EGFP fluorescence in transfected U2OS.EGFP cells. Negative control transfections of Cas9 and empty U6 promoter plasmids were used to establish background EGFP loss at ˜2.5% for all experiments (represented as a dashed lined in FIGs).
T7E1 assay
[0162] T7E1 assays were performed as previously described (Reyon, D. et al. FLASH assembly of TALENs for high-throughput genome editing. Nat Biotechnol 30, 460-465 (2012)) to quantify Cas9-induced mutagenesis at endogenous loci in human cells. Approximately 72 hours post-transfection, genomic DNA was isolated using the Agencourt DNAdvance Genomic DNA Isolation Kit (Beckman Coulter Genomics). Target loci were PCR-amplified from ˜100 ng of genomic DNA using the primers listed in Table 11. Following an Agencourt Ampure XP clean-up step (Beckman Coulter Genomics), ˜200 ng purified PCR product was denatured and hybridized prior to digestion with T7E1 (New England Biolabs). Following a second clean-up step, mutagenesis frequencies were quantified using a Qiaxcel capillary electrophoresis instrument (Qiagen).
GUIDE-Seq Experiments
[0163] GUIDE-seq experiments were performed and analyzed as previously described (Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015)). Briefly, U2OS cells were transfected as described above with Cas9 and sgRNA plasmids, as well as 100 pmol of a phosphorylated, phosphorothioate-modified double-stranded oligodeoxynucleotide (dsODN) with an embedded NdeI site. Restriction fragment length polymorphism (RFLP) analyses were performed to determine frequency of dsODN-tag integration frequencies ((See Examples 1-2; Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015)), and T7E1 assays were performed to quantify on-target Cas9 mutagenesis frequencies. dsODN tag-specific amplification and library preparation (Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015)) was performed prior to high-throughput sequencing using an Illumina MiSeq Sequencer. When mapping potential off-target sites, the cut-off for alignment to the on-target spacer sequence was set at 8 mismatches for 21 nucleotide spacers, 9 mismatches for 22 nucleotide spacers, and 10 mismatches for 23 nucleotide spacers. Off-target sites with potential DNA- or RNA-bulges (Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res 42, 7473-7485 (2014)) were identified by manual alignment.
TABLE-US-00012 TABLE 10 Plasmids used in Example 3 SEQ ID Name NO: Description BPK2101 10 T7-humanSaCas9-NLS-3xFLAG-T7-Bsalcassette-Sa-sgRNA(120) Addgene ID: 65770 T7 promoters at 1-17 and 3418-3434, human codon optimized S. aureus Cas9 at 88- 3352, NLS at 3256-3276, 3xFLAG tag at 3283-3348, Bsal sites at 3437-3442 and 3485- 3490, gRNA at 3492-3616, T7 terminator at 3627-2674 of SEQ ID NO: 10. MSP2283 21 T7-humanSaCas9-NLS-3xFLAG-T7-site1-Sa-sgRNA(84) T7 promoters at nts 1-17 and 3418-3434, human codon optimized S. aureus Cas9 at 88- 3351, NLS at 3256-3276, 3xFLAG tag at 3243-33348, site 1 spacer at 3435-3455, sgRNA(84) at 3456-3539, T7 terminator at 3562-3609 of SEQ ID NO: 21 MSP2262 22 T7-humanSadCas9(D10A, H557A)-NLS-3xFLAG-T7-site1-Sa-sgRNA(84) T7 promoters at nts 1-17 and 3418-3434, human codon optimized S. aureus Cas9 at 88- 3351, modified codons at 118-120 and 1759-1761, NLS at 3256-3276, 3xFLAG tag at 3243-33348, site 1 spacer at 3435-3455, sgRNA(84) at 3456-3539, T7 terminator at 3562-3609 of SEQ ID NO: 22 MSP2253 23 T7-humanSaCas9(E782K, N968K, R1015H)-NLS-3xFLAG-T7-site1-Sa-sgRNA(84) T7 promoters at nts 1-17 and 3418-3434, human codon optimized S. aureus Cas9 at 88- 3351, modified codons at 2434-2436, 2992-2994, and 3133-3135, NLS at 3256-3276, 3xFLAG tag at 3243-33348, site 1 spacer at 3435-3455, sgRNA(84) at 3456-3539, T7 terminator at 3562-3609 of SEQ ID NO: 23 MSP2266 24 T7-humanSaCas9-NLS-3xFLAG-T7-site2-Sa-sgRNA(84) T7 promoters at 1-17 and 3419-3434, human codon optimized S. aureus Cas9 at 88- 3351, NLS at 3256-3276, 3xFLAG tag at 3283-3348, site 2 spacer at 3435-3455, sgRNA(84) at 3456-3539, T7 terminator at 3562-3609 of SEQ ID NO: 24 MSP2279 25 T7-humanSadCas9(D10A, H557A)-NLS-3xFLAG-T7-site2-Sa-sgRNA(84) T7 promoters at 1-17 and 3419-3434, human codon optimized S. aureus Cas9 at 88- 3351, modified codons at 118-120 and 1759-1761, NLS at 3256-3276, 3xFLAG tag at 3283-3348, site 2 spacer at 3435-3455, sgRNA(84) at 3456-3539, T7 terminator at 3562-3609 of SEQ ID NO: 25 MSP2292 26 T7-humanSaCas9(E782K, N968K, R1015H)-NLS-3xFLAG-T7-site2-Sa-sgRNA(84) T7 promoters at 1-17 and 3419-3434, human codon optimized S. aureus Cas9 at 88- 3351, modified codons at 2434-2436, 2992-2994, and 3133-3135, NLS at 3256-3276, 3xFLAG tag at 3283-3348, site 2 spacer at 3435-3455, sgRNA(84) at 3456-3539, T7 terminator at 3562-3609 of SEQ ID NO: 26 p11-lacY- — BAD-ccDB-Amp.sup.R-AraC-lacY(A177C) (Chen et al, 2005) wtx1 BPK2139 17 CAG-humanSaCas9-NLS-3xFLAG Addgene ID: 65776 Human codon optimized S. aureus Cas9 1-3195, NLS 3169-3189, 3xFLAG tag 3196- 3261 of SEQ ID NO: 17. MSP1830 27 CAG-humanSaCas9(E782K, N968K, R1015H)-NLS-3xFLAG (KKH variant) Human codon optimized S. aureus Cas9 1-3264, NLS 3169-3189, modified codons at 2347-2349, 2905-2907, and 3046-3048, 3xFLAG tag 3196-3261 of SEQ ID NO: 27 VVT1 20 U6-BsmBIcassette-Sa-sgRNA(120) Addgene ID: 65779 U6 promoter 1-318, BsmBI sites at 320-325 and 333-338, S. aureus gRNA 340-466, U6 terminator 459-466 of SEQ ID NO: 20. BPK2660 28 U6-BsmBIcassette-Sa-sgRNA(84) U6 promoter 1-318, BsmBI sites at 320-325 and 333-338, S. aureus gRNA 340-423, U6 terminator 424-430 of SEQ ID NO: 28.
TABLE-US-00013 TABLE 11 Oligonucleotides used in Example 3 SEQ ID Sequence Description NO: Oligos used to generate positive selection plasmids ctagaGGGtGGGcGGGa top oligo to clone 515 GGGTCGCCCTCGAACTT site 2 with an NNGAGT CACCTtgGAGTgcatg (SEQ ID NO: 5) PAM into the positive selection vector (XbaI/SphI cut p11- lacY-wtx1) cACTCcaAGGTGAAGTT bottom oligo to clone 516 CGAGGGCGACCCtCCCg site2 into the CCCaCCCt positive selection vector Oligos used to generate libraries for site-depletion experiments GCAGgaattcGGGAGGG top strand oligo for 517 GCACGGGCAGCTTGCCG site 1 PAM library, GNNNNNNNNCTNNNGCG cut with EcoRI once CAGGTCACGAGGCATG filled in GCAGgaattcGGAGGGT top strand oligo for 518 CGCCCTCGAACTTCACC site 2 PAM library, TNNNNNNNNCTNNNGCG cut with EcoRI once CAGGTCACGAGGCATG filled in /5Phos/ reverse primer to 200 CCTCGTGACCTGCGC fill in library oligos Primers used to amplify site-depletion libraries for sequencing GATACCGCTCGCCGCAG forward primer 201 C CTGCGTTCTGATTTAAT reverse primer 202 CTGTATCAGGC Primers used for T7E1 and RFLP experiments GGAGCAGCTGGTCAGAG forward primer 209 GGG targeted to EMX1 in U2OS human cells CCATAGGGAAGGGGGAC reverse primer 210 ACTGG targeted to EMX1 in U2OS human cells GGGCCGGGAAAGAGTTG forward primer 211 CTG targeted to FANCF in U2OS human cells GCCCTACATCTGCTCTC reverse primer 212 CCTCC targeted to FANCF in U2OS human cells CCAGCACAACTTACTCG forward primer 213 CACTTGAC targeted to RUNX1 in U2OS human cells CATCACCAACCCACAGC reverse primer 214 CAAGG targeted to RUNX1 in U2OS human cells TCCAGATGGCACATTGT forward primer 652 CAG targeted to VEGFA in U2OS human cells AGGGAGCAGGAAAGTGA reverse primer 653 GGT targeted to VEGFA in U2OS human cells
TABLE-US-00014 TABLE 12 sgRNA target sites for Example 3 Spacer SEQ SEQ Prep length ID ID Name Name (nt) Spacer Sequence NO:: Sequence with PAM NO: In VVT1 (120) EGFP MSP1428 NNGRRT 21 GCCCTCGAACTTCACCTCGGC 405 GCCCTCGAACTTCACCTCGGCG 406 1 CGGGT (SEQ ID NO: 46) MSP1400 NNGRRT 21 GCAACATCCTGGGGCACAAGC 397 GCAACATCCTGGGGCACAAGCT 398 2 GGAGT (SEQ ID NO: 46) MSP1401 NNGRRT 21 GTTGTACTCCAGCTTGTGCCC 519 GTTGTACTCCAGCTTGTGCCCC 520 3 AGGAT (SEQ ID NO: 46) MSP1403 NNGRRT 22 GCAAGGGCGAGGAGCTGTTCAC 409 GCAAGGGCGAGGAGCTGTTCAC 410 4 CGGGGT (SEQ ID NO: 46) MSP1748 NNARRT 20 GGACGGCGACGTAAACGGCC 521 GGACGGCGACGTAAACGGCCAC 522 1 AAGT (SEQ ID NO: 43) MSP1754 NNARRT 21 GAACTTCAGGGTCAGCTTGCC 523 GAACTTCAGGGTCAGCTTGCCG 524 5 TAGGT (SEQ ID NO: 43) MSP2030 NNCRRT 20 GTCGATGCCCTTCAGCTCGA 525 GTCGATGCCCTTCAGCTCGATG 526 2 CGGT (SEQ ID NO: 47) MSP2034 NNCRRT 22 GTGACCACCCTGACCTACGGCG 527 GTGACCACCCTGACCTACGGCG 528 4 TGCAGT (SEQ ID NO: 47) MSP2040 NNTRRT 20 GATATAGACGTTGTGGCTGT 529 GATATAGACGTTGTGGCTGTTG 530 1 TAGT (SEQ ID NO: 48) MSP2045 NNTRRT 21 GGTGAAGTTCGAGGGCGACAC 531 GGTGAAGTTCGAGGGCGACACC 532 3 CTGGT (SEQ ID NO: 48) In BPK2660 (84) EGFP MSP2149* NNARRT 20 GGACGGCGACGTAAACGGCC 521 GGACGGCGACGTAAACGGCCAC 522 1 AAGT (SEQ ID NO: 43) MSP2152 NNARRT 21 GTAGTTGCCGTCGTCCTTGAA 654 GTAGTTGCCGTCGTCCTTGAAG 655 2 AAGAT (SEQ ID NO: 43) MSP2153 NNARRT 22 GCCACCTACGGCAAGCTGACCC 656 GCCACCTACGGCAAGCTGACCC 657 3 TGAAGT (SEQ ID NO: 43) MSP2154 NNARRT 23 GACGGCAACTACAAGACCCGCGC 658 GACGGCAACTACAAGACCCGCG 659 4 CCGAGGT (SEQ ID NO: 43) MSP2150* NNARRT 21 GAACTTCAGGGTCAGCTTGCC 523 GAACTTCAGGGTCAGCTTGCCG 524 5 TAGGT (SEQ ID NO: 43) MSP2155 NNCRRT 20 GCGTGTCCGGCGAGGGCGAG 305 GCGTGTCCGGCGAGGGCGAGGG 533 1 CGAT (SEQ ID NO: 47) MSP2156* NNCRRT 20 GTCGATGCCCTTCAGCTCGA 525 GTCGATGCCCTTCAGCTCGATG 526 2 CGGT (SEQ ID NO: 47) MSP2158 NNCRRT 22 GCTCGACCAGGATGGGCACCAC 534 GCTCGACCAGGATGGGCACCAC 535 3 CCCGGT (SEQ ID NO: 47) MSP2159* NNCRRT 22 GTGACCACCCTGACCTACGGCG 527 GTGACCACCCTGACCTACGGCG 528 4 TGCAGT (SEQ ID NO: 47) MSP2145* NNGRRT 21 GCCCTCGAACTTCACCTCGGC 405 GCCCTCGAACTTCACCTCGGCG 406 1 CGGGT (SEQ ID NO: 46) MSP2146* NNGRRT 21 GCAACATCCTGGGGCACAAGC 397 GCAACATCCTGGGGCACAAGCT 398 2 GGAGT (SEQ ID NO: 46) MSP2147 NNGRRT 21 GTTGTACTCCAGCTTGTGCCC 519 GTTGTACTCCAGCTTGTGCCCC 520 3 AGGAT (SEQ ID NO: 46) MSP2148 NNGRRT 22 GCAAGGGCGAGGAGCTGTTCAC 409 GCAAGGGCGAGGAGCTGTTCAC 410 4 CGGGGT (SEQ ID NO: 46) MSP2161* NNTRRT 20 GATATAGACGTTGTGGCTGT 529 GATATAGACGTTGTGGCTGTT 530 1 GTAGT (SEQ ID NO: 48) MSP2162 NNTRRT 21 GGGCGAGGAGCTGTTCACCGG 536 GGGCGAGGAGCTGTTCACCGG 537 2 GGTGGT (SEQ ID NO: 48) MSP2164* NNTRRT 21 GGTGAAGTTCGAGGGCGACAC 531 GGTGAAGTTCGAGGGCGACAC 532 3 CCTGGT (SEQ ID NO: 48) MSP2163 NNTRRT 21 GCACTGCACGCCGTAGGTCAG 538 GCACTGCACGCCGTAGGTCAG 539 4 GGTGGT (SEQ ID NO: 48) Endo- genous genes EMX1 MSP2184** EMX1 1 22 GTGTGGTTCCAGAACCGGAGGA 540 GTGTGGTTCCAGAACCGGAGGA 541 CAAAGT MSP2185 EMX1 2 21 GCAGGCTCTCCGAGGAGAAGG 542 GCAGGCTCTCCGAGGAGAAGGC 543 CAAGT MSP2183 EMX1 3 23 GCCCCTCCCTCCCTGGCCCAGGT 544. GCCCCTCCCTCCCTGGCCCAGG 545. TGAAGGT MSP2199** EMX1 4 21 GCTCAGCCTGAGTGTTGAGGC 546. GCTCAGCCTGAGTGTTGAGGCC 547. CCAGT MSP2202 EMX1 5 21 GCCTGCTTCGTGGCAATGCGCC 548. GCCTGCTTCGTGGCAATGCGCC 549. ACCGGT MSP2168** EMX1 6 21 GCAACCACAAACCCACGAGGG 550. GCAACCACAAACCCACGAGGGC 551. AGAGT MSP2169 EMX1 7 21 GGCCTCCCCAAAGCCTGGCCA 552. GGCCTCCCCAAAGCCTGGCCAG 553. GGAGT MSP2170 EMX1 8 23 GCAGAAGCTGGAGGAGGAAGGGC 554. GCAGAAGCTGGAGGAGGAAGGG 555. CCTGAGT MSP2201 EMX1 9 21 GCTTCGTGGCAATGCGCCACCG 556. GCTTCGTGGCAATGCGCCACCG 557. GTTGAT MSP2200** EMX1 10 22 GGCTCTCCGAGGAGAAGGCCA 558. GGCTCTCCGAGGAGAAGGCCAA 559. GTGGT FANCF MSP2189 FANCF 1 22 GCCTCTCTGCAATGCTATTGGT 560. GCCTCTCTGCAATGCTATTGGT 561. CGAAAT MSP2190 FANCF 2 21 GCGTACTGATTGGAACATCCG 562. GCGTACTGATTGGAACATCCGC 563. GAAAT MSP2186 FANCF 3 23 GACGTCACAGTGACCGAGGGCCT 564. GACGTCACAGTGACCGAGGGCC 565. TGGAAGT MSP2187 FANCF 4 23 GCCCGGCGCACGGTGGCGGGGTC 566. GCCCGGCGCACGGTGGCGGGGT 567. CCCAGGT MSP2188 FANCF 5 21 GGCGGGGTCCCAGGTGCTGAC 568. GGCGGGGTCCCAGGTGCTGACG 569. TAGGT MSP2205 FANCF 6 21 GGCGTATCATTTCGCGGATGT 570. GGCGTATCATTTCGCGGATGTT 571. CCAAT MSP2208 FANCF 7 22 GAGACCGCCAGAAGCTCGGAAA 572. GAGACCGCCAGAAGCTCGGAAA 573. AGCGAT MSP2204 FANCF 8 21 GGATCGCTTTTCCGAGCTTCT 574. GGATCGCTTTTCCGAGCTTCTG 575. GCGGT MSP2207** FANCF 9 22 GCGCCCACTGCAAGGCCCGGCG 576. GCGCCCACTGCAAGGCCCGGCG 577. CACGGT MSP2172** FANCF 10 21 GTAGGGCCTTCGCGCACCTCA 578. GTAGGGCCTTCGCGCACCTCAT 579. GGAAT MSP2174 FANCF 11 22 GCAGCCGCCGCTCCAGAGCCGT 580. GCAGCCGCCGCTCCAGAGCCGT 581. GCGAAT MSP2332 FANCF 12 22 GGCCATGCCGACCAAAGCGCCG 582. GGCCATGCCGACCAAAGCGCCG 583. ATGGAT MSP2171** FANCF 13 21 GCAAGGCCCGGCGCACGGTGG 584. GCAAGGCCCGGCGCACGGTGGC 585. GGGGT MSP2173 FANCF 14 22 GAGGCAAGAGGGCGGCTTTGGG 586. GAGGCAAGAGGGCGGCTTTGGG 587. CGGGGT MSP2206 FANCF 15 22 GTGACCGAGGGCCTGGAAGTTC 588. GTGACCGAGGGCCTGGAAGTTC 589. GCTAAT MSP2203** FANCF 16 21 GGGGTCCCAGGTGCTGACGTA 590. GGGGTCCCAGGTGCTGACGTAG 591. GTAGT MSP2209 FANCF 17 22 GTACTGATTGGAACATCCGCGA 592. GTACTGATTGGAACATCCGCGA 593. AATGAT RUNX1 MSP2192 RUNX1 1 23 GTCTGAAGCCATCGCTTCCTCCT 594. GTCTGAAGCCATCGCTTCCTCC 595. TGAAAAT MSP2193 RUNX1 2 21 GGTTTTCGCTCCGAAGGTAAA 596. GGTTTTCGCTCCGAAGGTAAAA 597. GAAAT MSP2195 RUNX1 3 21 GGGACTCCCCAAGCCCTATTA 598. GGGACTCCCCAAGCCCTATTAA 599. AAAAT MSP2235 RUNX1 4 22 GCAGCTTGTTTCACCTCGGTGC 600. GCAGCTTGTTTCACCTCGGTGC 601. AGAGAT MSP2194 RUNX1 5 22 GACCTGTCTTGGTTTTCGCTCC 602. GACCTGTCTTGGTTTTCGCTCC 603. GAAGGT MSP2216 RUNX1 6 23 GCTTCCATCTGATTAGTAAGTAA 604. GCTTCCATCTGATTAGTAAGTA 605. ATCCAAT MSP2214 RUNX1 7 22 GTGCAGAGATGCCTCGGTGCCT 606. GTGCAGAGATGCCTCGGTGCCT 607. GCCAGT MSP2211 RUNX1 8 21 GAGGGTGCATTTTCAGGAGGA 608. GAGGGTGCATTTTCAGGAGGAA 609. GCGAT MSP2217 RUNX1 9 23 GTTTCACCTCGGTGCAGAGATGC 610. GTTTCACCTCGGTGCAGAGATG 611. CCTCGGT MSP2176 RUNX1 22 GCGATGGCTTCAGACAGCATAT 612. GCGATGGCTTCAGACAGCATAT 613. 10 TTGAGT MSP2177 RUNX1 22 GCTCCGAAGGTAAAAGAAATCA 614. GCTCCGAAGGTAAAAGAAATCA 615. 11 TTGAGT MSP2334 RUNX1 22 GAGGCATATGATTACAAGTCTA 616. GAGGCATATGATTACAAGTCTA 617. 12 TTGGAT MSP2175** RUNX1 21 GAAAGAGAGATGTAGGGCTAG 618. GAAAGAGAGATGTAGGGCTAGA 619. 13 GGGGT MSP2178** RUNX1 23 GTACTCACCTCTCATGAAGCACT 620. GTACTCACCTCTCATGAAGCAC 621. 14 TGTGGGT MSP2210 RUNX1 21 GAGGTGAGTACATGCTGGTCT 622. GAGGTGAGTACATGCTGGTCTT 623. 15 GTAAT MSP2213 RUNX1 22 GAGAGGAATTCAAACTGAGGCA 624. GAGAGGAATTCAAACTGAGGCA 625. 16 TATGAT MSP2212 RUNX1 21 GAGGCTGAAACAGTGACCTGT 626. GAGGCTGAAACAGTGACCTGTC 627. 17 TTGGT VEGFA MSP2196 VEGFA 1 21 GTACATGAAGCAACTCCAGTC 628. GTACATGAAGCAACTCCAGTCC 629. CAAAT MSP2198 VEGFA 2 21 GACGGGTGGGGAGAGGGACAC 630. GACGGGTGGGGAGAGGGACACA 631. CAGAT MSP2197 VEGFA 3 22 GTCCCAAATATGTAGCTGTTTG 632. GTCCCAAATATGTAGCTGTTTG 633. GGAGGT MSP2219 VEGFA 4 21 GGCCAGGGGTCACTCCAGGAT 634. GGCCAGGGGTCACTCCAGGATT 635. CCAAT MSP2220 VEGFA 5 22 GCCAGAGCCGGGGTGTGCAGAC 636. GCCAGAGCCGGGGTGTGCAGAC 637. GGCAGT MSP2181 VEGFA 6 22 GAGGACGTGTGTGTCTGTGTGG 638. GAGGACGTGTGTGTCTGTGTGG 639. GTGAGT MSP2336 VEGFA 7 22 GGGAGAAGGCCAGGGGTCACTC 640. GGGAGAAGGCCAGGGGTCACTC 641. CAGGAT MSP2179** VEGFA 8 21 GGGTGAGTGAGTGTGTGCGTG 642. GGGTGAGTGAGTGTGTGCGTGT 643. GGGGT MSP2180 VEGFA 9 22 GAGTGAGGACGTGTGTGTCTGT 644. GAGTGAGGACGTGTGTGTCTGT 645. GTGGGT MSP2182 VEGFA 22 GCGTTGGAGCGGGGAGAAGGCC 646. GCGTTGGAGCGGGGAGAAGGCC 647. 10 AGGGGT MSP2218 VEGFA 21 GCTCCATTCACCCAGCTTCCC 648. GCTCCATTCACCCAGCTTCCCT 649. 11 GTGGT *Used in FIGS. 1C and 1E, FIG. 32 **Used for GUIDE-seq experiments in FIG. 3, FIGS. 36A-B
Example 3. Engineering the PAM Specificity of Staphylococcus aureus Cas9
[0164] Site-specific DNA cleavage by CRISPR-Cas9 nucleases is primarily guided by RNA-DNA interactions, but also requires Cas9-mediated recognition of a protospacer adjacent motif (PAM). Although the commonly used Streptococcus pyogenes Cas9 specifies only two nucleotides within its NGG PAM, other Cas9 orthologues with desirable properties recognize longer PAMs. While potentially advantageous from the perspective of specificity, extended PAM sequences can limit the targeting range of Cas9 orthologues for genome editing applications. One possible strategy to broaden the range of sequences targetable by such Cas9 orthologues might be to evolve variants with relaxed specificity for certain positions within the PAM. Here we used molecular evolution to modify the NNGRRT (SEQ ID NO:46) PAM specificity of Staphylococcus aureus Cas9 (SaCas9), a smaller size orthologue that is useful for applications requiring viral delivery. One variant we identified, referred to as KKH SaCas9, shows robust genome editing activities at endogenous human target sites with NNNRRT PAMs. Importantly, using the GUIDE-seq method, we showed that both wild-type and KKH SaCas9 induce comparable numbers of off-target effects in human cells. KKH SaCas9 increased the targeting range of SaCas9 by nearly two-to four-fold, enabling targeting of sequences that cannot be altered with the wild-type nuclease. More generally, these results demonstrate the feasibility of relaxing PAM specificity to broaden the targeting range of Cas9 orthologues. Our molecular evolution strategy does not require structural information or a priori knowledge of specific residues that contact the PAM, and therefore should be applicable to a wide range of Cas9 orthologues.
Results
[0165] We devised an unbiased genetic approach for engineering Cas9 variants with relaxed PAM recognition specificities that does not require structural information. We tested this strategy using SaCas9, for which no structural data was available at the time we initiated these studies. In an initial step, we sought to conservatively estimate the PAM-interacting domain for SaCas9 by sequence comparisons with the structurally well-characterized SpCas9 (Jiang et al., Science 348, 1477-1481 (2015); Anders et al., Nature 513, 569-573 (2014); Jinek et al., Science (2014); Nishimasu et al., Cell (2014)). Although SpCas9 and SaCas9 differ substantially at the primary sequence level (
[0166] Because the guanine at the third position in the SaCas9 PAM is the most strictly specified base (Ran et al., Nature 520, 186-191 (2015)), we randomly mutagenized the predicted PI domain and used our previously described bacterial cell-based method (see Examples 1-2) to attempt to select for mutants capable of cleaving sites with each of the three other possible nucleotides at the 3.sup.rd PAM position (i.e., NN[A/C/T]RRT PAMs (NNHRRT (SEQ ID NO:44));
[0167] Our bacterial-based selection results also suggested that the R1015H mutation might at least partially relax the specificity of SaCas9 at the third position of the PAM. However, we found that the R1015H single mutant had suboptimal activity in our previously described human cell-based EGFP disruption assay (Fu et al., Nat Biotechnol 31, 822-826 (2013); Reyon et al., Nat Biotechnol 30, 460-465 (2012)) when tested against sites with any nucleotide at the 3.sup.rd position of NNNRRT PAMs (
[0168] Combined with the selection results from wild-type SaCas9, the most frequent missense mutations identified across all selections were E782K, K929R, N968K, and R1015H (
[0169] To more comprehensively define the PAM specificities of KKH and wild-type SaCas9, we used our previously described bacterial cell-based site-depletion assay (See Examples 1-2) (
[0170] To assess the robustness of the KKH SaCas9 variant in human cells, we tested its activity on 55 different endogenous gene target sites containing a variety of NNNRRT PAMs (
[0171] To demonstrate that the KKH variant enables modification of PAMs that cannot be targeted by wild-type SaCas9, we performed direct comparisons of these nucleases in human cells on sites bearing various NNNRRT PAMs. Assessment of 16 sites using our EGFP disruption assay and 16 endogenous human gene targets (
[0172] To assess the impact of the KKH mutations on the genome-wide specificity of SaCas9, we used the GUIDE-seq (Genome-wide Unbiased Identification of DSBs Enabled by sequencing) method (Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015)) to directly compare the off-target profiles of wild-type and KKH SaCas9 with the same sgRNAs. When tested with sgRNAs targeted to six endogenous human gene sites containing NNGRRT (SEQ ID NO:46) PAMs, we observed that wild-type and KKH SaCas9 induced nearly identical GUIDE-seq tag integration rates and on-target cleavage frequencies for all six sites (
[0173] To further examine the genome-wide specificity of KKH SaCas9, we tested five additional sgRNAs targeted to sites containing NNHRRT (SEQ ID NO:44) PAMs (
[0174] Although wild-type SaCas9 remains the most optimal choice for targeting NNGRRT (SEQ ID NO:46) PAMs, the KKH SaCas9 variant we describe here can robustly target sites with NNARRT (SEQ ID NO:43) and NNCRRT (SEQ ID NO:47) PAMs and has a reasonable success rate for sites with NNTRRT (SEQ ID NO:48) PAMs. Thus, we conservatively estimate that the KKH variant increases the targeting range of SaCas9 by nearly two- to four-fold in random DNA sequence, thereby improving the prospects for more broadly utilizing SaCas9 in a variety of different applications that require highly precise targeting. Using GUIDE-seq, we demonstrated that KKH SaCas9 induces similar numbers of off-target mutations as wild-type SaCas9 when targeted to the same sites that contain NNGRRT (SEQ ID NO:46) PAMs. Also, KKH SaCas9 induces only a small number of off-target mutations when targeted to sites bearing NNHRRT (SEQ ID NO:44) PAMs. Although KKH SaCas9 recognizes a modified PAM sequence relative to wild-type SaCas9, our findings are not entirely surprising given that the total combined length of the protospacer and PAM is still long enough with the KKH variant (24 to 26 bps) to be reasonably orthogonal to the human genome. Furthermore, it is possible that modifying PAM recognition can improve specificity by altering the energetics of Cas9/sgRNA interaction with its target site (similar to the previously proposed mechanisms for improved specificities of truncated sgRNAs (Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M. & Joung, J. K. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat Biotechnol 32, 279-284 (2014)) or the D1135E SpCas9 mutant (See Examples 1-2)).
Example 4. Improving the Activity of the SpCas9-VQR Variant
[0175] Because the SpCas9-VQR variant has a preference for NGAN PAMs of: NGAG>NGAA=NGAT>NGAC, we sought to select for derivative variants that had improved activity against NGAH PAMs (where H=A, C, or T). Selections with the R1335Q library (with PI domain randomly mutagenized) against cells that contain target sites with either an NGAA, NGAT, or NGAC PAM enabled us to sequence additional clones that contained mutations that convey an altered PAM specificity. The sequences of these clones revealed additional mutations that might be important for altering PAM specificity towards NGAA, NGAT, or NGAC PAMs.
[0176] Based on the results of these selections, the VQR variant and 24 other derivative variants were tested against NGAG, NGAA, NGAT, and NGAC PAM sites in bacteria. A number of these derivative variants survived better than the VQR variant on NGAH PAM sites, most of which contained the G1218R mutation (Table 7 and
TABLE-US-00015 TABLE 7 Table of variants and their corresponding amino acid changes. variant D1135 G1218 E1219 R1335 T1337 A1 VRQ V R Q A2 NRQ N R Q A3 YRQ Y R Q A4 VRQL V R Q L A5 VRQM V R Q M A6 VRQR V R Q R A7 VRQE V R Q E A8 VRQQ V R Q Q A9 NRQL N R Q L A10 NRQM N R Q M A11 NRQR N R Q R A12 NRQE N R Q E B1 NRQQ N R Q Q B2 YRQL Y R Q L B3 YRQM Y R Q M B4 YRQR Y R Q R B5 YRQE Y R Q E B6 YRQQ Y R Q Q B7 VRVQE V R V Q E B8 NRVQE N R V Q E B9 YRVQE Y R V Q E B10 VVQE V — V Q E B11 NVQE N V Q E B12 YVQE Y V Q E C1 VQR V Q R
[0177] Given that the results from the bacterial screen demonstrated that some of these additional mutations improved activity against NGAH PAM sites, we tested some of the best candidates in human cells in the EGFP disruption assay. What we observed is that a number of these variants outperformed the VQR variant at targeting NGAH sites, including the VRQR, NRQR, and YRQR variants (Table 8 and
TABLE-US-00016 TABLE 8 Table of SpCas9-VQR derivatives and their corresponding amino acid changes variant D1135 G1218 R1335 T1337 VQR V — Q R YRQ Y R Q — VRQR V R Q R VRQQ V R Q Q NRQR N R Q R NRQQ N R Q Q YRQR Y R Q R YRQQ Y R Q Q
[0178] Because the VRQR variant appeared to be the most robust of those tested, we compared its activity to that of the VQR against 9 different endogenous sites in human cells (2 sites for each NGAA, NGAC, NGAT, and NGAG PAMs, and 1 site for an NGCG PAM). This data reveals that the VRQR variant outperforms the VQR variant at all sites tested in human cells (
[0179] After demonstrating that VRQR variant has improved activity relative to the VQR variant, we sought to determine whether adding additional substitutions could further improve activity. Because we observed additional mutations in the selections that were in close proximity to the PAM interacting pocket of SpCas9, a subset of these mutations were added to the VQR and VRQR variants and screened in bacteria against sites containing NGAG, NGAA, NGAT, and NGAC PAMs (Table 9 and
TABLE-US-00017 TABLE 9 Table of variants and their corresponding amino acid changes variant mutations 1 VQR + L1111H L1111H/D1135V/R1335Q/T1337R 2 VRQR + L1111H L1111H/D1135V/G1218R/R1335Q/ T1337R 3 VQR + E1219K D1135V/E1219K/R1335Q/T1337R 4 VQR + E1219V D1135V/E1219V/R1335Q/T1337R 5 VQR + N1317K D1135V/N1317K/R1335Q/T1337R 6 VRQR + N1317K D1135V/G1218R/N1317K/R1335Q/ T1337R 7 VQR + G1104K G1104R/D1135V/R1335Q/T1337R 8 VRQR + G1104K G1104R/D1135V/G1218R/R1335Q/ T1337R 9 VQR + S1109T S1109T/D1135V/R1335Q/T1337R 10 VRQR + S1109T S1109T/D1135V/G1218R/R1335Q/ T1337R 11 NQR + S1136N D1135N/S1136N/R1335Q/T1337R 12 NRQR + S1136N D1135N/S1136N/G1218R/R1335Q/ T1337R 13 VQR D1135V/R1335Q/T1337R 14 VRQR D1135V/G1218R/R1335Q/T1337R
[0180] Taken together, these results suggest that including additional mutations in the SpCas9-VQR variant can improve activity against sites that contain NGAN PAMs, specifically sites that contain NGAH PAMs.
REFERENCES
[0181] 1. Sander, J. D. & Joung, J. K. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol 32, 347-355 (2014). [0182] 2. Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262-1278 (2014). [0183] 3. Doudna, J. A. & Charpentier, E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096 (2014). [0184] 4. Barrangou, R. & May, A. P. Unraveling the potential of CRISPR-Cas9 for gene therapy. Expert Opin Biol Ther 15, 311-314 (2015). [0185] 5. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821 (2012). [0186] 6. Sternberg, S. H., Redding, S., Jinek, M., Greene, E. C. & Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 62-67 (2014). [0187] 7. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat
[0188] Biotechnol 31, 827-832 (2013). [0189] 8. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol 33, 187-197 (2015). [0190] 9. Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc Natl Acad Sci USA (2013). [0191] 10. Fonfara, I. et al. Phylogeny of Cas9 determines functional exchangeability of dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems. Nucleic Acids Res 42, 2577-2590 (2014). [0192] 11. Esvelt, K. M. et al. Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat Methods 10, 1116-1121 (2013). [0193] 12. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems.
[0194] Science 339, 819-823 (2013). [0195] 13. Horvath, P. et al. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus. J Bacteriol 190, 1401-1412 (2008). [0196] 14. Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569-573 (2014). [0197] 15. Reyon, D. et al. FLASH assembly of TALENs for high-throughput genome editing. Nat Biotechnol 30, 460-465 (2012). [0198] 16. Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat Biotechnol 31, 822-826 (2013). [0199] 17. Chen, Z. & Zhao, H. A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res 33, e154 (2005). [0200] 18. Doyon, J. B., Pattanayak, V., Meyer, C. B. & Liu, D. R. Directed evolution and substrate specificity profile of homing endonuclease I-SceI. J Am Chem Soc 128, 2477-2484 (2006). [0201] 19. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat Biotechnol 31, 233-239 (2013). [0202] 20. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823-826 (2013). [0203] 21. Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat Biotechnol 31, 227-229 (2013). [0204] 22. Chylinski, K., Le Rhun, A. & Charpentier, E. The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems. RNA Biol 10, 726-737 (2013). [0205] 23. Kleinstiver, B. P., Fernandes, A. D., Gloor, G. B. & Edgell, D. R. A unified genetic, computational and experimental framework identifies functionally relevant residues of the homing endonuclease I-BmoI. Nucleic Acids Res 38, 2411-2427 (2010). [0206] 24. Gagnon, J. A. et al. Efficient mutagenesis by Cas9 protein-mediated oligonucleotide insertion and large-scale assessment of single-guide RNAs. PLoS One 9, e98186 (2014).
OTHER EMBODIMENTS
[0207] It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.