Applications of engineered <i>Streptococcus canis </i>Cas9 variants on single-base PAM targets
11697808 · 2023-07-11
Assignee
Inventors
- Pranam Chatterjee (Cambridge, MA, US)
- Noah Michael Jakimo (Boston, MA, US)
- Joseph M. Jacobson (Newton, MA)
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
C12N2800/80
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
International classification
C12N9/22
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
Abstract
Engineered Streptococcus canis Cas9 (ScCas9) variants include an ScCas9 protein with its PID being the PID amino acid composition of Streptococcus pyogenes Cas9 (SpCas9)-NG, an ScCas9 protein having a threonine-to-lysine substitution mutation at position 1227 in its amino acid sequence (Sc+), and an ScCas9 protein having a threonine-to-lysine substitution mutation at position 1227 and a substitution of residues ADKKLRKRSGKLATE [SEQ ID No. 4] in position 365-379 in the ScCas9 open reading frame (Sc++). Also included are CRISPR-associated DNA endonucleases with a PAM specificity of 5′-NG-3′ or 5′-NNG-3′ and a method of altering expression of a gene product by utilizing the engineered ScCas9 variants.
Claims
1. An isolated, engineered Streptococcus canis Cas9 (ScCas9) protein comprising SEQ ID NO: 27, wherein said ScCas9 is modified with a Protospacer Adjacent Motif (PAM) interacting domain (PID) of Streptococcus pyogenes Cas9 (SpCas9)-NG, which replaces the ScCas9 PID.
2. The ScCas9 protein of claim 1, further comprising the substitution of amino acids 365-379 in ScCas9 (SEQ ID NO: 27) with amino acids ADKKLRKRSGKLATE (SEQ ID No: 4).
3. An isolated, engineered Streptococcus canis Cas9 (ScCas9) protein comprising SEQ ID NO: 27, wherein said ScCas9 is modified with a substitution of amino acids ADKKLRKRSGKLATE (SEQ ID No: 4) for amino acids 365-379 in ScCas9.
4. A method of altering expression of at least one gene product, comprising: introducing, into a eukaryotic cell containing and expressing a DNA molecule having a target sequence and encoding the gene product, an engineered, non-naturally occurring Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) (CRISPR-Cas) system comprising one or more vectors comprising: (a) a first regulatory element, operable in a eukaryotic cell, operably linked to at least one nucleotide sequence encoding a CRISPR system guide RNA that hybridizes with the target sequence; and (b) a second regulatory element, operable in a eukaryotic cell, operably linked to a nucleotide sequence encoding an engineered Streptococcus canis Cas9 (ScCas9) protein comprising SEQ ID No: 27, wherein said engineered ScCas9 protein further comprises: (i) the ScCas9 PID domain of SEQ ID No: 27 substituted with the PID domain from Streptococcus pyogenes Cas9 (SpCas9)-NG, and/or (ii) the substitution of amino acid positions 365-379 of SEQ ID NO: 27 with amino acids ADKKLRKRSGKLATE (SEQ ID No: 4), and/or (iii) a threonine-to-lysine substitution at position 1227 of SEQ ID No: 27, and, wherein components (a) and (b) are located on the same or different vectors of the system, whereby the guide RNA targets the target sequence and one or more of the proteins cleave the DNA molecule, whereby expression of the at least one gene product is altered, and wherein the proteins and the guide RNA do not naturally occur together.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Other aspects, advantages and novel features of the invention will become more apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
DETAILED DESCRIPTION
(31) In one aspect, the invention is an addition to the family of CRISPR-Cas9 systems repurposed for genome engineering and regulation applications. Specifically, the invention comprises the usage of Streptococcus canis Cas9 (ScCas9) endonuclease in complex with guide RNA, consisting of an identical non-target-specific sequence to that of the guide RNA SpCas9, for specific recognition and activity on a DNA target immediately upstream of either an “NNGT” or “NNNGT” PAM sequence, promoting new flexibility in target selection. In a further aspect, the invention is a novel DNA-interacting loop domain within ScCas9, and other Cas9 orthologs, such as those from Streptococcus gordonii (Uniprot A0A134D9V8) and Streptococcus angionosis (Uniprot F5U0T2), that may facilitate a divergent PAM sequence from the canonical “NGG” PAM of SpCas9.
(32) As previously described, the application of CRISPR-Cas9 has been hampered by the inaccessibility of genomic sequences, largely due to the PAM restriction. The recent discoveries of ScCas9, xCas9-3.7, and SpCas9-NG, all reporting to possess single G PAM specificity, significantly increased the targetable space, potentially allowing for expanded base editing activities, more efficient homology-directed repair, and denser screening platforms. As all have been shown to possess limitations, however, including inefficient targeting of certain single G PAM sequences, the present invention addresses this problem by engineering ScCas9 to possess increased efficiency and broader targeting capabilities, by utilizing sequence information from engineered Cas9 variants and uncharacterized Streptococcus Cas9 orthologs. Sc+ and Sc++ nucleases outperform SpCas9, xCas9-3.7, SpCas9-NG, and ScCas9 as genome editing tools, and can thus be harnessed for various applications, including base editing. Furthermore, due to high sequence homology of ScCas9 and SpCas9, previous modifications made to SpCas9, such as high-fidelity mutations [C. A. Vakulskas, D. P. Dever, G. R. Rettig, R. Turk, A. M. Jacobi, et al., “A high-fidelity Cas9 mutant delivered as a ribonucleoprotein complex enables efficient gene editing in human hematopoietic stem and progenitor cells”, Nat. Medicine 24, 1216-1224 (2018)], can be ported into these engineered variants for improved functionality. Sc+ and Sc++, with their broad targeting range and high genome editing efficiency, will hopefully serve as platforms toward the goals of versatile genome engineering and eventual access to every sequence in the entire genome.
(33) Identification of SpCas Homologs
(34) While numerous Cas9 homologs have been sequenced, only a handful of Streptococcus orthologs have been characterized or functionally validated. To explore this space, all Streptococcus Cas9 protein sequences from UniProt [The UniProt Consortium, “UniProt: the universal protein knowledgebase”, Nucleic Acids Res. 45, D158-D169 (2017)] were curated, global pairwise alignments using the BLOSUM62 scoring matrix [S. Henikoff, J. G. Henikoff, “Amino acid substitution matrices from protein blocks”, Proc. Natl. Acad. Sci. 89, 10915-10919 (1992] were performed, and percent sequence homology to SpCas9 was calculated.
(35) As shown in Table 1, a bioinformatics workflow to identify the PAM specificity of ScCas9 in silico involves the alignment of the spacer sequences within the CRISPR cassette of Streptococcus canis with potential protospacers found within the phage and/or other genome databases. As the PAM lies immediately adjacent to the protospacer sequence, these sequences can be conglomerated and weighted based on the number of mismatches to infer bases that are overrepresented at each position [Ran, F. A. et al., “In vivo genome editing using Staphylococcus aureus Cas9”, Nature 520, 186-191 (2015); Crooks, G. E. et al. “WebLogo: a sequence logo generator”, Genome Res. 14, 1188-1190 (2004)].
(36) TABLE-US-00001 TABLE 1 S. canis Spacer Adjacent Motif (5′ to 3′) Protospacer Source (5′ to 3′) CCGCTGACAACATTGTTGGC Streptococcus pyogenes CAGTTAAT [SEQ ID No: 1] MGAS2096 (phage protein) TTTCAATGGTAAGATCATTC Streptococcus phage P9 ATGTTGAA [SEQ ID No: 2] GTTTACGCTCATCAGATAGA Streptococcus phage P9 AAGTCTAA [SEQ ID No: 3]
(37) An orthologous Cas9 protein from Streptococcus canis, ScCas9 (UniProt I7QXF2) was found to possess 89.2% sequence similarity to Sp-Cas9. Despite such homology, ScCas9 prefers a more minimal 5′-NNG-3′ PAM. To explain this divergence, two significant insertions were identified within its open reading frame (ORF) that differentiate ScCas9 from SpCas9 and contribute to its PAM-recognition flexibility. ScCas9 can efficiently and accurately edit genomic DNA in mammalian cells.
(38) From the calculations, the Cas9 from Streptococcus canis (ScCas9) stood out, not only due to its remarkable sequence homology (89.2%) to SpCas9, but also because of the positive-charged insertion of 10 amino acids within the highly-conserved REC3 domain, in positions 367-376.
(39) Exploiting both of these properties, the insertion was modeled within the corresponding domain of PDB 4008 [H. Nishimasu, F. A. Ran, P. D. Hsu, S. Konermann, S. I. Shehata, et al., “Crystal structure of Cas9 in complex with guide RNA and target DNA”, Cell 156, 935-949 (2014] and, when viewed in PyMol, it formed a “loop”-like structure, of which several of its positive-charged residues come in close proximity with the target DNA near the PAM.
(40) An additional insertion of two amino acids (KQ) was identified immediately upstream of the two critical arginine residues necessary for PAM binding [C. Anders, K. Bargsten, M. Jinek, “Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9”, Mol. Cell 61, 895-902 (2016)], in positions 1337-1338 (
(41) Analysis suggested an 5′-NNGTT-3′ PAM. As
(42) Determination of PAM Sequences Recognized by ScCas9
(43) Due to the relatively low number of protospacer targets, the PAM binding sequence of ScCas9 was validated utilizing an existent positive selection bacterial screen based on GFP expression conditioned on PAM binding, termed PAM-SCALAR [R. T. Leenay, K. R. Maksimchuk, R. A. Slotkowski, R. N. Agrawal, A. A. Gomaa, et al., “Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems”, Mol. Cell 62, 137-147 (2016)]. A plasmid library containing the target sequence followed by a randomized 5′-NNNNNNNN-3′ (8N) PAM sequence was bound by a nuclease-deficient ScCas9 (and dSpCas9 as a control) and an sgRNA both specific to the target sequence and general for SpCas9 and ScCas9, allowing for the repression of lad and expression of GFP. Plasmid DNA from FACS-sorted GFP-positive cells and pre-sorted cells were extracted and amplified, and enriched PAM sequences were identified by Sanger sequencing, and visualized utilizing DNA chromatograms. The results provided initial evidence that ScCas9 can bind to the minimal 5′-NNG-3′ PAM, distinct to that of SpCas9's 5′-NGG-3′.
(44)
(45) The previously described insertions may contribute to the flexibility permitting ScCas9 to bind to the minimal 5′-NNG-3′ PAM, distinct to that of SpCas9's 5′-NGG-3′. ScCas9 was engineered to remove either insertion or both, and subjected these variants to the same screen. Only removing the loop (ScCas9 Δ367-376 or ScCas9 ΔLoop) extended the PAM of ScCas9 to 5′-NAG-3′, with reduced specificity for C and G at position 2, while only removing the KQ insertion (ScCas9 Δ1337-1338 or ScCas9 ΔKQ), reverted its specificity to a more 5′-NGG-3′-like PAM, with reduced specificity for A at position 2 (
(46) To confirm the results of the library assay and to rule out limiting downstream requirements, the minimal PAM requirements of ScCas9 were elucidated by utilizing fixed PAM sequences. The PAM library was replaced with individual PAM sequences, which were varied at positions 2, 4, and 5 to test each possible base. The results demonstrate that while ScCas9 exhibits no clear additional base dependence, with activity for all base iterations at each position, ScCas9 ΔLoop ΔKQ demonstrates significant binding at 5′-NGG-3′ PAM sequences and at some, but not all, 5′-NNGNN-3′ motifs, indicating an intermediate PAM specificity between that of SpCas9 and ScCas9.
(47)
(48) To confirm an expected PAM sequence of “NNGT”, a bacterial assay based upon lad promoter repression of GFP expression, employing 4 nucleotide libraries of PAM sequences upstream of lad, was utilized [Leenay, R. T. et al., “Identifying and visualizing functional PAM diversity across CRISPR-Cas systems”, Mol. Cell 62, 137-147 (2016)]. The library-containing plasmids were co-electroporated with a gRNA plasmid and a nuclease-activity deficient ScCas9 (dScCas9) plasmid, all expressing different antibiotic resistance cassettes. Transformants were plated on triple antibiotic-containing LB agar plates, and GFP positive colonies were subsequently selected and screened.
(49) Sequencing results confirmed that ScCas9 prefers an “NNGT” PAM, but can also tolerate a “NNNGT” PAM, indicating both potential conformational flexibility and strict sequence constraints of the ScCas9 PAM interacting domain (PID). No preference for A was observed at position 7. While various length PAMs with diverse sequences have either been discovered or engineered, this invention, with a PAM specificity of “NNGT” or “NNNGT”, different than any known Cas9 variant [Karvelis, T. et al., “Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview”, Methods 121-122, 3-6 (2017)] and unable to be engineered from wild-type SpCas9 [Kleinstiver, B. P. et al., “Engineered CRISPR-Cas9 nucleases with altered specificities”, Nature 523, 481-485 (2015)] or Cpf1 [Gao, L., et al., “Engineered Cpf1 variants with altered specificities”, Nature Biotechnology 35, 789-792 (2017)], augments the list of potential genomic sites that can be targeted by the CRISPR system with high specificity and fidelity in a variety of cell types.
(50) Additionally, there is a two amino acid insertion (KQ) at positions 1328 and 1329, immediately upstream of the two arginine (R) residues critical for PAM binding of Cas9. It is likely that this insertion shifts the length and alters the specificity of the PAM adjacent to the target sequence. A preferred embodiment of this invention enables both the insertion of the KQ motif one amino acid upstream of the first critical arginine residue in SpCas9 to alter its PAM specificity, as well as the removal of the KQ motif in ScCas9 for a similar purpose. Sufficient sequence, and potentially structural, differences from SpCas9 in its PAM interacting domain (PID) further enable exploration of a directed evolution phase space that SpCas9 may not be able to access, through random mutagenesis or rational design, which may also lead to expanded PAM specificities for ScCas9. These engineered PIDs of ScCas9 can be swapped with the PID of SpCas9 to further augment and alter its PAM specificities as well.
(51) Further, due to the high degree of homology between SpCas9 and ScCas9, the propensity to cleave similar, but mismatched, sequences to the intended target is expected to be very similar for both wild-type endonucleases. Much work has been done to characterize and engineer mutations that destabilize strand displacement at mismatched substrates by weakening sequence dependent interactions between Cas9 and DNA (K848A, K1003A, R1060A [Slaymaker, I., et al., “Rationally engineered Cas9 Nucleases with improved specificity”, Science 351, 84-88 (2016)] or N497A, R661A, Q695A, Q926A [Kleinstiver, B. P., et al., “High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects”, Nature 529, 490-495 (2016)]), and govern mismatch sensing in non-catalytic domains of Cas9 (N692A, M694A, Q695A, H698A) [Chen, J. S. et al. “Enhanced proofreading governs CRISPR-Cas9 targeting accuracy”, bioRxiv (2017)]. In a preferred embodiment of this invention, these residue-specific mutations that decrease off-target activity while maintaining robust on-target nuclease activity can be applied to the ORF of ScCas9 to generate a hyper-accurate ScCas9 endonuclease.
(52) For in vitro and in vivo applications, the invention is compatible with existing delivery methods used for other CRISPR-Cas9 systems including, but not limited to, electroporation, lipofection, viral infection, and nanoparticle injection. Embodiments can co-deliver the invention as a coding nucleic acid or protein, along with a gRNA. Components can also be stably expressed in cells.
(53) Assessment of ScCas9 PAM Specificity in Human Cells
(54) The PAM specificity of ScCas9 was compared to SpCas9 in human cells by co-transfecting HEK293T cells with plasmids expressing these variants along with sgRNAs directed to a native genomic locus (VEGFA) with varying PAM sequences. Editing efficiency was first tested at a site containing an overlapping PAM (5′-GGGT-3′). After 48 hours post-transfection, gene modification rates, as detected by the T7E1 assay, demonstrated comparable editing activities of SpCas9, ScCas9, and ScCas9 ΔLoop ΔKQ. Additionally sgRNAs to sites with various non-overlapping 5′-NNGN-3′ PAM sequences were constructed. While SpCas9's cleavage activity was impaired at other non-5′-NGG-3′ sequences (
(55)
(56) Consistent with the bacterial data, ScCas9 ΔLoop ΔKQ was able to cleave at the 5′-NGG-3′ target, along with significant activity on the 5′-NNGA-3′ target, with reduced gene modification levels at all other 5′-NNGN-3′ targets (
(57) The PAM specificity of ScCas9 base editors was assessed by using a synthetic Traffic Light Reporter (TLR) [M. T. Certo, B. Y. Ryu, J. E. Annis, M. Garibov, J. Jarjour, et al., “Tracking genome engineering outcome at individual DNA breakpoints”, Nat. Methods 8, 671-676 (2011)] plasmid, containing an early stop codon upstream of a GFP ORF and downstream of an mCherry ORF. Successful A.fwdarw.G base editing using the ABE(7.10) architecture, as described in Gaudelli, et al. [N. M. Gaudelli, A. C. Komor, H. A. Rees, M. S. Packer, A. H. Badran, et al., “Programmable base editing of AT to GC in genomic DNA without DNA cleavage”, Nature 551, 464-471 (2017)], converts an early, in-frame TAG stop codon to a TGG tryptophan codon, thus restoring GFP expression. After gating cells based on mCherry expression, significant base editing efficiency was observed at all 5′-NNGN-3′ target PAM sequences for ScCas9-ABE(7.10), as compared to the SpCas9-ABE(7.10) architecture, which only demonstrates significant A.fwdarw.G conversion on the standard 5′-NGG-3′ and tolerated 5′-NAG-3′ motifs in this assay).
(58) Off-Target Analysis of ScCas9
(59) The accuracy of this enzyme was evaluated in comparison to SpCas9. Previous genome-wide analysis of SpCas9 targeting accuracy was utilized to select three genomic targets (VEGFA site 3, FANCF site 2, and DNMT1 site 4) that possess multiple off-target sites on which SpCas9 demonstrates activity [S. Q. Tsai, Z. Zheng, N. T. Nguyen, M. Liebers, V. V. Topkar, et al., “GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases”, Nat. Biotechnol. 33, 187-197 (2015)]. Each of these three sites additionally possesses a single off-target that has been particularly difficult to mediate via engineering of high-fidelity Cas9 variants [I. M. Slaymaker, L. Gao, B. Zetsche, D. A. Scott, W. X. Yan, et al., “Rationally engineered Cas9 Nucleases with improved specificity”, Science 351, 84-88 (2016); B. P. Kleinstiver, V. Pattanayak, M. S. Prew, S. Q. Tsai, N. T. Nguyen, et al., “High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects”, Nature 529, 490-495 (2016); J. S. Chen, Y. S. Dagdas, B. P. Kleinstiver, M. M. Welch, A. A. Sousa, et al., “Enhanced proofreading governs CRISPR-Cas9 targeting accuracy”, Nature 550, 407-410 (2017)]. ScCas9's activity was analyzed on these off-targets. After co-transfection of sgRNAs to the three aforementioned sites alongside both SpCas9 and ScCas9, genomic DNA flanking both the on-target and difficult off-target sequences was amplified to assess their genome modification activities.
(60) Consistent with previously-reported data [J. S. Chen, Y. S. Dagdas, B. P. Kleinstiver, M. M. Welch, A. A. Sousa, et al., “Enhanced proofreading governs CRISPR-Cas9 targeting accuracy”, Nature 550, 407-410 (2017)], SpCas9 demonstrated high off-to-on targeting on all three examined targets. ScCas9 demonstrated comparable on-target activities for the three targets, but exhibited negligible activity on the VEGFA site 3 and DNMT1 site 4 off-targets, and a nearly 1.5-fold decrease in off-to-on target ratio for FANCF site 2, suggesting improved accuracy over SpCas9 on overlapping 5′-NGG-3′ targets.
(61) To examine ScCas9's accuracy across its wider PAM targeting range, a mismatch tolerance assay [J. S. Chen, Y. S. Dagdas, B. P. Kleinstiver, M. M. Welch, A. A. Sousa, et al., “Enhanced proofreading governs CRISPR-Cas9 targeting accuracy”, Nature 550, 407-410 (2017)] was utilized on target sequences with 5′-NAG-3′, 5′-NCG-3′, 5′-NGG-3′, and 5′-NTG-3′ PAMs. sgRNAs containing both single and adjacent double mismatches at every other base along each of the four on-target crRNA sequences were generated, and subsequently the genome modification efficiencies were measured for these mismatched sgRNAs. The results demonstrate that ScCas9 generally tolerates single mismatches better than double mismatches for each analyzed spacer position, and is similarly less likely to tolerate mismatches within the seed region of the crRNA, though with greater sensitivity than SpCas9, as shown in
(62)
(63) ScCas9 Genome Editing Capabilities were evaluated for the ability to modify a variety of gene targets for a handful of different PAM sequences was evaluated. sgRNAs to 24 targets within 9 endogenous genes in HEK293T cells were constructed, and on-target gene modification was evaluated utilizing the T7E1 assay. The results demonstrate that ScCas9 maintains comparable efficiencies to that of SpCas9 on 5′-NGG-3′ sequences, as well as on selected 5′-NNG-3′ PAM targets, supporting the previous findings (
(64)
(65) The efficacy of ScCas9 integrated within the BE3 [A. C. Komor, Y. B. Kim, M. S. Packer, J. A. Zuris, D. R. Liu, “Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage”, Nature 533, 420-424 (2016)] and ABE(7.10) base editing architectures on endogenous genomic loci was subsequently measured. To evaluate the efficiency of base editing activities, a simple, easy-to-use Python program, termed the Base Editing Evaluation Program (BEEP), was developed, which takes as input both a negative control ab1 Sanger sequencing file and the edited sample ab1 file and outputs the efficiency of an indicated base conversion at a specific position (read 5′ to 3′) along the target sequence.
(66) BEEP analysis on ab1 files, following transfection of ScCas9 base editors, genomic amplification, and subsequent Sanger sequencing, demonstrates that ScCas9 is capable of mediating C.fwdarw.T and A.fwdarw.G base conversion at both overlapping 5′-NGG-3′ and nonoverlapping 5′-NNG-3′ PAM sequences, as shown in
(67) Investigation of Sequence Conservation Between S. canis and Other Streptococcus Cas9 Orthologs
(68) To further investigate the distinguishing motif insertions in ScCas9, the loop (SpCas9::Loop), the KQ motif (SpCas9::KQ), or both (SpCas9::Loop::KQ) were inserted into the Sp-Cas9 ORF and binding on the 8N library was analyzed using PAM-SCANR. Of these variants, only SpCas9::KQ showed target binding affinity in the PAM-SCALAR assay. Sequencing on enriched GFP-expressing cells demonstrated an unaffected preference for 5′-NGG-3′. FACS analysis on a fixed 5′-TGG-3′ PAM confirmed these binding profiles, with SpCas9::KQ yielding half the fraction of GFP-positive cells compared to SpCas9. This data, in conjunction with the binding profiles of ScCas9 variants, suggests that while these insertions within ScCas9 do distinguish its PAM preference from SpCas9, other sequence features of ScCas9 also contribute to its divergence.
(69) S. canis has been reported to infect dogs, cats, cows, and humans, and has been implicated as an adjacent evolutionary neighbor of S. pyogenes, as evidenced by various phylogenetic analyses [T. Lef'ebure, V. P. Richards, P. Lang, P. Pavinski-Bitar, M. J. Stanhope, “Gene Repertoire Evolution of Streptococcus pyogenes Inferred from Phylogenomic Analysis with Streptococcus canis and Streptococcus dysgalactiae”, PLOS ONE 7, e37607 (2012); 32. V. P. Richards, R. N. Zadoks, P. D. Pavinski Bitar, T. Lefbure, P. Lang, et al., “Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis”, BMC Microbiol. 12, 293 (2012); V. P. Richards, S. R. Palmer, P. D. Pavinski Bitar, X. Qin, G. M. Weinstock, et al., “Phy-logenomics and the Dynamic Genome Evolution of the Genus Streptococcus”, Genome Biol. Evol. 6, 741-753 (2014)]. In addition to sharing common hosts, S. canis CRISPR spacers that map to phage lysogens in S. pyogenes genomes were identified, which suggests they are overlapping viral hosts as well. This close evolutionary relationship has manifested itself in the sequence homology of ScCas9 and SpCas9, amongst other orthologous genes, predicted to be a result of lateral gene transfer (LGT). Nonetheless, from the alignment of SpCas9 and ScCas9, the first 1240 positions score with 93.5% similarity and the last 144 positions score with 52.8%. To account for the exceptional divergence in the PAM-interacting domain (PID) at the C-terminus of ScCas9 as well as the positive-charged inserted loop, focus was placed on alignment of the distinguishing sequences of ScCas9 to other Streptococcus Cas9 orthologs. Notably, the loop motif is present in certain orthologs, such as those from S. gordonii, S. anginosus, and S. intermedius, while the ScCas9 PID is mostly composed of disjoint sequences from other orthologs, such as those from S. phocae, S. varani, and S. equinis. Additional LGT events between these orthologs, as opposed to isolated divergence, more likely explain the differences between ScCas9 and SpCas9. The demonstration that two insertion motifs in ScCas9 alter PAM preferences, yet do not abolish PAM binding when removed, suggests other functional evolutionary intermediates in the formation of effective PAM preferences.
(70) Genus-Wide Prediction of Divergent Streptococcus Cas9 PAMs
(71) Demonstrations of efficient genome editing by Cas9 nucleases with distinct PAM specificity from several Streptococcus species, including S. canis, motivated development of a bioinformatics pipeline for discovering additional Cas9 proteins with novel PAM requirements in the Streptococcus genus. This method was termed the Search for PAMs by ALignment Of Targets (SPAMALOT). Briefly, a 20 nt portion of spacers flanked by known Streptococcus repeat sequences was mapped to candidate protospacers that align with no more than two mismatches in phages associated with the genus [S. A. Shmakov, V. Sitnik, K. S. Makarova, Y. I. Wolf, K. V. Severinov, et al., “The CRISPR Spacer Space Is Dominated by Sequences from Species-Specific Mobilomes”, mBio 8, e01397-17 (2017)]. 12 nt protospacer3′-adjacent sequences from each alignment were grouped by genome and CRISPR repeat, and then group WebLogos were generated to compute presumed PAM features.
(72)
(73)
(74) As the growth and development of CRISPR technologies continue, the range of targetable sequences remains limited by the requirement for a PAM sequence flanking a given target site. While significant discovery and engineering efforts have been undertaken to expand this range, there are still only a handful of CRISPR endonucleases with minimal specificity requirements. Here, an analogous platform for genome editing using the Cas9 from Streptococcus canis, a highly-similar SpCas9 ortholog with affinity to minimal 5′-NNG-3′ PAM sequences has been developed.
(75) Established PAM engineering methods, such as random mutagenesis and directed evolution, can only generate substitution mutations in protein coding sequences. In fact, another group utilized phage assisted continuous evolution (PACE) [K. M. Esvelt, J. C. Carlson, D. R. Liu, “A system for the continuous directed evolution of biomolecules”, Nature 472, 499-503 (2011)] to evolve an SpCas9 variant, xCas9(3.7), with preference for various 5′-NG-3′ PAM sequences [J. H. Hu, S. M. Miller, M. H. Geurts, W. Tang, L. Chen, et al., “Evolved Cas9 variants with broad PAM compatibility and high DNA specificity”, Nature 556, 5763 (2018)]. An alternative approach consists of inserting or removing motifs with specific properties, which may provide a sequence search space that more common mutagenic techniques cannot directly access. Here, an evolutionary example of this method is demonstrated with ScCas9, whose sequence disparities with SpCas9 include two divergent motifs that contribute to its minimal PAM sequence. Engineered variants lacking these motifs exhibit more stringent PAM specificities in PAM determination assays, and the removal of both motifs reverts its PAM specificity back to a more 5′-NGG-3′-like preference. While minimal inconsistencies in PAM preference between the utilized assays may arise from PAM-dependent allosteric changes that drive DNA cleavage [C. Anders, K. Bargsten, M. Jinek, “Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9”, Mol. Cell 61, 895-902 (2016)], the PAM flexibility of ScCas9, as compared to SpCas9, remains consistent in all tested contexts.
(76) To date, there are limited open-source tools or platforms specifically for the prediction of PAM sequences, though prior studies have conducted internal bioinformatics-based characterizations prior to experimental validation. Here, SPAMALOT is established as an accessible resource that is shared with the community for application to CRISPR cassettes from other genera. Future development will include broadening the scope of candidate targets beyond genus-associated phage to capture additional sequences that could be beneficial targets, such as lysogens in species that host the same phage. It is hoped that this pipeline can be utilized to more efficiently validate and engineer PAM specificities that expand the targeting range of CRISPR, especially for strictly PAM-constrained technologies such as base editing and homology repair induction.
(77) Because ScCas9 does not require any alterations to the sgRNA of SpCas9, and due to its significant sequence homology with SpCas9, identical modifications from previous studies [I. M. Slaymaker, L. Gao, B. Zetsche, D. A. Scott, W. X. Yan, et al., “Rationally engineered Cas9 Nucleases with improved specificity”, Science 351, 84-88 (2016); B. P. Kleinstiver, V. Pattanayak, M. S. Prew, S. Q. Tsai, N. T. Nguyen, et al., “High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects”, Nature 529, 490-495 (2016); J. S. Chen, Y. S. Dagdas, B. P. Kleinstiver, M. M. Welch, A. A. Sousa, et al., “Enhanced proofreading governs CRISPR-Cas9 targeting accuracy”, Nature 550, 407-410 (2017)] can be made to increase the accuracy and efficiency of the endonuclease and its variants, although it already demonstrates potential improved on-to-off activity as compared to the standard SpCas9 on 5′-NGG-3′ targets. Additionally, while the PAM specificity of ScCas9 on multiple targets in a variety of genome editing contexts has been exhaustively evaluated, the possibility remains that there may exist untested 5′-NNG-3′ genomic targets on which ScCas9 does not possess significant activity. Used together with SpCas9 and xCas9(3.7), however, ScCas9 expands the target range of currently-used Cas9 enzymes for genome editing purposes. With further development, this broadened Streptococcus Cas9 toolkit, containing both ScCas9 and additional, uncharacterized orthologs with expanded targeting range, will enhance the current set of CRISPR technologies.
(78) Applications of Engineered Streptococcus canis Cas9 Variants on Single Base PAM Targets.
(79) Specifically, the claimed invention comprises use of either the ScCas9 endonuclease with a T1227K (ScCas9+) or the PAM-interacting domain of SpCas9-NG grafted onto the N-terminal domain of ScCas9 (ScCas9-NG), in complex with guide RNA to enable specific recognition and activity on a DNA target immediately upstream of either an 5′-NG-3′ or 5′-NNG-3′ PAM sequence, promoting improved flexibility in target selection.
(80) To validate the predicted minimal G-rich PAM sequence of the described variants, a bacterial assay based upon lad promoter repression of GFP expression, employing a fully randomized 8-nucleotide library of PAM sequences upstream of lad, was utilized [Leenay, R. T. et al., “Identifying and visualizing functional PAM diversity across CRISPR-Cas systems”, Mol. Cell 62, 137-147 (2016)]. The library-containing plasmids were co-electroporated with a gRNA plasmid and a nuclease-activity deficient SpMacCas9 (dSpMacCas9) plasmid, all expressing different antibiotic resistance cassettes (Kanamycin, Ampicillin, Chloramphenicol, respectively). Transformants were collected in 5 ml of triple antibiotic-containing Luria Broth (LB) media. Overnight cultures were diluted to an ABS600 of 0.01 and cultured to an OD600 of 0.2. Cultures were analyzed and sorted on a FACSAria machine (Becton Dickinson). Events were gated based on forward scatter and side scatter and fluorescence was measured in the FITC channel (488 nm laser for excitation, 530/30 filter for detection), with at least 30,000 gated events for data analysis. Sorted GFP-positive cells were grown to sufficient density, and plasmids from the pre-sorted and sorted populations were then isolated, and the region flanking the nucleotide library was PCR amplified and submitted for Sanger sequencing (Genewiz).
(81) Histograms of the fluorescein isothiocyanate (FITC) channel demonstrate a significant increase of GFP-positive cells for both ScCas9-NG as well as ScCas9+, as compared to SpCas9, ScCas9, and SpCas9-NG (
(82) In some implementations, the invention includes the application of ScCas9-NG and ScCas9+ as tools for genome engineering in human cells. Briefly, the coding sequence of the described Cas9 variants are transiently transfected, using standard lipofection reagents (e.g. Lipofectamine 2000), as plasmids under the control of an Elongation Factor 1-alpha (EF1-α) promoter in HEK293T cells along with guide RNA vectors under the control of a U6 promoter containing spacer sequences targeting various 5′-NG-3′ and 5′-NNG-3′ PAM sequences at the standard VEGFA locus. After 5 days post transfection, individual cells are harvested for genomic extraction to allow for an approximately one kilobase (kb) window around the target to be amplified via polymerase chain reaction (PCR). Indel formation can be further verified on Sanger sequencing results utilizing the TIDE algorithm or ICE (Synthego). The invention further includes utilizing the described variants for applications such as, but not limited to, specific base conversions and gene regulation applications, such as transcriptional activation and repression.
(83) For in vitro and in vivo applications, the invention is compatible with additional delivery methods used for other CRISPR-Cas9 systems including, but not limited to, electroporation, viral infection, and nanoparticle injection. Embodiments can co-deliver the invention as a coding nucleic acid or protein, along with a gRNA. Components can also be stably expressed in cells.
(84) Engineering and PAM Determination of ScCas9++ Variant
(85) SpCas9-NG and xCas9-3.7 both harbor various substitutions in their open reading frames (ORFs) that allow reduced specificity from the canonical 5′-NGG-3′ to the more minimal 5′-NGN-3′ PAM. Specifically, positions 1218-1219 for both enzymes have been shown to be the most consequential in terms of PAM recognition [H. Nishimasu, X. Shi, S. Ishiguro, L. Gao, S. Hirano, et al., “Engineered CRISPR-Cas9 nuclease with expanded targeting space”, Science 361, 1259-1262 (2018); M. Guo, K. Ren, Y. Zhu, Z. Tang, Y. Wang, et al., “Structural insights into a high fidelity variant of SpCas9”, Cell Research 29, 183192 (2019)]. To engineer ScCas9 to possess improved PAM targeting capabilities, global pairwise alignments were performed using the BLOSUM62 scoring matrix [S. Henikoff, J. G. Henikoff, “Amino acid substitution matrices from protein blocks”, Proc. Natl. Acad. Sci. 89, 10915-10919 (1992] of various Streptococcus Cas9 orthologs to SpCas9, xCas9-3.7, and SpCas9-NG at these critical residues. The sequence alignment isolated a positive-charged lysine residue, derived from the S. gordonii Cas9 ORF. Substituting positive-charged residues into the PAM-interacting domain (PID) of Cas enzymes has been suggested to allow for the formation of novel PAM-proximal DNA contacts [B. P. Kleinstiver, A. A. Sousa, R. T. Walton, Y. E. Tak, J. T. Hsu, et al., “Engineered CRISPR-Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing”, Nat. Biotechnol. 37, 276-282 (2019)]. Motivated by this finding, the corresponding T1227K mutation was substituted into the ORF of ScCas9, generating ScCas9+(Sc+).
(86) One of the defining characteristics of ScCas9's PAM flexibility is its employment of a positive-charged loop, in positions 367 to 376 of its ORF, which does not exist in SpCas9 or its engineered variants [P. Chatterjee, N. Jakimo, J. M. Jacobson, “Minimal PAM specificity of a highly similar SpCas9 ortholog”, Science Advances 4:10, eaau0766 (2018)]. The obtained sequence alignments identified a divergent insertion from S. anginosus, which not only maintains the positive charge of the ScCas9 loop by compensating an extra lysine residue for a histidine, but also possesses an “SG” motif, a flexible sequence of residues used for linker design in protein engineering [X. Chen, J. Zaro, W. C. Shen, “Fusion Protein Linkers: Property, Design and Functionality”, Adv. Drug. Deliv. Rev. 65, 13571369 (2012)]. It was hypothesized that this novel loop may improve the targeting capabilities and efficiency of ScCas9 by allowing for more flexible protein-phosphate backbone contacts with the PAM sequence. Thus, the loop sequence from S. anginosus was substituted into the Sc+ ORF to generate ScCas9++(Sc++), as illustrated in
(87)
(88) Determination of PAM Sequences Recognized by Engineered ScCas9 Variants
(89) To comprehensively profile the PAM specificity of Sc+ and Sc++, in comparison to SpCas9, xCas9-3.7, and SpCas9-NG, as well as the wild-type ScCas9, a previously-developed positive selection bacterial screen based on green fluorescent protein (GFP) expression conditioned on PAM binding, termed PAM-SCALAR [R. T. Leenay, K. R. Maksimchuk, R. A. Slotkowski, R. N. Agrawal, A. A. Gomaa, et al., “Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems”, Mol. Cell 62, 137-147 (2016)], was utilized. Following transformation of the PAM-SCANR plasmid, harboring a randomized 5′-NNNNNNNN-3′ (8N) PAM library, an sgRNA plasmid targeting the fixed PAM-SCANR protospacer, and a corresponding dCas9 plasmid, FACS analysis was conducted to first determine the percent of GFP-positive cells in each population, a relative proxy for the percent of total PAM sequences being bound.
(90) The results demonstrated that both dSc+ and dSc++ bind to a greater percentage of PAM sequences, and dSc++ exhibits a shifted GFP-positive population, suggesting stronger binding capabilities and improved efficiency, as seen in
(91) Plasmid DNA from FACS-sorted GFP-positive cells and presorted cells were then extracted and amplified, and enriched PAM sequences were identified by Sanger sequencing, and visualized utilizing DNA chromatograms. Sequencing results indicate that the ScCas9 variants possess improved PAM specificity, as compared to xCas9-3.7, which demonstrates notable dependence on bases in downstream positions, and SpCas9-NG, which may require additional G nucleotides in positions 3 or 4 for efficient binding.
(92) Genome Editing Capability of Engineered ScCas9 Variants
(93) The PAM specificities and nucleolytic capabilities of Sc+ and Sc++ were compared to SpCas9, xCas9-3.7, SpCas9-NG, and ScCas9 by transfecting HEK293T cells with plasmids expressing each variant individually alongside one of 16 sgRNAs, together directed to four genomic loci with diverse PAM sequences, collectively representing every base at each position in the PAM window (Table 2). The sgRNA sequences were shifted by one base for xCas9-3.7 and SpCas9-NG to account for their reported 5′-NGN-3′ PAM preferences, so as to equivalently compare these enzymes to ScCas9 variants with 5′-NNG-3′ specificities.
(94) Table 2 summarizes the relevant sequence information for genome editing in human cells. Spacer and PAM sequences indicated are for use with ScCas9 variants and the standard SpCas9. All sequences for xCas9-3.7 and SpCas9-NG are shifted one base in the 3′ direction for equivalent comparison purposes, due to their reported 5′-NGN-3′ PAM sequences.
(95) TABLE-US-00002 TABLE 2 5′-Spacer-3′ 5′-PAM-3′ Gene Editing Context GGAGGGTGGCGAGAGGGGCC GAGATTG PVALB Nuclease [SEQ ID No: 7] TCTGACAATAGTCCTGTCTG GTGCATT PVALB Nuclease [SEQ ID No: 8] AAATGAATGAATGAGCAGAT GAGTGAA PVALB Nuclease [SEQ ID No: 9] CCAGAAGAATGGTGTCATTA GAGGGCC PVALB Nuclease [SEQ ID No: 10] ATTTCATTACAGGCAAAGCT GAGCAAA RUNX1 Nuclease/Base [SEQ ID No: 11] Editing GAAAATGCACCCTCTTCTGA AGGCGGG RUNX1 Nuclease [SEQ ID No: 12] GCTGAAACAGTGACCTGTCT TGGTTTT RUNX1 Nuclease [SEQ ID No: 13] AAACACCATGTACCACACAT GTGAACG DNMT1 Nuclease [SEQ ID No: 14] GGATTCCTGGTGCCAGAAAC AGGGGTG DNMT1 Nuclease [SEQ ID No: 15] GTTAACAGCTGACCCAATAA GTGGCAG DNMT1 Nuclease [SEQ ID No: 16] ATGTGAACGGACAGATTGAC ATGTTAA DNMT1 Nuclease [SEQ ID No: 17] GGTCTAGAACCCTCTGGGGA CCGTTTG DNMT1 Nuclease/Mismatch [SEQ ID No: 18] GCACCAGCGGACCCACACGG GCGAGAA ZSCAN2 Nuclease [SEQ ID No: 19] CATTCTGGTCATGCACCAGA GAGCCCA ZSCAN2 Nuclease [SEQ ID No: 20] ACAGGGGAGAAACCCTACGA GTGCCTG ZSCAN2 Nuclease [SEQ ID No: 21] GATGTGTGATAAAGTTAGAG CTGTTGC ZSCAN2 Nuclease [SEQ ID No: 22] GCCAGTCTCGATCCGCCCCG TCGTTCC AAVS2 Base Editing [SEQ ID No: 23] GCGGATCGAGACTGGCAACG GGGAAGG AAVS2 Base Editing [SEQ ID No: 24] GCTCGGCCACCACAGGGAAG CTGGGTG VEGF Base Editing [SEQ ID No: 25]
(96) After 5 days post-transfection, indel formation was quantified from Sanger sequencing ab1 files using the TIDE algorithm [E. K. Brinkman, T. Chen, M. Amendola, B. V. Steensel, “Easy quantitative assessment of genome editing by sequence trace decomposition”, Nucleic Acids Res. 42, e168 (2014)] following PCR amplification of the target genomic region. The results demonstrate that Sc+ and Sc++ can effectively edit across the various genomic loci, and demonstrate improved indel formation percentages for a majority of the targets tested. SpCas9, xCas9-3.7, and SpCas9-NG all edit on “GG” PAM targets, and maintain activity on various 5′-AGN-3′ PAM sequences. While xCas9-3.7 and SpCas9-NG additionally edit few sites that harbor 5′-CGN-3′ and 5′-TGN-3′ sequences, they performed poorly on all tested 5′-NGC-3′ PAM targets, consistent with previously reported data [J. H. Hu, S. M. Miller, M. H. Geurts, W. Tang, L. Chen, et al., “Evolved Cas9 variants with broad PAM compatibility and high DNA specificity”, Nature 556, 5763 (2018); H. Nishimasu, X. Shi, S. Ishiguro, L. Gao, S. Hirano, et al., “Engineered CRISPR-Cas9 nuclease with expanded targeting space”, Science 361, 1259-1262 (2018); K. Hua, X. Tao, P. Han, R. Wang, J. K. Zhu, “Genome engineering in rice using Cas9 variants that recognize NG PAM sequences”, Mol. Plant (2019); Z. Zhong, S. Stretenovic, Q. Ren, L. Yang, Y. Bao, et al. “Improving plant genome editing with high-fidelity xCas9 and non-canonical PAM-targeting Cas9-NG”, Mol. Plant (2019); M. Guo, K. Ren, Y. Zhu, Z. Tang, Y. Wang, et al., “Structural insights into a high fidelity variant of SpCas9”, Cell Research 29, 183192 (2019)].
(97) In contrast, Sc+ and Sc++ improve greatly upon the editing capabilities of the wild-type ScCas9 enzyme, demonstrating nearly 3-fold improvement in indel formation efficiency on certain 5′-NNGC-3′ targets, and even editing sites at which ScCas9, xCas9-3.7, and SpCas9-NG have negligible activity.
(98) The D10A nickase version of ScCas9+ was subsequently incorporated into the BE3 base editing architecture to examine whether the engineered ScCas9 variants may enable successful C.fwdarw.T base conversion. Following transfection of the ScCas9+BE3 plasmid and plasmids encoding sgRNAs directed at 4 genomic sites with PAM sequences representing each base at both flanking positions (Table 2), evident C.fwdarw.T base editing activities in the 5-nucleotide editing window were observed, in comparison to the unedited control, demonstrating that the engineered variants can be further utilized for base editing purposes. Together, this data suggests that Sc+ and Sc++ are efficient, broad-targeting enzymes that can be harnessed for diverse genome editing applications.
(99) Mismatch Tolerance Profile of a High-Fidelity Sc++ Nuclease
(100) To assess the off-target propensity of the engineered nucleases, a mismatch tolerance assay [J. S. Chen, Y. S. Dagdas, B. P. Kleinstiver, M. M. Welch, A. A. Sousa, et al., “Enhanced proofreading governs CRISPR-Cas9 targeting accuracy”, Nature 550, 407-410 (2017)] was conducted, employing sgRNAs harboring double or single mismatches to a fixed protospacer in the endogenous DNMT1 gene with a non-canonical 5′-CCGT-3′ PAM sequence (Table 2). Following TIDE analysis, it was observed that ScCas9 and Sc++ share similar mismatch tolerance profiles across the spacer sequence, as shown in
(101) Overall, double mismatches are tolerated less than single mismatches, and mismatches within the PAM-distal region of the spacer generally allow higher editing rates. As Sc++ possesses higher efficiency overall, however, the magnitude of activity for mismatched spacer sequences is greater. Thus, to ameliorate the mismatch tolerance of Sc++, a high-fidelity variant harboring the R701A mutation was engineered, which was previously isolated via high-throughput bacterial selection for SpCas9 to maintain high on-target activity while reducing off-target editing [C. A. Vakulskas, D. P. Dever, G. R. Rettig, R. Turk, A. M. Jacobi, et al., “A high-fidelity Cas9 mutant delivered as a ribonucleoprotein complex enables efficient gene editing in human hematopoietic stem and progenitor cells”, Nat. Medicine 24, 1216-1224 (2018)]. The engineered variant demonstrated a slight reduction in on-target editing from that of Sc++, but exhibited reduced activity on mismatched sequences. Overall, these results motivate the usage of this high-fidelity Sc++ for broad and efficient genome editing with reduced mismatch tolerance.
(102) Materials and Methods
(103) Identification of Cas9 Homologs and Generation of Plasmids. The UniProt database [The UniProt Consortium, “UniProt: the universal protein knowledgebase”, Nucleic Acids Res. 45, D158-D169 (2017)] was mined for all Streptococcus Cas9 protein sequences, which were used as inputs to either the BioPython painvise2 module or Geneious to conduct global pairwise alignments with SpCas9, using the BLOSUM62 scoring matrix [S. Henikoff, J. G. Henikoff, “Amino acid substitution matrices from protein blocks”, Proc. Natl. Acad. Sci. 89, 10915-10919 (1992], and subsequently calculate percent homology. The Cas9 from Streptococcus canis was codon optimized for E. coli, ordered as multiple gBlocks from Integrated DNA Technologies (IDT), and assembled using Golden Gate Assembly. The pSF-EF1-Alpha-Cas9WT-EMCV-Puro (OG3569) plasmid for human expression of SpCas9 was purchased from Oxford Genetics, and the ORFs of Cas9 variants were individually amplified by PCR to generate 35 bp extensions for subsequent Gibson Assembly into the OG3569 backbone. The pX330-SpCas9-NG (Addgene Plasmid #117919) and xCas9 3.7 (Addgene Plasmid #108379) were gifts from Osamu Nureki and David Liu, respectively. The Cas9 from S. canis was codon optimized for human cell expression, ordered as multiple gBlocks from Integrated DNA Technologies (IDT), and assembled using Gibson Assembly into a mammalian expression backbone harboring an EF1α promoter and coexpressing GFP.
(104) Engineering of the coding sequence of ScCas9 to generate the T1227K, S. anginosus loop, and R701A substitutions were conducted using the KLD Enzyme Mix (NEB) following PCR amplification with mutagenic primers (Genewiz). Engineering of the coding sequence of ScCas9 and SpCas9 for removal or insertion of motifs was conducted using either the Q5 Site-Directed Mutagenesis Kit (NEB) or Gibson Assembly.
(105) To assemble ScCas9 base editing plasmids, pCMV-ABE(7.10) (Addgene plasmid #102919) and pCMV-BE3 (Addgene plasmid #73021) were received as gifts from David Liu. Similarly, the ORF of the ScCas9 D10A nickase was amplified by PCR to generate 35 bp extensions for subsequent Gibson Assembly into each base editing architecture backbone. sgRNA plasmids were constructed by annealing oligonucleotides coding for crRNA sequences as well as 4 bp overhangs, and subsequently performing a T4 DNA Ligase-mediated ligation reaction into a plasmid backbone immediately downstream of the human U6 promoter sequence. Assembled constructs were transformed into 50 μL NEB Turbo Competent E. coli cells, and plated onto LB agar supplemented with the appropriate antibiotic for subsequent sequence verification of colonies and plasmid purification.
(106) PAM-SCANR Assay. Plasmids for the SpCas9 sgRNA and PAM-SCANR genetic circuit, as well as BW25113 ΔlacI cells, were generously provided by the Beisel Lab (North Carolina State University). Plasmid libraries containing the target sequence followed by either a fully-randomized 8-bp 5′-NNNNNNNN-3′ library or fixed PAM sequences were constructed by conducting site-directed mutagenesis, utilizing the KLD enzyme mix (NEB) after plasmid amplification, on the PAM-SCALAR plasmid flanking the protospacer sequence (5′-CGAAAGGTTTTGCACTCGAC-3′) [SEQ ID No: 5]. Nuclease-deficient mutations (D10A and H850A) were introduced to the ScCas9 variants using Gibson Assembly. The provided BW25113 cells were made electrocompetent using standard glycerol wash and resuspension protocols. The PAM library and sgRNA plasmids, with resistance to kanamycin (Kan) and carbenicillin (Crb) respectively, were co-electroporated into the electrocompetent cells at 2.4 kV, outgrown, and recovered in Kan+Crb Luria Broth (LB) media overnight. The outgrowth was diluted 1:100, grown to ABS600 of 0.6 in Kan+Crb LB liquid media, and made electrocompetent. Indicated dCas9 plasmids, with resistance to chloramphenicol (Chl), were electroporated in duplicates into the electrocompetent cells harboring both the PAM library and sgRNA plasmids, outgrown, and collected in 5 mL Kan+Crb+Chl LB media. Overnight cultures were diluted to an AB S600 of 0.01 and cultured to an OD600 of 0.2. Cultures were analyzed and sorted on a FACSAria machine (Becton Dickinson).
(107) Events were gated based on forward scatter and side scatter and fluorescence was measured in the FITC channel (488 nm laser for excitation, 530/30 filter for detection), with at least 30,000 gated events for data analysis. Sorted GFP-positive cells were grown to sufficient density, and plasmids from the pre-sorted and sorted populations were then isolated, and the region flanking the nucleotide library was PCR amplified and submitted for Sanger sequencing (Genewiz). Bacteria harboring non-library PAM plasmids, performed in duplicates, were analyzed by FACS following electroporation and overnight incubation, and represented as the percent of GFP-positive cells in the population, utilizing standard deviation to calculate error bars. Additional details on the PAM-SCALAR assay can be found in Leenay, et al. [R. T. Leenay, K. R. Maksimchuk, R. A. Slotkowski, R. N. Agrawal, A. A. Gomaa, et al., “Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems”, Mol. Cell 62, 137-147 (2016].
(108) Cell Culture and Gene Modification Analysis.
(109) HEK293T cells were maintained in DMEM supplemented with 100 units/ml penicillin, 100 mg/ml streptomycin, and 10% fetal bovine serum (FBS). For the initial ScCas9+ experiments, sgRNA plasmids (500 ng) and effector (nuclease, BE3, or ABE(7.10)) plasmid (500 ng) were transfected into cells as duplicates (2×10.sup.5/well in a 24-well plate) with Lipofectamine 2000 (Invitrogen) in Opti-MEM (Gibco). After 48 hours post-transfection, genomic DNA was extracted using QuickExtract Solution (Epicentre), and genomic loci were amplified by PCR utilizing the KAPA HiFi HotStart ReadyMix (Kapa Biosystems).
(110) For base editing analysis, amplicons were purified and submitted for Sanger sequencing (Genewiz). For indel analysis, the T7E1 reaction was conducted according to the manufacturer's instructions and equal volumes of products were analyzed on a 2% agarose gel stained with SYBR Safe (Thermo Fisher Scientific). Unprocessed gel image files were analyzed in Fiji [J. Schindelin, I. Arganda-Carreras, E. Frise, V. Kaynig, M. Longair, et al., “Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676-682 (2012)]. The cleaved bands of interest were isolated using the rectangle tool, and the areas under the corresponding peaks were measured and calculated as the fraction cleaved of the total product. Percent gene modification was calculated as follows [D. Y. Guschin, A. J. Waite, G. E. Katibah, J. C. Miller, M. C. Holmes, et al., “A Rapid and General Assay for Monitoring Endogenous Gene Modification”, Methods Mol. Biol. 649, 247-256 (2010]:
% gene modification=100×(1−(1−fraction cleaved).sup.†)
All samples were performed in duplicates and percent gene modifications were averaged. Standard deviation was used to calculate error bars.
(111) For follow-on and ScCas9++ experiments, sgRNA plasmids (100 ng) and effector (nuclease and BE3) plasmids (100 ng) were transfected into cells as duplicates (2×10.sup.4/well in a 96-well plate) with Lipofectamine 3000 (Invitrogen) in Opti-MEM (Gibco). After 5 days post-transfection, genomic DNA was extracted using QuickExtract Solution (Epicentre), and genomic loci were amplified by PCR utilizing the Phusion Hot Start Flex DNA Polymerase (NEB). Amplicons were enzymatically purified and submitted for Sanger sequencing (Genewiz). Sanger sequencing ab1 files were either analyzed using the TIDE algorithm (tide.deskgen.com) in comparison to an unedited control to calculate indel frequencies, or by the internally-developed BEEP software for base editing analysis. All samples were performed in duplicates and modification values were averaged. Standard deviation was used to calculate error bars.
(112) Base editing analysis with Traffic Light Reporter. HEK293T cells were maintained as previously described, and transfected with the corresponding sgRNA plasmids (333 ng), ABE7.10 plasmids (333 ng), and synthetically constructed TLR plasmids (333 ng) into cells as duplicates (2×105/well in a 24-well plate) with Lipofectamine 2000 (Invitrogen) in Opti-MEM (Gibco). After 5 days post-transfection, cells were harvested and analyzed on a FACSCelesta machine (Becton Dickinson) for mCherry (561 nm laser excitation, 610/20 filter for detection) and GFP (488 nm laser excitation, 530/30 filter for detection) fluorescence. Cells expressing mCherry were gated and percent GFP calculation of the subset were calculated. All samples were performed in duplicates and percentage values were averaged. Standard deviation was used to calculate error bars. The TLR spacer sequence is 5′-TTCTGTAGTCGACGGTACCG-3′ [SEQ ID No: 6].
(113) Base Editing Evaluation Program. The Base Editing Evaluation Program (BEEP) was written in Python, employing the pandas data manipulation library and BioPython package. As inputs, the program requires a sample ab1 file, a negative control ab1 file, a target sequence, as well as the position of the specified base conversion, either handled as a .csv file for multiple sample analysis or for individual samples on the command line. Briefly, the provided target sequences are aligned to the base-calls of each input ab1 file to determine the absolute position of the target within the file. Subsequently, the peak values for each base at the indicated position in the spacer are obtained, and the editing percentage of the specified base conversion is calculated. Finally, a separate function normalizes the editing percentage to that of the negative control ab1 file to account for background signals of each base. The final base conversion percentage is outputted to the same .csv file for downstream analysis.
(114) SPAMALOT Pipeline. All 11,440 Streptococcus bacterial and 53 Streptococcus associated phage genomes were downloaded from NCBI. CRISPR repeats catalogued for the genus were downloaded from CRISPRdb hosted by University of Paris-Sud [I. Grissa, G. Vergnaud, C. Pourcel, “The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats”, BMC Bioinform. 8, 172 (2007)]. For each genome, spacers upstream of a specific repeat sequence were collected with a toolchain consisting of the fast and memory-efficient Bowtie 2 alignment [B. Langmead, S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2”, Nat. Methods 9, 357359 (2012)]. Each genome and repeat-type specific collection of spacers were then matched to all phage genomes using the original Bowtie short-sequence alignment tool [B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome”, Genome Biol. 10, R25 (2009)] to identify candidate protospacers with at most one, two, or no mismatches. Unique candidates were input into the WebLogo 3 [Crooks, G. E. et al. “WebLogo: a sequence logo generator”, Genome Res. 14, 1188-1190 (2004)] command line tool for prediction of PAM features.
(115) Statistical analysis. Data are shown as mean±s.d., unless stated otherwise. Statistical analysis was performed using the two-tailed Students t-test, utilizing the SciPy software package. Calculated p-values, as compared to the negative control, are represented as follows: *P≤0.05, **P≤0.01, ***P≤0.001, and ****P≤0.0001. Data was plotted using Matplotlib.
(116) The present invention demonstrates the natural PAM plasticity of a highly similar, yet previously uncharacterized, Cas9 from Streptococcus canis (ScCas9) through rational manipulation of distinguishing motif insertions. Affinity to minimal 5′-NNG-3′ PAM sequences and the accurate editing capabilities of the ortholog in both bacterial and human cells have been demonstrated. In one aspect of the invention, an automated bioinformatics pipeline, the Search for PAMs by ALignment Of Targets (SPAMALOT) further explores the microbial PAM diversity of otherwise-overlooked Streptococcus Cas9 orthologs. The results establish that ScCas9 can be utilized both as an alternative genome editing tool and as a functional platform to discover novel Streptococcus PAM specificities.
(117) At least the following aspects, implementations, modifications, and applications of the described technology are contemplated by the inventors and are considered to be aspects of the presently claimed invention:
(118) (1) An isolated, engineered Streptococcus canis Cas9 (ScCas9) protein with its PID being the PID amino acid composition of SpCas9-NG.
(119) (2) An isolated, engineered ScCas9 protein having a threonine-to-lysine substitution mutation at position 1227 in its amino acid sequence.
(120) (3) An isolated, engineered ScCas9 protein having a threonine-to-lysine substitution mutation at position 1227 in its amino acid sequence and a substitution of residues ADKKLRKRSGKLATE [SEQ ID No: 4] in position 365-379 in the ScCas9 open reading frame, in addition to the T1227K substitution (Sc++).
(121) (4) CRISPR-associated DNA endonucleases with a PAM specificity of 5′-NG-3′ or 5′-NNG-3′.
(122) (5) A method of altering expression of at least one gene product, comprising steps of introducing, into a eukaryotic cell containing and expressing a DNA molecule having a target sequence and encoding the gene product, an engineered, non-naturally occurring Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR-associated (Cas) (CRISPR-Cas) system comprising one or more vectors comprising:
(123) (a) a regulatory element operable in a eukaryotic cell operably linked to at least one nucleotide sequence encoding a CRISPR system guide RNA that hybridizes with the target sequence, and
(124) (b) a second regulatory element operable in a eukaryotic cell operably linked to a nucleotide sequence encoding at least one protein selected from the group comprising an isolated, engineered Streptococcus canis Cas9 (ScCas9) protein with its PID as the PID amino acid composition of SpCas9-NG and an isolated, engineered ScCas9 protein with its harboring a threonine-to-lysine substitution mutation at position 1227 in its amino acid sequence, wherein components (a) and (b) are located on same or different vectors of the system, whereby the guide RNA targets the target sequence and one or more of the proteins cleave the DNA molecule, whereby expression of the at least one gene product is altered and wherein the proteins and the guide RNA do not naturally occur together.
(125) While preferred embodiments of the invention are disclosed herein, many other implementations will occur to one of ordinary skill in the art and are all within the scope of the invention. Each of the various embodiments described above may be combined with other described embodiments in order to provide multiple features. Furthermore, while the foregoing describes a number of separate embodiments of the apparatus and method of the present invention, what has been described herein is merely illustrative of the application of the principles of the present invention. Other arrangements, methods, modifications, and substitutions by one of ordinary skill in the art are therefore also considered to be within the scope of the present invention.