HIGH-PRECISION BASE EDITORS
20220154163 · 2022-05-19
Inventors
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
A61K31/7088
HUMAN NECESSITIES
A61K38/465
HUMAN NECESSITIES
C12N2800/80
CHEMISTRY; METALLURGY
C12N2710/16143
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
A61K38/50
HUMAN NECESSITIES
C12N9/78
CHEMISTRY; METALLURGY
International classification
C12N9/78
CHEMISTRY; METALLURGY
A61K31/7088
HUMAN NECESSITIES
A61K38/50
HUMAN NECESSITIES
C12N15/11
CHEMISTRY; METALLURGY
C12N15/86
CHEMISTRY; METALLURGY
C12N15/90
CHEMISTRY; METALLURGY
Abstract
The present invention relates to a base editing compound comprising or consisting of (a) a Cas protein, and, covalently connected therewith; (b) a nucleobase-modifying enzyme, wherein the covalent connection of (a) and (b) is (i) direct; (ii) provided by a peptide comprising at least one Pro residue, said peptide having a length between 1 and 20 preferably between 1 and 15 amino acids; or (iii) provided by a non-peptidic linker, said non-peptidic linker being a small organic molecule comprising one or more double bonds, one or more triple bonds, and/or one or more aromatic rings.
Claims
1. A base editing compound comprising or consisting of (a) a Cas protein, and, covalently connected therewith; (b) a nucleobase-modifying enzyme, wherein the covalent connection of (a) and (b) is (i) direct; (ii) provided by a peptide comprising at least one Pro residue, said peptide having a length between 1 and 20, preferably between 1 and 15 amino acids; or (iii) provided by a non-peptidic linker, said non-peptidic linker being a small organic molecule comprising one or more double bonds, one or more triple bonds, and/or one or more aromatic rings.
2. The compound of claim 1, wherein (a) said Cas protein is a Cas nickase or dead Cas, said Cas nickase preferably being Cas9 or Cas12, and said dead Cas preferably being dead Cas9 or dead Cas12; and/or (b) said nucleobase-modifying enzyme is selected from a deaminase, a nucleoside synthase, a DNA methyl transferase and a DNA demethylase, said deaminase preferably being selected from the APOBEC, CDA1 or Tad/ADAR families, APOBEC3A being particularly preferred.
3. The compound of claim 1 or 2, wherein said deaminase is truncated at the N- and/or C-terminus, wherein in case of APOBEC deaminases C-terminal truncation is preferred, an APOBEC3A with a C-terminal truncation of 17 amino acids (A3AΔ182) being particularly preferred, and in case of CDA1 deaminases, truncations from residue 188 or residue 198 onwards being preferred.
4. The compound of claim 3, wherein the truncated residues are not essential for catalytic activity of said deaminase.
5. The compound of any one of claims 1 to 4, wherein said compound comprises said peptide and wherein said peptide (i) consists of an amino acid sequence comprising 1 to 10 Pro residues and 1 to 10 small amino acid residues, said small amino acid residues preferably being selected from Ala, Gly and Ser, said amino acid sequence preferably being the sequence of SEQ ID NO: 130 (XTEN); (ii) has a length of 5, 6 or 7 amino acids; and/or (iii) consists of the sequence A.sub.m(PA).sub.nP.sub.p, wherein m and p are independently 0 or 1 and n is 1, 2, 3, 4, 5, 6 or 7, for example of SEQ ID NO: 154 or 162.
6. The compound of any of the preceding claims, said compound comprising or further consisting of one or both of (a) an inhibitor of base excision repair, preferably an uracil DNA glycosylase inhibitor (UGI), more preferably the sequence of SEQ ID NO: 149; and (b) a nuclear localization signal (NLS), preferably the sequence of SEQ ID NO: 135; wherein (a) and/or (b) are preferably connected to each other and/or to said Cas protein with a peptidic linker consisting of 1 to 10 amino acids, said linker preferably consisting of the sequence of SEQ ID NO. 132 or 148.
7. The compound of any one of the preceding claims, wherein (a) said deaminase is APOBEC3A (A3A; SEQ ID NO: 183), wherein preferably said A3A is truncated at the C-terminus; (b) said deaminase is A3AΔ182 (SEQ ID NO: 205) and is fused to the N-terminus of said Cas protein; (c) said deaminase is APOBEC1, preferably consisting of the sequence of SEQ ID NO: 129, and is fused to the N-terminus of said Cas protein; or (d) said deaminase is CDA1, preferably consisting of the sequence of SEQ ID NO: 137; and is fused to the N-terminus or the C-terminus, preferably to the N-terminus of said Cas protein, wherein preferably the C-terminus of said CDA1 is truncated, preferably either from position 198 onwards or from any of positions 188 to 194 onwards.
8. The compound of any one of the preceding claims, wherein (a) said Cas protein consists of the amino acid sequence of SEQ ID NO: 1 or a sequence with at least 80% identity thereto and preferably providing nickase activity, or is encoded by the nucleic acid sequence of SEQ ID NO: 2 or a sequence with at least 80% thereto identity and preferably encoding a protein with nickase activity, and is preferably selected from VQR-Cas9 (amino acid sequence of SEQ ID NO: 121), VRER-Cas9 (amino acid sequence of SEQ ID NO: 122), xCas9 (amino acid sequence of SEQ ID NO: 123), and Cas9-NG (amino acid sequence of SEQ ID NO: 124) or encoded by any one of SEQ ID NOs: 23, 24, 25 or 26; and/or (b) said deaminase consists of the amino acid sequence of any one of SEQ ID NOs: 205, 129, 137, 169, 176, 183, 198, 212, 219, 3 and 5, a sequence with at least 80% identity and providing deaminase activity or a truncated version of any such sequence, or is encoded by the nucleic acid sequence of any one of SEQ ID NOs: 107, 31, 55, 71, 78, 85, 100, 114, 220, 4 and 6, a sequence with at least 80% identity and encoding a protein with deaminase activity or a truncated version of any such sequence.
9. The compound of any one of the preceding claims, wherein said compound is a single polypeptide and comprises or consists of an amino acid sequence selected from SEQ ID NOs: 204, 7, 9, 11, 13, 136, 218, 190, 144, 168, 182, 128, 152, 204 and 211.
10. A nucleic acid encoding the compound of any one of the preceding claims, to the extent said compound is a single polypeptide.
11. A method of base editing, said method comprising introducing into a cell a nucleic acid of claim 10 or a compound of any one of claims 1 to 9.
12. The method of claim 11, further comprising introducing into said cell a guide nucleic acid for said nickase.
13. The method of claim 11 or 12, wherein said method is performed in vitro or ex vivo.
14. A pharmaceutical composition comprising or consisting of (a) the compound of any one of claims 1 to 9; and/or (b) the nucleic acid of claim 10.
15. The pharmaceutical composition of claim 14, further comprising or further consisting of a guide nucleic acid for said nickase, wherein said guide nucleic acid comprises a sequence which is homologous to a subsequence of a target gene, wherein said target gene is associated with a genetic disorder.
16. A compound of any one of claims 1 to 9 or a nucleic acid of claim 10, and a guide nucleic acid for said nickase for use in a method of treating, alleviating or preventing a disorder, wherein said guide nucleic acid comprises a sequence which is homologous to a subsequence of a target gene, wherein said disorder is associated with a point mutation or an SNP in said target gene.
17. A kit comprising or consisting of (a) (i) one or more compounds of any one of claims 1 to 9; and/or (ii) one or more nucleic acids of claim 10.
18. The kit of claim 17, furthermore comprising or further consisting of (b) one or more guide nucleic acids for the nickase comprised in said compound, wherein each of said guide nucleic acids comprises a sequence which is identical to a subsequence of a given target gene; and/or (c) a manual comprising instructions for performing the method of any one of claims 11 to 13.
19. The kit of claim 17 or 18, wherein said kit comprises a plurality of said compounds and/or a plurality of said nucleic acids, wherein at least two of said compounds of (a)(i) or at least two of the compounds encoded by said nucleic acids of (a)(ii) differ with regard to their base editing profile.
20. Use of a peptide as defined in any one of the preceding claims or of a non-peptidic linker as defined in claims 1 and 9 for covalently connecting a Cas protein such as a Cas nickase (nCas) or a dead Cas (dCas) and a deaminase (DA) to provide a base editing compound.
21. The use of claim 20, wherein said deaminase is truncated at the N- or C-terminus.
Description
[0143] The figures show:
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154] The examples illustrate the invention.
Example 1
Methods
[0155] Yeast Strains and Growth Conditions.
[0156] Saccharomyces cerevisiae BY4743 (diploid, MAT a/α, his3Δ1/his3Δ1, leu2Δ0/leu2Δ0, LYS2/lys2Δ0, met15Δ0/MET15, ura3Δ0/ura3Δ) was used as host strain for genome editing. Cells were grown non-selectively in YPAD medium (2% Bacto peptone, 1% Bacto yeast extract, 2% glucose, 0.003% adenine hemisulfate). For culture in Petri dishes, the medium was solidified with 2% agar. Selection of yeast transformants based on the URA3 and LEU2 markers was done on a synthetic complete (SC) medium (6.7 g/L of Difco Yeast Nitrogen Base, 20 g/L glucose) and a mixture of appropriate amino acids deficient in uracil and leucine (SC-U-L). Yeast strains were cultivated at 28° C. on a rotary shaker.
[0157] DNA Methods.
[0158] PCR was performed with Phusion High-Fidelity DNA Polymerase (ThermoFisher) according to the manufacturer's instructions. Cloning and amplification of plasmids were carried out in the E. coli strain DH5α. Plasmids harboring the Streptococcus pyogenes cas9 gene (p415-GaIL-Cas9-CYC1t) and a chimeric guide RNA construct (p426-SNR52p-gRNA.CAN1.Y-SUP4t) were provided by the laboratory of Dr. George Church and obtained from Addgene (Cambridge, Mass., USA).
[0159] To generate APOBECI base editors, the APOBECI reading frame and the partial cas9 sequence were PCR-amplified using oligonucleotides with overlapping linker sequences. The two fragments were cloned into the Spel/Sbfl-digested p415-GaIL-Cas9-CYC1t with the help of the In-Fusion HD Cloning Kit (Clontech, CA, USA). The D10A point mutation was introduced into cas9 with primers harboring the desired mutation by amplification of the entire plasmid template followed by DpnI digestion to remove the parental template. The UGI gene was codon-optimized for yeast and synthesized (Eurofins Genomics, Ebersberg, Germany), followed by insertion into the AscI/MluI-digested vector p415-GaIL-Cas9-CYC1t. To generate CDA1 base editors, the reading frame encoding pmCDA1 was PCR-amplified to replace the APOBECI fragment within BE3, thus generating nCDA1-BE3. To produce a fusion of CDA1 to the C-terminus of Cas9, plasmid pRS315e_pGal-nCas9 (D10A)-PmCDA1 (provided by the laboratory of Akihiko Kondo, Hyogo, Japan, and obtained from Addgene) was modified. First, the amplified UGI sequence was introduced into the XbaI site, and the resulting vector was then digested with Ascl and Sphl. Subsequently, two PCR fragments (overlapping by the XTEN linker sequence) were inserted to generate cCDA1-BE3. Insertion of three PCR fragments (covering XTEN and APOBEC1) produced base editor cBE3. The CDA1 protein truncations were generated by PCR amplification, and cloned into SpeI/Sbf1-digested BE3 or AscI/SphI-digested cBE3 vectors to produce the ΔCDA1-Cas and Cas-ΔCDA1 vector series, respectively. To produce YEE-BE3, the mutated APOBECI from plasmid pCMV-dCpf1-BE-YEE (provided by the laboratory of Jia Chen, Shanghai, China, and obtained from Addgene) was PCR amplified and cloned into SpeI/Sbf1-digested BE3.
[0160] To generate CDA1-BE3 variants with VQR-Cas9, the three required point mutations (D1135V/R1335Q/T1337R) were introduced into the cas9 gene by PCR with primers harboring the desired mutations, and the resulting three PCR products were cloned into the NruI/NcoI-digested BE3 to obtain VQR-BE3 with the help of the In-Fusion HD Cloning Kit (Clontech, Mountain View, Calif., USA). The mutated fragment was then released by digesting VQR-BE3 with NruI and MluI, followed by ligation into the similarly digested CDA1 BE plasmid (21). To construct VRER-BE3 variants, three fragments containing the four mutations (D1135V/G1218R/R1335E/T1337R) were PCR-amplified followed by cloning into the NruI/MluI-digested VQR-BE3. The mutated fragment was then excised by digesting VRER-BE3 with NruI and MluI, and ligated into the CDA1 BE construct cut with the same enzyme combination. For the generation of SpCas9-NG BE3 variants, four fragments containing the seven mutations (R1335V/L1111R/D1135V/G1218R/E1219F/A1322R/T1337R) were PCR-amplified followed by cloning into the Nrul/Mlul-digested vector VQR-BE3. The mutated fragment was released by digesting SpCas9-NG-BE3 with NruI and MluI and cloned into the similarly cut CDA1 BE plasmid. For the construction of xCas9 variants, plasmid xCas9 (3.7)-BE3 (obtained from Addgene) was digested with the restriction enzymes Sbf1 and AscI. The resulting 3.7 kb fragment was then inserted into the CDA1 BE construct digested with Sbf1 and AscI. To obtain cCDA1-BE3 variants, the mutated fragments were PCR-amplified using the corresponding BE3 variant as template and cloned into the NurI/SphI-digested cCDA1-BE3 plasmid (21).
[0161] To generate hA3A, hA3B, hA3G, hAID, mAID, cAICDA and truncated hA3A base editors, the deaminase genes were PCR-amplified from plasmid clones (provided by the laboratory of Dr. Jia Chen, Shanghai, China, and obtained from Addgene) together with part of the cas9 sequence, and then ligated into the SpeI/SbfI-digested BE3 vector. To produce A3A(R128A)-BE3, A3A(Y130F)-BE3 as well as eA3A-BE3, the point mutations (R128A, Y130F and N57G) were introduced into A3A with primers containing the appropriate mutations.
[0162] To generate plasmids expressing sgRNAs that target-specific sites, the protospacer sequences were introduced by PCR amplification, and the resulting PCR products were cloned into the Clal/Kpnl-digested vector p426-SNR52p-gRNA.CAN1.Y-SUP4t with the In-Fusion HD Cloning Kit (Clontech, CA, USA).
[0163] Yeast Transformation and Genomic DNA Extraction.
[0164] Yeast cells were transformed with the LiAc/SS carrier DNA/PEG method using 0.5-1 μg plasmid DNA (Gietz, R. D. & Schiestl, R. H. Quick and easy yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 35-37 (2007)). Transgenic clones were selected on SC-U-L media and confirmed by PCR analyses. Yeast genomic DNA was extracted according to a published protocol (Lõoke, M., Kristjuhan, K. & Kristjuhan, A. Extraction of genomic DNA from yeasts for PCR-based applications. Biotechniques 50, 325-328 (2011)). PCR products were purified (PCR Purification kit; Macherey-Nagel) and then sequenced.
[0165] CAN1 Mutagenesis.
[0166] Yeast colonies were picked, suspended in 3 mL SC medium with 2% glucose and without leucine and uracil, and grown to a stationary phase. The cells were then pelleted, washed twice in sterile water, and then resuspended in SC induction medium with 2% galactose and 1% raffinose, but without leucine and uracil, to an OD600 of 0.3. The cells were incubated for 20 h prior to plating on YPAD rich or SC media plates without arginine but with 60 mg/mL L-canavanine (Sigma). After incubation for 3 days, the colony number on each plate was counted. The C-to-T mutation frequency in CAN1 was determined as the ratio of the colony count on canavanine-containing plates to the colony count on YPAD-rich media plates. Each experiment was performed at least three times on different days. To determine the mutation spectrum, colonies were randomly picked and suspended in sterile water, followed by PCR amplification of the relevant CAN1 fragment and DNA sequencing. Control cultures (not treated with base editors) did not produce canavanine-resistant colonies.
[0167] Next-Generation Sequencing.
[0168] Yeast colonies harboring plasmids expressing base editors and sgRNAs were picked from SC-L-U plates, suspended in 3 mL SCL-U medium with 2% glucose, and grown to a stationary phase. The cultures were then washed twice to remove residual glucose, resuspended in 5 mL SC-L-U medium with 2% galactose and 1% raffinose to an OD600 of 0.3, and incubated for 20 h at 28° C. on a rotary shaker. Genomic DNA was extracted from culture samples of 0.5 mL volume, and the regions targeted by base editing were amplified by PCR with primer pairs containing index tags for sample multiplexing. PCR amplification was performed with the Phusion High-Fidelity DNA Polymerase (ThermoFisher) according to the manufacturer's protocol, followed by product purification with the NucleoSpin Gel and PCR Clean-up kit (Machery-Nagel). The purified index-labeled PCR products were pooled at equal molar ratios. PCR-free library construction and NGS sequencing, demultiplexing by assigning reads to samples, and data filtering (including removal of adaptor sequences, contaminations and low-quality reads from raw reads) were done commercially (BGI, Hong Kong). Sequencing was performed on an Illumina MiSeq 4000 platform in a paired-end way to obtain 150 bp read length for each side and, on average, more than 100,000 reads per sample.
[0169] Data Analysis.
[0170] The clean FASTQ files obtained after data filtering were further analyzed with python scripts (available at https://github.com/zfcarpe/Cas9Sequencing). Briefly, the “pattern_extract.py” was first applied to scan all sequencing reads and extract the reads with the fixed length of the editing region (and exactly matching the two flanking sequences). This procedure excluded indel-containing and imperfectly matching reads, and allows summarizing each base calling in an alignment-like manner. Subsequent application of the “result_stat.py” script scanned each base within the editing region and calculated the frequency of each base converted to one of the other three bases by dividing the respective read number by the total number of sequencing reads to obtain the percentage of C-to-T editing and the percentage of edited reads with the C converted to any of the other bases. In addition, the script calculates the frequencies of all edited products by scanning each aligned read for conversion of the potential target cytidines. For the analysis of indel frequencies, the sequencing reads were scanned for two exactly matching 10-bp sequences that flank both sides of the region of interest (i.e., the sequence containing the editing sites). Reads without exact matches were excluded from further analysis. By calculating the length of the region, all sequencing reads exactly matching the length of the reference sequence were classified as not containing an indel, otherwise the read was classified as harboring an indel. A shell script “Cas9Sequencing.sh” combined the processes.
Example 2
Results
[0171] Rigid Linkers Improve Precision of APOBEC1-Based Editors.
[0172] We hypothesized that the positioning on the target sequence of the Cas9 protein relative to the deaminase domain (i.e., their physical distance) and the rigidity of the connection between these two domains of the base editor determine the width of the editing window, and hence the precision of the base editor. In previous studies, a 16 amino acid (aa) flexible linker (XTEN) has been identified as the best compromise between editing efficiency and specificity (Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage. Nature 533, 420-424 (2016)). Using L-canavanine selection in yeast (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)), we first investigated the effects of length and rigidity of the linker between APOBEC1 and nCas9 (Cas9 nickase) on base editing precision and efficiency when targeting several sites in the Can1 gene (
[0173] It was reported that mutations in the APOBECI domain of BE3 can also narrow the base editing width. We, therefore, compared the base editing outcome of BE3, YEE-BE3 (the optimal BE3 variant (Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371-376 (2017)), and BE-PAPAPAP when targeting the Can1 sites. We found that YEE-BE3, although mainly editing C.sub.−15 or C.sub.−16, suffered from strongly reduced editing activity at these sites. Although it will be important to confirm this deficit for additional sequence contexts, this finding is consistent with a recent study that also reported low editing efficiency of the YEE-BE3 base editor (Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)).
[0174] Previous work has mostly investigated the activity of base editors in favorable sequence contexts, with relatively few C targets within the protospacer sequence. To develop a more rigorous (and Can1-independent) assay for base editor specificity, we also investigated the worst-case scenario, in which all nucleotides within the BE3 activity window are Cs (i.e., a nonacytidine motif from −13 to −21). Analysis of editing products by deep sequencing revealed that base editors with 5-7 aa rigid linkers mainly edited at positions C.sub.−14 to C.sub.−16.
[0175] These editors showed greatly improved site selectivity and a narrowed editing window, while retaining up to 90% of the editing efficiency of the original BE3.
[0176] Importantly, when editing product distribution was analyzed, BE3-treated sequences mostly contained four simultaneously edited bases, whereas short rigid linker-containing base editors predominantly generate products with one to three edited bases, thus providing further evidence for short rigid linkers leading to more precise editing.
[0177] Engineering of Improved CDA1-Based Editors.
[0178] To test whether other base editors can also be improved by engineering the linker region connecting the nucleoside deaminase domain with the nCas9 domain, we next applied a similar strategy to CDA1, the AID homolog of sea lamprey (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)) that has been reported to exhibit superior performance to APOBEC1 in certain sequence contexts (Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017)).
[0179] When fused to nCas9 with flexible linkers up to 100 aa long (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)), CDA1 conducts C-to-T conversion in a window of approximately −16 to −19. To better understand what influences the width of the activity window, we generated four constructs for direct comparison of N- and C-terminal fusions of APOBEC1 and CDA1 to nCas9, initially using the XTEN linker (
[0180] Comparative assessment of the specificity of previously generated base editors and our base editors on several genomic target sequences showed that, in many cases, some level of discrimination between adjacent Cs is possible, but the achievable precision depends on the sequence context and on the base editor used. In general, the nCDA1-BE3 and cCDA1-BE3 editors display less dependence on the neighboring nucleotides and can edit target Cs efficiently even when located immediately after an A, a context that is only very inefficiently edited by APOBEC1-based editors. Moreover, CDA1-based editors enhance product purity, as reported previously (Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017)).
[0181] In an attempt to further narrow the activity window of CDA1 editors, we removed the linker between CDA1 and Cas9, generating versions nCDA1-NL-BE3 and cCDA1-NL-BE3. Surprisingly, both linkerless fusions showed an unaltered activity window with largely unchanged editing efficiency at each C within it. This result suggests that the termini of CDA1 are inherently flexible and may act as linker-like sequences. We, therefore, tested the impact of N- and C-terminal truncations (removing potential linker-like fragments) on base editing.
[0182] A nuclear export signal (NES) was reported to reside in the C-terminus of the CDA1 homolog AID (Patenaude, A. M. et al. Active nuclear import and cytoplasmic retention of activation-induced deaminase. Nat. Struct. Mol. Biol. 16, 517-527 (2009)), and its location corresponds to residues 199 to 208 in CDA1 (
[0183] Tests on oligo(C) motifs represent the most stringent assays for site selectivity of base editors. However, such long C stretches would only rarely be targets of genome editing with base editors in vivo. To assess whether base editors with C-terminally truncated CDA1 domains also show superior performance in more natural (heteropolymeric) genomic sequence contexts, we targeted four sites in the Can1 gene, each of which contains at least one additional C directly adjacent or close to position C.sub.−18. When the base editing outcome of nCDA1-BE3, cCDA1-BE3 and our base editors with truncated CDA1 domains were compared, our base editors displayed editing with much higher precision (
[0184] Finally, we also determined the base editing outcome in individual colonies obtained by the canavanine selection method. While nCDA1-BE3 and cCDA1-BE3 yielded only 1 and 6 colonies (out of total 24 randomly picked colonies), respectively, that carried the specifically C.sub.−18 edited Can1 gene biallelically (i.e., in a homozygous fashion), the base editors with truncated CDA1 domains yielded 18-24 colonies that were homozygous for the allele only edited at position C.sub.−18. Importantly, two of the base editors produced 100% precisely edited homozygous clones (
TABLE-US-00002 TABLE 1 Base editors with CDA1 truncations exhibit many more homozygous C.sub.−19T.sub.−18 colonies than nCDA1-BE3 and cCDA1-BE3*. For each base editor, 24 canavanine-resistant colonies were randomly picked from the selection plate followed by sequencing of the Can1 locus. The major types of edited products are listed in the first column of the table, and the colony numbers representing each product type are given. For nCDA1-BE3, the genotype of the remaining colony is C.sub.−19T.sub.−18/T.sub.−19C.sub.−18; for nCDA1Δ194-BE3, the remaining two colonies are C.sub.−19T.sub.−18/T.sub.−19C.sub.−18 and T.sub.−19T.sub.−18/T.sub.−19C.sub.−18, respectively. nCDA1- cCDA1- nCDA1Δ194- nCDA1Δ193- nCDA1Δ192- nCDA1Δ190- nCDA1Δ184- nCDA1Δ176- BE3 BE3 BE3 BE3 BE3 BE3 BE3 BE3 C.sub.−19T.sub.−18 1/24 6/24 18/24 21/24 22/24 24/24 24/24 20/24 Homozygous C.sub.−19T.sub.−18/T.sub.−19T.sub.−18 0/24 11/24 2/24 2/24 1/24 0/24 0/24 2/24 Heterozygous T.sub.−19C.sub.−18 22/24 7/24 2/24 1/24 1/24 0/24 0/24 2/24 Homozygous
[0185] Expanding Precision Base Editing to Non-NGG PAM Sequences
[0186] Recently, several Cas9 variants have been described that recognize non-NGG PAM sequences (Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259-1262 (2018); Hu, J. H. et al. EvolvedCas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57-63 (2018); Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481-485 (2015)). To test whether Cas9 variants with expanded PAM compatibility can be used in our high-precision BEs to extend their DNA targeting scope, we replaced the nCas9 sequence with that of four different nCas9 variants recognizing four different non-NGG PAMs (
[0187] For each set of BEs, we tested target sites that contain a stretch of consecutive cytidines within the activity window upstream of the PAM. PolyC motifs were used to provide the most rigorous test for editing precision, in that specific editing of a single C would require maximum discriminatory power. Editing efficiency and precision were first assessed by dideoxy chain termination sequencing of amplified PCR products, and the two best-performing BEs were then further characterized by high-throughput next-generation sequencing (
[0188] The VQR-Cas9 variant recognizes the PAM sequence NGA (
[0189] The VRER-Cas9 variant recognizes the PAM sequence NGCG (
[0190] Recently, two Cas9 variants, designated xCas9 and SpCas9-NG, were developed that show greatly relaxed PAM recognition specificity and, instead of NGG, recognize the minimal PAM sequence NG (Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259-1262 (2018); Hu, J. H. et al. EvolvedCas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57-63 (2018)). When tested on three non-NGG target sites (PolyC-1-NGA, PolyC-5-NGC and PolyC-6-NGT), xCas9-derived BEs displayed detectable activity only on one of the three sites (PolyC-5-NGC;
[0191] BEs constructed with SpCas9-NG edited all three non-NGG target sites (
[0192] Taken together, our findings indicate that BEs with truncated CDA1 sequences tolerate replacement of Cas9 with variants that recognize alternative PAMs, including PAMs with greatly relaxed specificity such as NG. The high efficiency and accuracy of these new editors greatly expand the editing scope of high-precision BEs.
[0193] Engineering of A3A-Based Precision BEs
[0194] In an attempt to develop additional high-precision BEs that selectively edit nucleotide positions other than C.sub.−18, we generated fusions of several deaminases to nCas9 by omitting a linker sequence between the two proteins. This approach was taken to investigate the possibility that these deaminases inherently harbor a linker-like fragment at their C-terminus.
[0195] Six different deaminases were tested by fusing nCas9 directly to their C-terminus. The fusion proteins were then assayed for their base editing efficiency on two polyC-containing target sites. The BE based on the human cytidine deaminase APOBEC3A (A3A; (Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)), referred to as hA3A-NL-BE3, displayed the best performance in that it conferred the highest editing efficiency on both target sequences. We, therefore chose A3A for further optimization.
[0196] For comparison, we also generated an A3A-BE3 editor with the standard XTEN linker (Rees, H. A. et al. Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun. 8, 15790 (2017)). Surprisingly, we observed that hA3A-NL-BE3 (for brevity subsequently referred to as A3A-NL-BE3) showed a slightly broader editing window than A3A-BE3 and also caused a shift in the most strongly edited (central) positions, despite the shorter connection between the cytidine deaminase domain (A3A) and the nCas9 domain of the fusion protein. This may be attributable to linker removal slightly altering the spatial structure of the fusion protein (and, in this way, affecting positioning of the deaminase domain on the target sequence), and would be consistent with the variable effects of linker engineering seen in previous studies (Kim, Y. B., et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 36, 371-376 (2017); Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). The editing efficiency of both BEs was similar at both tested sites (Supplementary
[0197] A3A-based BEs were reported to exhibit a lower dependence on the sequence context, reduced sensitivity to DNA methylation and a wider editing window (Zong, Y. et al. Efficient C-to-T base editing in plants using a fusion of nCas9 and human APOBEC3A. Nat. Biotechnol. 36, 950-953 (2018); Wang, X., et al. Efficient base editing in methylated regions with a human APOBEC3A-Cas9 fusion. Nat. Biotechnol. 36, 946-949 (2018); Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)). To test if the precision of these BEs can be improved by narrowing the activity window, we constructed a series of truncations at the C-terminus of A3A and determined their impact on base editing (
[0198] To confirm the superior precision of the truncated editors A3AΔ190-BE3, A3AΔ186-BE3 and A3AΔ182-BE3, we compared the base editing outcomes when targeting different cytidines within the yeast Can1 gene (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). Each of the five tested sites contains one or two target Cs in different distances from the PAM, ranging from position C.sub.−19 to position C.sub.−11 (
[0199] It was recently reported that mutations in A3A (N57G mutation in an A3A variant dubbed eA3A) can reduce bystander editing frequency by enhancing the preference of the editor for TCR motif (with R being A or G; Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)). We, therefore, generated an eA3A-BE3 editor and compared it with our best-performing truncated A3A BEs. We found that eA3A, although mainly editing C.sub.−15 or C.sub.−16, suffered from reduced editing activity (
[0200] It has been reported that A3A-derived BEs can induce significant transcriptome-wide off-target editing at the RNA level. Specific amino acid substitutions (R128A or Y130F) in A3A largely eliminate these off-target activities (Zhou, C., et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275-278 (2019); Grünewald, J., et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433-437 (2019)). We therefore investigated the effect of each of these two mutations on the width of the base editing window and the BE activity when combined with proper A3A truncations. Introduction of either of the two mutations into A3A-BE3 neither reduced the base editing efficiency, consistent with previous findings (Zhou, C., et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275-278 (2019)), nor did it affect the base editing window. When we combined these mutations with the two optimal A3A truncations (A3AΔ186 and A3AΔ182), we found that Y130F, but not R128A, in combination with the A3A version truncated at residue 186 (i.e., BE variant A3A(Y130F)Δ186-BE3) displays a base editing window and an editing efficiency similar to A3AΔ186-BE3, and thus should be used to suppress off-target RNA editing.
[0201] Together, these data demonstrate that the A3A deaminase can be engineered to obtain high-precision base editors that predominantly edit position C.sub.−15 or C.sub.−16, while retaining high editing efficiency.
[0202] Analysis of Genome-Wide Off-Target Editing by Whole Genome Sequencing
[0203] Recently, cytosine base editors were reported to produce substantial genome-wide off-target effects that are largely independent of the sgRNA (Jin, S., et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292-295 (2019); Zuo, E., et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289-292 (2019)). Since a narrower editing window means fewer target nucleotides, we envisioned that our narrow-window base editors could also reduce the off-target DNA editing. We, therefore, investigated off-target editing in yeast cells treated with nCDA1-BE3, cCDA1-BE3, nCDA1Δ190-BE3 and a no BE control, in combination with an sgRNA targeting a Can1 site. Canavanine selection was used to isolate colonies harboring on-target editing events. The truncated CDA1 version Δ190 was chosen for this experiment, because we had previously shown that this version displays high editing precision as well as high editing efficiency for most tested sites (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). For all constructs, cultures grown from three different transformed colonies were mixed, followed by genomic DNA isolation and whole-genome sequencing. The three BE variants showed comparable numbers of indels as the no BE control (
[0204] Guidelines for the Choice of the Optimal Cytidine BE
[0205] Three different cytidine deaminases (APOBEC1, CDA1 and APOBEC3A) have been engineered to produce efficient cytosine BEs, modify PAM specificities, and alter position and width of the editing window. BE variants with different properties have been obtained that differ in their suitability for (i) different target sequences and (ii) different positions of the C to be edited within the protospacer.
[0206] There is now sufficient information available to define some guidelines for the choice of the best BE depending on the position of the C, the sequence context and the presence or absence of bystander Cs (see Table 1 which is presented further above).
[0207] If the target C is located at position C.sub.−19 relative to the PAM and no bystander C is present, three BEs can be recommended: nCDA1-BE3, nCDA1Δ198-BE3 and A3A-NL-BE3. If the target C is in the same position (C.sub.−19), but has a bystander C directly upstream (CCDDD motif, with D being any nucleotide but C), cCDA1-BE3 would be the best choice (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)).
[0208] If the target C is located at C.sub.−18 and has a bystander C in its vicinity (NCN motif, with N being any nucleotide, including a possible bystander C), BEs with C-terminal truncations of CDA1 (Δ194 to Δ188) are recommended (
[0209] For editing at C.sub.−16 with a 5′ bystander C (NCD context) or editing at C.sub.−15 with a 3′ bystander C (DCN), A3AΔ182-BE3 and A3A(Y130F)Δ186-BE3 are the editors of choice (
[0210] With our set of narrow-window BEs, many disease-causing T-to-C and A-to-G mutations can now potentially be corrected in a precise manner. For example, a T-to-C mutation at position 497 of the coding region of the human gene encoding presenilin-1 (PSEN1-L166P mutation) is associated with early-onset Alzheimer's disease (Moehlmann, T., et al. Presenilin-1 mutations of leucine 166 equally affect the generation of the Notch and APP intracellular domains independent of their effect on Abeta 42 production. Proc. Natl. Acad. Sci. USA 99, 8025-8030 (2002)). This mutation can be corrected by a BE that has this C within its predicted editing window at position −18 relative to the PAM sequence NG. Precision is important here, because an additional C is present immediately adjacent to the target C (at position 496), which also lies within the editing window (−19 relative to the PAM). Using precision BEs with CDA1 truncations, this C now can be targeted much more accurately (Table 1). Similarly, an A-to-G mutation at position 980 of the coding region of the tyrosinase-encoding gene (representing a T-to-C mutation in the complementary strand) causes oculocutaneous albinism (TYR-Y327C mutation; 8). The target C is in a TCAC motif and located in position −15 of the PAM sequence AGG. Therefore, this mutation can be precisely corrected with the BEs A3AΔ182-BE3 or A3A(Y130F)Δ186-BE3 (Table 1).