HIGH-PRECISION BASE EDITORS

20220154163 · 2022-05-19

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to a base editing compound comprising or consisting of (a) a Cas protein, and, covalently connected therewith; (b) a nucleobase-modifying enzyme, wherein the covalent connection of (a) and (b) is (i) direct; (ii) provided by a peptide comprising at least one Pro residue, said peptide having a length between 1 and 20 preferably between 1 and 15 amino acids; or (iii) provided by a non-peptidic linker, said non-peptidic linker being a small organic molecule comprising one or more double bonds, one or more triple bonds, and/or one or more aromatic rings.

    Claims

    1. A base editing compound comprising or consisting of (a) a Cas protein, and, covalently connected therewith; (b) a nucleobase-modifying enzyme, wherein the covalent connection of (a) and (b) is (i) direct; (ii) provided by a peptide comprising at least one Pro residue, said peptide having a length between 1 and 20, preferably between 1 and 15 amino acids; or (iii) provided by a non-peptidic linker, said non-peptidic linker being a small organic molecule comprising one or more double bonds, one or more triple bonds, and/or one or more aromatic rings.

    2. The compound of claim 1, wherein (a) said Cas protein is a Cas nickase or dead Cas, said Cas nickase preferably being Cas9 or Cas12, and said dead Cas preferably being dead Cas9 or dead Cas12; and/or (b) said nucleobase-modifying enzyme is selected from a deaminase, a nucleoside synthase, a DNA methyl transferase and a DNA demethylase, said deaminase preferably being selected from the APOBEC, CDA1 or Tad/ADAR families, APOBEC3A being particularly preferred.

    3. The compound of claim 1 or 2, wherein said deaminase is truncated at the N- and/or C-terminus, wherein in case of APOBEC deaminases C-terminal truncation is preferred, an APOBEC3A with a C-terminal truncation of 17 amino acids (A3AΔ182) being particularly preferred, and in case of CDA1 deaminases, truncations from residue 188 or residue 198 onwards being preferred.

    4. The compound of claim 3, wherein the truncated residues are not essential for catalytic activity of said deaminase.

    5. The compound of any one of claims 1 to 4, wherein said compound comprises said peptide and wherein said peptide (i) consists of an amino acid sequence comprising 1 to 10 Pro residues and 1 to 10 small amino acid residues, said small amino acid residues preferably being selected from Ala, Gly and Ser, said amino acid sequence preferably being the sequence of SEQ ID NO: 130 (XTEN); (ii) has a length of 5, 6 or 7 amino acids; and/or (iii) consists of the sequence A.sub.m(PA).sub.nP.sub.p, wherein m and p are independently 0 or 1 and n is 1, 2, 3, 4, 5, 6 or 7, for example of SEQ ID NO: 154 or 162.

    6. The compound of any of the preceding claims, said compound comprising or further consisting of one or both of (a) an inhibitor of base excision repair, preferably an uracil DNA glycosylase inhibitor (UGI), more preferably the sequence of SEQ ID NO: 149; and (b) a nuclear localization signal (NLS), preferably the sequence of SEQ ID NO: 135; wherein (a) and/or (b) are preferably connected to each other and/or to said Cas protein with a peptidic linker consisting of 1 to 10 amino acids, said linker preferably consisting of the sequence of SEQ ID NO. 132 or 148.

    7. The compound of any one of the preceding claims, wherein (a) said deaminase is APOBEC3A (A3A; SEQ ID NO: 183), wherein preferably said A3A is truncated at the C-terminus; (b) said deaminase is A3AΔ182 (SEQ ID NO: 205) and is fused to the N-terminus of said Cas protein; (c) said deaminase is APOBEC1, preferably consisting of the sequence of SEQ ID NO: 129, and is fused to the N-terminus of said Cas protein; or (d) said deaminase is CDA1, preferably consisting of the sequence of SEQ ID NO: 137; and is fused to the N-terminus or the C-terminus, preferably to the N-terminus of said Cas protein, wherein preferably the C-terminus of said CDA1 is truncated, preferably either from position 198 onwards or from any of positions 188 to 194 onwards.

    8. The compound of any one of the preceding claims, wherein (a) said Cas protein consists of the amino acid sequence of SEQ ID NO: 1 or a sequence with at least 80% identity thereto and preferably providing nickase activity, or is encoded by the nucleic acid sequence of SEQ ID NO: 2 or a sequence with at least 80% thereto identity and preferably encoding a protein with nickase activity, and is preferably selected from VQR-Cas9 (amino acid sequence of SEQ ID NO: 121), VRER-Cas9 (amino acid sequence of SEQ ID NO: 122), xCas9 (amino acid sequence of SEQ ID NO: 123), and Cas9-NG (amino acid sequence of SEQ ID NO: 124) or encoded by any one of SEQ ID NOs: 23, 24, 25 or 26; and/or (b) said deaminase consists of the amino acid sequence of any one of SEQ ID NOs: 205, 129, 137, 169, 176, 183, 198, 212, 219, 3 and 5, a sequence with at least 80% identity and providing deaminase activity or a truncated version of any such sequence, or is encoded by the nucleic acid sequence of any one of SEQ ID NOs: 107, 31, 55, 71, 78, 85, 100, 114, 220, 4 and 6, a sequence with at least 80% identity and encoding a protein with deaminase activity or a truncated version of any such sequence.

    9. The compound of any one of the preceding claims, wherein said compound is a single polypeptide and comprises or consists of an amino acid sequence selected from SEQ ID NOs: 204, 7, 9, 11, 13, 136, 218, 190, 144, 168, 182, 128, 152, 204 and 211.

    10. A nucleic acid encoding the compound of any one of the preceding claims, to the extent said compound is a single polypeptide.

    11. A method of base editing, said method comprising introducing into a cell a nucleic acid of claim 10 or a compound of any one of claims 1 to 9.

    12. The method of claim 11, further comprising introducing into said cell a guide nucleic acid for said nickase.

    13. The method of claim 11 or 12, wherein said method is performed in vitro or ex vivo.

    14. A pharmaceutical composition comprising or consisting of (a) the compound of any one of claims 1 to 9; and/or (b) the nucleic acid of claim 10.

    15. The pharmaceutical composition of claim 14, further comprising or further consisting of a guide nucleic acid for said nickase, wherein said guide nucleic acid comprises a sequence which is homologous to a subsequence of a target gene, wherein said target gene is associated with a genetic disorder.

    16. A compound of any one of claims 1 to 9 or a nucleic acid of claim 10, and a guide nucleic acid for said nickase for use in a method of treating, alleviating or preventing a disorder, wherein said guide nucleic acid comprises a sequence which is homologous to a subsequence of a target gene, wherein said disorder is associated with a point mutation or an SNP in said target gene.

    17. A kit comprising or consisting of (a) (i) one or more compounds of any one of claims 1 to 9; and/or (ii) one or more nucleic acids of claim 10.

    18. The kit of claim 17, furthermore comprising or further consisting of (b) one or more guide nucleic acids for the nickase comprised in said compound, wherein each of said guide nucleic acids comprises a sequence which is identical to a subsequence of a given target gene; and/or (c) a manual comprising instructions for performing the method of any one of claims 11 to 13.

    19. The kit of claim 17 or 18, wherein said kit comprises a plurality of said compounds and/or a plurality of said nucleic acids, wherein at least two of said compounds of (a)(i) or at least two of the compounds encoded by said nucleic acids of (a)(ii) differ with regard to their base editing profile.

    20. Use of a peptide as defined in any one of the preceding claims or of a non-peptidic linker as defined in claims 1 and 9 for covalently connecting a Cas protein such as a Cas nickase (nCas) or a dead Cas (dCas) and a deaminase (DA) to provide a base editing compound.

    21. The use of claim 20, wherein said deaminase is truncated at the N- or C-terminus.

    Description

    [0143] The figures show:

    [0144] FIG. 1. Rigid linkers narrow the width of the editing window of BE3. a Protospacers and PAM (blue; C-terminal 3 nt) sequences of the genomic loci tested, with the target Cs shown in red (with subscripts indicating the respective position). Subscript numbers indicate the positions of the cytidines relative to the PAM. C-to-T editing at any of the indicated Cs inactivates the Can1 transporter and thus causes resistance to canavanine (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)). b Editing efficiency and specificity of the base editors tested as determined by canavanine selection. The x-axis represents the target Cs within the protospacers. The y-axis shows their C-to-T editing frequency (see Example 1). Values and error bars represent the mean and standard deviation of three independent biological replicates.

    [0145] FIG. 2. Comparison of N- and C-terminal deaminase fusions to nCas9. a Structure of nBE3 (=BE3; (Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage. Nature 533, 420-424 (2016))), cBE3, nCDA1-BE3, and cCDA1-BE3 driven by the GalL inducible promoter. In all constructs, the XTEN linker separates the nucleoside deaminase domain from the nCas9 domain. nSpCas9: Streptococcus pyogenes Cas9 nickase. b Base editors with the deaminase at the N-terminus show broadened base editing windows. The sequence of the target (C).sub.9 motif is shown with the numbers representing the position of possible editing targets (grey, in the middle of the sequences) relative to the PAM (grey, at the end of the sequences). % of C-to-T editing represents the percentage of total sequencing reads with the target C converted to T. c Base editing outcome of nBE3, cBE3, nCDA1-BE3, and cCDA1-BE3 targeting several sites containing target Cs at different positions (indicated on the x-axis) in the Can1 gene. Values and error bars represent the mean and standard deviation of three independent biological replicates. Order in the legend (top to bottom) corresponds to the order of the bars in the figure (left to right).

    [0146] FIG. 3. Design of base editors with truncated CDA1 domains. a Amino acid sequence alignment of CDA1 and human AID. The catalytic domain HxE-PCxxC and the nuclear export signal (NES) are indicated by black horizontal lines. The alignment was created by CLUSTALW (Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-2948 (2007); https://www.genome.jp/tools-bin/clustalw) and graphically formatted with the help of the ESPript 3.0 server (Robert, X. & Gouet, P. Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Res. 42, W320-W324 (2014).) (http://espript.ibcp.fr/ESPript/ESPript/). Identical amino acid residues are shaded in red (dark grey), similar residues in yellow (light grey). b Schematic representation of base editors with C-terminal CDA1 truncations (named after the last CDA1 residue included).

    [0147] FIG. 4. Effects of C-terminal truncations of the CDA1 domain on the width of the editing window of nCDA1-BE3 base editors. All base editor variants were tested on both (C).sub.8 (a) and (C).sub.9 (b) motifs (see Methods). Cs within each target region are shown in red (grey, in the middle of the sequences), with the number below indicating their distance from the PAM (blue; grey, at the end of the sequences). The C-to-T conversion efficiencies are plotted for all Cs within the protospacer, and shown in comparison to the nCDA1-BE3 base editor with the full-length CDA1 (light grey bars). Values and error bars represent the mean and standard deviation of three biological replicates.

    [0148] FIG. 5. Base editors with C-terminally truncated CDA1 domains edit position C.sub.−18 with high precision. nCDA1-BE3, cCDA1-BE3, and selected base editors with C-terminally truncated CDA1 domains are compared. a Editing of genomic loci containing multiple cytidines directly adjacent or in close proximity to C.sub.−18. Cytidines representing possible editing targets are shown in red (grey where reproduced in greyscale; with subscripts indicating the respective position) with the subscript number representing their position relative to the PAM (CGG). b, c Base editors with truncated CDA1 domains greatly improve editing product distribution and produce predominantly singly C.sub.−18-modified products. % of edited reads represents the percentage of total sequencing reads containing the products shown. Values and error bars represent the mean and standard deviation of three biological replicates.

    [0149] FIG. 6. Analysis of base editing patterns and efficiencies in single yeast colonies selected for canavanine resistance. A comparison of base editing frequencies for nCDA1-BE3, cCDA1-BE3, and selected base editors with truncated CDA1 domains is shown. Yeast cells were transformed with plasmids expressing the base editor and an sgRNA targeting the Can1-5 site. The target sequence is shown with the cytidines that can potentially undergo editing in red (grey, in the middle of the sequences) and the PAM in blue (grey, at the end of the sequences). If C-to-T conversion occurs at position −18 or −19 or both, the Can1 gene will be inactivated and the cell becomes resistant to canavanine. Values and error bars reflect the mean and standard deviation of three biological replicates. See also Table 1.

    [0150] FIG. 7. High-precision base editing at target sites containing non-NGG PAMs. a Structure of nCDA1-BE3 in comparison to base editors harboring CDA1 truncations (ΔCDA1). nSpCas9: Streptococcus pyogenes Cas9 nickase; XTEN: synthetic linker sequence (13); UGI: uracil DNA glycosylase inhibitor; NLS: nuclear localization signal. b Cas9 variants with altered PAM specificities. c-g BE variants with CDA1 truncations mediate high-precision base editing at target sites comprised of multiple cytidines (polyC targets). The x-axis shows the Cs in the target sequence with their position relative to the PAM indicated. The y-axis (C-to-T editing in %) represents the percentage of total sequencing reads with the target C converted to T. Values and error bars represent the mean and standard deviation of three independent biological replicates. c Analysis of base editing precision of VQR-Cas9 BEs fused to selected C-terminally truncated versions of CDA1. For comparison, the BE carrying the full-length CDA1 and the nCDA1-BE3 editor are also included. d Analysis of base editing precision of VRER-Cas9 BEs fused to C-terminally truncated CDA1 versions. e Analysis of base editing precision of xCas9 BEs fused to C-terminally truncated CDA1 versions. f,g Analysis of base editing precision of SpCas9-NG BEs fused to C-terminally truncated CDA1 versions.

    [0151] FIG. 8. Base editors with C-terminally truncated A3A sequences exhibit narrowed editing windows. a Structure of A3A-BE3 and BEs with A3A truncations (A3AΔ-BE3 variants). b, c Effects of C-terminal truncations of the A3A domain on the width of the editing window of A3AΔ-BE3s. All base editor variants were tested on both the polyC-7 (b) and polyC-8 (c) sites (see Methods). Cs within each target region are indicated in (grey, in the middle of the sequences), with the number below indicating their distance from the PAM (grey, at the end of the sequences). The C-to-T conversion efficiencies are plotted for all Cs within the protospacer, and shown in comparison to the A3A-BE3 base editor with the full-length A3A (light grey bars). Values and error bars represent the mean and standard deviation of three biological replicates.

    [0152] FIG. 9. Base editing outcomes of A3A-BE3, truncated A3AΔ-BE3 variants and the recently optimized editor eA3A-BE3 (20) when targeting specific sites in the yeast Can1 gene. a Sequences of the five target sites (containing Cs at different positions). Target Cs are indicated in grey (in the middle of the sequences) and numbered relative to the PAM (grey, at the end of the sequences). Edited clones were identified by using the canavanine selection strategy (see Methods). b Base editing efficiency and precision. The x-axis represents the target Cs within the protospacers (with the order of the bars from left to right corresponding to the Cs in the legend from top to bottom). The y-axis shows their C-to-T mutation frequency (see Methods). Values and error bars represent the mean and standard deviation of three independent biological replicates.

    [0153] FIG. 10. Analysis of off-target editing. Genetic changes that occurred in strains harboring nCDA1-BE3, cCDA1-BE3, nCDA1Δ190-BE3 or a control plasmid without a BE construct were identified by whole genome sequencing. a-b Comparison of the total number of detected indels (a) and SNVs (b). c The mutation frequency of different types of SNVs in cells treated by the three base editors and the control. The order of the bars from left to right corresponds to the BEs listed in the legend from top to bottom. The sgRNA was designed to target site Can1-4. Values and error bars represent the mean and standard deviation of three independent biological replicates.

    [0154] The examples illustrate the invention.

    Example 1

    Methods

    [0155] Yeast Strains and Growth Conditions.

    [0156] Saccharomyces cerevisiae BY4743 (diploid, MAT a/α, his3Δ1/his3Δ1, leu2Δ0/leu2Δ0, LYS2/lys2Δ0, met15Δ0/MET15, ura3Δ0/ura3Δ) was used as host strain for genome editing. Cells were grown non-selectively in YPAD medium (2% Bacto peptone, 1% Bacto yeast extract, 2% glucose, 0.003% adenine hemisulfate). For culture in Petri dishes, the medium was solidified with 2% agar. Selection of yeast transformants based on the URA3 and LEU2 markers was done on a synthetic complete (SC) medium (6.7 g/L of Difco Yeast Nitrogen Base, 20 g/L glucose) and a mixture of appropriate amino acids deficient in uracil and leucine (SC-U-L). Yeast strains were cultivated at 28° C. on a rotary shaker.

    [0157] DNA Methods.

    [0158] PCR was performed with Phusion High-Fidelity DNA Polymerase (ThermoFisher) according to the manufacturer's instructions. Cloning and amplification of plasmids were carried out in the E. coli strain DH5α. Plasmids harboring the Streptococcus pyogenes cas9 gene (p415-GaIL-Cas9-CYC1t) and a chimeric guide RNA construct (p426-SNR52p-gRNA.CAN1.Y-SUP4t) were provided by the laboratory of Dr. George Church and obtained from Addgene (Cambridge, Mass., USA).

    [0159] To generate APOBECI base editors, the APOBECI reading frame and the partial cas9 sequence were PCR-amplified using oligonucleotides with overlapping linker sequences. The two fragments were cloned into the Spel/Sbfl-digested p415-GaIL-Cas9-CYC1t with the help of the In-Fusion HD Cloning Kit (Clontech, CA, USA). The D10A point mutation was introduced into cas9 with primers harboring the desired mutation by amplification of the entire plasmid template followed by DpnI digestion to remove the parental template. The UGI gene was codon-optimized for yeast and synthesized (Eurofins Genomics, Ebersberg, Germany), followed by insertion into the AscI/MluI-digested vector p415-GaIL-Cas9-CYC1t. To generate CDA1 base editors, the reading frame encoding pmCDA1 was PCR-amplified to replace the APOBECI fragment within BE3, thus generating nCDA1-BE3. To produce a fusion of CDA1 to the C-terminus of Cas9, plasmid pRS315e_pGal-nCas9 (D10A)-PmCDA1 (provided by the laboratory of Akihiko Kondo, Hyogo, Japan, and obtained from Addgene) was modified. First, the amplified UGI sequence was introduced into the XbaI site, and the resulting vector was then digested with Ascl and Sphl. Subsequently, two PCR fragments (overlapping by the XTEN linker sequence) were inserted to generate cCDA1-BE3. Insertion of three PCR fragments (covering XTEN and APOBEC1) produced base editor cBE3. The CDA1 protein truncations were generated by PCR amplification, and cloned into SpeI/Sbf1-digested BE3 or AscI/SphI-digested cBE3 vectors to produce the ΔCDA1-Cas and Cas-ΔCDA1 vector series, respectively. To produce YEE-BE3, the mutated APOBECI from plasmid pCMV-dCpf1-BE-YEE (provided by the laboratory of Jia Chen, Shanghai, China, and obtained from Addgene) was PCR amplified and cloned into SpeI/Sbf1-digested BE3.

    [0160] To generate CDA1-BE3 variants with VQR-Cas9, the three required point mutations (D1135V/R1335Q/T1337R) were introduced into the cas9 gene by PCR with primers harboring the desired mutations, and the resulting three PCR products were cloned into the NruI/NcoI-digested BE3 to obtain VQR-BE3 with the help of the In-Fusion HD Cloning Kit (Clontech, Mountain View, Calif., USA). The mutated fragment was then released by digesting VQR-BE3 with NruI and MluI, followed by ligation into the similarly digested CDA1 BE plasmid (21). To construct VRER-BE3 variants, three fragments containing the four mutations (D1135V/G1218R/R1335E/T1337R) were PCR-amplified followed by cloning into the NruI/MluI-digested VQR-BE3. The mutated fragment was then excised by digesting VRER-BE3 with NruI and MluI, and ligated into the CDA1 BE construct cut with the same enzyme combination. For the generation of SpCas9-NG BE3 variants, four fragments containing the seven mutations (R1335V/L1111R/D1135V/G1218R/E1219F/A1322R/T1337R) were PCR-amplified followed by cloning into the Nrul/Mlul-digested vector VQR-BE3. The mutated fragment was released by digesting SpCas9-NG-BE3 with NruI and MluI and cloned into the similarly cut CDA1 BE plasmid. For the construction of xCas9 variants, plasmid xCas9 (3.7)-BE3 (obtained from Addgene) was digested with the restriction enzymes Sbf1 and AscI. The resulting 3.7 kb fragment was then inserted into the CDA1 BE construct digested with Sbf1 and AscI. To obtain cCDA1-BE3 variants, the mutated fragments were PCR-amplified using the corresponding BE3 variant as template and cloned into the NurI/SphI-digested cCDA1-BE3 plasmid (21).

    [0161] To generate hA3A, hA3B, hA3G, hAID, mAID, cAICDA and truncated hA3A base editors, the deaminase genes were PCR-amplified from plasmid clones (provided by the laboratory of Dr. Jia Chen, Shanghai, China, and obtained from Addgene) together with part of the cas9 sequence, and then ligated into the SpeI/SbfI-digested BE3 vector. To produce A3A(R128A)-BE3, A3A(Y130F)-BE3 as well as eA3A-BE3, the point mutations (R128A, Y130F and N57G) were introduced into A3A with primers containing the appropriate mutations.

    [0162] To generate plasmids expressing sgRNAs that target-specific sites, the protospacer sequences were introduced by PCR amplification, and the resulting PCR products were cloned into the Clal/Kpnl-digested vector p426-SNR52p-gRNA.CAN1.Y-SUP4t with the In-Fusion HD Cloning Kit (Clontech, CA, USA).

    [0163] Yeast Transformation and Genomic DNA Extraction.

    [0164] Yeast cells were transformed with the LiAc/SS carrier DNA/PEG method using 0.5-1 μg plasmid DNA (Gietz, R. D. & Schiestl, R. H. Quick and easy yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 35-37 (2007)). Transgenic clones were selected on SC-U-L media and confirmed by PCR analyses. Yeast genomic DNA was extracted according to a published protocol (Lõoke, M., Kristjuhan, K. & Kristjuhan, A. Extraction of genomic DNA from yeasts for PCR-based applications. Biotechniques 50, 325-328 (2011)). PCR products were purified (PCR Purification kit; Macherey-Nagel) and then sequenced.

    [0165] CAN1 Mutagenesis.

    [0166] Yeast colonies were picked, suspended in 3 mL SC medium with 2% glucose and without leucine and uracil, and grown to a stationary phase. The cells were then pelleted, washed twice in sterile water, and then resuspended in SC induction medium with 2% galactose and 1% raffinose, but without leucine and uracil, to an OD600 of 0.3. The cells were incubated for 20 h prior to plating on YPAD rich or SC media plates without arginine but with 60 mg/mL L-canavanine (Sigma). After incubation for 3 days, the colony number on each plate was counted. The C-to-T mutation frequency in CAN1 was determined as the ratio of the colony count on canavanine-containing plates to the colony count on YPAD-rich media plates. Each experiment was performed at least three times on different days. To determine the mutation spectrum, colonies were randomly picked and suspended in sterile water, followed by PCR amplification of the relevant CAN1 fragment and DNA sequencing. Control cultures (not treated with base editors) did not produce canavanine-resistant colonies.

    [0167] Next-Generation Sequencing.

    [0168] Yeast colonies harboring plasmids expressing base editors and sgRNAs were picked from SC-L-U plates, suspended in 3 mL SCL-U medium with 2% glucose, and grown to a stationary phase. The cultures were then washed twice to remove residual glucose, resuspended in 5 mL SC-L-U medium with 2% galactose and 1% raffinose to an OD600 of 0.3, and incubated for 20 h at 28° C. on a rotary shaker. Genomic DNA was extracted from culture samples of 0.5 mL volume, and the regions targeted by base editing were amplified by PCR with primer pairs containing index tags for sample multiplexing. PCR amplification was performed with the Phusion High-Fidelity DNA Polymerase (ThermoFisher) according to the manufacturer's protocol, followed by product purification with the NucleoSpin Gel and PCR Clean-up kit (Machery-Nagel). The purified index-labeled PCR products were pooled at equal molar ratios. PCR-free library construction and NGS sequencing, demultiplexing by assigning reads to samples, and data filtering (including removal of adaptor sequences, contaminations and low-quality reads from raw reads) were done commercially (BGI, Hong Kong). Sequencing was performed on an Illumina MiSeq 4000 platform in a paired-end way to obtain 150 bp read length for each side and, on average, more than 100,000 reads per sample.

    [0169] Data Analysis.

    [0170] The clean FASTQ files obtained after data filtering were further analyzed with python scripts (available at https://github.com/zfcarpe/Cas9Sequencing). Briefly, the “pattern_extract.py” was first applied to scan all sequencing reads and extract the reads with the fixed length of the editing region (and exactly matching the two flanking sequences). This procedure excluded indel-containing and imperfectly matching reads, and allows summarizing each base calling in an alignment-like manner. Subsequent application of the “result_stat.py” script scanned each base within the editing region and calculated the frequency of each base converted to one of the other three bases by dividing the respective read number by the total number of sequencing reads to obtain the percentage of C-to-T editing and the percentage of edited reads with the C converted to any of the other bases. In addition, the script calculates the frequencies of all edited products by scanning each aligned read for conversion of the potential target cytidines. For the analysis of indel frequencies, the sequencing reads were scanned for two exactly matching 10-bp sequences that flank both sides of the region of interest (i.e., the sequence containing the editing sites). Reads without exact matches were excluded from further analysis. By calculating the length of the region, all sequencing reads exactly matching the length of the reference sequence were classified as not containing an indel, otherwise the read was classified as harboring an indel. A shell script “Cas9Sequencing.sh” combined the processes.

    Example 2

    Results

    [0171] Rigid Linkers Improve Precision of APOBEC1-Based Editors.

    [0172] We hypothesized that the positioning on the target sequence of the Cas9 protein relative to the deaminase domain (i.e., their physical distance) and the rigidity of the connection between these two domains of the base editor determine the width of the editing window, and hence the precision of the base editor. In previous studies, a 16 amino acid (aa) flexible linker (XTEN) has been identified as the best compromise between editing efficiency and specificity (Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage. Nature 533, 420-424 (2016)). Using L-canavanine selection in yeast (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)), we first investigated the effects of length and rigidity of the linker between APOBEC1 and nCas9 (Cas9 nickase) on base editing precision and efficiency when targeting several sites in the Can1 gene (FIG. 1) that contain Cs within the activity window of the base editor BE3 (Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage. Nature 533, 420-424 (2016)). L-Canavanine is a highly toxic analog of the proteinogenic amino acid arginine, and mutations inactivating the uptake protein Can1 confer resistance to canavanine. We used an inducible base editor construct, determined the optimal induction time, and then tested 10 different rigid linker sequences (containing the amino acid proline that, due to its secondary amine, confers conformational rigidity) in comparison to the commonly used XTEN flexible linker. Consistent with previous reports (Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage. Nature 533, 420-424 (2016)), the base editor BE3 (containing the XTEN linker) allowed editing at all Cs within a window of nine nucleotides (FIG. 1). Omission of the linker sequence or use of a very short rigid linker (i.e., the 3 aa linker PAP) abolished editing nearly completely. Interestingly, rigid linkers of 5-7 aa made editing substantially more precise, with the seven aa linker PAPAPAP largely restricting editing to positions −15 and −16 (FIG. 1). Longer linkers resulted in reduced editing accuracy, suggesting that a seven aa rigid linker is optimal.

    [0173] It was reported that mutations in the APOBECI domain of BE3 can also narrow the base editing width. We, therefore, compared the base editing outcome of BE3, YEE-BE3 (the optimal BE3 variant (Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371-376 (2017)), and BE-PAPAPAP when targeting the Can1 sites. We found that YEE-BE3, although mainly editing C.sub.−15 or C.sub.−16, suffered from strongly reduced editing activity at these sites. Although it will be important to confirm this deficit for additional sequence contexts, this finding is consistent with a recent study that also reported low editing efficiency of the YEE-BE3 base editor (Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)).

    [0174] Previous work has mostly investigated the activity of base editors in favorable sequence contexts, with relatively few C targets within the protospacer sequence. To develop a more rigorous (and Can1-independent) assay for base editor specificity, we also investigated the worst-case scenario, in which all nucleotides within the BE3 activity window are Cs (i.e., a nonacytidine motif from −13 to −21). Analysis of editing products by deep sequencing revealed that base editors with 5-7 aa rigid linkers mainly edited at positions C.sub.−14 to C.sub.−16.

    [0175] These editors showed greatly improved site selectivity and a narrowed editing window, while retaining up to 90% of the editing efficiency of the original BE3.

    [0176] Importantly, when editing product distribution was analyzed, BE3-treated sequences mostly contained four simultaneously edited bases, whereas short rigid linker-containing base editors predominantly generate products with one to three edited bases, thus providing further evidence for short rigid linkers leading to more precise editing.

    [0177] Engineering of Improved CDA1-Based Editors.

    [0178] To test whether other base editors can also be improved by engineering the linker region connecting the nucleoside deaminase domain with the nCas9 domain, we next applied a similar strategy to CDA1, the AID homolog of sea lamprey (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)) that has been reported to exhibit superior performance to APOBEC1 in certain sequence contexts (Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017)).

    [0179] When fused to nCas9 with flexible linkers up to 100 aa long (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)), CDA1 conducts C-to-T conversion in a window of approximately −16 to −19. To better understand what influences the width of the activity window, we generated four constructs for direct comparison of N- and C-terminal fusions of APOBEC1 and CDA1 to nCas9, initially using the XTEN linker (FIG. 2a). When the APOBECI domain was fused to the C-terminus of nCas9 (cBE3), the editing activity was very low (FIG. 2b, c), consistent with previous observations (Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without doublestranded DNA cleavage. Nature 533, 420-424 (2016)). By contrast, when CDA1 was fused to either the N-terminus or the C-terminus of nCas9, both fusions exhibited high editing efficiency. However, there was a remarkable difference in the width of the editing window, in that the N-terminal CDA1 (nCDA1-BE3) triggered editing in a much broader window when tested on either an oligo(C) substrate or target sites in the Can1 gene (FIG. 2b, c). The C-terminal fusion showed a more specific editing activity, peaking from C.sub.−16 to C.sub.−19, consistent with previous reports (Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, 1248 (2016)).

    [0180] Comparative assessment of the specificity of previously generated base editors and our base editors on several genomic target sequences showed that, in many cases, some level of discrimination between adjacent Cs is possible, but the achievable precision depends on the sequence context and on the base editor used. In general, the nCDA1-BE3 and cCDA1-BE3 editors display less dependence on the neighboring nucleotides and can edit target Cs efficiently even when located immediately after an A, a context that is only very inefficiently edited by APOBEC1-based editors. Moreover, CDA1-based editors enhance product purity, as reported previously (Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017)).

    [0181] In an attempt to further narrow the activity window of CDA1 editors, we removed the linker between CDA1 and Cas9, generating versions nCDA1-NL-BE3 and cCDA1-NL-BE3. Surprisingly, both linkerless fusions showed an unaltered activity window with largely unchanged editing efficiency at each C within it. This result suggests that the termini of CDA1 are inherently flexible and may act as linker-like sequences. We, therefore, tested the impact of N- and C-terminal truncations (removing potential linker-like fragments) on base editing.

    [0182] A nuclear export signal (NES) was reported to reside in the C-terminus of the CDA1 homolog AID (Patenaude, A. M. et al. Active nuclear import and cytoplasmic retention of activation-induced deaminase. Nat. Struct. Mol. Biol. 16, 517-527 (2009)), and its location corresponds to residues 199 to 208 in CDA1 (FIG. 3a). Deletion of the NES from AID increased the deamination efficiency of the enzyme (Yang, L. et al. Engineering and optimising deaminase fusions for genome editing. Nat. Commun. 7, 1038 (2016); Ma, Y. et al. Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells. Nat. Methods 13, 1029-1035 (2016); Hess, G. T. et al. Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat. Methods 13, 1036-1042 (2016)). We generated a series of 22 base editors with C-terminally truncated CDA1 versions fused to nCas9 (FIG. 3b) and tested them on two oligo(C) motifs (FIG. 4). While removal of the NES had only small effects on editing efficiency and specificity (nCDA1Δ198-BE3), larger deletions made editing more precise and substantially narrowed the activity window of the base editors (FIG. 4). The enzyme tolerated truncations up to amino acid residue 158 without a significant loss in editing efficiency (FIG. 4). The major gain in site selectivity was seen with the removal of at least 13-14 amino acids from the Cterminus of CDA1 (nCDA1Δ195-BE3, nCDA1Δ194-BE3; FIG. 4). Larger deletions had similar beneficial effects on editing precision, although some of them displayed slightly reduced overall editing efficiency (FIG. 4). Unlike the full-length base editor, the best-performing truncated variants showed a clear preference for one or two Cs within the oligo(C) stretch (e.g., nCDA1Δ194-BE3 for C.sub.−18 and, to a lesser extent, C.sub.−17 within the (C).sub.9 motif: FIG. 4a; nCDA1Δ192-BE3 and nCDA1Δ190-BE3 for C.sub.−18 in the (C).sub.8 motif: FIG. 4b). By contrast, truncations at the N-terminus of CDA1 in cCDA1-BE3 had no significant effect on the width of the editing window.

    [0183] Tests on oligo(C) motifs represent the most stringent assays for site selectivity of base editors. However, such long C stretches would only rarely be targets of genome editing with base editors in vivo. To assess whether base editors with C-terminally truncated CDA1 domains also show superior performance in more natural (heteropolymeric) genomic sequence contexts, we targeted four sites in the Can1 gene, each of which contains at least one additional C directly adjacent or close to position C.sub.−18. When the base editing outcome of nCDA1-BE3, cCDA1-BE3 and our base editors with truncated CDA1 domains were compared, our base editors displayed editing with much higher precision (FIG. 5). For all four tested sites, our base editors mainly edited position C.sub.−18, with a 2- to 20-fold higher efficiency than other adjacent Cs (FIG. 5a). Importantly, the base editors also produced predominantly single-C-modified products at position C.sub.−18 (accounting for 50-94% of all edited products), whereas nCDA1-BE3 and cCDA1-BE3 produced mainly double or triple modified products (FIG. 5b, c). We also investigated the indel frequency and base editing purity at these sites when treated by narrowed-window base editors. We found that the frequency of editing errors was very low, consistent with what has been reported for other base editors.

    [0184] Finally, we also determined the base editing outcome in individual colonies obtained by the canavanine selection method. While nCDA1-BE3 and cCDA1-BE3 yielded only 1 and 6 colonies (out of total 24 randomly picked colonies), respectively, that carried the specifically C.sub.−18 edited Can1 gene biallelically (i.e., in a homozygous fashion), the base editors with truncated CDA1 domains yielded 18-24 colonies that were homozygous for the allele only edited at position C.sub.−18. Importantly, two of the base editors produced 100% precisely edited homozygous clones (FIG. 6; Table 1).

    TABLE-US-00002 TABLE 1 Base editors with CDA1 truncations exhibit many more homozygous C.sub.−19T.sub.−18 colonies than nCDA1-BE3 and cCDA1-BE3*. For each base editor, 24 canavanine-resistant colonies were randomly picked from the selection plate followed by sequencing of the Can1 locus. The major types of edited products are listed in the first column of the table, and the colony numbers representing each product type are given. For nCDA1-BE3, the genotype of the remaining colony is C.sub.−19T.sub.−18/T.sub.−19C.sub.−18; for nCDA1Δ194-BE3, the remaining two colonies are C.sub.−19T.sub.−18/T.sub.−19C.sub.−18 and T.sub.−19T.sub.−18/T.sub.−19C.sub.−18, respectively. nCDA1- cCDA1- nCDA1Δ194- nCDA1Δ193- nCDA1Δ192- nCDA1Δ190- nCDA1Δ184- nCDA1Δ176- BE3 BE3 BE3 BE3 BE3 BE3 BE3 BE3 C.sub.−19T.sub.−18 1/24 6/24 18/24  21/24  22/24  24/24  24/24  20/24  Homozygous C.sub.−19T.sub.−18/T.sub.−19T.sub.−18 0/24 11/24  2/24 2/24 1/24 0/24 0/24 2/24 Heterozygous T.sub.−19C.sub.−18 22/24  7/24 2/24 1/24 1/24 0/24 0/24 2/24 Homozygous

    [0185] Expanding Precision Base Editing to Non-NGG PAM Sequences

    [0186] Recently, several Cas9 variants have been described that recognize non-NGG PAM sequences (Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259-1262 (2018); Hu, J. H. et al. EvolvedCas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57-63 (2018); Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481-485 (2015)). To test whether Cas9 variants with expanded PAM compatibility can be used in our high-precision BEs to extend their DNA targeting scope, we replaced the nCas9 sequence with that of four different nCas9 variants recognizing four different non-NGG PAMs (FIG. 7a, b). Of particular interest is the minimal PAM sequence NG (as recognized by variant SpCas9-NG; FIG. 7b), which occurs much more frequently in DNA sequences than the wild-type PAM sequence NGG. As deaminase domain, we tested the full-length CDA1 and a series of truncated CDA1 versions that lack 13 to 20 C-terminal amino acids. When fused to nCas9, this range of C-terminal deletions was shown previously to provide the maximum increase in editing precision while retaining high editing activity (21). In this way, 32 new BEs were constructed: the full-length CDA1 (as N-terminal or C-terminal fusion) and 6 CDA1 deletions combined with the VQR-Cas9 variant (nCDA1Δ195-VQRBE3; nCDA1Δ194-VQRBE3; nCDA1Δ193-VQRBE3; nCDA1Δ192-VQRBE3; nCDA1Δ190-VQRBE3; nCDA1Δ188-VQRBE3; FIG. 7a, c) that recognizes the PAM sequence NGA (FIG. 7b), the full-length CDA1 (as N-terminal or C-terminal fusion) and 6 CDA1 deletions combined with the VRER-Cas9 variant (nCDA1Δ195-VRERBE3; nCDA1Δ194-VRERBE3; nCDA1Δ193-VRERBE3; nCDA1Δ192-VRERBE3; nCDA1Δ190-VRERBE3; nCDA1Δ188-VRERBE3; FIG. 7d) that recognizes the PAM sequence NGCG (FIG. 7b), the full-length CDA1 (as N-terminal or C-terminal fusion) and 6 CDA1 deletions combined with the xCas9 variant (nCDA1Δ195-xBE3; nCDA1Δ194-xBE3; nCDA1Δ193-xBE3; nCDA1Δ192-xBE3; nCDA1Δ190-xBE3; nCDA1Δ188-xBE3; FIG. 7e) that recognizes the PAM sequences NG, GAA and GAT (FIG. 7b), and the full-length CDA1 (as N-terminal or C-terminal fusion) and 6 CDA1 deletions combined with the SpCas9-NG variant (nCDA1Δ195-NGBE3; nCDA1Δ194-NGBE3; nCDA1Δ193-NGBE3; nCDA1Δ192-NGBE3; nCDA1Δ190-NGBE3; nCDA1Δ188-NGBE3; FIG. 7f,g) that recognizes the PAM sequence NG (FIG. 7b).

    [0187] For each set of BEs, we tested target sites that contain a stretch of consecutive cytidines within the activity window upstream of the PAM. PolyC motifs were used to provide the most rigorous test for editing precision, in that specific editing of a single C would require maximum discriminatory power. Editing efficiency and precision were first assessed by dideoxy chain termination sequencing of amplified PCR products, and the two best-performing BEs were then further characterized by high-throughput next-generation sequencing (FIG. 7; see Methods; Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)).

    [0188] The VQR-Cas9 variant recognizes the PAM sequence NGA (FIG. 7b). The activity window ranged from C.sub.−14 to C.sub.−19 in target sequence PolyC-1-NGA and from C.sub.−14 to C.sub.−20 in target sequence PolyC-2-NGA. By contrast, VQR-Cas9 BEs harboring CDA1 truncations had a much narrower activity window and predominantly edited positions C.sub.−17 and C.sub.−18 in target sequence PolyC-1-NGA and C.sub.−17 and C.sub.−18 in sequence PolyC-2-NGA (FIG. 7c). Interestingly, the largest truncation, nCDA1Δ188-VQRBE3, even discriminated to some extent between the two positions in that C.sub.−18 was edited nearly twice as efficiently as C.sub.−17 in sequence PolyC-1-NGA (FIG. 7c).

    [0189] The VRER-Cas9 variant recognizes the PAM sequence NGCG (FIG. 7b). The truncated variants efficiently edited both target sequences and displayed greatly superior editing precision on sequence PolyC-4-NGCG (FIG. 7d).

    [0190] Recently, two Cas9 variants, designated xCas9 and SpCas9-NG, were developed that show greatly relaxed PAM recognition specificity and, instead of NGG, recognize the minimal PAM sequence NG (Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259-1262 (2018); Hu, J. H. et al. EvolvedCas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57-63 (2018)). When tested on three non-NGG target sites (PolyC-1-NGA, PolyC-5-NGC and PolyC-6-NGT), xCas9-derived BEs displayed detectable activity only on one of the three sites (PolyC-5-NGC; FIG. 7e. A particularly well-performing truncated variant, nCDA1Δ194-xBE3, edited position C.sub.−18 with high selectivity and strongly enhanced efficiency (of more than 35%; FIG. 7e).

    [0191] BEs constructed with SpCas9-NG edited all three non-NGG target sites (FIG. 7f,g). Compared to the full-length BE (nCDA1-NGBE3), the truncated versions again exhibited superior editing preference. The truncated versions predominantly edited one or two nucleotides (FIG. 7f,g). Typically, position C.sub.18 was most efficiently recognized, but dependent on the target site, some BEs also edited C.sub.−17 (e.g., nCDA1Δ194-NGBE3 in PolyC-1-NGA) or C.sub.−19 (e.g., nCDA1Δ194-NGBE3 in PolyC-6-NGT; FIG. 7g) at high efficiency. For comparison, we also tested the reciprocal fusions harboring the SpCas9 variants at the N-terminus (cCDA1-VQRBE3, cCDA1-VRERBE3, cCDA1-xBE3 and cCDA1-NGBE3). These fusions showed a narrower activity window than the C-terminal fusions, but did not reach the specificity of the best-performing fusions with truncated CDA1 versions. When target sites upstream of the wild-type PAM of Cas9, NGG, were tested, the SpCas9-NG-derived BEs displayed reduced editing activity compared to wild-type Cas9-derived BEs. This finding is consistent with recent studies that reported lower genome editing activity of SpCas9-NG on canonical NGG PAMs (Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259-1262 (2018), Zhong, Z. et al. Improving plant genome editing with high-fidelity xCas9 and non-canonical PAM-targeting Cas9-NG. Mol. Plant 12, 1027-1036 (2019)).

    [0192] Taken together, our findings indicate that BEs with truncated CDA1 sequences tolerate replacement of Cas9 with variants that recognize alternative PAMs, including PAMs with greatly relaxed specificity such as NG. The high efficiency and accuracy of these new editors greatly expand the editing scope of high-precision BEs.

    [0193] Engineering of A3A-Based Precision BEs

    [0194] In an attempt to develop additional high-precision BEs that selectively edit nucleotide positions other than C.sub.−18, we generated fusions of several deaminases to nCas9 by omitting a linker sequence between the two proteins. This approach was taken to investigate the possibility that these deaminases inherently harbor a linker-like fragment at their C-terminus.

    [0195] Six different deaminases were tested by fusing nCas9 directly to their C-terminus. The fusion proteins were then assayed for their base editing efficiency on two polyC-containing target sites. The BE based on the human cytidine deaminase APOBEC3A (A3A; (Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)), referred to as hA3A-NL-BE3, displayed the best performance in that it conferred the highest editing efficiency on both target sequences. We, therefore chose A3A for further optimization.

    [0196] For comparison, we also generated an A3A-BE3 editor with the standard XTEN linker (Rees, H. A. et al. Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun. 8, 15790 (2017)). Surprisingly, we observed that hA3A-NL-BE3 (for brevity subsequently referred to as A3A-NL-BE3) showed a slightly broader editing window than A3A-BE3 and also caused a shift in the most strongly edited (central) positions, despite the shorter connection between the cytidine deaminase domain (A3A) and the nCas9 domain of the fusion protein. This may be attributable to linker removal slightly altering the spatial structure of the fusion protein (and, in this way, affecting positioning of the deaminase domain on the target sequence), and would be consistent with the variable effects of linker engineering seen in previous studies (Kim, Y. B., et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 36, 371-376 (2017); Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). The editing efficiency of both BEs was similar at both tested sites (Supplementary FIG. 10), possibly suggesting that the C-terminus of A3A is extraordinarily flexible.

    [0197] A3A-based BEs were reported to exhibit a lower dependence on the sequence context, reduced sensitivity to DNA methylation and a wider editing window (Zong, Y. et al. Efficient C-to-T base editing in plants using a fusion of nCas9 and human APOBEC3A. Nat. Biotechnol. 36, 950-953 (2018); Wang, X., et al. Efficient base editing in methylated regions with a human APOBEC3A-Cas9 fusion. Nat. Biotechnol. 36, 946-949 (2018); Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)). To test if the precision of these BEs can be improved by narrowing the activity window, we constructed a series of truncations at the C-terminus of A3A and determined their impact on base editing (FIG. 8a). Previously, we showed that the major gain in site selectivity for CDA1-based BEs was seen with the removal of at least 13 amino acids from the C-terminus (nCDA1Δ195-BE3; Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). Alignment of A3A with CDA1 revealed that the 13 amino acid CDA1 truncation corresponds to residue 194 of A3A. We generated six BEs with C-terminally truncated A3A versions fused to nCas9 and tested them on two polycytidine motifs (FIG. 8b). Deletion of 17 amino acids (A3AΔ182-BE3) made the editing significantly more specific in that A3AΔ182-BE3 preferentially edits position C.sub.−15 or C.sub.−16 (FIG. 8b). When tested on target sequence polyC-8, the truncated editors A3AΔ190-BE3, A3AΔ186-BE3 and A3AΔ182-BE3 displayed improved specificity. For example, A3AΔ182-BE3 exhibits a strong preference for positions C.sub.−15 and C.sub.−16, while showing greatly reduced editing activity at the neighboring positions C.sub.−17 and C.sub.−14 (FIG. 8c).

    [0198] To confirm the superior precision of the truncated editors A3AΔ190-BE3, A3AΔ186-BE3 and A3AΔ182-BE3, we compared the base editing outcomes when targeting different cytidines within the yeast Can1 gene (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). Each of the five tested sites contains one or two target Cs in different distances from the PAM, ranging from position C.sub.−19 to position C.sub.−11 (FIG. 9a). Canavanine-resistant colonies clones can arise only when C-to-T base editing occurs and results in synthesis of an inactive gene product (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). While the BE with the full-length A3A (A3A-BE3) non-selectively edited all Cs within a window of nine nucleotides (FIG. 9b), the BEs containing truncated A3A versions mainly edited positions C.sub.−15 or C.sub.−16, confirming the results obtained with polycytidine target sequences (FIG. 8b).

    [0199] It was recently reported that mutations in A3A (N57G mutation in an A3A variant dubbed eA3A) can reduce bystander editing frequency by enhancing the preference of the editor for TCR motif (with R being A or G; Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977-982 (2018)). We, therefore, generated an eA3A-BE3 editor and compared it with our best-performing truncated A3A BEs. We found that eA3A, although mainly editing C.sub.−15 or C.sub.−16, suffered from reduced editing activity (FIG. 9b), suggesting relatively poor editing at non-TCR sites.

    [0200] It has been reported that A3A-derived BEs can induce significant transcriptome-wide off-target editing at the RNA level. Specific amino acid substitutions (R128A or Y130F) in A3A largely eliminate these off-target activities (Zhou, C., et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275-278 (2019); Grünewald, J., et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433-437 (2019)). We therefore investigated the effect of each of these two mutations on the width of the base editing window and the BE activity when combined with proper A3A truncations. Introduction of either of the two mutations into A3A-BE3 neither reduced the base editing efficiency, consistent with previous findings (Zhou, C., et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275-278 (2019)), nor did it affect the base editing window. When we combined these mutations with the two optimal A3A truncations (A3AΔ186 and A3AΔ182), we found that Y130F, but not R128A, in combination with the A3A version truncated at residue 186 (i.e., BE variant A3A(Y130F)Δ186-BE3) displays a base editing window and an editing efficiency similar to A3AΔ186-BE3, and thus should be used to suppress off-target RNA editing.

    [0201] Together, these data demonstrate that the A3A deaminase can be engineered to obtain high-precision base editors that predominantly edit position C.sub.−15 or C.sub.−16, while retaining high editing efficiency.

    [0202] Analysis of Genome-Wide Off-Target Editing by Whole Genome Sequencing

    [0203] Recently, cytosine base editors were reported to produce substantial genome-wide off-target effects that are largely independent of the sgRNA (Jin, S., et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292-295 (2019); Zuo, E., et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289-292 (2019)). Since a narrower editing window means fewer target nucleotides, we envisioned that our narrow-window base editors could also reduce the off-target DNA editing. We, therefore, investigated off-target editing in yeast cells treated with nCDA1-BE3, cCDA1-BE3, nCDA1Δ190-BE3 and a no BE control, in combination with an sgRNA targeting a Can1 site. Canavanine selection was used to isolate colonies harboring on-target editing events. The truncated CDA1 version Δ190 was chosen for this experiment, because we had previously shown that this version displays high editing precision as well as high editing efficiency for most tested sites (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)). For all constructs, cultures grown from three different transformed colonies were mixed, followed by genomic DNA isolation and whole-genome sequencing. The three BE variants showed comparable numbers of indels as the no BE control (FIG. 10a). When the total number of SNVs (single nucleotide variants) was analyzed, the full-length fusions were found to display many more SNVs than the control, in agreement with the previous reports on off-target effects of cytosine BEs (Jin, S., et al. Cytosine, but note adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292-295 (2019); Zuo, E., et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289-292 (2019)). However, the truncated version exhibited a substantially reduced number of SNVs that was only slightly higher than that of the negative control (FIG. 10b). We also analyzed the mutation types and found that, in nCDA1-BE3 and cCDA1-BE3, the frequency of C-to-T (G-to-A) transitions was significantly higher than in the control and the truncated base editor nCDA1Δ190-BE3 (FIG. 10c). These findings indicate that high editing precision of BEs can contribute to reduced non-specific editing at off-target sites.

    [0204] Guidelines for the Choice of the Optimal Cytidine BE

    [0205] Three different cytidine deaminases (APOBEC1, CDA1 and APOBEC3A) have been engineered to produce efficient cytosine BEs, modify PAM specificities, and alter position and width of the editing window. BE variants with different properties have been obtained that differ in their suitability for (i) different target sequences and (ii) different positions of the C to be edited within the protospacer.

    [0206] There is now sufficient information available to define some guidelines for the choice of the best BE depending on the position of the C, the sequence context and the presence or absence of bystander Cs (see Table 1 which is presented further above).

    [0207] If the target C is located at position C.sub.−19 relative to the PAM and no bystander C is present, three BEs can be recommended: nCDA1-BE3, nCDA1Δ198-BE3 and A3A-NL-BE3. If the target C is in the same position (C.sub.−19), but has a bystander C directly upstream (CCDDD motif, with D being any nucleotide but C), cCDA1-BE3 would be the best choice (Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)).

    [0208] If the target C is located at C.sub.−18 and has a bystander C in its vicinity (NCN motif, with N being any nucleotide, including a possible bystander C), BEs with C-terminal truncations of CDA1 (Δ194 to Δ188) are recommended (FIG. 7; Tan, J., et al. Engineering of high-precision base editors for site-specific single nucleotide replacement. Nat. Commun. 10, 439 (2019)), and it may be advisable to test two or three different truncations.

    [0209] For editing at C.sub.−16 with a 5′ bystander C (NCD context) or editing at C.sub.−15 with a 3′ bystander C (DCN), A3AΔ182-BE3 and A3A(Y130F)Δ186-BE3 are the editors of choice (FIGS. 2 and 4; Table 1).

    [0210] With our set of narrow-window BEs, many disease-causing T-to-C and A-to-G mutations can now potentially be corrected in a precise manner. For example, a T-to-C mutation at position 497 of the coding region of the human gene encoding presenilin-1 (PSEN1-L166P mutation) is associated with early-onset Alzheimer's disease (Moehlmann, T., et al. Presenilin-1 mutations of leucine 166 equally affect the generation of the Notch and APP intracellular domains independent of their effect on Abeta 42 production. Proc. Natl. Acad. Sci. USA 99, 8025-8030 (2002)). This mutation can be corrected by a BE that has this C within its predicted editing window at position −18 relative to the PAM sequence NG. Precision is important here, because an additional C is present immediately adjacent to the target C (at position 496), which also lies within the editing window (−19 relative to the PAM). Using precision BEs with CDA1 truncations, this C now can be targeted much more accurately (Table 1). Similarly, an A-to-G mutation at position 980 of the coding region of the tyrosinase-encoding gene (representing a T-to-C mutation in the complementary strand) causes oculocutaneous albinism (TYR-Y327C mutation; 8). The target C is in a TCAC motif and located in position −15 of the PAM sequence AGG. Therefore, this mutation can be precisely corrected with the BEs A3AΔ182-BE3 or A3A(Y130F)Δ186-BE3 (Table 1).