Fusion proteins for base editing

Abstract

Provided are fusion proteins that include an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein, optionally further with uracil glycosylase inhibitor (UGI). Such a fusion protein is able to conduct base editing in DNA by deaminating cytosine to uracil, even when the cytosine is in a GpC context or is methylated.

Claims

1. A fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein, wherein the APOBEC3A is a mutant of human APOBEC3A having a mutation selected from the group consisting of D131Y, Y132D, W104A, P134Y and combinations thereof, according to residue numbering in SEQ ID NO:1, wherein the amino acid sequence of the fusion protein is at least 85% identity to SEQ ID NO: 1, and wherein the mutant retains cytidine deaminase activity.

2. The fusion protein of claim 1, further comprising a uracil glycosylase inhibitor (UGI).

3. The fusion protein of claim 1, wherein the mutant human APOBEC3A has mutations selected from the group consisting of Y130F+D131E+Y132D, Y130F+D131Y+Y132D, W98Y+W104A, W98Y+P134Y, W104A+P134Y, W104A+Y130F, W104A+Y132D, W98Y+W104A+Y130F, W98Y+W104A+Y132D, W104A+Y130F+P134Y, and W104A+Y132D+P134Y, according to residue numbering in SEQ ID NO:1.

4. The fusion protein of claim 1, wherein the human APOBEC3A is human APOBEC3A isoform a or isoform b.

5. The fusion protein of claim 1, wherein the APOBEC3A comprises an amino acid sequence selected from the group consisting of SEQ ID NO: 3-5, 22-23, 25-34.

6. The fusion protein of claim 1, wherein the Cas protein is selected from the group consisting of Streptococcus pyogenes CRISPR-associated protein (SpCas9), Francisella novicida Cas9 (FnCas9), Streptococcus thermophilus CRISPR-1 Cas9 (St1Cas9), Streptococcus thermophilus CRISPR-3 Cas9 (St3Cas9), NmCas9, SaCas9, AsCpf1, LbCpf1, FnCpf1, D1135V/R1335Q/T1337R (VQR) SpCas9, D1135E/R1335Q/T1337R (EQR) SpCas9, D1135V/G1218R/R1335E/T1337R (VRER) SpCas9, E1369R/E1449H/R1556A (RHA) FnCas9, E782K/N968K/R1015H (KKH) Staphylococcus aureus Cas9 (SaCas9), Neisseria meningitidis Cas9 (NmeCas9), Streptococcus thermophilus Cas9 (StCas9), Campylobacter jejuni (CjCas9), Acidaminococcus sp. Cpf1 (AsCpf1), Franscisella novicida Cpf1 (FnCpf1), Smithella sp. Cpf1 (SsCpf1), Porphyromonas crevioricanis Cpf1 (PcCpf1), Butyrivibrio proteoclasticus Cpf1 (BpCpf1), Candidatus Methanoplasma termitum (CmtCpf1), Leptospira inadai Cpf1 (LiCpf1), Porphyromonas macacae Cpf1 (PmCpf1), Parcubacteria bacterium 3310 Cpf1 (Pb3310Cpf1), Parcubacteria bacterium 4417 Cpf1 (Pb4417Cpf1), Butyrivibrio sp. NC3005 Cpf1 (BsCpf1), Eubacterium eligens Cpf1 (EeCpf1), Bacillus hisashii Cas12b (BhCas12b), Alicyclobacillus kakegawensis Cas12b (AkCas12b), Elusimicrobia bacterium Cas12b (EbCas12b), Laceyella sediminis Cas12b (LsCas12b), Ruminococcus flavefaciens Cas13d (RfCas13d), Leptotrichia wadei Cas13a (LwaCas13a), Prevotella sp. Cas13b (PspCas13b), Porphyromonas gulae Cas13b (PguCas13b), Porphyromonas gulae Cas13b (RanCas13b), CasX, and CasY.

7. The fusion protein of claim 1, wherein the Cas protein is a mutant of protein selected from the group consisting of SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpf1, LbCpf1, FnCpf1, VQR SpCas9, EQR SpCas9, VRER SpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpf1, FnCpf1, SsCpf1, PcCpf1, BpCpf1, CmtCpf1, LiCpf1, PmCpf1, Pb3310Cpf1, Pb4417Cpf1, BsCpf1, EeCpf1, BhCas12b, AkCas12b, EbCas12b, LsCas12b, RfCas13d, LwaCas13a, PspCas13b, PguCas13b, RanCas13b, CasX, and CasY, wherein the mutant retains the DNA-binding capability but does not introduce double strand DNA breaks.

8. The fusion protein of claim 7, wherein the mutant Cas protein is capable of introducing a nick to one of the strands of a double stranded DNA bound by the mutant.

9. The fusion protein of claim 7, wherein the mutant Cas protein comprises an amino acid sequence selected from the group consisting of SEQ ID NO:11, and 37-39.

10. The fusion protein of claim 1, wherein the first fragment is at the N-terminal side of the second fragment.

11. The fusion protein of claim 2, wherein the UGI comprises the amino acid sequence of SEQ ID NO:12 or has at least at least 90% sequence identity to SEQ ID NO:12 and retains the uracil glycosylase inhibition activity.

12. The fusion protein of claim 11, wherein the first fragment is at the N-terminal side of the second fragment which is at the N-terminal side of the UGI.

13. A method of editing a target polynucleotide, comprising contacting to the target polynucleotide a fusion protein of claim 1 and a guide RNA having at least partial sequence complementarity to the target polynucleotide, wherein the editing comprises deamination of a cytosine (C) in the target polynucleotide.

14. The method of claim 13, wherein the C is methylated.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1A-B. Construction and performance of hA3A-BE. Panel A: Schematic diagram illustrating the co-expression of BE3/sgRNA or hA3A-BE/sgRNA. Panel B: Comparing to the co-expression of BE3/sgRNA, the co-expression of hA3A-BE/sgRNA achieved more efficient base editing on the C of GpC in the sgRNA targeted genomic regions (sgFANCF-M-L6 and sgSITE4). Dashed boxes represent the cytosine's locating in the context of GpC. Sequences as shown in panel B, from left column to right column and from top to down, are SEQ ID NO:51-56.

(2) FIG. 2A-B. Construction and performance of hA3A-BE-Y130F and hA3A-BE-Y132D. Panel A: Schematic diagram illustrating the co-expression of hA3A-BE/sgRNA, hA3A-BE-Y130F/sgRNA or hA3A-BE-Y132D/sgRNA. Panel B: Comparing to the co-expression of hA3A-BE/sgRNA, the co-expression of hA3A-BE-Y130F/sgRNA or hA3A-BE-Y132D/sgRNA induced base editing in more narrowed windows in the sgRNA targeted genomic regions (sgSITE3 and sgEMX1). Dashed boxes represent the base editing windows. Sequences as shown in panel B, from left column to right column and from top to down, are SEQ ID NO:57-64.

(3) FIG. 3A-B. Construction and performance of hA3A-BE-W104A and hA3A-BE-D131Y. Panel A: Schematic diagram illustrating the co-expression of hA3A-BE/sgRNA, hA3A-BE-W104A/sgRNA or hA3A-BE-D131Y/sgRNA. Panel B: Comparing to the co-expression of hA3A-BE/sgRNA, the co-expression of hA3A-BE-W104A/sgRNA or hA3A-BE-D131Y/sgRNA induced more efficient base editing in the sgRNA targeted genomic regions (sgFANCF and sgSITE2). Dashed boxes represent the edited cytosine's. Sequences as shown in panel B, from left column to right column and from top to down, are SEQ ID NO:65-72.

(4) FIG. 4A-B. Construction and performance of hA3A-BE-Y130E-D131E-Y132D and hA3A-BE-Y130E-D131Y-Y132D. Panel A: Schematic diagram illustrating the co-expression of hA3A-BE/sgRNA, hA3A-BE-Y130E-D131E-Y132D/sgRNA or hA3A-BE-Y130E-D131Y-Y132D/sgRNA. Panel B: Comparing to the co-expression of hA3A-BE/sgRNA, the co-expression of hA3A-BE-Y130E-D131E-Y132D/sgRNA or hA3A-BE-Y130E-D131Y-Y132D/sgRNA induced base editing in more narrowed windows in the sgRNA targeted genomic regions (sgFANCF and sgSITE3). Dashed boxes represent the edited cytosine's. Sequences as shown in panel B, from left column to right column and from top to down, are SEQ ID NO:73-80.

(5) FIG. 5a-h. hA3A-BE3 induces efficient base editing in methylated region and in GpC context. (a) Distribution of BE-editable T-to-C (or A-to-G) variants. Potentially editable cytosines (underlined) are sub-classified according to their 3 adjacent bases. (b) Screening of BEs for efficient base editing in a high-methylation background. A series of new BEs were constructed by fusing different APOBEC/AID deaminases with Cas9 nickase (nCas9) and uracil DNA glycosylase inhibitor (UGI). (c) Cumulative base editing frequencies induced by different BEs in unmethylated and methylated vectors. A commonly used rA1-based BE3 was chosen for comparison. Meanss.d. were from three (six for hA3A-BE3) independent experiments. (d) Immunoblots of BE3 and hA3A-BE3 co-transfected with unmethylated or methylated vectors. Tubulin was used as a loading control and immunoblot images are representative of three independent experiments. (e) Comparison of base editing efficiencies induced by BE3 and hA3A-BE3 in genomic regions with natively high levels of DNA methylation. C-to-T editing frequencies of indicated cytosines were determined individually. Target site sequences are shown with the BE3 editing window (position 4-8, setting the base distal to the PAM as position 1) in pink, PAM in cyan and CpG site in capital. Shaded gray, guanines at 5 end of editable cytosines. NT, native HEK293T cells with no treatment. (f) Statistical analysis of normalized C-to-T editing frequencies in regions with natively high levels of DNA methylation shown in (e), setting the ones induced by BE3 as 100%. n=48 samples from three independent experiments. (g) Comparison of base editing efficiencies induced by BE3 and hA3A-BE3 at C of GpC in genomic regions with natively low levels of DNA methylation. (h) Statistical analysis of normalized C-to-T editing frequencies at GpC sites in regions with natively low levels of DNA methylation shown in (g), setting the ones induced by BE3 as 100%. n=24 samples from three independent experiments. (e,g) Meanss.d. were from three independent experiments. (f,h) P value, one-tailed Student's t test. The median and interquartile range (IQR) are shown. Sequences as shown in FIG. 5e are SEQ ID NO:81-89. Sequences as shown in FIG. 5g are SEQ ID NO:90-95.

(6) FIG. 6a-i. Improvements in hA3A-BE3. (a) Comparison of base editing efficiencies induced by BE3, hA3A-BE3, hA3A-BE3-Y130F and hA3A-BE3-Y132D in genomic regions with natively high levels of DNA methylation. Target site sequences are shown with the overlapped editing window (position 4-7) in pink, PAM in cyan and CpG site in capital. NT, native HEK293T cells with no treatment. (b) Statistical analysis of normalized C-to-T editing frequencies in the overlapped editing window shown in (a), setting the ones induced by BE3 as 100%. n=12 samples from three independent experiments. (c) Comparison of base editing efficiencies induced by BE3, hA3A-BE3, hA3A-BE3-Y130F and hA3A-BE3-Y132D at C of GpC in the overlapped editing window in genomic regions with natively low levels of DNA methylation. (d) Statistical analysis of normalized C-to-T editing frequencies shown in (c), setting the ones induced by BE3 as 100%. n=9 samples from three independent experiments. (e) Immunoblots of BEs transfected into HEK293T cells. Tubulin was used as a loading control and immunoblot images are representative of three independent experiments. (f) Comparison of base editing efficiencies induced by hA3A-BE3-Y130F, hA3A-eBE-Y130F, hA3A-BE3-Y132D and hA3A-eBE-Y132D at C of GpC in the overlapped editing window in genomic regions with natively low levels of DNA methylation. (g) Statistical analysis of normalized C-to-T editing frequencies shown in (f), setting the ones induced by hA3A-BE3-Y130F (left) or hA3A-BE3-Y132D (right) as 100%. n=9 samples from three independent experiments. (h,i) Comparison of product purity (h) and indels (i) yielded by hA3A-BE3-Y130F, hA3A-eBE-Y130F, hA3A-BE3-Y132D and hA3A-eBE-Y132D in genomic DNA regions with natively low levels of DNA methylation. Asterisk denotes an unusually high basal indel frequency (or amplification, sequencing or alignment artifact) at the examined VEGFA-M-c site in NT. (a,c,f,i) Meanss.d. were from three independent experiments. (b,d,g) P value, one-tailed Student's t test. The median and IQR are shown. Sequences as shown in FIG. 6a are SEQ ID NO:96-98.

(7) FIGS. 7A-B and 8A-B show the vector structures of each of the tested base editors and charting showing their editing efficiencies on the target DYRK1A gene.

(8) FIGS. 9A-B and 10A-B show the vector structures of each of the tested base editors and charting showing their editing efficiencies on the target SITE6 gene.

(9) FIGS. 11A-B and 12A-B show the vector structures of each of the tested base editors and charting showing their editing efficiencies on the target RUNX1 gene.

(10) FIG. 13-18 show the sequencing results for Examples 3-5. Sequences as shown in FIG. 13, from left column to right column and from top to down, are SEQ ID NO:99-114. Sequences as shown in FIG. 14, from left column to right column and from top to down, are SEQ ID NO:115-126. Sequences as shown in FIG. 15, from left column to right column and from top to down, are SEQ ID NO:127-142. Sequences as shown in FIG. 16, from left column to right column and from top to down, are SEQ ID NO:143-156. Sequences as shown in FIG. 17, from left column to right column and from top to down, are SEQ ID NO:157-172. Sequences as shown in FIG. 18, from left column to right column and from top to down, are SEQ ID NO:173-184.

DETAILED DESCRIPTION

Definitions

(11) It is to be noted that the term a or an entity refers to one or more of that entity; for example, an antibody, is understood to represent one or more antibodies. As such, the terms a (or an), one or more, and at least one can be used interchangeably herein.

(12) As used herein, the term polypeptide is intended to encompass a singular polypeptide as well as plural polypeptides, and refers to a molecule composed of monomers (amino acids) linearly linked by amide bonds (also known as peptide bonds). The term polypeptide refers to any chain or chains of two or more amino acids, and does not refer to a specific length of the product. Thus, peptides, dipeptides, tripeptides, oligopeptides, protein, amino acid chain, or any other term used to refer to a chain or chains of two or more amino acids, are included within the definition of polypeptide, and the term polypeptide may be used instead of, or interchangeably with any of these terms. The term polypeptide is also intended to refer to the products of post-expression modifications of the polypeptide, including without limitation glycosylation, acetylation, phosphorylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, or modification by non-naturally occurring amino acids. A polypeptide may be derived from a natural biological source or produced by recombinant technology, but is not necessarily translated from a designated nucleic acid sequence. It may be generated in any manner, including by chemical synthesis.

(13) The term isolated as used herein with respect to cells, nucleic acids, such as DNA or RNA, refers to molecules separated from other DNAs or RNAs, respectively, that are present in the natural source of the macromolecule. The term isolated as used herein also refers to a nucleic acid or peptide that is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized. Moreover, an isolated nucleic acid is meant to include nucleic acid fragments which are not naturally occurring as fragments and would not be found in the natural state. The term isolated is also used herein to refer to cells or polypeptides which are isolated from other cellular proteins or tissues. Isolated polypeptides is meant to encompass both purified and recombinant polypeptides.

(14) As used herein, the term recombinant as it pertains to polypeptides or polynucleotides intends a form of the polypeptide or polynucleotide that does not exist naturally, a non-limiting example of which can be created by combining polynucleotides or polypeptides that would not normally occur together.

(15) Homology or identity or similarity refers to sequence similarity between two peptides or between two nucleic acid molecules. Homology can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base or amino acid, then the molecules are homologous at that position. A degree of homology between sequences is a function of the number of matching or homologous positions shared by the sequences. An unrelated or non-homologous sequence shares less than 40% identity, though preferably less than 25% identity, with one of the sequences of the present disclosure.

(16) A polynucleotide or polynucleotide region (or a polypeptide or polypeptide region) has a certain percentage (for example, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98% or 99%) of sequence identity to another sequence means that, when aligned, that percentage of bases (or amino acids) are the same in comparing the two sequences. This alignment and the percent homology or sequence identity can be determined using software programs known in the art, for example those described in Ausubel et al. eds. (2007) Current Protocols in Molecular Biology. Preferably, default parameters are used for alignment. One alignment program is BLAST, using default parameters. In particular, programs are BLASTN and BLASTP, using the following default parameters: Genetic code=standard; filter=none; strand=both; cutoff=60; expect=10; Matrix=BLOSUM62; Descriptions=50 sequences; sort by=HIGH SCORE; Databases=non-redundant, GenBank+EMBL+DDBJ+PDB+GenBank CDS translations+SwissProtein+SPupdate+PIR. Biologically equivalent polynucleotides are those having the above-noted specified percent homology and encoding a polypeptide having the same or similar biological activity.

(17) The term an equivalent nucleic acid or polynucleotide refers to a nucleic acid having a nucleotide sequence having a certain degree of homology, or sequence identity, with the nucleotide sequence of the nucleic acid or complement thereof. A homolog of a double stranded nucleic acid is intended to include nucleic acids having a nucleotide sequence which has a certain degree of homology with or with the complement thereof. In one aspect, homologs of nucleic acids are capable of hybridizing to the nucleic acid or complement thereof. Likewise, an equivalent polypeptide refers to a polypeptide having a certain degree of homology, or sequence identity, with the amino acid sequence of a reference polypeptide. In some aspects, the sequence identity is at least about 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%. In some aspects, the equivalent polypeptide or polynucleotide has one, two, three, four or five addition, deletion, substitution and their combinations thereof as compared to the reference polypeptide or polynucleotide. In some aspects, the equivalent sequence retains the activity (e.g., epitope-binding) or structure (e.g., salt-bridge) of the reference sequence.

(18) Hybridization reactions can be performed under conditions of different stringency. In general, a low stringency hybridization reaction is carried out at about 40 C. in about 10SSC or a solution of equivalent ionic strength/temperature. A moderate stringency hybridization is typically performed at about 50 C. in about 6SSC, and a high stringency hybridization reaction is generally performed at about 60 C. in about 1SSC. Hybridization reactions can also be performed under physiological conditions which is well known to one of skill in the art. A non-limiting example of a physiological condition is the temperature, ionic strength, pH and concentration of Mg.sup.2+ normally found in a cell.

(19) A polynucleotide is composed of a specific sequence of four nucleotide bases: adenine (A); cytosine (C); guanine (G); thymine (T); and uracil (U) for thymine when the polynucleotide is RNA. Thus, the term polynucleotide sequence is the alphabetical representation of a polynucleotide molecule. This alphabetical representation can be input into databases in a computer having a central processing unit and used for bioinformatics applications such as functional genomics and homology searching. The term polymorphism refers to the coexistence of more than one form of a gene or portion thereof. A portion of a gene of which there are at least two different forms, i.e., two different nucleotide sequences, is referred to as a polymorphic region of a gene. A polymorphic region can be a single nucleotide, the identity of which differs in different alleles.

(20) The terms polynucleotide and oligonucleotide are used interchangeably and refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides or analogs thereof. Polynucleotides can have any three-dimensional structure and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: a gene or gene fragment (for example, a probe, primer, EST or SAGE tag), exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, dsRNA, siRNA, miRNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes and primers. A polynucleotide can comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure can be imparted before or after assembly of the polynucleotide. The sequence of nucleotides can be interrupted by non-nucleotide components. A polynucleotide can be further modified after polymerization, such as by conjugation with a labeling component. The term also refers to both double- and single-stranded molecules. Unless otherwise specified or required, any embodiment of this disclosure that is a polynucleotide encompasses both the double-stranded form and each of two complementary single-stranded forms known or predicted to make up the double-stranded form.

(21) The term encode as it is applied to polynucleotides refers to a polynucleotide which is said to encode a polypeptide if, in its native state or when manipulated by methods well known to those skilled in the art, it can be transcribed and/or translated to produce the mRNA for the polypeptide and/or a fragment thereof. The antisense strand is the complement of such a nucleic acid, and the encoding sequence can be deduced therefrom.

(22) Fusion Proteins

(23) The current rA1-based BEs (base editors) cannot efficiently edit C in methylated regions or in the context of GpC, which limits the use of base editing. The present disclosure provides fusion molecules that combine an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A or A3A) and a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein, optionally further with uracil glycosylase inhibitor (UGI).

(24) The resulting fusion protein is able to efficiently deaminate cytosine's to uracil's resulting in C to T substitution. Such base editing, surprisingly and unexpectedly, was effective even when the C follows a G (i.e., in a GpC dinucleotide context) and/or even when it is in a methylated region. This has significant clinical significance as cytosine methylation is common in living cells.

(25) In accordance with one embodiment of the present disclosure, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein.

(26) APOBEC3A, also referred to as apolipoprotein B mRNA editing enzyme catalytic subunit 3A or A3A, is a protein of the APOBEC3 family found in humans, non-human primates, and some other mammals. The APOBEC3A protein lacks the zinc binding activity of other family members. In human, isoform a (NP_663745.1; SEQ ID NO:1) and isoform b (NP_001257335.1; SEQ ID NO:6) both are active, while isoform a includes a few more residues close to the N-terminus. The term APOBEC3A also encompasses variants and mutants that have certain level (e.g., 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%) of sequence identity to a wildtype mammalian APOBEC3A and retains its cytidine deaminating activity.

(27) As demonstrated in the experimental examples, certain mutants (e.g., Y130F (SEQ ID NO:2), Y132D (SEQ ID NO:3), W104A (SEQ ID NO:4), D131Y (SEQ ID NO:5), D131E (SEQ ID NO:22), W98Y (SEQ ID NO:24), W104A (SEQ ID NO:25), and P134Y (SEQ ID NO:26)) even outperformed the wildtype human APOBEC3A. Furthermore, a number of tested combinations of these mutations also exhibited great performances. Moreover, although not specifically tested, the same mutations are believed to also work in the isoform b of A3A. Examples of such variants and mutants are provided in Table 1 below.

(28) TABLE-US-00001 TABLE1 ExamplesofAPOBEC3ASequences Name Sequence SEQIDNO: Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 1 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV wildtype 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 2 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIFDYDPLYKEALQMLRDAGAQV Y130F 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 3 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYDDDPLYKEALQMLRDAGAQV Y132D 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 4 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV W104A 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 5 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYYYDPLYKEALQMLRDAGAQV D131Y 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 6 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENT isoformb 101HVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH wildtype 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 7 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENT isoformb 101HVRLRIFAARIFDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH Y112F 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 8 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGACGEVRAFLQENT isoformb 101HVRLRIFAARIYDDDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH Y114D 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 9 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSAGCAGRVRAFLQENT isoformb 101HVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH W86A 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 10 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENT isoformb 101HVRLRIFAARIYYYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH D113Y 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 22 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIFEDDPLYKEALQMLRDAGAQV Y130FD131E 151SIMTYDEFKHCWDTFVDHQGVFPQPWDGLDEHSQALSGRLRAILQNQGN Y132D Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 23 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIFYDDPLYKEALQMLRDAGAQV Y130FD131Y 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Y132D Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 24 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRDLDLVPSLQLDPAQIYRVTWFISYSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV W98Y 150SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 25 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101VFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEALQMLRDAGAQV P134Y 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 26 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISYSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV W98Y+W104A 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 27 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISYSP isoforma 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEALQMLRDAGAQV W98Y+P134Y 151SIMTYDEFKHCWDTFVDHQGVPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 28 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKEALQMLRDAGAQV W104A+P134Y 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 29 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISYSP isoforma 101VFSAGCAGEVRAFLQENTHVRLRIFAARIFDYDPLYKEALQMLRDAGAQV W98Y+W104A+ 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Y130F Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 30 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISYSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDDDPLYKEALQMLRDAGAQV W98Y+W104A+ 151SIMTYDEFKHCWDTFVDHQGVPFQPWDGLDEHSQALSGRLRAILQNQGN Y132D Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 31 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101VFSAGCAGEVRAFLQENTHVRLRIFAARIFDYDYLYKEALQMLRDAGAQV W104A+Y130F+ 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN P134Y Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 32 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDDDYLYKEALQMLRDAGAQV W104A+Y132D+ 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN P134Y Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 33 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIFDYDPLYKEALQMLRDAGAQV W104A+Y130F 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 34 APOBEC3A 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP isoforma 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDDDPLYKEALQMLRDAGAQV W104A+Y132D 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALAGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 35 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISYSPCFSWGCAGRVRAFLQENT isoformbW80Y 101HVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN Human 1MEASPASGPRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYG 36 APOBEC3A 51RHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGRVRAFLQENT isoformbP116Y 101HVRLRIFAARIYDYDYLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDH 151QGCPFQPWDGLDEHSQALSGRLRAILQNQGN

(29) In some embodiments, the APOBEC3A in the fusion protein of the present disclosure is human isoform a or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% of sequence identity to isoform a. In some embodiments, the APOBEC3A in the fusion protein of the present disclosure is human isoform b or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% of sequence identity to isoform b. In some embodiments, the APOBEC3A in the fusion protein of the present disclosure is rat APOBEC3 or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% of sequence identity to the rat APOBEC3. In some embodiments, the APOBEC3A in the fusion protein of the present disclosure is mouse APOBEC3 or a sequence having at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% of sequence identity to the mouse APOBEC3. In some embodiments, the sequence retains the cytidine deaminase activity.

(30) In some embodiments, the APOBEC3A includes a Y130F mutation, according to residue numbering in SEQ ID NO:1 (the numbering would be different in human isoform b and rat and mouse sequences, but can readily converted). In some embodiments, the APOBEC3A includes a Y132D mutation, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes a W104A mutation, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes a D131Y mutation, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes a D131E mutation, according to residue numbering in SEQ ID NO: 1. In some embodiments, the APOBEC3A includes a W98Y mutation, according to residue numbering in SEQ ID NO: 1. In some embodiments, the APOBEC3A includes a P134Y mutation, according to residue numbering in SEQ ID NO:1.

(31) In some embodiments, the APOBEC3A includes mutations Y130F, D131E, and Y132D, according to residue numbering in SEQ ID NO:1 (the numbering would be different in human isoform b and rat and mouse sequences, but can readily converted). In some embodiments, the APOBEC3A includes mutations Y130F, D131Y, and Y132D, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W98Y and W104A, according to residue numbering in SEQ ID NO: 1. In some embodiments, the APOBEC3A includes mutations W98Y and P134Y, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W104A and P134Y, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W98Y, W104A, and Y130F, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W98Y, W104A, and Y132D, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W104A, Y130F, and P134Y, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W104A, Y132D, and P134Y, according to residue numbering in SEQ ID NO:1. In some embodiments, the APOBEC3A includes mutations W104A and Y130F, according to residue numbering in SEQ ID NO: 1. In some embodiments, the APOBEC3A includes mutations W104A and Y132D, according to residue numbering in SEQ ID NO:1.

(32) Example APOBEC3A sequences are shown in SEQ ID NO:1-10 and 22-36.

(33) The APOBEC3A protein can allow further modifications, such as addition, deletion and/or substitutions, at other amino acid locations as well. Such modifications can be substitution at one, two or three or more positions. In one embodiment, the modification is substitution at one of the positions. Such substitutions, in some embodiments, are conservative substitutions. In some embodiments, the modified APOBEC3A protein still retains the cytidine deaminase activity. In some embodiments, the modified APOBEC3A protein retains the mutations tested in the experimental examples.

(34) In various embodiments, the APOBEC3A can be substituted with another deaminase such as A3B (APOBEC3B), A3C (APOBEC3C), A3D (APOBEC3D), A3F (APOBEC3F), A3G (APOBEC3G), A3H (APOBEC3H), A3 (APOBEC3), and AID (AICDA).

(35) In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3B (APOBEC3B) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3C (APOBEC3C) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3D (APOBEC3D) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3F (APOBEC3F) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3G (APOBEC3G) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3H (APOBEC3H) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3 (APOBEC3) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein. In some embodiments, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit AID (AICDA) and a second fragment comprising a clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein.

(36) In some embodiments, the APOBEC protein is a human protein. In some embodiments, the APOBEC protein is a mouse or rat protein. Some example APOBEC proteins are listed in the table below.

(37) TABLE-US-00002 Example Deaminase version NCBI Accession Nos. A3B (APOBEC3B) hA3B (human) NP_001257340, NP_004891 A3C (APOBEC3C) hA3C (human) NP_055323 A3D (APOBEC3D) hA3D (human) NP_689639, NP_001350710 A3F (APOBEC3F) hA3F (human) NP_660341, NP_001006667 A3G (APOBEC3G) hA3G (human) NP_068594, NP_001336365, NP_001336366, NP_001336367 A3H (APOBEC3H) hA3H (human) NP_001159474, NP_001159475, NP_001159476, and NP_861438 A1 (APOBEC1) hA1 (human) NP_001291495, NP_001635, NP_005880 mA1 (mouse) NP_001127863, NP_112436 A3 (APOBEC3) mA3 (mouse) NP_001153887, NP_001333970, NP_084531 AID (AICDA) hAID (human) NP_001317272, NP_065712 mAID (mouse) NP_033775 cAICDA NP_001187114 (channel catfish)

(38) A conservative amino acid substitution is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art, including basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a nonessential amino acid residue in an immunoglobulin polypeptide is preferably replaced with another amino acid residue from the same side chain family. In another embodiment, a string of amino acids can be replaced with a structurally similar string that differs in order and/or composition of side chain family members.

(39) Non-limiting examples of conservative amino acid substitutions are provided in the table below, where a similarity score of 0 or higher indicates conservative substitution between the two amino acids.

(40) TABLE-US-00003 TABLE A Amino Acid Similarity Matrix C G P S A T D E N Q H K R V M I L F Y W W 8 7 6 2 6 5 7 7 4 5 3 3 2 6 4 5 2 0 0 17 Y 0 5 5 3 3 3 4 4 2 4 0 4 5 2 2 1 1 7 10 F 4 5 5 3 4 3 6 5 4 5 2 5 4 1 0 1 2 9 L 6 4 3 3 2 2 4 3 3 2 2 3 3 2 4 2 6 I 2 3 2 1 1 0 2 2 2 2 2 2 2 4 2 5 M 5 3 2 2 1 1 3 2 0 1 2 0 0 2 6 V 2 1 1 1 0 0 2 2 2 2 2 2 2 4 R 4 3 0 0 2 1 1 1 0 1 2 3 6 K 5 2 1 0 1 0 0 0 1 1 0 5 H 3 2 0 1 1 1 1 1 2 3 6 Q 5 1 0 1 0 1 2 2 1 4 N 4 0 1 1 0 0 2 1 2 E 5 0 1 0 0 0 3 4 D 5 1 1 0 0 0 4 T 2 0 0 1 1 3 A 2 1 1 1 2 S 0 1 1 1 P 3 1 6 G 3 5 C 12

(41) TABLE-US-00004 TABLE B Conservative Amino Acid Substitutions For Amino Acid Substitution With Alanine D-Ala, Gly, Aib, -Ala, L-Cys, D-Cys Arginine D-Arg, Lys, D-Lys, Orn D-Orn Asparagine D-Asn, Asp, D-Asp, Glu, D-Glu Gln, D-Gln Aspartic Acid D-Asp, D-Asn, Asn, Glu, D-Glu, Gln, D-Gln Cysteine D-Cys, S-Me-Cys, Met, D-Met, Thr, D-Thr, L-Ser, D-Ser Glutamine D-Gln, Asn, D-Asn, Glu, D-Glu, Asp, D-Asp Glutamic Acid D-Glu, D-Asp, Asp, Asn, D-Asn, Gln, D-Gln Glycine Ala, D-Ala, Pro, D-Pro, Aib, -Ala Isoleucine D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met Leucine Val, D-Val, Met, D-Met, D-Ile, D-Leu, Ile Lysine D-Lys, Arg, D-Arg, Orn, D-Orn Methionine D-Met, S-Me-Cys, Ile, D-Ile, Leu, D-Leu, Val, D-Val Phenylalanine D-Phe, Tyr, D-Tyr, His, D-His, Trp, D-Trp Proline D-Pro Serine D-Ser, Thr, D-Thr, allo-Thr, L-Cys, D-Cys Threonine D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met, Val, D-Val Tyrosine D-Tyr, Phe, D-Phe, His, D-His, Trp, D-Trp Valine D-Val, Leu, D-Leu, Ile, D-Ile, Met, D-Met

(42) The term clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas) protein or simply Cas protein refers to RNA-guided DNA endonuclease enzymes associated with the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) adaptive immunity system in Streptococcus pyogenes, as well as other bacteria. Non-limiting examples of Cas proteins include Streptococcus pyogenes Cas9 (SpCas9), Staphylococcus aureus Cas9 (SaCas9), Acidaminococcus sp. Cas12a (Cpf1), Lachnospiraceae bacterium Cas12a (Cpf1), Francisella novicida Cas12a (Cpf1). Additional examples are provided in Komor et al., CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes, Cell. 2017 Jan. 12; 168(1-2):20-36.

(43) Example Cas proteins include SpCas9, FnCas9, St1Cas9, St3Cas9, NmCas9, SaCas9, AsCpf1, LbCpf1, FnCpf1, VQR SpCas9, EQR SpCas9, VRER SpCas9, RHA FnCas9, KKH SaCas9, NmeCas9, StCas9, CjCas9, AsCpf1, FnCpf1, SsCpf1, PcCpf1, BpCpf1, CmtCpf1, LiCpf1, PmCpf1, Pb3310Cpf1, Pb4417Cpf1, BsCpf1, EeCpf1, BhCas12b, AkCas12b, EbCas12b, LsCas12b, RfCas13d, LwaCas13a, PspCas13b, PguCas13b, RanCas13b, CasX, CasY and those provided in Table C below.

(44) TABLE-US-00005 TABLE C Example Cas Proteins Cas protein types Cas proteins Cas9 proteins Cas 9 from Streptococcus pyogenes (SpCas9) Cas9 from Staphylococcus aureus (SaCas9) Cas9 from Neisseria meningitidis (NmeCas9) Cas9 from Streptococcus thermophilus (StCas9) Cas9 from Campylobacter jejuni (CjCas9) Cas12a (Cpf1) proteins Cas12a (Cpf1) from Lachnospiraceae bacterium Cas12a (LbCpf1) Cas12a (Cpf1) from Acidaminococcus sp BV3L6 (AsCpf1) Cas12a (Cpf1) from Francisella novicida sp BV3L6 (FnCpf1) Cas12a (Cpf1) from Smithella sp SC K08D17 (SsCpf1) Cas12a (Cpf1) from Porphyromonas crevioricanis (PcCpf1) Cas12a (Cpf1) from Butyrivibrio proteoclasticus (BpCpf1) Cas12a (Cpf1) from Candidatus Methanoplasma termitum (CmtCpf1) Cas12a (Cpf1) from Leptospira inadai (LiCpf1) Cas12a (Cpf1) from Porphyromonas macacae (PmCpf1) Cas12a (Cpf1) from Peregrinibacteria bacterium GW2011 WA2 33 10 (Pb3310Cpf1) Cas12a (Cpf1) from Parcubacteria bacterium GW2011 GWC2 44 17 (Pb4417Cpf1) Cas12a (Cpf1) from Butyrivibrio sp. NC3005 (BsCpf1) Cas12a (Cpf1) from Eubacterium eligens (EeCpf1) Cas12b (C2c1) proteins Cas12b (C2c1) Bacillus hisashii (BhCas12b) Cas12b (C2c1) Bacillus hisashii with a gain-of-function mutation (see, e.g., Strecker et al., Nature Communications 10 (article 212) (2019) Cas12b (C2c1) Alicyclobacillus kakegawensis (AkCas12b) Cas12b (C2c1) Elusimicrobia bacterium (EbCas12b) Cas12b (C2c1) Laceyella sediminis (Ls) (LsCas12b) Cas13 proteins Cas13d from Ruminococcus flavefaciens XPD3002 (RfCas13d) Cas13a from Leptotrichia wadei (LwaCas13a) Cas13b from Prevotella sp. P5-125 (PspCas13b) Cas13b from Porphyromonas gulae (PguCas13b) Cas13b from Riemerella anatipestifer (RanCas13b) Engineered Cas proteins Nickases (mutation in one nuclease domain) Catalytically inactive mutant (dCas; mutations in both of the nuclease domains) Enhanced variants with improved specificity (see, e.g., Chen et al., Nature, 550, 407-410 (2017)

(45) In some embodiments, the Cas protein is a mutant of protein selected from the above, wherein the mutant retains the DNA-binding capability but does not introduce double strand DNA breaks.

(46) For example, it is known that in SpCas9, residues Asp10 and His840 are important for Cas9's catalytic (nuclease) activity. When both residues are mutated to Ala, the mutant loses the nuclease activity. In another embodiment, only the Asp10Ala mutation is made, and such a mutant protein cannot generate a double strand break; rather, a nick is generated on one of the strands. Such a mutant is also referred to as a Cas9 nickase. A non-limiting example of a Cas9 nickase is provided is SEQ ID NO: 11. Non-limiting example of a Cas12a nickase are provided is SEQ ID NO:37-39. Cas proteins also encompass mutants of known Cas proteins that have certain sequence identity (e.g., 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99% or more). In some embodiments, the Cas protein retains the catalytic (nuclease) activity.

(47) In some embodiments, the Cas protein in a fusion protein of the present disclosure is a Cas12a (Cpf1, CRISPR-associated endonuclease in Prevotella and Francisella 1) protein. In conventional base editors, Cas9 is the commonly used DNA endonuclease. The Cas12a (Cpf1) has the advantage of recognizing A/T rich sequence when used together with APOBEC1 in base editors. In another surprising discovery of the present disclosure, when APOBEC1 was replaced with A3A, the editing efficiency was greatly increased (see, e.g., Examples 3-5 and FIGS. 7B, 9B and 11B). Yet, the editing efficiency of such a Cas12a-A3A can be further increased when the A3A includes a few tested mutations (Examples 3-5 and FIGS. 7B, 9B and 11B) and the editing window such a Cas12a-A3A can be narrowed to achieve more precise editing when even more tested mutations are included in A3A (Examples 3-5 and FIGS. 8B, 10B and 12B).

(48) In some embodiments, therefore, provided is a fusion protein comprising a first fragment comprising an apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A) and a second fragment comprising a CRISPR-associated endonuclease in Prevotella and Francisella 1 (Cpf1). Examples of APOBEC3A, as well as its alternatives (e.g., A3B (APOBEC3B), A3C (APOBEC3C), A3D (APOBEC3D), A3F (APOBEC3F), A3G (APOBEC3G), A3H (APOBEC3H), A3 (APOBEC3), or AID (AICDA)) and biological equivalents (homologues) have been disclosed above. Non-limiting example fusion sequences are provided in SEQ ID NO:40-50.

(49) In some embodiments, the fusion protein further comprises a uracil glycosylase inhibitor (UGI). A non-limiting example of UGI is found in Bacillus phage AR9 (YP_009283008.1). In some embodiments, the UGI comprises the amino acid sequence of SEQ ID NO:12 or has at least at least 90% sequence identity to SEQ ID NO:12 and retains the uracil glycosylase inhibition activity.

(50) In some embodiments, the UGI is not fused to the fusion protein, but rather is provided separately (free UGI, not fused to a Cas protein or a cytosine deaminase) when the fusion protein is used for genomic editing. In some embodiments, the free UGI is provided with the fusion protein which also includes a UGI portion.

(51) Preferably, a peptide linker is provided between each of the fragments in the fusion protein. In some embodiments, the peptide linker has from 1 to 100 amino acid residues (or 3-20, 4-15, without limitation). In some embodiments, at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% of the amino acid residues of peptide linker are amino acid residues selected from the group consisting of alanine, glycine, cysteine, and serine. In some embodiments, the peptide linker has an amino acid sequence of SEQ ID NO:13 or 14.

(52) The APOBEC3A, Cas protein, and UGI can be arranged in any manner. However, in a preferred embodiment, APOBEC3A is placed at the N-terminal side of the Cas protein. In one embodiment, the Cas protein is placed at the N-terminal side of the UGI.

(53) In some embodiments, the fusion protein further comprises a nuclear localization sequence such as SEQ ID NO:15.

(54) Non-limiting examples of fusion proteins include those having an amino acid sequence selected from the group consisting of SEQ ID NO:16-20.

(55) TABLE-US-00006 TABLE2 AdditionalSequences Name Sequence SEQIDNO: Cas9-Nickase 1MYPYDVPDYASPKKKRKVEASDKKYSIGLAIGTNSVGWAVITDEYKVPSK 11 51KFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC 101YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHE 151KYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSD 201VDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLP 251GEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA 301QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQ 351DLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFEIPIL 401EKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFY 451PFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE 501VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYV 551TEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEI 601SGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDRE 651MIEERLKTYAHLFDDKVMKQKLRRRYTGWGRLSRKLINGIRDKQSGKTIL 701DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGS 751PAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRER 801MKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDI 851NRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMK 901NYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKH 951VAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINN 1001YHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIG 1051KATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDF 1101ATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKK 1151YGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPID 1201FLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPS 1251KYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRV 1301ILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTT 1351IDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSPKKKRKVEAS Uracil-DNA- 1TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST 12 glycosylase 51DENVMLLTSDAPEYKPWALVIQDSNGENKIKML inhibitor(UGI) Linker1 1SGSETPGTSESATPES 13 Linker2 1SGGS 14 Nuclear 1PKKKRKV localization sequence Fusionprotein1 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 16 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNS 201GSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG 251NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIF 301SNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIY 351HLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI 401QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG 451LFGNLIALSLSLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQY 501ADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLK 551ALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEYFKFIKPILEKMDGT 601EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN 651REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA 701SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRK 751PAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDR 801FNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL 851KTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSD 901GFANRNFMQLIHDDSLTFKEIDQKAQVSGQGDSLHEHIANLAGSPAIKKG 951ILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE 1001GIKELGSQILHEKPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY 1051DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL 1101LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILD 1151SRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD 1201AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKY 1251FFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV 1301LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDS 1351PTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKG 1401YKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFL 1451YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN 1501LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY 1551TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQ 1601LVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEY 1651KPWALVIQDSNGENKIKMLSGGSPKKKRKV Fusionprotein2 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 17 (Y130F) 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP 101CFSWGCAGEVRAFLQENTHVRLRIFAARIFDYDPLYKEALQMLRDAGAQV 151SIMTYDEFKHVWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNS 201GSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG 251NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIF 301SNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIY 351HLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI 401QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG 451LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQY 501ADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLK 551ALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKIPLEKMDGT 601EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN 651REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA 701SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRK 751PAFLSGEQKKQIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDR 801FNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL 851KTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSD 901GFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG 951ILQTVKVVDEKVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE 1001GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY 1051DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL 1101LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILD 1151SRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD 1201AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKY 1251FFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV 1301LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDS 1351PTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKG 1401YKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSLYVNFL 1451YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN 1501LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY 1551TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQ 1601LVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEY 1651KPWALVIQDSNGENKIKMLSGGSPKKKRKV Fusionprotein3 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 18 (Y132D) 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYDDDPLYKEALQMLRDAGAQV 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNS 201GSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG 251NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIF 301SNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIY 351HLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI 401QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG 451LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQY 501ADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLK 551ALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGT 601EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN 651REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA 701SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRK 751PAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDR 801FNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL 851KTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSD 901GFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG 951ILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE 1001GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY 1051DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL 1101LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILD 1151SRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD 1201AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKY 1251FFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV 1301LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDS 1351PTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKG 1401YKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFL 1451YLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN 1501LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY 1551TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQ 1601LVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEY 1651KPWALVIQDSNGENKIKMLSGGSPKKKRKV Fusionprotein4 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 19 (W104A) 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP 101CFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNS 201GSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG 251NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIF 301SNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIY 351HLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI 401QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG 451LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQY 501ADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLK 551ALVRQQLPEKYEIKFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGT 601EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN 651REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA 701SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRK 751PAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDR 801FNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL 851KTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSD 901GFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG 951ILQTVKVVDELVKVMGHRKPENIVIEMARENQTTQKGQKNSRERMKRIEE 1001GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY 1051DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL 1101LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILD 1151SRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD 1201AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKY 1251FFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV 1301LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDS 1351PTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKG 1401YKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFL 1451YLASHYEKLKGSPENDEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN 1501LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY 1551TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQ 1601LVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEY 1651KPWALVIQDSNGENKIKMLSGGSPKKKRKV Fusionprotein5 1MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQ 20 51HRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSP 101CFSWGCAGEVRAFLQENTHVRLRIFAARIYYYDPLYKEALQMLRDAGAQV 151SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNS 201GSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLG 251NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIF 301SNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIY 351HLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFI 401QLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNG 451LFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQY 501ADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLK 551ALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGT 601EELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDN 651REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGA 701SAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRK 751PAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDR 801FNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL 851KTYAHLFDDKVMKQLKRRRYTGWGRLSRKPINGIRDKQSGKTILDFLKSD 901GFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKG 951ILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE 1001GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDY 1051DVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL 1101LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILD 1151SRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD 1201AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKY 1251FFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKV 1301LSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDS 1351PTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKG 1401YKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFL 1451YLASHYELKLGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADAN 1501LDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY 1551TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQ 1601LVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEY 1651KPWALVIQDSNGENKIKMLSGGSPKKKRKV DNAconstruct 1Atatgccaagtacgccccctattgacgtcaatgacggtaaatggcccgcc 21 51tggcattatgcccagtacatgaccttatgggactttcctacttggcagta 101catctacgtattagtcatcgctattaccatggtgatgcggttttggcagt 151acatcaatgggcgtggatagcggtttgactcacggggatttccaagtctc 201caccccattgacgtcaatgggagtttgttttggcaccaaaatcaacggga 251ctttccaaaatgtcgtaacaactccgccccattgacgcaaatgggcggta 301ggcgtgtacggtgggaggtctatataagcagagctggtttagtgaaccgt 351cagatccgctagagatccgcggccgctaatacgactcactatagggagag 401ccgccaccatggaagccagcccagcatccgggcccagacacttgatggat 451ccacacatattcacttccaactttaacaatggcattggaaggcataagac 501ctacctgtgctacgaagtggagcgcctggacaatggcacctcggtcaaga 551tggaccagcacaggggctttctacacaaccaggctaagaatcttctctgt 601ggcttttacggccgccatgcggagctgcgcttcttggacctggttccttc 651tttgcagttggacccggcccagatctacagggtcacttggttcatctcct 701ggagcccctgcttctcctggggctgtgccggggaagtgcgtgcgttcctt 751caggagaacacacacgtgagactgcgtatcttcgctgcccgcatctatga 801ttacgaccccctatataaggaggcactgcaaatgctgcgggatgctgggg 851cccaagtctccatcatgacctacgatgaatttaagcactgctgggacacc 901tttgtggaccaccagggatgtcccttccagccctgggatggactagatga 951gcacagccaagccctgagtgggaggctgcgggccattctccagaatcagg 1001gaaacagcggcagcgagactcccgggacctcagagtccgccacacccgaa 1051agtgataaaaagtattctattggtttagccatcggcactaattccgttgg 1101atgggctgtcataaccgatgaatacaaagtaccttcaaagaaatttaagg 1151tgttggggaacacagaccgtcattcgattaaaaagaatcttatcggtgcc 1201ctcctattcgatagtggcgaaacggcagaggcgactcgcctgaaacgaac 1251cgctcggagaaggtatacacgtcgcaagaaccgaatatgttacttacaag 1301aaatttttagcaatgagatggccaaagttgacgattctttctttcaccgt 1351ttggaagagtccttccttgtcgaagaggacaagaaacatgaacggcaccc 1401catctttggaaacatagtagatgaggtggcatatcatgaaaagtacccaa 1451cgatttatcacctcagaaaaaagctagttgactcaactgataaagcggac 1501ctgaggttaatctacttggctcttgcccatatgataaagttccgtgggca 1551ctttctcattgagggtgatctaaatccggacaactcggatgtcgacaaac 1601tgttcatccagttagtacaaacctataatcagttgtttgaagagaaccct 1651ataaatgcaagtggcgtggatgcgaaggctattcttagcgcccgcctctc 1701taaatcccgacggctagaaaacctgatcgcacaattacccggagagaaga 1751aaaatgggttgttcggtaaccttatagcgctctcactaggcctgacacca 1801aattttaagtcgaacttcgacttagctgaagatgccaaattgcagcttag 1851taaggacacgtacgatgacgatctcgacaatctactggcacaaattggag 1901atcagtatgcggacttatttttggctgccaaaaaccttagcgatgcaatc 1951ctcctatctgacatactgagagttaatactgagattaccaaggcgccgtt 2001atccgcttcaatgatcaaaaggtacgatgaacatcaccaagacttgacac 2051ttctcaaggccctagtccgtcagcaactgcctgagaaatataaggaaata 2101ttctttgatcagtcgaaaaacgggtacgcaggttatattgacggcggagc 2151gagtcaagaggaattctacaagtttatcaaacccatattagagaagatgg 2201atgggacggaagagttgcttgtaaaactcaatcgcgaagatctactgcga 2251aagcagcggactttcgacaacggtagcattccacatcaaatccacttagg 2301cgaattgcatgctatacttagaaggcaggaggatttttatccgttcctca 2351aagacaatcgtgaaaagattgagaaaatcctaacctttcgcataccttac 2401tatgtgggacccctggcccgagggaactctcggttcgcatggatgacaag 2451aaagtccgaagaaacgattactccatggaattttgaggaagttgtcgata 2501aaggtgcgtcagctcaatcgttcatcgagaggatgaccaactttgaccag 2551aatttaccgaacgaaaaagtattgcctaagcacagtttactttacgagta 2601tttcacagtgtacaatgaactcacgaaagttaagtatgtcactgagggca 2651tgcgtaaacccgcctttctaagcggagaacagaagaaagcaatagtagat 2701ctgttattcaagaccaaccgcaaagtgacagttaagcaattgaaagagga 2751ctactttaagaaaattgaatgcttcgattctgtcgagatctccggggtag 2801aagatcgatttaatgcgtcacttggtacgtatcatgacctcctaaagata 2851attaaagataaggacttcctggataacgaagagaatgaagatatcttaga 2901agatatagtgttgactcttaccctctttgaagatcgggaaatgattgagg 2951aaagactaaaaacatacgctcacctgttcgacgataaggttatgaaacag 3001ttaaagaggcgtcgctatacgggctggggacgattgtcgcggaaacttat 3051caacgggataagagacaagcaaagtggtaaaactattctcgattttctaa 3101agagcgacggcttcgccaataggaactttatgcagctgatccatgatgac 3151tctttaaccttcaaagaggatatacaaaaggcacaggtttccggacaagg 3201ggactcattgcacgaacatattgcgaatcttgctggttcgccagccatca 3251aaaagggcatactccagacagtcaaagtagtggatgagctagttaaggtc 3301atgggacgtcacaaaccggaaaacattgtaatcgagatggcacgcgaaaa 3351tcaaacgactcagaaggggcaaaaaaacagtcgagagcggatgaagagaa 3401tagaagagggtattaaagaactgggcagccagatcttaaaggagcatcct 3451gtggaaaatacccaattgcagaacgagaaactttacctctattacctaca 3501aaatggaagggacatgtatgttgatcaggaactggacataaaccgtttat 3551ctgattacgacgtcgatcacattgtaccccaatcctttttgaaggacgat 3601tcaatcgacaataaagtgcttacacgctcggataagaaccgagggaaaag 3651tgacaatgttccaagcgaggaagtcgtaaagaaaatgaagaactattggc 3701ggcagctcctaaatgcgaaactgataacgcaaagaaagttcgataactta 3751actaaagctgagaggggtggcttgtctgaacttgacaaggccggatttat 3801taaacgtcagctcgtggaaacccgccaaatcacaaagcatgttgcacaga 3851tactagattcccgaatgaatacgaaatacgacgagaacgataagctgatt 3901cgggaagtcaaagtaatcactttaaagtcaaaattggtgtcggacttcag 3951aaaggattttcaattctataaagttagggagataaataactaccaccatg 4001cgcacgacgcttatcttaatgccgtcgtagggaccgcactcattaagaaa 4051tacccgaagctagaaagtgagtttgtgtatggtgattacaaagtttatga 4101cgtccgtaagatgatcgcgaaaagcgaacaggagataggcaaggctacag 4151ccaaatacttcttttattctaacattatgaatttctttaagacggaaatc 4201actctggcaaacggagagatacgcaaacgacctttaattgaaaccaatgg 4251ggagacaggtgaaatcgtatgggataagggccgggacttcgcgacggtga 4301gaaaagttttgtccatgccccaagtcaacatagtaaagaaaactgaggtg 4351cagaccggagggttttcaaaggaatcgattcttccaaaaaggaatagtga 4401taagctcatcgctcgtaaaaaggactgggacccgaaaaagtacggtggct 4451tcgatagccctacagttgcctattctgtcctagtagtggcaaaagttgag 4501aagggaaaatccaagaaactgaagtcagtcaaagaattattggggataac 4551gattatggagcgctcgtcttttgaaaagaaccccatcgacttccttgagg 4601cgaaaggttacaaggaagtaaaaaaggatctcataattaaactaccaaag 4651tatagtctgtttgagttagaaaatggccgaaaacggatgttggctagcgc 4701cggagagcttcaaaaggggaacgaactcgcactaccgtctaaatacgtga 4751atttcctgtatttagcgtcccattacgagaagttgaaaggttcacctgaa 4801gataacgaacagaagcaactttttgttgagcagcacaaacattatctcga 4851cgaaatcatagagcaaatttcggaattcagtaagagagtcatcctagctg 4901atgccaatctggacaaagtattaagcgcatacaacaagcacagggataaa 4951cccatacgtgagcaggcggaaaatattatccatttgtttactcttaccaa 5001cctcggcgctccagccgcattcaagtattttgacacaacggattcaccaa 5051aacgatacacttctaccaaggaggtgctagacgcgacactgattcaccaa 5101tccatcacgggattatatgaaactcggatagatttgtcacagcttggggg 5151tgactctggtggttctactaatctgtcagatattattgaaaaggagaccg 5201gtaagcaactggttatccaggaatccatcctcatgctcccagaggaggtg 5251gaagaagtcattgggaacaagccggaaagcgatatactcgtgcacaccgc 5301ctacgacgagagcaccgacgagaatgtcatgcttctgactagcgacgccc 5351ctgaatacaagccttgggctctggtcatacaggatagcaacggtgagaac 5401aagattaagatgctctctggtggttctcccaagaagaagaggaaagtcta 5451accggtcatcatcaccatcaccattgagtttaaacccgctgatcagcctc 5501gactgtgccttctagttgccagccatctgttgtttgcccctcccccgtgc 5551cttccttgaccctggaaggtgccactcccactgtcctttcctaataaaat 5601gaggaaattgcatcgcattgtctgagtaggtgtcattctattctgggggg 5651tggggtggggcaggacagcaagggggaggattgggaagacaatagcaggc 5701atgctggggatgcggtgggctctatggcttctgaggcggaaagaaccagc 5751tggggctcgataccgtcgacctctagctagagcttggcgtaatcatggtc 5801atagctgtttcctgtgtgaaattgttatccgctcacaattccacacaaca 5851tacgagccggaagcataaagtgtaaagcctagggtgcctaatgagtgagc 5901taactcacattaattgcgttgcgctcactgcccgctttccagtcgggaaa 5951cctgtcgtgccagctgcattaatgaatcggccaacgcgcggggagaggcg 6001gtttgcgtattgggcgctcttccgcttcctcgctcactgactcgctgcgc 6051tcggtcgttcggctgcggcgagcggtatcagctcactcaaaggcggtaat 6101acggttatccacagaatcaggggataacgcaggaaagaacatgtgagcaa 6151aaggccagcaaaaggccaggaaccgtaaaaaggccgcgttgctggcgttt 6201ttccataggctccgcccccctgacgagcatcacaaaaatcgacgctcaag 6251tcagaggtggcgaaacccgacaggactataaagataccaggcgtttcccc 6301ctggaagctccctcgtgcgctctcctgttccgaccctgccgcttaccgga 6351tacctgtccgcctttctcccttcgggaagcgtggcgctttctcatagctc 6401acgctgtaggtatctcagttcggtgtaggtcgttcgctccaagctgggct 6451gtgtgcacgaaccccccgttcagcccgaccgctgcgccttatccggtaac 6501tatcgtcttgagtccaacccggtaagacacgacttatcgccactggcagc 6551agccactggtaacaggattagcagagcgaggtatgtaggcggtgctacag 6601agttcttgaagtggtggcctaactacggctacactagaagaacagtattt 6651ggtatctgcgctctgctgaagccagttaccttcggaaaaagagttggtag 6701ctcttgatccggcaaacaaaccaccgctggtagcggtggtttttttgttt 6751gcaagcagcagattacgcgcagaaaaaaaggatctcaagaagatcctttg 6801atcttttctacggggtctgacgctcagtggaacgaaaactcacgttaagg 6851gattttggtcatgagattatcaaaaaggatcttcacctagatccttttaa 6901attaaaaatgaagttttaaatcaatctaaagtatatatgagtaaacttgg 6951tctgacagttaccaatgcttaatcagtgaggcacctatctcagcgatctg 7001tctatttcgttcatccatagttgcctgactccccgtcgtgtagataacta 7051cgatacgggagggcttaccatctggccccagtgctgcaatgataccgcga 7101gacccacgctcaccggctccagatttatcagcaataaaccagccagccgg 7151aagggccgagcgcagaagtggtcctgcaactttatccgcctccatccagt 7201ctattaattgttgccgggaagctagagtaagtagttcgccagttaatagt 7251ttgcgcaacgttgttgccattgctacaggcatcgtggtgtcacgctcgtc 7301gtttggtatggcttcattcagctccggttcccaacgatcaaggcgagtta 7351catgatcccccatgttgtgcaaaaaagcggttagctccttcggtcctccg 7401atcgttgtcagaagtaagttggccgcagtgttatcactcatggttatggc 7451agcactgcataattctcttactgtcatgccatccgtaagatgcttttctg 7501tgactggtgagtactcaaccaagtcattctgagaatagtgtatgcggcga 7551ccgagttgctcttgcccggcgtcaatacgggataataccgcgccacatag 7601cagaactttaaaagtgctcatcattggaaaacgttcttcggggcgaaaac 7651tctcaaggatcttaccgctgttgagatccagttcgatgtaacccactcgt 7701gcacccaactgatcttcagcatcttttactttcaccagcgtttctgggtg 7751agcaaaaacaggaaggcaaaatgccgcaaaaaagggaataagggcgacac 7801ggaaatgttgaatactcatactcttcctttttcaatattattgaagcatt 7851tatcagggttattgtctcatgagcggatacatatttgaatgtatttagaa 7901aaataaacaaataggggttccgcgcacatttccccgaaaagtgccacctg 7951acgtcgacggatcgggagatcgatctcccgatcccctagggtcgactctc 8001agtacaatctgctctgatgccgcatagttaagccagtatctgctccctgc 8051ttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagctacaa 8101caaggcaaggcttgaccgacaattgcatgaagaatctgcttagggttagg 8151cgttttgcgctgcttcgcgatgtacgggccagatatacgcgttgacattg 8201attattgactagttattaatagtaatcaattacggggtcattagttcata 8251gcccatatatggagttccgcgttacataacttacggtaaatggcccgcct 8301ggctgaccgcccaacgacccccgcccattgacgtcaataatgacgtatgt 8351tcccatagtaacgccaatagggactttccattgacgtcaatgggtggagt 8401atttacggtaaactgcccacttggcagtacatcaagtgtatc Lb-dCas12a 1MSKLEKFTNCYSLSKTLRFKAIPVGKTQENIDNKRLLVEDEKRAEDYKGV 37 51KKLLDRYYLSFINDVLHSIKLKNLNNYISLFRKKTRTEKENKELENLEIN 101LRKEIAKAFKGNEGYKSLFKKDIIETILPEFLDDKDEIALVNSFNGFTTA 151FTGFFDNRENMFSEEAKSTSIAFRCINENLTRYISNMDIFEKVDAIFDKH 201EVQEIKEKILNSDYDVEDFFEGEFFNFVLTQEGIDVYNAIIGGFVTESGE 251KIKGLNEYINLYNQKTKQKLPKFKPLYKQVLSDRESLSFYGEGYTSDEEV 301LEVFRNTLNKNSEIFSSIKKLEKLFKNFDEYSSAGIFVKNGPAISTISKD 351IFGEWNVIRDKWNAEYDDIHLKKKAVVTEKYEDDRRKSFKKIGSFSLEQL 401QEYADADLSVVEKLKEIIIQKVDEIYKVYGSSEKLFDADFVLEKSLKKND 451AVVAIMKDLLDSVKSFENYIKAFFGEGKETNRDESFYGDFVLAYDILLKV 501DHIYDAIRNYVTQKPYSKDKFKLYFQNPQFMGGWDKDKETDYRATILRYG 551SKYYLAIMDKKYAKCLQKIDKDDVNGNYEKINYKLLPGPNKMLPKVFFSK 601KWMAYYNPSEDIQKIYKNGTFKKGDMFNLNDCHKLIDFFKDSISRYPKWS 651NAYDFNFSETEKYKDIAGFYREVEEQGYKVSFESASKKEVDKLVEEGKLY 701MFQIYNKDFSDKSHGTPNLHTMYFKLLFDENNHGQIRLSGGAELFMRRAS 751LKKEELVVHPANSPIANKNPDNPKKTTTLSYDVYKDKRFSEDQYELHIPI 801AINKCPKNIFKINTEVRVLLKHDDNPYVIGIARGERNLLYIVVVDGKGNI 851VEQYSLNEIINNFNGIRIKTDYHSLLDKKEKERFEARQNWTSIENIKELK 901AGYISQVVHKICELVEKYDAVIALADLNSGFKNSRVKVEKQVYQKFEKML 951IDKLNYMVDKKSNPCATGGALKGYQITNKFESFKSMSTQNGFIFYIPAWL 1001TSKIDPSTGFVNLLKTKYTSIADSKKFISSFDRIMYVPEEDLFEFALDYK 1051NFSRTDADYIKKWKLYSYGNRIRIFRNPKKNNVFDWEEVCLTSAYKELFN 1101KYGINYQQGDIRALLCEQSDKAFYSSFMALMSLMLQMRNSITGRTDVAFL 1151ISPVKNSDGIFYDSRNYEAQENAILPKNADANGAYNIARKVLWAIGQFKK 1201AEDEKLDKVKIAISNKEWLEYAQTSVKHGS AsCas12a 1MTQFEGFTNLYQVSKTLRFELIPQGKTLKHIQEQGFIEEDKARNDHYKEL 38 51KPIIDRIYKTYADQCLQLVQLDWENLSAAIDSYRKEKTEETRNALIEEQA 101TYRNAIHDYFIGRTDNLTDAINKRHAEIYKGLFKAELFNGKVLKQLGTVT 151TTEHENALLRSFDKFTTYFSGFYENRKNVFSAEDISTAIPHRIVQDNFPK 201FKENCHIFTRLITAVPSLREHFENVKKAIGIFVSTSIEEVFSFPFYNQLL 251TQTQIDLYNQLLGGISREAGTEKIKGLNEVLNLAIQKNDETAHIIASLPH 301RFIPLFKQILSDRNTLSFILEEFKSDEEVIQSFCKYKTLLRNENVLETAE 351ALFNELNSIDLTHIFISHKKLETISSALCDHWDTLRNALYERRISELTGK 401ITKSAKEKVQRSLKHEDINLQEIISAAGKELSEAFKQKTSEILSHAHAAL 451DQPLPTTLKKQEEKEILKSQLDSLLGLYHLLDWFAVDESNEVDPEFSARL 501TGIKLEMEPSLSFYNKARNYATKKPYSVEKFKLNFQMPTLASGWDVNKEK 551NNGAILFVKNGLYYLGIMPKQKGRYKALSFEPTEKTSEGFDKMYYDYFPD 601AAKMIPKCSTQLKAVTAHFQTHTTPILLSNNFIEPLEITKEIYDLNNPEK 651EPKKFQTAYAKKTGDQKGYREALCKWIDFTRDFLSKYTKTTSIDLSSLRP 701SSQYKDLGEYYAELNPLLYHISFQRIAEKEIMDAVETGKLYLFQIYNKDF 751AKGHHGKPNLHTLYWTGLFSPENLAKTSIKLNGQAELFYRPKSRMKRMAH 801RLGEKMLNKKLKDQKTPIPDTLYQELYDYVNHRLSHDLSDEARALLPNVI 851TKEVSHEIIKDRRFTSDKFFFHVPITLNYQAANSPSKFNQRVNAYLKEHP 901ETPIIGIDRGERNLIYITVIDSTGKILEQRSLNTIQQFDYQKKLDNREKE 951RVAARQAWSVVGTIKDLKQGYLSQVIHEIVDLMIHYQAVVVLENLNFGFK 1001SKRTGIAEKAVYQQFEKMLIDKLNCLVLKDYPAEKVGGVLNPYQLTDQFT 1051SFAKMGTQSGFLFYVPAPYTSKIDPLTGFVDPFVWKTIKNHESRKHFLEG 1101FDFLHYDVKTGDFILHFKMNRNLSFQRGLPGFMPAWDIVFEKNETQFDAK 1151GTPFIAGKRIVPVIENHRFTGRYRDLYPANELIALLEEKGIVFRDGSNIL 1201PKLLENDDSHAIDTMVALIRSVLQMRNSNAATGEDYINSPVRDLNGVCFD 1251SRFQNPEWPMDADANGAYHIALKGQLLLNHLKESKDLKLQNGISNQDWLA 1301YIQELRN FnCas12a 1MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKA 39 51KQIIDKYHQFFIEEILSSVCISEDLLQNYSDVYFKLKKSDDDNLQKDFKS 101AKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLILWLKQSKDNGI 151ELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSII 201YRIVDDNLPKFLENKAKYESLKDKAPEAINYEQIKKDLAEELTFDIDYKT 251SEVNQRVFSLDEVFEIANFNNYLNQSGITKFNTIIGGKFVNGENTKRKGI 301NEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVT 351TMQSFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLT 401DLSQQVFDDYSVIGTAVLEYITQQIAPKNLDNPSKKEQELIAKKTEKAKY 451LSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFDEIAQNKDNLA 501QISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSED 551KANILDKDEHFYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNF 601ENSTLANGWDKNKEPDNTAILFIKDDKYYLGVMNKKNNKIFDDKAIKENK 651GEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKN 701GSPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSI 751DEFYREVENQGYKLTFENISESYIDSVVNQGKLYLFQIYNKDFSAYSKGR 801PNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKKITHPAKEAIA 851NKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEI 901NLLLKEKANDVHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMK 951TNYHDKLAAIEKDRDSARKDWKKINNIKEMKEGYLSQVVHEIAKLVIEYN 1001AIVVFEDLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDKTGG 1051VLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYE 1101SVSKSQEFFSKFDKICYNLDKGYFEFSFDYKNFGDKAAKGKWTIASFGSR 1151LINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGECIKAAICGESD 1201KKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNM 1251PQDADANGAYHIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN dCas12a-hA3A-BE 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 40 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 41 BE-W98Y 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISYSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 42 BE-W104A 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 43 BE-P134Y 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 44 BE-W98Y-W104A 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISYSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 45 BE-W98Y-P134Y 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISYSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 46 BE-W104A-P134Y 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDYDYLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 47 BE-W98Y-W104A- 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV Y130F 101TWFISYSPCFSAGCAGEVRAFLQENTHVRLRIFAARIFDYDPLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A-BE- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 48 W98Y-W104A-Y132D 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISYSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDDDPLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A-BE- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 49 W104A-Y130F-P134Y 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV 101TWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIFDYDYLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV dCas12a-hA3A- 1MPKKKRKVMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDN 50 BE-W104A-Y132D- 51GTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRV P134Y 101TWFISWSPCFSAGCAGEVRAFLQENTHVRLRIFAARIYDDDYLYKRALQM 151LRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRA 201ILQNQGNSGSETPGTSESATPESMSKLEKFTNCYSLSKTLRFKAIPVGKT 251QENIDNKRLLVEDEKRAEDYKGVKKLLDRYYLSFINDVLHSIKLKNLNNY 301ISLFRKKTRTEKENKELENLEINLRKEIAKAFKGNEGYKSLFKKDIIETI 351LPEFLDDKDEIALVNSFNGFTTAFTGFFDNRENMFSEEAKSTSIAFRCIN 401ENLTRYISNMDIFEKVDAIFDKHEVQEIKEKILNSDYDVEDFFEGEFFNF 451VLTQEGIDVYNAIIGGFVTESGEKIKGLNEYINLYNQKTKQKLPKFKPLY 501KQVLSDRESLSFYGEGYTSDEEVLEVFRNTLNKNSEIFSSIKKLEKLFKN 551FDEYSSAGIFVKNGPAISTISKDIFGEWNVIRDKWNAEYDDIHLKKKAVV 601TEKYEDDRRKSFKKIGSFSLEQLQEYADADLSVVEKLKEIIIQKVDEIYK 651VYGSSEKLFDADFVLEKSLKKNDAVVAIMKDLLDSVKSFENYIKAFFGEG 701KETNRDESFYGDFVLAYDILLKVDHIYDAIRNYVTQKPYSKDKFKLYFQN 751PQFMGGWDKDKETDYRATILRYGSKYYLAIMDKKYAKCLQKIDKDDVNGN 801YEKINYKLLPGPNKMLPKVFFSKKWMAYYNPSEDIQKIYKNGTFKKGDMF 851NLNDCHKLIDFFKDSISRYPKWSNAYDFNFSETEKYKDIAGFYREVEEQG 901YKVSFESASKKEVDKLVEEGKLYMFQIYNKDFSDKSHGTPNLHTMYFKLL 951FDENNHGQIRLSGGAELFMRRASLKKEELVVHPANSPIANKNPDNPKKTT 1001TLSYDVYKDKRFSEDQYELHIPIAINKCPKNIFKINTEVRVLLKHDDNPY 1051VIGIARGERNLLYIVVVDGKGNIVEQYSLNEIINNFNGIRIKTDYHSLLD 1101KKEKERFEARQNWTSIENIKELKAGYISQVVHKICELVEKYDAVIALADL 1151NSGFKNSRVKVEKQVYQKFEKMLIDKLNYMVDKKSNPCATGGALKGYQIT 1201NKFESFKSMSTQNGFIFYIPAWLTSKIDPSTGFVNLLKTKYTSIADSKKF 1251ISSFDRIMYVPEEDLFEFALDYKNFSRTDADYIKKWKLYSYGNRIRIFRN 1301PKKNNVFDWEEVCLTSAYKELFNKYGINYQQGDIRALLCEQSDKAFYSSF 1351MALMSLMLQMRNSITGRTDVAFLISPVKNSDGIFYDSRNYEAQENAILPK 1401NADANGAYNIARKVLWAIGQFKKAEDEKLDKVKIAISNKEWLEYAQTSVK 1451HGSPKKKRKVSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKP 1501ESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGG 1551SPKKKRKV

(56) The present disclosure also provides isolated polynucleotides or nucleic acid molecules (e.g., SEQ ID NO:21) encoding the fusion proteins, variants or derivatives thereof of the disclosure. Methods of making fusion proteins are well known in the art and described herein.

(57) Compositions and Methods

(58) The present disclosure also provides compositions and methods. Such compositions comprise an effective amount of a fusion protein, and an acceptable carrier. In some embodiments, the composition further includes a guide RNA that has a desired complementarity to a target DNA. Such a composition can be used for base editing in a sample.

(59) The fusion proteins and the compositions can be used for base editing. In one embodiment, a method for editing a target polynucleotide is provided, comprising contacting to the target polynucleotide a fusion protein of the present disclosure and a guide RNA having at least partial sequence complementarity to the target polynucleotide, wherein the editing comprises deamination of a cytosine (C) in the target polynucleotide.

(60) It is shown that the presently disclosed fusion proteins can edit cytosine at any location and in any context, such as in CpC, ApC, GpC, TpC, CpA, CpG, CpC, CpT. It is surprising and unexpected, however, that these fusion proteins can edit C in a GpC dinucleotide context, and even when the C is methylated.

(61) The contacting between the fusion protein (and the guide RNA) and the target polynucleotide can be in vitro, in particular in a cell culture. When the contacting is ex vivo, or in vivo, the fusion proteins can exhibit clinical/therapeutic significance.

EXAMPLES

Example 1: Base Editors

(62) Human apolipoprotein B mRNA editing enzyme catalytic subunit 3A (APOBEC3A, hA3A; SEQ ID NO:1) was included in an expression vector that further included a Cas9 nickase (SEQ ID NO:11) and a uracil-DNA-glycosylase inhibitor [Bacillus phage AR9] (SEQ ID NO:12). The Cas9 nickase contained a Asp10Ala mutation that inactivated its double strand nuclease activity, while allowing it to introduce a nick on one of the strands.

(63) The fusion vector, hA3A-nCas9-UGI (hA3A-BE, SEQ ID NO:21), and a sgRNA expression vector were co-transfected into eukaryotic cells (FIG. 1A) to perform C-to-T base editing at sgRNA target site in the genome. After PCR amplification of the target genomic DNA, the C-to-T base editing efficiency at targeted site in genome were determined through Sanger DNA sequencing. As illustrated in two sgRNA target sites (sgFANCF-M-L6 and sgSITE4), efficient C-to-T base editing was executed on C of GpC through co-expressing hA3A-BE and sgRNA, as compared to co-expressing BE3 (APOBEC1-nCas9-UGI) and sgRNA (FIG. 1B, dashed box).

(64) Next, mutations Y130F (SEQ ID NO:2) and Y132D (SEQ ID NO:3) were individually introduced into the hA3A gene in the construct, thereby generating the base editor hA3A-BE-Y130F or hA3A-BE-Y132D (FIG. 2A). The Y130F and Y132D mutations in hA3A-BE narrowed the window of base editing, and further improved the editing precision of hA3A-BE (FIG. 2B).

(65) Furthermore, the mutations W104A (SEQ ID NO:4) and D131Y (SEQ ID NO:5) were individually introduced into the hA3A gene of hA3A-BE, thereby generating the base editor hA3A-BE-W104A or hA3A-BE-D131Y (FIG. 3A). Both hA3A-BE-W104A and hA3A-BE-D131Y increased the efficiency of desired C to T base substitutions (FIG. 3B), achieving even higher efficiency of base editing as compared to hA3A-BE.

(66) In a further experiment, three amino acid changes (Y130E-D131E-Y132D, SEQ ID NO:22 or Y130E-D131Y-Y132D, SEQ ID NO:23) of human APOBEC3A (hA3A) in hA3A-BE3 (FIG. 4A) were tested and it was found that these two base editors (hA3A-BE-Y130E-D131E-Y132D and hA3A-BE-Y130E-D131Y-Y132D) have more narrowed editing windows (position 4-6 in target region) and therefore higher editing precision (FIG. 4B).

Example 2: Efficient Base Editing in Methylated Regions with a Human APOBEC3A-Cas9 Fusion

(67) Base editors (BEs) enable the generation of targeted single-nucleotide mutations, but currently used rat APOBEC1-based BEs are relatively inefficient in editing cytosines in highly-methylated regions or in GpC contexts. By screening a variety of APOBEC/AID deaminases, this example shows that human APOBEC3A-conjugated BEs and versions engineered to have narrower editing windows can mediate efficient C-to-T base editing in regions with high methylation levels and GpC dinucleotide content.

(68) Base editors (BEs), which combine a cytidine deaminase with Cas9 or Cpf1, have been successfully applied to perform targeted base editing, including C-to-T. Numerous human diseases have been reported to be driven by point mutations in genomic DNAs. With recently developed BEs, these disease-related point mutations can be potentially corrected, providing new therapeutic options. By analyzing disease-related T-to-C mutations that can be theoretically reverted to thymines by BEs, the example found that 43% of them are on cytosines in the context of CpG dinucleotides (FIG. 5a). It is well known that C of CpG is usually methylated in mammalian cells, and methylation of C strongly suppresses cytidine deamination catalyzed by some APOBEC/AID deaminases. This example shows that CpG dinucleotide methylation hinders the C-to-T base editing by current BEs and has successfully developed BEs for efficient C-to-T base editing in highly methylated regions.

(69) Methods and Materials

(70) Plasmid Construction

(71) Primer sets (hA3A_PCR_F/hA3A_PCR_R) were used to amplify the fragment Human_APOBEC3A with template pUC57-Human_APOBEC3A (synthesized by Genscript). Then the fragment Human_APOBEC3A was cloned into the SacI and SmaI linearized pCMV-BE3 (addgene, 73021) with plasmid recombination kit Clone Express (Vazyme, C112-02) to generate the hA3A-BE3 expression vector pCMV-hAPOBEC3A-XTEN-D10A-SGGS-UGI-SGGS-NLS. hA3B-BE3, hA3C-BE3, hA3D-BE3, hA3F-BE3, hA3G-BE3, hA3H-BE3, hAID-BE3, hA1-BE3, mA3-BE3, mAID-BE3, mA1-BE3, cAICDA-BE3, expression vectors were constructed with the same strategy. The pmCDA1 expression vector pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA (HPRT) was purchased from Addgene (79620).

(72) Primer sets (SupF_PCR_F/SupF_PCR_R) were used to amplify the fragment SupF with template shuttle vector pSP189. Then the fragment SupF was cloned into pEASY-ZERO-BLUNT (TransGen Biotech, CB501) to generate the vector pEASY-SupF-ZERO-BLUNT.

(73) Oligonucleotides SupF_sg1_FOR/SupF_sg1_REV and SupF_sg2_FOR/SupF_sg2_REV were annealed and ligated into BsaI linearized pGL3-U6-sgRNA-PGK-puromycin (addgene, 51133) to generate the sgRNA expression vectors psgSupF-1 and psgSupF-2 that target the SupF gene in pEASY-SupF-ZERO-BLUNT.

(74) Two primer sets (hA3A_PCR_F/hA3A_Y130F_PCR_R) (hA3A_Y130F_PCR_F/hA3A_PCR_R) were used to amplify the Y130E-containing fragment hA3A-Y130F. Then the fragment was cloned into the ApaI and SmaI linearized hA3A-BE3 expression vector to generate the hA3A-BE3-Y130F expression vector pCMV-hAPOBEC3A_Y130E-XTEN-D10A-SGGS-UGI-SGGS-NLS. hA3A-BE3-D131Y, hA3A-BE3-Y132D, hA3A-BE3-C101S and hA3A-BE3-C106S expression vectors were constructed with the same strategy.

(75) Primer sets (hA3A_PCR_F/hA3A_PCR_R) were used to amplify the fragment Human_APOBEC3A_Y130F with template hA3A-BE3-Y130F. Then the fragment Human_APOBEC3A_Y130F was cloned into the SacI and SmaI linearized pCMV-eBE-S3.sup.19 to generate the hA3A-eBE-Y130F expression vector pCMV-hAPOBEC3A_Y130F-XTEN-D10A-SGGS-UGI-SGGS-NLS-T2A-UGI-NLS-P2A-UGI-NLS-T2A-UGI-NLS. hA3A-eBE-Y132D expression vector was constructed by the similar way.

(76) Oligonucleotides hEMX1_FOR/hEMX1_REV were annealed and ligated into BsaI linearized pGL3-U6-sgRNA-PGK-puromycin to generate sgEMX1 expression vector psgEMX1. Other sgRNA expression vectors were constructed with the same strategy.

(77) Antibodies

(78) Antibodies were purchased from the following sources: against alpha-tubulin (T6199)Sigma; against Cas9 (ab204448)Abcam.

(79) Immunoblotting Analysis

(80) Protein samples were incubated at 95 C. for 20 min, separated by SDS-PAGE in sample loading buffer and proteins were transferred to nitrocellulose membranes (Thermo Fisher Scientific). After blocking with TBST (25 mM Tris pH 8.0, 150 mM NaCl, and 0.1% Tween 20) containing 5% (w/v) nonfat dry milk for 2 h, the membrane was reacted overnight with indicated primary antibody. After extensive washing, the membranes were reacted with HRP-conjugated secondary antibodies for 1 h. Reactive bands were developed in ECL (Thermo Fisher Scientific) and detected with Amersham Imager 600.

(81) Cell Culture and Transfection

(82) HEK293T cells from ATCC were maintained in DMEM (10566, Gibco/Thermo Fisher Scientific)+10% FBS (16000-044, Gibco/Thermo Fisher Scientific) and regularly tested to exclude Mycoplasma contamination.

(83) The dCas9-Suntag-TetCD system was used to induce targeted demethylation of the genomic regions with natively high levels of methylation, e.g., FANCF, MAGEA1 and MSSK1 regions. The dCas9-DNMT3a-DNMT31 system was used to induce targeted methylation of the genomic regions with natively low levels of methylation, e.g., VEGFA and PDL1 regions. HEK293T cells were transfected by using LIPOFECTAMINE 2000 (Life, Invitrogen) with 3 g pCAG-scFvGCN4sfGFPTET1CD (synthesized by Genscript) and 1 g sgRNA expression vector or with 3 g dCas9-DNMT3a-DNMT31 (synthesized by Genscript) and 1 g sgRNA expression vector. Blasticidin (10 g/ml, Sigma, 15205) and puromycin (1 g/ml, Merck, 540411) were added 24 h after transfection. One week later, a portion of cells were collected to determine DNA methylation level and others were stored in liquid nitrogen for base editing. The sgRNAs used to induce genomic DNA methylation/demethylation are the ones used to induce base editing.

(84) For base editing in genomic DNA, HEK293T cells were seeded in a 24-well plate at a density of 1.610.sup.5 per well and transfected with 200 l serum-free Opti-MEM that contained 5.04 l LIPOFECTAMINE LTX (Life, Invitrogen), 1.68 l LIPOFECTAMINE plus (Life, Invitrogen), 1 g BE3 expression vector (or hA3A-BE3, hA3A-BE3-Y130F, hA3A-BE3-D131Y, hA3A-BE3-Y132D, hA3A-BE3-C101S, hA3A-BE3-C106S, hA3A-eBE-Y130F, hA3A-eBE-Y132D expression vector) and 0.68 g sgRNA expression vector. After 72 hr, the genomic DNA was extracted from the cells with QuickExtract DNA Extraction Solution (QE09050, Epicentre) or the cells were lysed in 2SDS loading buffer for western blot.

(85) For base editing in plasmid vector, 293T cells were seeded in a 6-well plate at a density of 310.sup.5 per well and transfected with 500 l serum-free Opti-MEM that contained 4 l LIPOFECTAMINE LTX (Life, Invitrogen), 2 l LIPOFECTAMINE plus (Life, Invitrogen), 1 g BE3 expression vector (or hA3A-BE3, hA3B-BE3, hA3C-BE3, hA3D-BE3, hA3F-BE3, hA3G-BE3, hA3H-BE3, hAID-BE3, hA1-BE3, mA3-BE3, mAID-BE3, mA1-BE3, cAICDA-BE3 or pmCDA1 expression vector) and 0.5 g sgRNA expression vector. After 24 hr, these cells were transfected with 500 l serum-free Opti-MEM that contained 4 l LIPOFECTAMINE LTX, 2 l LIPOFECTAMINE plus and 1.5 g un-methylated (or methylated) pEASY-SupF-ZERO-BLUNT. After 48 hr, the plasmids were extracted from the cells with TIANprep Mini Plasmid Kit (DP103-A, TIANGEN) or the cells were lysed in 2SDS loading buffer for western blot.

(86) Bisulfite Sequencing Analysis

(87) Genomic DNA was isolated and treated with bisulfite according to the instruction of EZ DNA methylation-direct Kit (Zymo Research, D5021). The bisulfite-treated DNA was PCR-amplified with Tag Hot Start Version (Takara, R007B). The PCR products were ligated into T-Vector pMDTM19 (Takara, 3271). Eight clones were picked out and sequenced by Sanger sequencing (Genewiz). The primers used for bisulfite PCR were listed in Supplementary Table 2.

(88) Plasmid DNA Methylation

(89) For in vitro methylation, 1 l CpG methyltransferase (M.SssI, Life, EM0821) was used to methylate 2 l plasmid DNA in a 20 l reaction. After in vitro methylation, pEASY-SupF-ZERO-BLUNT was restricted with BstUI (NEB, R0518S) to determine the methylation level.

(90) Blue/White Colony Screening

(91) The plasmids extracted from transfected cells were transformed into E. coli strain MBM7070 (lacZ.sup.aug_amber), which were grown on LB plates containing 50 g/mlkanamycin, 1 mM IPTG and 0.03% Bluo-gal (Life, Invitrogen) at 37 C. overnight and then at room temperature for another day (for maximal color development). The cumulative base editing frequency is calculated by dividing the number of white colonies with the number of total colonies.

(92) DNA Library Preparation and Sequencing

(93) Target genomic sites were PCR amplified by high-fidelity DNA polymerase PrimeSTAR HS (Clonetech) with primers flanking each examined sgRNA target site. The PCR primers used to amplify target genomic sequences were listed in Supplementary Table 2. Indexed DNA libraries were prepared by using the TruSeq ChIP Sample Preparation Kit (Illumina) with some minor modifications. Briefly, the PCR products were fragmented by Covaris 5220 and then amplified by using the TruSeq ChIP Sample Preparation Kit (Illumina). After being quantitated with Qubit High-Sensitivity DNA kit (Life, Invitrogen), PCR products with different tags were pooled together for deep sequencing by using the Illumina NextSeq 500 (2150) or Hiseq X Ten (2150) at CAS-MPG Partner Institute for Computational Biology Omics Core, Shanghai, China. Raw read qualities were evaluated by FastQC. For paired-end sequencing, only R1 reads were used. Adaptor sequences and read sequences on both ends with Phred quality score lower than 28 were trimmed. Trimmed reads were then mapped with the BWA-MEM algorithm (BWA v0.7.9a) to target sequences. After being piled up with samtools (v0.1.18), indels and base substitutions were further calculated.

(94) Indel Frequency Calculation

(95) Indels were estimated in the aligned regions spanning from upstream eight nucleotides of the target site to downstream 19 nucleotides of PAM sites (50 bp). Indel frequencies were subsequently calculated by dividing reads containing at least one inserted and/or deleted nucleotide by all the mapped reads at the same region.

(96) Base Substitution Calculation

(97) Base substitutions were selected at each position of the examined sgRNA target sites that mapped with at least 1,000 independent reads, and obvious base substitutions were only observed at the targeted base editing sites. Base substitution frequencies were calculated by dividing base substitution reads by total reads.

(98) Calculation of BE-Targetable Genetic Variants

(99) The single nucleotide variants (SNVs) from NCBI ClinVar database were overlapped with the pathogenic human allele sequence from NCBI dbSNP database to calculate the pathogenic T-to-C and A-to-G mutations. In 3,089 pathogenic T-to-C or A-to-G mutations, 2,499 are potentially editable by SpCas9-BE3, SaCas9-BE3, dLbCpf1-BE or xCas9-BE3 with nearby PAM sequences. These 2,499 BE-targetable SNVs are further sub-classified according to their 3 adjacent base preferences, i.e., CpA, CpC, CpG and CpT (FIG. 5a).

(100) Statistical Analysis

(101) P values were calculated from one-tailed Student's t test in this study.

(102) Data Availability

(103) The deep-sequencing data from this study are deposited in the NCBI Gene Expression Omnibus (accession no. GSE114999) and the National Omics Data Encyclopedia (accession no. OEP000030).

(104) Results

(105) This example first examined the base editing efficiency of a commonly used BE, the rat APOBEC1 (rA1)-based BE3, in human cells having either increased or decreased levels of methylation. When DNA methylation was promoted by DNMT3 in regions with native low methylation levels, editing frequencies by BE3 decreased. In addition, when DNA methylation was reduced by TET1 in regions with native high methylation levels, BE3-induced editing frequencies increased accordingly. These results suggest that the canonical rA1-based BE3 is less efficient in editing cytosines embedded in highly methylated genomic regions. Notably, C-to-T editing was suppressed by DNA methylation at both CpG and flanking non-CpG sites (median decrement28%, P=210.sup.8 for CpG sites and 51%, P=710.sup.40 for flanking non-CpG sites). APOBECs deaminate cytidines on single-stranded DNA in a processive manner. CpG methylation may affect the sliding of APOBEC and therefore impairs its binding on the flanking non-CpG sites for deamination.

(106) To screen for efficient base editing in high-methylation background, a series of BEs was obtained by fusing Cas9 nickase with fifteen different APOBEC/AID deaminases (FIG. 5b). This example tested these BEs then in an E. coli-derived vector system (FIG. 5b), which has been previously used to probe mutations. In unmethylated vectors, these BEs showed varied levels of base editing. The BEs containing human APOBEC3A (hA3A-BE3, mean editing frequency39%), human APOBEC3B (hA3B-BE3, mean editing frequency33%) or human AID (hAID-BE3, mean editing frequency28%) mediated base editing at levels that are comparable to BE3 (mean editing frequency31%) (FIG. 5c). Whereas in methylated vectors, only hA3A-BE3 induced efficient base editing (mean editing frequency35%), compared to relatively low editing efficiencies induced by BE3 (mean editing frequency12%) or other examined BEs (mean editing frequencies1%-20%) (FIG. 5c). Of note, protein products of hA3A-BE3, BE3 and other examined BEs are comparable (FIG. 5d).

(107) Similar to the observation in E. coli-derived vectors, hA3A-BE3 exhibited significantly higher base editing frequencies than rA1-based BE3 in all tested genomic regions, either those with a native high-methylation background (median1.7-fold, P=210.sup.10, FIG. 5e,f) or those with an induced high-methylation condition (median1.8-fold, P=510.sup.4). Thus, using hA3A as the deaminase module in BE could generally achieve high base editing efficiency in genomic regions with high methylation levels.

(108) The base editing on cytosines in a GpC context was observed to be generally inefficient by rA1-based BEs. While, this example found that hA3A-BE3 could induce efficient base editing on most of cyto sines at GpC sites in both endogenously and induced high-methylation backgrounds (FIG. 5e). This example further compared their editing efficiencies under both endogenously and induced low-methylation backgrounds and observed a similar superiority of hA3A-BE3 over BE3 on editing cytosines in the GpC context (FIG. 5g,h). Statistical analysis confirmed that the base editing efficiency induced by hA3A-BE3 was significantly higher than that induced by BE3 on cytosines in the GpC context in either high(median2.3-fold, P=110.sup.5) or low(median1.8-fold, P=610.sup.9) methylation conditions. Notably, hA3A-BE3-mediated base editing was as efficient as BE3 at cytosines in non-GpC contexts in all tested low-methylation regions (median1.1-fold, P=0.045). This example also found that hA3A-BE3 yielded less non-C-to-T conversion than BE3 in both high(median97% by hA3A-BE3 comparing to 94% by BE3, P=310) and low-methylation regions (median92% by hA3A-BE3 comparing to 90% by BE3, P=410.sup.6). Both BE3 and hA3A-BE3 induced less non-C-to-T conversion at CpG sites with high methylation status than at CpG sites with low methylation status (median95% vs90%, P=310.sup.5 for BE3 and median95% vs92%, P=510.sup.4 for hA3A-BE3). This example also found that hA3A-BE3 induced higher indel frequencies than BE3 (median2 in both high- and low-methylation regions). Such an increase may be caused by the high deaminase activity of hA3A, which can trigger downstream DNA repair pathways to generate DNA double strand breaks.

(109) The results suggest that hA3A-BE3 can efficiently induce base editing in a broader scope (FIG. 5). However, the editing window of hA3A-BE3 is wider (12 nt, position 2-13 in the sgRNA target site) than that of BE3 (5 nt, position 4-8). As the wide editing window of hA3A-BE3 may result from the high deaminase activity of hA3A, mutations in hA3A that can reduce deaminase activity might correspondingly narrow the editing window of hA3A-BE3. Designated mutations (Y130F, D131Y or Y132D) successfully narrowed the editing window with little effect on the base editing efficiency, whereas mutations in the zinc-coordination motif almost completely eliminated the deaminase activity (C101S and C106S).

(110) This example then focused on two engineered hA3A-BE3s (hA3A-BE3-Y130F and hA3A-BE3-Y132D), which have similar editing windows (position 3-8 for hA3A-BE3-Y130F and position 3-7 for hA3A-BE3-Y132D) as BE3 (position 4-8). In highly-methylated regions, hA3A-BE3-Y130F and hA3A-BE3-Y132D induced higher editing efficiencies than BE3 at all editable sites in overlapping editing windows (position 4-7) (FIG. 6a, cytosines in pink and FIG. 6b, median2.3 fold, P=0.002 for hA3A-BE3-Y130F and median1.2 fold, P=0.03 for hA3A-BE3-Y132D). For cytosines outside of overlapping editing windows, hA3A-BE3-Y132D induced C-to-T editing frequencies similar to BE3 while hA3A-BE3-Y130F induced higher editing frequencies (FIG. 6a, cytosines in black). Similar to the original hA3A-BE3, both engineered hA3A-BE3-Y130F and hA3A-BE3-Y132D edited cytosines in GpC contexts more efficiently than BE3 in overlapping editing windows (FIG. 6c,d, median2.3 fold, P=310.sup.5 for hA3A-BE3-Y130F and median1.9 fold, P=0.002 for hA3A-BE3-Y132D). Protein expression levels of hA3A-BE3-Y130F and hA3A-BE3-Y132D were very similar to that of BE3 (FIG. 6e), though the two engineered hA3A-BEs induced higher C-to-T editing efficiencies (FIG. 6b,d). In terms of product purity, we found that hA3A-BE3-Y130F yielded less non-C-to-T conversion (median96.3% by hA3A-BE3-Y130F comparing to 95.6% by BE3, P=0.03 in high-methylation regions, median92% by hA3A-BE3-Y130F comparing to 90% by BE3, P=0.002 in low-methylation regions) but more indels (median2.1 fold, P=0.0002 in high-methylation regions, median1.3 fold in low-methylation regions, P=0.12) than BE3. The product purity induced by hA3A-BE3-Y132D was higher than BE3 in native low-methylation regions (median93% by hA3A-BE3-Y132D comparing to 90% by BE3, P=0.001), but lower in native high-methylation regions (median94.9% by hA3A-BE3-Y132D comparing to 95.6% by BE3, P=0.03). Nevertheless, indel frequencies induced by hA3A-BE3-Y132D were comparable to those induced by BE3 at all tested sites (median1.2 fold in both high- and low-methylation regions).

(111) To further enhance C-to-T base editing system, three copies of the 2A-uracil DNA glycosylase inhibitor (UGI) sequence were fused to the C-terminus of hA3A-BE3-Y130F and hA3A-BE3-Y132D to develop hA3A-eBE-Y130F and hA3A-eBE-Y132D. In low-methylation regions, hA3A-eBE-Y130F and hA3A-eBE-Y132D induced significantly higher editing efficiencies (FIG. 6f,g, median1.2 fold, P=0.0004 for hA3A-eBE-Y130F and median1.2 fold, P=0.004 for hA3A-eBE-Y132D), higher product purity (FIG. 6h, median96% by hA3A-eBE-Y130F comparing to 94% by hA3A-BE3-Y130F, P=0.006 and median96% by hA3A-eBE-Y132D comparing to 92% by hA3A-BE3-Y132D, P=0.004) and lower indel frequencies (FIG. 6i, median decrement21%, P=410.sup.5 for hA3A-eBE-Y130F and median decrement9%, P=0.03 for hA3A-eBE-Y132D) than hA3A-BE3-Y130F and hA3A-BE3-Y132D, respectively. In high-methylation regions, hA3A-eBE-Y130F and hA3A-eBE-Y132D induced significantly higher product purity (median97% by hA3A-eBE-Y130F comparing to 95% by hA3A-BE3-Y130F, P=0.003 and median97% by hA3A-eBE-Y132D comparing to 95% by hA3A-BE3-Y132D, P=0.003) and lower indel frequencies (median decrement23%, P=210.sup.7 for hA3A-eBE-Y130F and median decrement21%, P=410.sup.5 for hA3A-eBE-Y132D) than hA3A-BE3-Y130F and hA3A-BE3-Y132D, respectively, though editing efficiencies remained the same (median1 fold for hA3A-eBE-Y130F and hA3A-eBE-Y132D). Together, these results indicated that hA3A-BE3-Y130F, hA3A-BE3-Y132D, hA3A-eBE-Y130F and hA3A-eBE-Y132D can mediate highly efficient base editing in narrowed editing windows compared to the original hA3A-BE3 in all examined contexts.

(112) Here, this example demonstrates that hA3A-BE3 and its engineered forms, can comprehensively induce efficient base editing in all examined contexts, including both methylated DNA regions and GpC dinucleotides. It is contemplated that hA3A can also be conjugated with other Cas proteins to further expand the scope of base editing.

Example 3. Gene Editing of Human DYRK1A with dCas12a-hA3A Base Editors

(113) This example tested base editors that combined a Cas12a (Cpf1) and various mutant human A3A proteins.

(114) Methods

(115) Construction of dCas12a-hA3A-BE Expression Vector

(116) pUC57-hA3A (synthesized by Genscript Biotechnology Co., Ltd.) was used as a template, using suitable primers. PCR was carried out to obtain the coding sequence of hA3A, and a fragment homologous to the linearized vector at both ends was subjected to gel electrophoresis purification. After purification by gel electrophoresis, the fragment was recombined into the linearized dCas12a-BE vector produced by SacI and SmaI by plasmid recombinant kit Clone Express to obtain expression vector dCas12a-hA3A-BE.

(117) Construction of dCas12a-hA3A-BE-W98Y Expression Vector

(118) Using dCas12a-hA3A-BE as a template, two PCR products with a W98Y mutation and a homology arm, and a homologous segment with a linearized vector. After purification by gel electrophoresis, the two fragments were simultaneously recombined into the linearized dCas12a-hA3A-BE vector generated by ApaI and SmaI using plasmid recombinant kit Clone Express to obtain expression vector dCas12a-hA3A-BE-W98Y.

(119) Likewise, expression vectors dCas12a-hA3A-BE-W104A, dCas12a-hA3A-BE-P134Y, dCas12a-hA3A-BE-W98Y-W104A, dCas12a-hA3A-BE-W98Y-P134Y, dCas12a-hA3A-BE-W104A-P134Y, dCas12a-hA3A-BE-W98Y-W104A-Y130F, dCas12a-hA3A-BE-W98Y-W104A-Y132D, dCas12a-hA3A-BE-W104A-Y130E-P134Y, and dCas12a-hA3A-BE-W104A-Y132D-P134Y. Relevant sequences are shown in Tables 1 and 2.

(120) Construction of gRNA Expression Plasmid

(121) The nucleotide sequence was annealed to primers and the annealed product was ligated into the gRNA expression vector pLb-Cas12a-pGL3-U6-sgRNAdigested with restriction endonuclease BsaI using T4 DNA ligase. gRNA expression plasmid sgDYRK1A targeting human DYRK1A site was obtained.

(122) Eukaryotic Cell Transfection

(123) The sgDYRK1A and each of dCas12a-hA3A-BE, dCas12a-hA3A-BE-W98Y, dCas12a-hA3A-BE-W104A, dCas12a-hA3A-BE-P134Y, dCas12a-hA3A-BE-W98Y-W104A, dCas12a-hA3A-BE-W98Y-P134Y, dCas12a-hA3A-BE-W104A-P134Y, dCas12a-hA3A-BE-W98Y-W104A-Y130F, dCas12a-hA3A-BE-W98Y-W104A-Y132D, dCas12a-hA3A-BE-W104A-Y130E-P134Y, dCas12a-hA3A-BE-W104A-Y132D-P134Y were mixed into 200 l Opti-MEM at a ratio of 0.68 ug: 1 g, added with 1.68 l of LIPOFECTAMINE plus, and 5.04 l of LIPOFECTAMINE LTX was added, and allowed to stand at room temperature for 5 minutes. 500 l DMEM (+10% FBS) medium was add for 24-well plates and transfected HEK293T cells 160,000. After 12 h, replaced with fresh medium containing 1% double antibody (cyanin). The cells were harvested after 60 hours of incubation.

(124) EditR Analysis of Sanger Sequencing Results

(125) DNA sanger sequencing results were analyzed using EditR software (moriaritylab.shinyapps.io/editr_v10/). EditR is a web version of the sanger sequencing result analysis software developed in 2018 (Kluesner M G, Nedveck D A, Lahr W S, et al. EditR: A Method to Quantify Base Editing from Sanger Sequencing [J]. The CRISPR Journal, 2018, 1(3): 239-250.). EditR is a simple, accurate and efficient analytical tool for processing the sequencing results of DNA samples based on the sgRNA sequence by using the sanger sequencing signal, and finally outputting the base editing efficiency at the sgRNA target site.

(126) The sequencing results are shown FIGS. 11 and 12. The EditR analysis results are presented in FIGS. 7 and 8. When fused to the conventional cytosine deaminase, A1 (APOBEC1), Cas12a (cpf1) exhibited poor efficiency (see, e.g., FIG. 7B, the first column in each group). The combination with the hA3A wild-type protein greatly increased the editing efficiency (see, e.g., the second column). Interestingly, the A3A mutation W98Y, W104A, P134Y or the combination of each two further increased the editing efficiency (FIG. 7). Also, the editing window such a Cas12a-A3A can be narrowed to achieve more precise editing when the mutation Y130F or Y132D is further included in A3A (FIG. 8).

Example 4. Gene Editing of Human SITE6 with dCas12a-hA3A Base Editors

(127) This example tested various indicated base editors with the human gene SITE6.

(128) The experimental procedure is similar to Example 3. The sequencing results are shown in detail in FIGS. 15 and 16 (two replicates of experimental data). The EditR analysis results are shown in FIGS. 9 and 10. Like in Example 3, the Cas12a-A3A editor had greater editing efficiency than the Cas12a-A1 and the A3A mutation W98Y, W104A, P134Y or the combination of each two further increased the editing efficiency (FIG. 9). Also, the editing window such a Cas12a-A3A can be narrowed to achieve more precise editing when the mutation Y130F or Y132D is further included in A3A (FIG. 10).

Example 5. Gene Editing of Human RUNX1 with dCas12a-hA3A Base Editors

(129) This example tested various indicated base editors with the human gene RUNX1.

(130) The experimental procedure is similar to Example 3. The sequencing results are shown in detail in FIGS. 17 and 18 (two replicates of experimental data). The EditR analysis results are shown in FIGS. 11 and 12. Like in Example 3, the Cas12a-A3A editor had greater editing efficiency than the Cas12a-rA1, and the A3A mutation W98Y, W104A, P134Y or the combination of each two further increased the editing efficiency (FIG. 11). Also, the editing window such a Cas12a-A3A can be narrowed to achieve more precise editing when the mutation Y130F or Y132D is further included in A3A (FIG. 12).

(131) The present disclosure is not to be limited in scope by the specific embodiments described which are intended as single illustrations of individual aspects of the disclosure, and any compositions or methods which are functionally equivalent are within the scope of this disclosure. It will be apparent to those skilled in the art that various modifications and variations can be made in the methods and compositions of the present disclosure without departing from the spirit or scope of the disclosure. Thus, it is intended that the present disclosure cover the modifications and variations of this disclosure provided they come within the scope of the appended claims and their equivalents.

(132) All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference

Fusion proteins for base editing

Assignee

Inventors

Cpc classification

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/01

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/907

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/78

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N2800/80

CHEMISTRY; METALLURGY

Classification Explorer

C12Y305/04

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/00

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N9/78

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/01

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/90

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Abstract

Claims

Description