CRISPR/CAS12F ENZYME AND SYSTEM

Abstract

The application belongs to the field of nucleic acid editing, in particular to the field of clustered regularly interspaced short palindromic repeats (CRISPR) technology. In particular, the application provides a Cas effector protein, a fusion protein with the Cas effector protein, and a nucleic acid molecule encoding the same. Also provided are a compound and a composition for nucleic acid editing (e.g., gene or genome editing) with the protein or the nucleic acid molecule, and a method for nucleic acid editing (e.g., gene or genome editing) using the protein.

Claims

1.-50. (canceled)

51. A protein comprising an amino acid sequence of any one of SEQ ID NOs: 1, 2, or 3 with one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions compared to SEQ ID NOs: 1, 2, or 3) and of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to any one of SEQ ID NOs: 1, 2, and 3; for example, the protein is an effector protein in the CRISPR/Cas system.

52. A conjugate or a fusion protein comprising the protein of claim 51 and a modified portion, an additional protein, or an additional polypeptide, wherein the modified portion, the additional protein, or the additional polypeptide is selected from a protein, a polypeptide, a detectable label, an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcription activation domain (for example, VP64), a transcription repression domain (for example, KRAB domain or SID domain), a nuclease domain (for example, Fok1), a domain having an activity selected from: nucleotide deaminase, methylase activity, demethylase, transcription activation activity, transcription inhibition activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity, and nucleic acid binding activity, and any combinations thereof.

53. The conjugate or the fusion protein of claim 52, wherein the modified portion, the additional protein, or the additional polypeptide is connected to the N-terminus or C-terminus of the protein through a linker.

54. The conjugate or the fusion protein claim 52, wherein the conjugate or the fusion protein comprises an NLS sequence and wherein the NLS sequence is shown in SEQ ID NO: 19 and/or the NLS sequence is located at, near, or close to the end of the protein (e.g., N-terminal or C-terminal) or wherein the fusion protein has an amino acid sequence as shown in SEQ ID NO:20

55. An isolated nucleic acid molecule comprising a sequence selected from the following or consisting of a sequence selected from the following: (i) a sequence of SEQ ID NO: 7 or 13 with one or more base substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions or additions compared to SEQ ID NO: 7 or 13) and of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% sequence identity with the sequence as shown in SEQ ID NO:7 or 13; (ii) a sequence that hybridizes to the sequence as described in (i) under stringent conditions; or (iii) a complementary sequence of the sequence as described in (i); in addition, the sequence as described in any one of (i)-(iii) substantially retains the biological function of the sequence from which it is derived; for example, the isolated nucleic acid molecule is RNA; for example, the isolated nucleic acid molecule is a direct repeat sequence in the CRISPR/Cas system.

56. The isolated nucleic acid molecule of claim 55, wherein the nucleic acid molecule comprises one or more stem loops or optimized secondary structures; for example, the sequence as described in any one of (i) to (iii) retains the secondary structure of the sequence from which it is derived.

57. A complex comprising: (i) a protein component having an amino acid sequence of SEQ ID NOs: 1, 2, or 3 with one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions compared to SEQ ID NOs: 1, 2, or 3) and of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to any one of SEQ ID NOs: 1, 2, and 3; and (ii) a nucleic acid component, which comprises the isolated nucleic acid molecule of claim 55 and a targeting sequence capable of hybridizing to the target sequence from 5′ to 3′ direction, wherein the protein component and the nucleic acid component combine with each other to form a complex; for example, the nucleic acid component is a guide RNA in the CRISPR/Cas system; for example, the nucleic acid molecule is RNA; for example, the complex does not contain trans-activating crRNA (tracrRNA).

58. The complex of claim 57, wherein the targeting sequence is attached to the 3′ end of the nucleic acid molecule or wherein the targeting sequence comprises a complementary sequence of the target sequence.

59. An isolated nucleic acid molecule comprising: (i) a nucleotide sequence encoding the protein component of claim 57; (ii) a nucleotide sequence encoding the nucleic acid component of claim 57; and/or (iii) a nucleotide sequence containing (i) and (ii); for example, the nucleotide sequence described in any one of (i) to (iii) is codon-optimized for expression in a prokaryotic cell or an eukaryotic cell.

60. A vector comprising the isolated nucleic acid molecule of claim 59.

61. A host cell comprising the vector of claim 60.

62. A composition comprising: (i) a first component, which is selected from: a protein having an amino acid sequence of any one of SEQ ID NOs: 1, 2, or 3 with one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions compared to SEQ ID NOs: 1, 2, or 3) and of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to any one of SEQ ID NOs: 1, 2, and 3, a nucleotide sequence encoding the protein, and any combinations thereof; and (ii) a second component, which is a nucleotide sequence containing a guide RNA, or a nucleotide sequence encoding the nucleotide sequence containing a guide RNA; wherein the guide RNA includes a direct repeat sequence and a targeting sequence from the 5′ to 3′, and the targeting sequence can hybridize with the target sequence; the targeting RNA can form a complex with the protein, conjugate or fusion protein as described in (i); the direct repeat sequence is an isolated nucleic acid molecule as defined claim 55; for example, the composition does not contain a trans-activating crRNA (tracrRNA).

63. A composition comprising one or more vectors comprising: (i) a first nucleic acid, which is a nucleotide sequence encoding a protein having an amino acid sequence of any one of SEQ ID NOs: 1, 2, or 3, with one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions compared to SEQ ID NOs: 1, 2, or 3) and of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to any one of SEQ ID NOs: 1, 2, and 3; optionally, the first nucleic acid is operationally linked to a first regulatory element; and (ii) a second nucleic acid, which encodes a nucleotide sequence comprising a guide RNA; optionally the second nucleic acid is operationally linked to a second regulatory element; wherein: the first nucleic acid and the second nucleic acid are present on the same or different vectors; the guide RNA includes a direct repeat sequence and a targeting sequence from the 5′ to 3′, and the targeting sequence can hybridize with the target sequence; the guide RNA can form a complex with the effector protein or fusion protein as described in (i); the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 55; for example, the composition does not contain a trans-activating crRNA (tracrRNA).

64. The composition of claim 63, wherein the first regulatory element and/or the second regulatory element is a promoter, such as an inducible promoter and/or wherein at least one component of the composition is non-naturally occurring or modified and/or wherein the targeting sequence is connected to the 3′ end of the direct repeat sequence and/or wherein the targeting sequence comprises a complementary sequence of the target sequence and/or wherein when the target sequence is DNA, the target sequence is located at the 3′ end of the protospacer adjacent motif (PAM), and the PAM has a sequence shown by 5′-TTN, wherein N is selected from A, G, T, C; when the target sequence is RNA, the target sequence does not have PAM domain restrictions and/or wherein the target sequence is a DNA or RNA sequence derived from a prokaryotic cell or an eukaryotic cell or wherein the target sequence is a non-naturally occurring DNA or RNA sequence and/or wherein the target sequence is present in a cell; for example, the target sequence is present in a nucleus or in a cytoplasm (e.g., organelle); for example, the cell is an eukaryotic cell; for example, the cell is a prokaryotic cell and/or wherein the protein is linked to one or more NLS sequences, wherein the NLS sequence is connected to the N-terminus or C-terminus of the protein or the NLS sequence is fused to the N-terminus or C-terminus of the protein.

65. A kit comprising the protein of claim 51.

66. A delivery composition comprising a delivery vehicle and the protein of claim 51, wherein the delivery vehicle is selected from a particle, a lipid particle, sugar particle, metal particle, protein particle, liposome, exosome, microvesicle, gene gun, or viral vector (e.g., replication defective retrovirus, lentivirus, adenovirus or adeno-associated virus).

67. A method for modifying a target gene, comprising contacting the complex of claim 57 with the target gene, or delivering that to a cell containing the target gene, wherein the target sequence is present in the target gene, wherein the target gene is present in a cell selected from a prokaryotic cell an eukaryotic cell, a mammalian cell, a human cell, and a plant cell and/or wherein the target gene is present in a nucleic acid molecule (e.g., a plasmid) in vitro and/or wherein the modification refers to a break in the target sequence, such as a double-strand break in DNA or a single-strand break in RNA; for example, the modification also includes the insertion of an exogenous nucleic acid into the break.

68. A method for altering the expression of a gene product, comprising combining the complex of claim 57 with a nucleic acid molecule encoding the gene product, or delivering that to a cell containing the nucleic acid molecule in which the target sequence is present, wherein the nucleic acid molecule is present in a cell selected from a prokaryotic cell, an eukaryotic cell, a mammalian cell, a human cell, and a plant cell and/or wherein the nucleic acid molecule is present in a nucleic acid molecule (e.g., a plasmid) in vitro and/or wherein the expression of the gene product is altered (e.g., enhanced or reduced), preferably the gene product is a protein and/or wherein the protein, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle; wherein the delivery vehicle is selected from a lipid particle, sugar particle, metal particle, protein particle, liposome, exosome, viral vector (such as replication-defective retrovirus, lentivirus, adenovirus or adeno-associated virus) and/or wherein the method is used to change one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product to modify a cell, cell line, or organism.

69. A cell or progeny thereof obtained by the method of claim 67, wherein the cell contains a modification that is not present in its wild type.

70. A cell product of the cell or progeny thereof of claim 69.

71. An in vitro, isolated or in vivo cell or cell line or progeny thereof, the cell or cell line or the progeny thereof comprising the protein of claim 51; for example, the cell is an eukaryotic cell; for example, the cell is an animal cell (for example, a mammalian cell, such as a human cell) or a plant cell; for example, the cell is a stem cell or stem cell line.

72. Use of the protein of claim 51 in a nucleic acid editing (for example, gene or genome editing); for example, the gene or genome editing includes modifying genes, knocking out genes, altering the expression of gene products, repairing mutations, and/or inserting polynucleotides.

73. Use of the protein of claim 51 in the preparation of a preparation for (i) isolated gene or genome editing; (ii) detection of an isolated single-stranded DNA; (iii) editing a target sequence in a target locus to modify a biological or non-human organism; or (iv) treating a disease caused by defects in the target sequence in the target locus.

Description

DESCRIPTION OF THE DRAWINGS

[0196] FIG. 1 is the result of the crRNA structure analysis of Cas12f.4, Cas12f.5 and Cas12f.6 in Example 2, showing the secondary structure of the Repeat sequence.

[0197] FIG. 2 shows the analysis result of the PAM domain in Example 3.

[0198] FIGS. 3a-FIG. 3c are the results of the detection of the cleavage activity of Cas12f.4 in a human cell line in Example 4.

[0199] FIGS. 4a-FIG. 4c are the results of the detection of the cleavage activity of Cas12f.4 in a maize protoplast cell in Example 5.

SEQUENCE INFORMATION

[0200] Information on partial sequences involved in the present invention is provided in Table 1 below.

TABLE-US-00001 TABLE 1 Description of the sequence SEQ ID NO: Description 1 an amino acid sequence of Cas12f.4 2 an amino acid sequence of Cas12f.5 3 an amino acid sequence of Cas12f.6 4 a coding nucleic acid sequence of Cas12f.4 5 a coding nucleic acid sequence of Cas12f.5 6 a coding nucleic acid sequence of Cas12f.6 7 Cas12f.4/prototype direct repeat 8 Cas12f.5/prototype direct repeat 9 Cas12f.6/prototype direct repeat 10 Cas12f.4/a coding nucleic acid sequence of prototype direct repeat 11 Cas12f.5/a coding nucleic acid sequence of prototype direct repeat 12 Cas12f.6/a coding nucleic acid sequence of prototype direct repeat 13 Cas12f.4/mature direct repeat 14 Cas12f.5/mature direct repeat 15 Cas12f.6/mature direct repeat 16 Cas12f.4/a coding nucleic acid sequence of mature direct repeat 17 Cas12f.5/a coding nucleic acid sequence of mature direct repeat 18 Cas12f.6/a coding nucleic acid sequence of mature direct repeat 19 NLS sequence 20 an amino acid sequence of Cas12f.4-NLS fusion protein 21 an amino acid sequence of Cas12f.5-NLS fusion protein 22 an amino acid sequence of Cas12f.6-NLS fusion protein 23 a plasmid expressing Cas12f.4 system 24 PAM library sequence 25 guide RNA-VEGFA of Cas12f.4 system 26 guide RNA-VEGFA of Cas12f.5 system 27 guide RNA-VEGFA of Cas12f.6 system 28 guide RNA-PDI1 of Cas12f.4 system 29 guide RNA-SBE2.2 of Cas12f.4 system

DETAILED DESCRIPTION

[0201] The invention will now be described with reference to the following examples which are intended to illustrate the present invention rather than limit the present invention.

[0202] Unless otherwise specified, the experiments and methods described in the examples are basically performed according to conventional methods well known in the art and described in various references. For example, conventional techniques such as immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA used in the present invention can be found in Sambrook, Fritsch and Maniatis, “MOLECULAR CLONING: A LABORATORY MANUAL”, 2nd edition (1989); “CURRENT PROTOCOLS IN MOLECULAR BIOLOGY” (edited by F. M. Ausubel et al., (1987)); “METHODS IN ENZYMOLOGY” series (Academic Publishing Company): “PCR 2: A PRACTICAL APPROACH” (edited by M. J. MacPherson, B D Hames and G. R. Taylor (1995)), “ANTIBODIES, A LABORATORY MANUAL”, edited by Harlow and Lane (1988), and “ANIMAL CELL CULTURE” (edited by R. I. Freshney (1987)).

[0203] In addition, if the specific conditions are not specified in the examples, it shall be carried out in accordance with the conventional conditions or the conditions recommended by the manufacturer. The reagents or instruments used without the manufacturer's indication are all conventional products that can be purchased commercially. Those skilled in the art know that the embodiments describe the present invention by way of example, and are not intended to limit the scope of protection claimed by the present invention. All publications and other references mentioned in this article are incorporated into this article by reference in their entirety.

[0204] The sources of some reagents involved in the following examples are as follows:

[0205] LB liquid medium: 10 g Tryptone, 5 g Yeast Extract, 10 g NaCl, diluted to 1 L, and sterilized. If antibiotics are needed, they are added at a final concentration of 50 μg/ml after cooling the medium.

[0206] Chloroform/isoamyl alcohol: adding 240 ml of chloroform to 10 ml of isoamyl alcohol and mixing them well.

[0207] RNP buffer: 100 mM sodium chloride, 50 mM Tris-HCl, 10 mM MgCl.sub.2, 100 μg/ml BSA, pH 7.9.

[0208] The prokaryotic expression vectors pACYC-Duet-1 and pUC19 are purchased from Beijing Quanshijin Biotechnology Co., Ltd.

[0209] E. coli competence EC100 is purchased from Epicentre company.

Example 1. Acquisition of Cas12f Gene and Cas12f Guide RNA

[0210] 1. CRISPR and gene annotation: Using Prodigal to perform gene annotation on the microbial genome and metagenomic data of NCBI and JGI databases to obtain all proteins and at the same time, using Piler-CR to annotate CRISPR locus. All parameters are the default parameters.

[0211] 2. Protein filtering: Eliminating redundancy of annotated proteins by sequence identity, removing proteins with exactly identical sequence, and at the same time classifying proteins longer than 800 amino acids into macromolecular proteins. Since all the effector proteins of the second type of CRISPR/Cas system discovered so far are more than 900 amino acids in length, in order to reduce the computational complexity, when we explore CRISPR effector proteins, we only consider macromolecular proteins.

[0212] 3. Obtaining CRISPR-associated macromolecular proteins: extending each CRISPR locus by 10 Kb upstream and downstream, and identifying non-redundant macromolecular proteins in the adjacent interval of CRISPR.

[0213] 4. Clustering of CRISPR-associated macromolecular proteins: using BLASTP to perform internal pairwise comparisons of non-redundant macromolecular CRISPR-associated proteins, and output the comparison result of Evalue<1E-10. Using MCL to perform cluster analysis on the output result of BLASTP, CRISPR-associated protein family.

[0214] 5. Identification of CRISPR-enriched macromolecular protein family: using BLASTP to compare the proteins of the CRISPR-associated protein family to the non-redundant macromolecular protein database that removes the CRISPR-associated proteins and output the comparison result of Evalue<1E-10. If the homologous protein found in a non-CRISPR-related protein database is less than 100%, it means that the proteins of this family are enriched in the CRISPR region. In this way, we identify the CRISPR-enriched macromolecular protein family.

[0215] 6. Annotation of protein functions and domains: using the Pfam database, NR database and Cas protein collected from NCBI to annotate the CRISPR-enriched macromolecular protein family to obtain a new CRISPR/Cas protein family. Using Mafft to perform multiple sequence alignments for each CRISPR/Cas family protein, and then using JPred and HHpred to perform conserved domain analysis to identify protein families containing RuvC domains.

[0216] On this basis, the present inventors have obtained a new Cas effector protein, namely Cas12f, named Cas12f.4 (SEQ ID NO: 1), Cas12f.5 (SEQ ID NO: 2) and Cas12f.6 (SEQ ID NO: 3), respectively with its three active homologue sequences. the coding DNA of the three homologues are shown in SEQ ID NOs: 4, 5, and 6, respectively. The prototype direct repeat sequences (repeat sequences contained in pre-crRNA) corresponding to Cas12f.4, Cas12f.5, and Cas12f.6 are shown in SEQ ID NOs: 7, 8, and 9, respectively. The mature direct repeat sequences (repeat sequences contained in mature crRNA) corresponding to Cas12f.4, Cas12f.5, and Cas12f.6 are shown in SEQ ID NOs: 13, 14, and 15, respectively.

Example 2. Processing of Mature crRNA by Cas12f Gene

[0217] 1. The double-stranded DNA molecule as shown in SEQ ID NO: 4 was artificially synthesized, and the double-stranded DNA molecule as shown in SEQ ID NO: 10 was artificially synthesized at the same time.

[0218] 2. Connecting the double-stranded DNA molecule synthesized in step 1 with the prokaryotic expression vector pACYC-Duet-1 to obtain the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f.

[0219] The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f was sequenced. Sequencing results show that the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f contains the sequences as shown in SEQ ID NO: 4 and SEQ ID NO: 10, and expresses the Cas12f.4 protein as shown in SEQ ID NO: 1 and the Cas12f.4 prototype direct repeat sequence as shown in SEQ ID NO: 7. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f was introduced into E. coli EC100 to obtain a recombinant bacteria, which was named EC100-CRISPR/Cas12f.

[0220] 3. Taking a single clone of EC100-CRISPR/Cas12f, inoculating it into 100 mL LB liquid medium (containing 50 μg/mL ampicillin), culturing with shaking at 37° C. and 200 rpm for 12 h to obtain a culture broth.

[0221] 4. Extracting bacterial RNA: transferring 1.5 mL of bacterial culture to a pre-cooled microcentrifuge tube and centrifuged at 6000×g for 5 minutes at 4° C. After centrifugation, discarding the supernatant, and resuspendings the cell pellet in 2004, Max Bacterial Enhancement Reagent preheated to 95° C. Mixed by pipetting and mixed well, and incubated at 95° C. for 4 minutes. Adding 1 mL of TRIzol® Reagent to the lysate and mixed by pipetting and incubated at room temperature for 5 minutes. Adding 0.2 mL cold chloroform, shaking the tube by hand to mix for 15 seconds, and incubated at room temperature for 2-3 minutes. Centrifuged at 12,000×g for 15 minutes at 4° C. Taking 600 μL of supernatant in a new tube, adding 0.5 mL of cold isopropanol to precipitate RNA, mixed upside down, and incubated at room temperature for 10 minutes. Centrifuged at 15,000×g for 10 minutes at 4° C., discarding the supernatant, adding 1 mL of 75% ethanol, and for the vortex to mix. Centrifuged at 7500×g for 5 minutes at 4° C., discarding the supernatant, and for the air dry. Dissolving the RNA pellet in 504, RNase-free water and incubated at 60° C. for 10 minutes.

[0222] 5. DNA digestion: 20 ug RNA was dissolved in 39.5 μL, dH.sub.2O, 65° C., 5 min. 5 min on ice, adding 0.5 μL RNAI, 5 μL, buffer, 5 μL, DNaseI, 37° C. for 45 min (50 μL system). Adding 50 μL dH.sub.2O and adjusting the volume to 100 μL. After centrifuging the 2 mL Phase-Lock tube at 16000 g for 30 s, adding 100 μL of phenol: chloroform: isoamyl alcohol (25:24:1), 100 μL of digested RNA, shaked for 15 s, and centrifuged at 16000 g for 12 min at 15° C. Taking the supernatant into a new 1.5 mL centrifuge tube, adding the same volume of isopropanol 1/10 NaoAC as the supernatant, and reacted for 1 hour or −20° C. overnight. Centrifuged at 16000 g for 30 min at 4° C., and discarding the supernatant. Adding 3504, of 75% ethanol to wash the pellet, centrifuged at 16000 g for 10 min at 4° C., and discarding the supernatant. Drying, and adding 20 μL RNase-free water at 65° C. for 5 min to dissolve the precipitate. Using NanoDrop to measure the concentration and running the gel.

[0223] 6. 3′ dephosphorylation and 5′ phosphorylation: Adding water to ˜20 ug of each digested RNA to 42.5 μL, at 90° C. for 2 min. Cooling on ice for 5 minutes. Adding 54, 10×T4 PNK buffer; 0.5 μL RNaI, 2 μL, T4 PNK (50 μL), at 37° C. for 6 h. Adding 1 μL at T4 PNK, 1.25 μL, (100 mM) ATP, 37° C. for 1 h. Adding 47.75 μL, dH.sub.2O and adjusting the volume to 100 μL. After centrifuging the 2 mL Phase-Lock tube at 16000 g for 30 s, adding 100 μL of phenol: chloroform: isoamyl alcohol (25:24:1), 100 μL of digested RNA, shaking for 15 s, and centrifuged at 16000 g for 12 min at 15° C. Taking the supernatant into a new 1.5 mL centrifuge tube, adding the same volume of isopropanol with the supernatant, the total volume of 1/10 NaoAC, and reacted for 1 hour or −20° C. overnight. Centrifuged at 16000 g for 30 min at 4° C., and discarding the supernatant. Adding 3504, of 75% ethanol to wash the pellet, centrifuged at 16000 g for 10 min at 4° C., and discarding the supernatant. Drying, and adding 21 μL RNase-free water at 65° C. for 5 min to dissolve the precipitate, using NanoDrop to measure the concentration.

[0224] 7. RNA monophosphorylation: 20 μL RNA, at 90° C. for 1 min, cooling on ice for 5 min. Adding 2 μL RNA 5′ Polphosphatase 10×Reaction buffer, 0.5 μL Inhibitor, 1 μL at RNA 5′ Polphosphatase (20 Units), and adding RNase-free water to 20 μL, at 37° C. for 60 min. Adding 80 μL dH.sub.2O and adjusting the volume to 100 μL. After centrifuging the 2 mL Phase-Lock tube at 16000 g for 30 s, adding 100 μL of phenol: chloroform: isoamyl alcohol (25:24:1), 100 μL of digested RNA, shaking for 15 s, and centrifuged at 16000 g for 12 min at 15° C. Taking the supernatant in a new 1.5 mL centrifuge tube, adding the same volume of isopropanol with the supernatant, the total volume of 1/10 NaoAC, and reacted for 1 hour or −20° C. overnight. Centrifuged at 16000 g for 30 min at 4° C., discarding the supernatant, adding 3504, of 75% ethanol to wash the precipitate, centrifuged at 16000 g for 10 min at 4° C., discarding the supernatant. Drying, and adding 214, RNase-free water at 65° C. for 5 min to dissolve the precipitate, using NanoDrop to measure the concentration.

[0225] 8. Preparation of cDNA library: 16.5 μL RNase-free water. 5 μL Poly(A)Polymerase 10×Reaction buffer. 54, 10 mM ATP. 1.5 μL RiboGuard RNase Inhibitor. 204, RNA Substrate. 2 μL Poly(A)Polymerase (4 Units). 504, of total volume at 37° C. for 20 minutes. Adding 504, dH.sub.2O and adjusting the volume to 100 μL. After centrifuging the 2 mL Phase-Lock tube at 16000 g for 30 s, adding 100 μL of phenol: chloroform: isoamyl alcohol (25:24:1), 100 μL of digested RNA, shaking for 15 s, and centrifuged at 16000 g for 12 min. Taking the supernatant into a new 1.5 mL centrifuge tube, adding the same volume of isopropanol with the supernatant, the total volume of 1/10 NaoAC, and reacted for 1 hour or −20° C. overnight. Centrifuged at 16000 g for 30 min at 4° C., discarding the supernatant, drying it, and adding 114, RNase-free water at 65° C. for 5 min to dissolve the precipitate, and measuring the concentration with NanoDrop.

[0226] 9. Adding the sequencing linker to the cDNA library and sending it to Beijing berrygenomics for sequencing.

[0227] 10. Performing quality filtering on the original data to remove sequences with an average base quality value lower than 30. After removing the linker from the sequence, the RNA sequence from 25 nt to 50 nt was retained, and aligned to the reference sequence of the CRISPR array with bowtie.

[0228] 11. Through comparison, we have found that the pre-crRNA of Cas12f.4 can be successfully processed into 45 nt mature crRNA in E. coli, which consists of 23 nt Repeat sequence and 19-22 nt targeting sequence. 12. Using ViennaRNA and VARNA to predict and visualize the structure of mature crRNA. We have found that the 3′end of the Repeat sequence of crRNA can form an 8-base neck loop (FIG. 1).

[0229] 13. After predicting the 23 nt sequence of the 3′ end of the crRNA of Cas12f.5 and Cas12f.6, we have found a similar secondary structure (FIG. 1).

Example 3. Identification of the PAM Domain of the Cas12f Gene

[0230] 1. Constructing the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f and sequencing it. According to the sequencing results, the structure of the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f is described as follows: Replacing the small fragment between the recognition sequence of the restriction endonuclease Pm1 I and Kpn I of the vector pACYC-Duet-1 with the double-stranded sequence shown at positions 1 to 3713 from the 5′ end in the sequence as shown in SEQ ID NO: 4. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f expresses the Cas12f.4 protein as shown in SEQ ID NO: 1 and the Cas12f guide RNA as shown in SEQ ID NO: 25.

[0231] 2. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f contains an expression cassette, and the nucleotide sequence of the expression cassette is shown in SEQ ID NO: 23. In the sequence as shown in SEQ ID NO: 23, positions 1 to 44 from the 5′ end are the nucleotide sequence of the pLacZ promoter, positions 45 to 3326 are the nucleotide sequence of the Cas12f.4 gene, and positions 3327 to 3412 are the nucleotide sequence of the terminator (used to terminate transcription). From the 5′ end, positions 3413 to 3452 are the nucleotide sequence of the J23119 promoter, positions 3453 to 3,628 are the nucleotide sequence of the CRISPR array, and positions 3627 to 3713 are the nucleotide sequence of the rrnB-T1 terminator (used to terminate transcription).

[0232] 3. The acquisition of the recombinant E. coli: the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12f was introduced into E. coli EC100 to obtain recombinant E. coli, named EC100/pACYC-Duet-1+CRISPR/Cas12f. The recombinant plasmid pACYC-Duet-1 was introduced into E. coli EC100 to obtain a recombinant E. coli named EC100/pACYC-Duet-1.

[0233] 4. Construction of the PAM library: the sequence shown in SEQ ID NO: 24 is artificially synthesized and connected to the pUC19 vector, wherein the sequence as shown in SEQ ID NO: 24 includes eight random bases at the 5′ end and the target sequence. Eight random bases were designed in front of the 5′ end of the target sequence of the PAM library to construct a plasmid library. The plasmids were transferred into Escherichia coli containing the Cas12f.4 locus and Escherichia coli without the Cas.12f.4 locus, respectively. After treatment at 37° C. for 1 hour, we extracted the plasmid, and performed PCR amplification and sequencing on the sequence of the PAM region.

[0234] 5. The acquisition of the PAM library domain: the number of occurrences of 65,536 combinations of PAM sequences in the experimental group and the control group were counted, and the number of PAM sequences in each group was used for normalization. For any PAM sequence, when the log 2 (normalized value of the control group/normalized value of the experimental group) is greater than 3.5, we deem that this PAM is significantly consumed. We obtained a total of 3,548 significantly consumed PAM sequences, all accounting for 5.41%. We used Weblogo to predict the significantly consumed PAM sequence and found that the PAM domain of Cas12f.4 was a strict 5′-TTN structure (FIG. 2), and almost 100% of the second and third bases in front of the target sequence were T, and the other positions can be any sequence. This is a more rigorous PAM recognition method than C2c1, which has been reported for the most rigorous PAM recognition.

[0235] 6. Verification of the PAM library domain: Through the PAM library consumption experiment, we obtained the PAM domain of Cas12f.4. In order to verify the rigor of this domain, we set up 10 groups of PAM for in vivo experiments and sequenced Cas12fs editing activity on these PAMs. First, we integrated the 30 nt target and PAM sequence into the non-conserved position of the Kana gene-resistance of the plasmid, and then mixed it with the complex formed by CRSPR/Cas12f and guide RNA for 8 hours. By coating the plate and counting the number of colonies, we can judge the consumption activity of Cas12f on different PAM sequences. Through the experimental results, we can see that the CRISPR/Cas12f.4 system can only effectively edit the target sequence with 5′-TTA, 5′-TTT, 5′-TTC and 5′-TTG PAM, it has no editing activity on target sequences with 5′-TAT, 5′-TCT, 5′-TCG, 5′-ATT, 5′-CTT and 5′-GTT PAM, thus verifying the verifiability of the PAM domain recognition of Cas12f.4. By counting the colonies of different PAMs, we have found that the editing activity of the CRISPR/Cas12f.4 system on 5′-TTA, 5′-TTT and 5′-TTC is higher than that on 5′-TTG.

Example 4. Cas12f.4, Cas12f.5, Cas12f.6 Cleavage in Human Cell Lines

[0236] The eukaryotic expression vector containing the Cas12f.4 gene and the PCR product containing the U6 promoter and crRNA (SEQ ID NO: 25) sequence were transfected into a human HEK293T cell by liposome transfection (FIG. 3a), and incubated for 72 hours at 37 degrees Celsius with 5% carbon dioxide concentration. The DNA of total cells was extracted, and the 700 bp sequence containing the target site was amplified. The PCR products were constructed for next-generation sequencing library through Tn5, and the sequencing was completed by Beijing Annoroad Genomics Technology Co., Ltd. The sequencing results were compared to the VEGFA gene of the human genome, the cleavage method of Cas12f.4 to the target site was identified (FIG. 3b). The editing efficiency of CRISPR/Cas12f.4 system for VEGFA can reach 4.2%. The original sequencing data is shown in FIG. 3c (FIG. 3c).

[0237] The same method was used to detect the cleavage activity of Cas12f.5 and Cas12f.6 on VEGFA, and their crRNAs are shown in SEQ ID NO: 26 and SEQ ID NO: 27, respectively. The results in FIG. 3c show that the editing efficiency of CRISPR/Cas12f.5 and CRISPR/Cas12f.6 systems on VEGFA are 0.31% and 0.19%, respectively.

Example 5. Cleavage of Cas12f.4 in a Maize Protoplast

[0238] The purified Cas12f.4 protein (60 μg) and the guide RNA (120 μg) as shown in SEQ ID NO: 28 or 29 were mixed at 37 degrees Celsius to form a ribonucleoprotein complex (RNP), and then the CRISPR/Cas12f.4 RNP was transferred into a maize protoplast cell using PEG4000-mediated protoplast transformation, and cultured in the dark at 37 degrees Celsius for 24 hours (FIG. 4a). After the culture, the supernatant was discarded by centrifugation to collect the protoplasts, and the protoplast DNA was extracted. The DNA fragments of about 600 bp upstream and downstream of the target site were amplified. The DNA fragment containing the target site was subjected to T7 endonuclease digestion detection, and the result was shown in FIG. 4b. The CRISPR/Cas12f.4 system has a high-efficiency cleavage activity for PDI1 and SEB2.2. Connecting the DNA fragment containing the target site to the Blunt Simple vector, coating the plate, and using Thermo Fisher Scientific (China) Co., Ltd. to perform Sanger sequencing on the single clone, and comparing the sequencing results to the PDI1 and SEB2.2 genes in the maize group, the results are shown in FIGS. 4b-4c. The cleavage efficiency of Cas12f.4 on the target site is identified as 33.5% and 16.7%, respectively.

[0239] Although the specific embodiments of the present invention have been described in detail, those skilled in the art will understand that various modifications and changes can be made to the details according to all the teachings that have been published, and these changes are within the protection scope of the present invention. All of the present invention is given by the appended claims and any equivalents thereof.

CRISPR/CAS12F ENZYME AND SYSTEM

Inventors

Cpc classification

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/111

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/09

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/40

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/71

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/01

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/90

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/902

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/70

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/90

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Abstract

Claims

Description