Novel CRISPR-Cas delta enzyme and system

Abstract

The present invention relates to the field of nucleic acid editing, in particular to the field of clustered regularly interspaced short palindromic repeat (CRISPR) technology. Specifically, the present invention relates to Cas effector proteins, fusion proteins comprising such proteins, and nucleic acid molecules encoding them. The present invention also relates to complexes and compositions for nucleic acid editing (e.g., gene or genome editing), which comprise the proteins or fusion proteins of the present invention, or nucleic acid molecules encoding them. The present invention also relates to a method for nucleic acid editing (e.g., gene or genome editing), which uses the proteins or fusion proteins comprising the present invention.

Claims

1. A protein, which comprises or consists of a sequence selected from the following: (i) a sequence as set forth in any one of SEQ ID NOs: 1, 2, and 3; (ii) a sequence having a substitution, deletion, or addition of one or more amino acids as compared to the sequence as set forth in any one of SEQ ID NOs: 1, 2, and 3; (iii) a sequence having a substitution, deletion, or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, and 40 amino acids as compared to the sequence as set forth in any one of SEQ ID NOs: 1, 2, and 3; or (iiii) a sequence having a sequence identity of at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% as compared to the sequence as set forth in any one of SEQ ID NOs: 1, 2, and 3.

2. A truncated protein, wherein the truncated protein has a truncation of 1 to 10, 11 to 20, 21 to 30, 31 to 40, 41 to 50, 51 to 60 or more amino acids at the N-terminal and/or C-terminal as compared to the protein according to claim 1.

3. The truncated protein according to claim 2, the truncated protein is characterized by one or more of the following: (1) the truncated protein has a truncation of 31 amino acids at the N-terminal as compared to the sequence as set forth in any one of SEQ ID NO: 1 or 2; (2) the truncated protein comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as set forth in SEQ ID NO: 3; (ii) a sequence having a substitution, deletion, or addition of one or more amino acids as compared to the sequence as set forth in SEQ ID NO: 3; (iii) a sequence having a substitution, deletion, or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, and 40 as compared to the sequence as set forth in SEQ ID NO: 3; or (iiii) a sequence having a sequence identity of at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% as compared to the sequence as set forth in SEQ ID NO: 3.

4. A conjugate, which comprises the protein according to claim 1 or a truncated protein comprising the protein and a modification portion; wherein, the modification portion is selected from an additional protein or polypeptide, a detectable label, and any combination thereof.

5. The conjugate according to claim 4, the conjugate is characterized by one or more of the following: (1) the modification portion is connected to the N-terminal or C-terminal of the protein or truncated protein optionally via a linker; (2) the modification portion is fused to the N-terminal or C-terminal of the protein or truncated protein; (3) the additional protein or polypeptide is selected from the group consisting of an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcriptional activation domain, a transcriptional repression domain, a nuclease domain, and any combination thereof; (4) the conjugate comprises an epitope tag; and, (5) the conjugate comprises an NLS sequence.

6. A fusion protein, which comprises the protein according to claim 1 or a truncated protein comprising the protein and an additional protein or polypeptide.

7. The fusion protein according to claim 6, the fusion protein is characterized by one or more of the following: (1) the additional protein or polypeptide is connected to the N-terminal or C-terminal of the protein or truncated protein optionally via a linker; (2) the additional protein or polypeptide is selected from the group consisting of epitope tag, reporter gene sequence, nuclear localization signal (NLS) sequence, targeting moiety, transcription activation domain, transcription repression domain, nuclease domain, and any combination thereof; (3) the fusion protein comprises an epitope tag; and, (4) the fusion protein comprises an NLS sequence.

8. The fusion protein according to claim 7, the fusion protein is characterized by one or more of the following: (1) the NLS sequence is set forth in SEQ ID NO: 27; (2) the NLS sequence is located at, near or close to the N-terminal or C-terminal of the protein or truncated protein; (3) the fusion protein has an amino acid sequence as set forth in any one of SEQ ID NO: 28 to 30.

9. An isolated nucleic acid molecule, which comprises: a nucleotide sequence encoding the protein according to claim 1, or a truncated protein comprising the protein, or a fusion protein comprising the protein.

10. A vector, which comprises the isolated nucleic acid molecule according to claim 9.

11. A host cell, which comprises the isolated nucleic acid molecule according to claim 9 or a vector comprising the isolated nucleic acid molecule.

12. A composition or complex, which comprises: (i) a first component, which is selected from the group consisting of the protein according to claim 1, a truncated protein comprising the protein, a conjugate comprising the protein, a fusion protein comprising the protein, a nucleotide sequence encoding the protein, truncated protein or fusion protein, and any combination thereof; and (ii) a second component, which is a nucleotide sequence comprising a guide RNA, or a nucleotide sequence encoding the nucleotide sequence comprising a guide RNA; wherein, the guide RNA comprises a direct repeat sequence and a guide sequence from the 5 to 3 direction, and the guide sequence is capable of hybridizing with a target sequence; the guide RNA is capable of forming a complex with the protein, truncated protein, conjugate or fusion protein as described in (i).

13. The composition or complex according to claim 12, the composition or complex is characterized by one or more of the following: (1) the guide sequence is linked to the 3 end of the direct repeat sequence; for example, the guide sequence comprises a complementary sequence of the target sequence; (2) the composition or complex does not comprise a trans-activating crRNA (tracrRNA).

14. A composition or complex, which comprises one or more vectors, wherein the one or more vectors comprise: (i) a first nucleic acid, which comprises a nucleotide sequence encoding the protein according to claim 1, a truncated protein comprising the protein, or a fusion protein comprising the protein; optionally, the first nucleic acid is operably ligated to a first regulatory element; and (ii) a second nucleic acid, which comprises a nucleotide sequence encoding a guide RNA; optionally, the second nucleic acid is operably ligated to a second regulatory element; wherein: the first nucleic acid and the second nucleic acid are present on the same vector or different vectors; the guide RNA comprises a direct repeat sequence and a guide sequence from the 5 to 3 direction, and the guide sequence is capable of hybridizing with a target sequence; the guide RNA is capable of forming a complex with the protein, truncated protein or fusion protein as described in (i).

15. The composition or complex according to claim 12, wherein, when the target sequence is DNA, the target sequence is located at the 3 end of a protospacer adjacent motif (PAM), and the PAM has a sequence shown as 5-RYR, wherein R is A or G, and Y is T or C; preferably, the sequence of the PAM is selected from the group consisting of ATG, ACG, GTG, ATA, ACA, GCA, GTA and/or GCG.

16. The composition or complex according to claim 12, wherein, the target sequence is a DNA or RNA sequence derived from a prokaryotic cell or a eukaryotic cell; or, the target sequence is a non-naturally occurring DNA or RNA sequence.

17. The composition or complex according to claim 12, wherein, the target sequence is present in a cell; or, the target sequence is present in a nucleic acid molecule (e.g., a plasmid) in vitro; for example, the target sequence is present in a cell nucleus or cytoplasm (e.g., an organelle); for example, the cell is a prokaryotic cell; for example, the cell is a eukaryotic cell.

18. The composition or complex according to claim 12, wherein, the protein or truncated protein is linked to one or more NLS sequences, or, the conjugate or fusion protein comprises one or more NLS sequences; for example, the NLS sequence is linked to the N-terminal or C-terminal of the protein or truncated protein; for example, the NLS sequence is fused to the N-terminal or C-terminal of the protein or truncated protein.

19. A kit, which comprises one or more components selected from the following: the protein according to claim 1, a truncated protein comprising the protein, a conjugate comprising the protein, a fusion protein comprising the protein, an isolated nucleic acid molecule comprising a nucleotide sequence encoding the protein, an isolated nucleic acid molecule comprising a nucleotide sequence encoding the protein, a vector comprising the protein, a composition or complex comprising the protein.

20. A delivery composition, which comprises a delivery vector and one or more selected from the following: the protein according to claim 1, a truncated protein comprising the protein, a conjugate comprising the protein, a fusion protein comprising the protein, an isolated nucleic acid molecule comprising a nucleotide sequence encoding the protein, an isolated nucleic acid molecule comprising a nucleotide sequence encoding the protein, a vector comprising the protein, and a composition or complex comprising the protein.

21. The delivery composition according to claim 20, the delivery composition is characterized by one or more of the following: (1) the delivery vector is a particle; (2) the delivery vector is selected from the group consisting of lipid particle, sugar particle, metal particle, protein particle, liposome, exosome, microvesicle, gene gun or viral vector.

22. A method for modification of a target gene, which comprises: contacting a composition or complex comprising the protein with the target gene, or delivering it to a cell containing the target gene; wherein, the target sequence is present in the target gene.

23. The method according to claim 22, the method is characterized by one or more of the following: (1) the target gene is present in a cell, or, the target gene is present in a nucleic acid molecule (for example, a plasmid) in vitro; (2) the cell is a prokaryotic cell; for example, the cell is a eukaryotic cell; for example, the cell is selected from the group consisting of animal cell (e.g., mammalian cell, such as human cell), plant cell; (3) the modification refers to a break in the target sequence, such as a double-strand break in DNA or a single-strand break in RNA; and, (4) the modification further comprises inserting an exogenous nucleic acid into the break.

24. A method for changing the expression of a gene product, which comprises: contacting a composition or complex comprising the protein with a nucleic acid molecule encoding the gene product, or delivering it to a cell containing the nucleic acid molecule; wherein, the target sequence is present in the nucleic acid molecule.

25. The method according to claim 24, the method is characterized by one or more of the following: (1) the nucleic acid molecule is present in a cell, or the nucleic acid molecule is present in a nucleic acid molecule (for example, a plasmid) in vitro; (2) the cell is a prokaryotic cell; for example, the cell is a eukaryotic cell; for example, the cell is selected from the group consisting of animal cell (e.g., mammalian cell, such as human cell), plant cell; (3) the expression of the gene product is changed (e.g., enhanced or reduced); for example, the gene product is a protein; (4) the protein, truncated protein, conjugate, fusion protein, isolated nucleic acid molecule, vector or composition or complex is contained in a delivery vehicle; for example, the delivery vehicle is selected from the group consisting of lipid particle, sugar particle, metal particle, protein particle, liposome, exosome, viral vector; and, (5) the method is used to modify a cell, cell line or organism by changing one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product.

26. A cell or progeny thereof obtained by the method according to claim 20, wherein the cell comprises a modification that is not present in its wild type.

27. A cell product of the cell or progeny thereof according to claim 26.

28. An in vitro, ex vivo or in vivo cell or cell line or progeny thereof, wherein the cell or cell line or progeny thereof comprises: the protein according to claim 1, a truncated protein comprising the protein, a conjugate comprising the protein, a fusion protein comprising the protein, an isolated nucleic acid molecule comprising a nucleotide sequence encoding the protein, an isolated nucleic acid molecule comprising a nucleotide sequence encoding the protein, a vector comprising the protein, a composition or complex comprising the protein; for example, the cell is a prokaryotic cell or a eukaryotic cell.

29. A method for detecting whether a target nucleic acid is present in a sample, comprising the following steps: (1) contacting the sample with a labeled DNA probe and any of the following components: a composition or complex comprising the protein, or a kit comprising the protein; wherein, the guide sequence contained in the composition or complex, or kit is capable of hybridizing with the target nucleic acid, and the DNA probe is not capable of hybridizing with the guide sequence; preferably, the DNA probe emits a detectable signal after being cleaved; (2) detecting the detectable signal generated by the cleavage of DNA probes by the protein or truncated protein contained in the composition or complex, or kit, thereby determining whether the target nucleic acid is present in the sample; preferably, one end (e.g., the 5 end) of the DNA probe is labeled with a fluorescent group, and the other end (e.g., the 3 end) is labeled with a quenching group.

30. The method according to claim 29, the method is characterized by one or more of the following: (1) wherein the sequence of the target nucleic acid is a sequence obtained from a pathogen; preferably, the pathogen is selected from the group consisting of a virus, a bacterium, a fungus, a protozoa, a parasite or any combination thereof; (2) the sequence of the target nucleic acid is obtained from the genome of a tumor cell; (3) the method further comprises a step of contacting the sample with a reagent for reverse transcription; preferably, the reagent for reverse transcription is selected from the group consisting of reverse transcriptase, oligonucleotide primer, dNTP or any combination thereof; (4) the target nucleic acid is single-stranded or double-stranded; preferably, the sequence of the target nucleic acid is a DNA or RNA sequence derived from a prokaryotic cell or a eukaryotic cell; or, the sequence of the target nucleic acid is a non-naturally occurring DNA or RNA sequence; (5) the detectable signal is determined by one or more methods selected from the following: imaging-based detection, sensor-based detection, color detection, gold nanoparticle-based detection, fluorescence polarization, colloidal phase transition/dispersion, electrochemical detection and semiconductor-based sensing; (6) the method further comprises a step of amplifying the target nucleic acid in the sample.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0206] FIGS. 1A and B show the PAM structure analysis and verification results of Cas-1 in Example 3.

[0207] FIG. 2 shows the verification of the PAM recognition site of Cas-1 in Escherichia coli in Example 3.

[0208] FIG. 3 shows the analysis of the cleavage characteristics of Cas-1 protein on the target sequence in Example 3.

[0209] FIGS. 4A and B show the analysis results of the trans-cleavage activity of Cas-1 in Example 4.

[0210] FIG. 5 shows the analysis of the editing activity of Cas-1 at 14 target sites in HELA cells in Example 5.

[0211] FIGS. 6A and B show the editing activity of Cas-1 on AAVS1 targets at different PAM sites in HELA cells in Example 5.

[0212] FIG. 7 shows the editing activity of Cas-1 for crRNAs composed of direct repeat sequences and guide sequences with different lengths in HELA cells in Example 5, in which R represents direct repeat sequences, T represents guide sequences, and the numbers represent the number of bases in the corresponding sequence.

[0213] FIG. 8A shows the components in the expression vector with the protein of Cas-1 constructed in Example 6; FIG. 8B shows the base sequences of Cas-1-edited plants in maize in Example 6.

[0214] FIG. 9 shows the cleavage results of Cas-1D31 on double-stranded DNA in Example 9.

SEQUENCE INFORMATION

[0215] The information of the partial sequences involved in the present invention is provided in Table 1 below.

TABLE-US-00001 TABLE1 Descriptionofsequences SEQ ID NO: Description 1 AminoacidsequenceofCas-1 MSKDGDKMKHSKQEVSEDKGVKKDPSVFRCQNLAILEMREEVADYYADLQADYR HYLPLMILWGQNGFMEPDETGLNFSAVLPSTAKTRCENRYGAQIAADLKLIADDLP SSLYAIFTNDYQPLTGVKKVGDRWKFVTSLNTVSDEEFAKKVKPLLRDYKGVTEEQ WRRFREERVAGKTADQVWDDLDSILRNPTLAALKKIHDKHMFRVKRPESKLYCDA LAHmLSSSYDSWKKINEARQLELQAMRDEQQGLLANKTTNSLYNLVCEFADELESQ NYGLTRRIFFIARKEWEDEKSHYLPVIKLLVTEKYLPLLQADNKTVSPAVKAWDICS TLKQRSPYIRFVPPTKDYPIPFGQTGRGHKFTLKEEAGKVCFLVPGLTKDEMPLSGSH YFSSLRIDKVGTDFKISFRHKTKIRSKKSARVTVKQPRINGTVTEVGILRRNGKFFLR VAYKIAYPQHNIDLKSYFASAAPSTQKLELLPAQMRVAGVDLNISDPIVVSKADIFK GEDGGPLSVLDYGSGKVVGEPVIICKDTSRAGKISSLATKCRTLRDVIRNFRHSLHQ GTPLPSKSLEFLQEVPVAQEDLKSPSPRYLIQTWIKYIRTEMKRLHYKVWLDGYCHT SEAMRmLDLVDQVAFLMKSYENIHVKPGHKIPKKTTASLKIRETSRTRFRLHVSRLV GARVVRACEDCQMIFLEDLSTGFSEGSNNSLARLFSAGQLKNSIVMAAEKVGKAVV FVCKDGTSIKDPVFGLHGLRPSKKTPGIPPDKKRLYVRRDGKIGYINADQAASLNVL LVGLSHSVAKYKFFVKDGKLQDDEEVAAQQGKKPRKIGVRLERLLKMNFDTPARL FFSLEKDQVVPCYEATETPYTGWVYVYHDRLLTTRQRDQQVNQLAQEVKDLLRDG VKIPKWDVIPDCDNSCLGFSPCLVLDDETAAVSVV 2 AminoacidsequenceofCas-2 MELKTTFCQSLKIKILPESHKLYYDELQADLKKMRIFFATHSGLNYSYKEDENDSEW KIASSKKSTLSPEAQRLIDIIAKGGIGAEHAYGIFYKDKKVSDKIRRDNSNCIFAmLLP NLVNFEEEYKKRKSKLQLTKEEWEEYHKRCLAGESVEVLWSEEIAPKIGPNEAARL ASLAYHQFPDSRHPADITHCRALMSMVADSFCSWVECHKLHREETEKLKTKLQEKI ESCKEAYNLLLAYSEDLYSRNYGLSARVLKALIDNKSRCCFDIAFQIAEKYPALMKI DEKTLFKAYNALKLHWKIKRRKPFISLPKIERDYQVPFGLTGSRGKKFDVYVENKNI VVEIDGQKVETFSSHYFSDMQISKIYGEKKNLKGFKLKFRHKLKDKKKEVYGEWIE AELKEIKIKKDMETGDFYLYLPYTTTHNEKNLLLEKFFSYADPLKETIFKNGKTELPN EFLGFGFDLNLSDPIAMAVAEFVRDSSDGEIGALDYGHGKLLDASVLVCNSSLSKRI NDLVGDNRKLIQAIRSYKNSLVTKEMDEESGEWLKKALGNAKYGNHRHQLQLVMS KLNKKSKKYWQECRKNGHNDLSENIALLKLLDVQFSLHKSYNNIHIFDYKKQIHSH KTDSKRENFREFVTKQFAATIVKHCKEVMAKYGREYAVVFLEDLEMYFDADADNN SLIRLFAPGQLKKYIASALAKSKIGYVFIPPAGTSKTDPLTSKIGFRATGKYYKQLGY LLNKSDLYVERNGVIGKINSDVAAAINILLKGVNHSIVPYRFLNKHAGSEQKRLKRF EGEIGCDLKKMPKTRLYYLDGKIITEEEKNSLEEALLAEIKHFLLSKANIPEFDAIPGS NGTCKAFSVPRCLK 3 AminoacidsequenceofCas-1D31 NLAILEMREEVADYYADLQADYRHYLPLMILWGQNGFMEPDETGLNFSAVLPSTA KTRCENRYGAQIAADLKLIADDLPSSLYAIFTNDYQPLTGVKKVGDRWKFVTSLNT VSDEEFAKKVKPLLRDYKGVTEEQWRRFREERVAGKTADQVWDDLDSILRNPTLA ALKKIHDKHMFRVKRPESKLYCDALAHmLSSSYDSWKKINEARQLELQAMRDEQQ GLLANKTTNSLYNLVCEFADELESQNYGLTRRIFFIARKEWEDEKSHYLPVIKLLVT EKYLPLLQADNKTVSPAVKAWDICSTLKQRSPYIRFVPPTKDYPIPFGQTGRGHKFT LKEEAGKVCFLVPGLTKDEMPLSGSHYFSSLRIDKVGTDFKISFRHKTKIRSKKSARV TVKQPRINGTVTEVGILRRNGKFFLRVAYKIAYPQHNIDLKSYFASAAPSTQKLELLP AQMRVAGVDLNISDPIVVSKADIFKGEDGGPLSVLDYGSGKVVGEPVIICKDTSRAG KISSLATKCRTLRDVIRNFRHSLHQGTPLPSKSLEFLQEVPVAQEDLKSPSPRYLIQT WIKYIRTEMKRLHYKVWLDGYCHTSEAMRmLDLVDQVAFLMKSYENIHVKPGHKI PKKTTASLKIRETSRTRFRLHVSRLVGARVVRACEDCQMIFLEDLSTGFSEGSNNSLA RLFSAGQLKNSIVMAAEKVGKAVVFVCKDGTSIKDPVFGLHGLRPSKKTPGIPPDKK RLYVRRDGKIGYINADQAASLNVLLVGLSHSVAKYKFFVKDGKLQDDEEVAAQQG KKPRKIGVRLERLLKMNFDTPARLFFSLEKDQVVPCYEATETPYTGWVYVYHDRLL TTRQRDQQVNQLAQEVKDLLRDGVKIPKWDVIPDCDNSCLGFSPCLVLDDETAAVS VV 4 EncodingnucleotidesequenceofCas-1 ATGTCCAAAGACGGTGACAAGATGAAGCACAGCAAACAAGAAGTCTCTGAGGA CAAAGGGGTCAAGAAAGACCCGTCTGTCTTCCGCTGTCAGAACCTCGCGATTCT GGAGATGCGTGAAGAAGTCGCTGATTACTACGCCGACCTTCAGGCCGACTACCG TCACTACTTGCCCCTGATGATCCTCTGGGGCCAGAACGGGTTCATGGAGCCAGA CGAAACTGGTCTGAACTTCAGTGCCGTCCTGCCATCGACCGCAAAAACACGGTG CGAGAACCGCTACGGTGCGCAGATTGCGGCTGATCTCAAGTTGATCGCCGACGA TCTCCCGAGCAGTCTCTATGCCATCTTCACCAACGACTACCAGCCCCTCACCGGC GTGAAGAAGGTCGGCGACAGATGGAAGTTCGTCACGTCCCTCAACACTGTCAGC GACGAAGAGTTCGCCAAGAAGGTGAAGCCGCTCCTGCGTGACTACAAAGGCGTT ACCGAAGAGCAGTGGCGTCGATTTCGTGAAGAGCGTGTCGCCGGCAAGACAGCC GATCAGGTCTGGGACGACCTTGACAGCATTCTGCGAAACCCGACGCTCGCGGCC CTGAAGAAGATCCACGACAAGCACATGTTCCGGGTCAAGCGACCCGAGAGCAA ACTCTACTGTGACGCCCTGGCTCACATGCTGAGTTCCAGCTACGACAGTTGGAA GAAAATCAACGAGGCCCGCCAACTCGAACTCCAAGCCATGCGTGACGAGCAGC AGGGCTTGCTCGCCAACAAGACCACGAACAGTCTGTACAATCTGGTGTGCGAGT TCGCCGACGAACTGGAGTCGCAGAACTACGGTCTGACTCGGCGGATCTTCTTCA TCGCCCGCAAAGAGTGGGAGGACGAAAAGTCCCATTACCTCCCGGTCATCAAGC TCCTCGTCACGGAGAAGTATCTGCCCCTGCTCCAGGCGGATAACAAGACTGTCA GCCCCGCCGTCAAGGCGTGGGACATCTGCTCTACTCTGAAGCAGAGGTCGCCGT ACATCCGCTTCGTGCCTCCGACCAAGGACTATCCGATCCCCTTCGGCCAGACGG GTCGCGGACACAAGTTCACCTTGAAAGAGGAAGCCGGAAAGGTCTGTTTCCTCG TACCCGGCCTGACCAAGGACGAGATGCCTCTGTCCGGCTCTCACTACTTCTCAAG CCTCCGTATCGACAAGGTCGGCACAGACTTCAAGATCTCGTTCCGTCACAAGAC CAAGATTCGCAGCAAAAAGTCTGCGCGAGTCACCGTCAAGCAGCCCAGGATCAA CGGCACCGTCACTGAGGTCGGCATCCTACGCAGGAACGGCAAGTTCTTCCTGCG TGTCGCTTACAAGATCGCCTACCCACAGCACAACATCGACCTGAAGAGCTACTT CGCCTCGGCAGCCCCCTCCACGCAAAAGCTGGAGCTTCTGCCGGCTCAAATGAG AGTCGCGGGCGTCGATCTGAACATCAGCGACCCGATTGTCGTCTCCAAGGCCGA CATCTTCAAGGGCGAAGACGGCGGGCCGCTGTCTGTGCTGGACTACGGCTCTGG CAAGGTGGTAGGTGAGCCTGTGATCATCTGCAAGGACACCAGCAGGGCCGGGA AGATCAGTTCCCTCGCCACGAAATGCCGGACGCTACGCGACGTGATCCGCAACT TTCGTCATTCACTGCATCAGGGTACGCCCTTGCCCAGCAAGAGTCTTGAATTTCT CCAAGAAGTTCCCGTGGCGCAGGAGGACTTGAAGTCGCCCAGCCCTCGCTATCT GATCCAGACGTGGATCAAGTACATCAGGACCGAGATGAAGCGACTTCACTACAA GGTCTGGCTCGACGGGTACTGCCACACTAGCGAAGCCATGCGGATGCTCGATCT GGTCGATCAGGTCGCGTTCCTAATGAAGTCGTATGAGAACATCCACGTCAAACC GGGTCACAAGATCCCGAAGAAGACCACTGCGTCCCTGAAGATTCGCGAGACGAG CCGAACGCGGTTCAGGCTGCATGTCTCCCGACTTGTCGGTGCGCGAGTGGTTCGT GCGTGCGAGGACTGCCAGATGATCTTCTTGGAAGATCTCTCGACAGGTTTCTCCG AAGGCAGCAATAACTCGTTGGCCCGGCTGTTTTCGGCTGGCCAACTGAAGAACT CCATCGTCATGGCGGCAGAGAAGGTCGGCAAAGCCGTAGTCTTCGTGTGCAAGG ACGGTACGTCTATCAAAGATCCTGTTTTCGGTCTGCACGGCCTGCGGCCTTCCAA GAAGACCCCAGGCATTCCGCCGGACAAGAAGCGTCTCTACGTCCGTCGCGACGG CAAAATCGGCTACATCAACGCCGATCAGGCGGCTTCGCTGAATGTGCTGCTGGT GGGGCTGTCGCACTCCGTCGCCAAGTACAAGTTCTTTGTCAAGGACGGGAAGCT ACAAGACGACGAGGAGGTGGCTGCCCAGCAAGGCAAGAAGCCGCGTAAGATCG GTGTTCGTCTGGAAAGACTCCTCAAGATGAACTTCGACACCCCTGCCAGACTTTT CTTCTCGCTGGAGAAAGATCAGGTGGTGCCGTGCTACGAGGCCACGGAAACGCC TTACACCGGCTGGGTCTACGTCTACCACGACCGACTGCTGACCACCCGGCAACG TGATCAGCAAGTCAATCAACTCGCACAGGAAGTCAAGGATCTCCTGCGGGATGG TGTCAAAATTCCTAAATGGGACGTAATCCCAGACTGCGACAACAGTTGCCTCGG CTTCTCTCCGTGTCTGGTGCTTGACGACGAAACCGCCGCTGTTTCTGTGGTTTAG 5 EncodingnucleotidesequenceofCas-2 ATGGAATTGAAGACAACTTTTTGTCAGTCACTAAAAATTAAGATCCTCCCCGAA AGCCATAAGTTGTATTACGATGAGCTGCAAGCCGACCTAAAGAAGATGAGAATC TTCTTCGCAACGCACTCTGGGCTCAACTACAGCTACAAGGAAGATGAAAACGAT TCGGAATGGAAGATAGCGTCGTCAAAAAAGAGCACACTCTCACCTGAAGCCCAA AGGCTTATTGACATCATCGCCAAAGGCGGTATAGGAGCAGAGCACGCCTACGGA ATATTTTACAAGGACAAAAAGGTGTCGGACAAGATCCGGCGAGATAATTCAAAC TGCATCTTCGCCATGCTATTACCTAACCTCGTCAATTTCGAGGAAGAATACAAAA AGAGGAAAAGTAAGCTCCAGCTCACCAAGGAAGAGTGGGAGGAGTACCATAAA CGATGTCTTGCAGGCGAGAGCGTGGAAGTCCTGTGGTCCGAGGAGATTGCTCCA AAAATTGGGCCTAACGAAGCGGCGAGACTGGCATCACTCGCGTATCACCAGTTC CCAGATTCTCGTCATCCGGCGGACATTACCCACTGCAGGGCGCTCATGTCTATGG TAGCAGACTCTTTTTGTTCCTGGGTCGAGTGCCACAAGCTGCACCGTGAGGAAA CAGAGAAACTCAAGACGAAATTACAGGAGAAGATAGAGAGCTGCAAGGAGGCA TACAATCTGCTACTCGCTTATTCCGAAGACCTGTACAGTAGGAATTATGGTCTGA GTGCCAGGGTTTTGAAGGCACTTATTGACAACAAATCCCGGTGCTGCTTCGACAT AGCCTTTCAGATCGCGGAGAAATATCCTGCACTGATGAAGATTGATGAAAAGAC CCTGTTCAAGGCCTACAACGCCCTAAAGCTCCACTGGAAAATAAAACGACGCAA GCCGTTCATCTCGCTGCCCAAGATCGAGAGGGACTACCAGGTGCCCTTTGGACT CACAGGAAGCAGAGGCAAGAAGTTCGACGTCTATGTGGAAAACAAGAATATTG TGGTCGAGATCGACGGCCAGAAAGTCGAAACCTTCAGCTCTCACTATTTCTCTGA TATGCAGATCTCCAAGATTTACGGTGAAAAGAAAAATTTGAAGGGATTCAAGTT GAAATTCCGCCACAAGCTGAAGGACAAGAAGAAGGAGGTTTATGGAGAGTGGA TTGAGGCGGAGCTGAAGGAAATCAAGATTAAGAAAGACATGGAGACAGGCGAC TTCTACCTGTATCTCCCATACACTACAACGCATAATGAGAAAAACTTATTGCTCG AGAAGTTCTTTTCCTACGCCGATCCACTTAAGGAGACTATTTTTAAGAACGGAAA GACGGAGCTTCCAAATGAATTTCTAGGCTTCGGCTTCGATTTGAACCTTTCCGAC CCGATTGCGATGGCGGTTGCCGAGTTTGTTCGGGACTCATCAGATGGTGAGATT GGCGCCCTCGACTATGGTCACGGTAAGCTCCTCGACGCTAGTGTTTTGGTCTGCA ACTCATCACTATCAAAGCGCATCAATGACCTGGTTGGAGACAATAGAAAGCTGA TCCAAGCTATTAGATCGTACAAGAACTCGCTAGTAACTAAGGAAATGGACGAGG AGAGTGGCGAGTGGCTTAAAAAAGCTCTAGGCAATGCAAAGTACGGCAACCAT AGGCATCAGCTTCAACTGGTCATGTCTAAGCTGAACAAAAAAAGCAAAAAGTAC TGGCAGGAATGTCGCAAGAACGGCCACAACGATCTTTCCGAAAATATTGCACTA TTAAAGTTGTTGGACGTGCAGTTCTCTTTGCATAAAAGCTATAACAACATACACA TCTTTGATTACAAAAAACAAATTCACAGCCATAAGACTGACTCCAAAAGGGAGA ATTTCCGTGAATTTGTGACGAAGCAATTTGCGGCTACCATAGTAAAGCATTGCA AGGAGGTGATGGCTAAATATGGCAGAGAGTACGCCGTTGTCTTTTTGGAGGACC TGGAGATGTACTTTGATGCTGATGCTGATAACAATTCTCTGATCCGGCTTTTTGC ACCCGGCCAATTGAAGAAGTACATCGCCTCCGCTCTGGCTAAGAGCAAGATAGG TTATGTGTTCATCCCTCCTGCTGGGACAAGTAAGACCGACCCGTTAACTTCGAAA ATTGGATTCAGGGCTACCGGAAAATACTACAAGCAGCTGGGCTATCTGCTTAAT AAGTCCGACCTCTACGTGGAGCGCAACGGGGTTATCGGGAAGATCAACAGTGAT GTAGCTGCGGCGATAAATATTCTCTTAAAGGGCGTGAACCACTCAATCGTGCCG TACCGCTTCCTGAATAAACATGCTGGGTCTGAACAGAAGCGCCTCAAACGTTTT GAAGGGGAGATCGGTTGTGATCTGAAAAAGATGCCAAAGACACGCCTCTACTAT CTTGACGGAAAAATCATCACTGAAGAGGAGAAGAACAGCCTGGAGGAAGCGCT ACTCGCTGAGATCAAGCACTTCCTTCTGTCGAAGGCCAACATTCCGGAGTTTGAT GCAATACCGGGGAGCAATGGCACCTGCAAAGCCTTCTCCGTTCCACGCTGCCTT AAGTGA 6 EncodingnucleotidesequenceofCas-1D31 AACCTCGCGATTCTGGAGATGCGTGAAGAAGTCGCTGATTACTACGCCGACCTT CAGGCCGACTACCGTCACTACTTGCCCCTGATGATCCTCTGGGGCCAGAACGGG TTCATGGAGCCAGACGAAACTGGTCTGAACTTCAGTGCCGTCCTGCCATCGACC GCAAAAACACGGTGCGAGAACCGCTACGGTGCGCAGATTGCGGCTGATCTCAAG TTGATCGCCGACGATCTCCCGAGCAGTCTCTATGCCATCTTCACCAACGACTACC AGCCCCTCACCGGCGTGAAGAAGGTCGGCGACAGATGGAAGTTCGTCACGTCCC TCAACACTGTCAGCGACGAAGAGTTCGCCAAGAAGGTGAAGCCGCTCCTGCGTG ACTACAAAGGCGTTACCGAAGAGCAGTGGCGTCGATTTCGTGAAGAGCGTGTCG CCGGCAAGACAGCCGATCAGGTCTGGGACGACCTTGACAGCATTCTGCGAAACC CGACGCTCGCGGCCCTGAAGAAGATCCACGACAAGCACATGTTCCGGGTCAAGC GACCCGAGAGCAAACTCTACTGTGACGCCCTGGCTCACATGCTGAGTTCCAGCT ACGACAGTTGGAAGAAAATCAACGAGGCCCGCCAACTCGAACTCCAAGCCATGC GTGACGAGCAGCAGGGCTTGCTCGCCAACAAGACCACGAACAGTCTGTACAATC TGGTGTGCGAGTTCGCCGACGAACTGGAGTCGCAGAACTACGGTCTGACTCGGC GGATCTTCTTCATCGCCCGCAAAGAGTGGGAGGACGAAAAGTCCCATTACCTCC CGGTCATCAAGCTCCTCGTCACGGAGAAGTATCTGCCCCTGCTCCAGGCGGATA ACAAGACTGTCAGCCCCGCCGTCAAGGCGTGGGACATCTGCTCTACTCTGAAGC AGAGGTCGCCGTACATCCGCTTCGTGCCTCCGACCAAGGACTATCCGATCCCCTT CGGCCAGACGGGTCGCGGACACAAGTTCACCTTGAAAGAGGAAGCCGGAAAGG TCTGTTTCCTCGTACCCGGCCTGACCAAGGACGAGATGCCTCTGTCCGGCTCTCA CTACTTCTCAAGCCTCCGTATCGACAAGGTCGGCACAGACTTCAAGATCTCGTTC CGTCACAAGACCAAGATTCGCAGCAAAAAGTCTGCGCGAGTCACCGTCAAGCAG CCCAGGATCAACGGCACCGTCACTGAGGTCGGCATCCTACGCAGGAACGGCAAG TTCTTCCTGCGTGTCGCTTACAAGATCGCCTACCCACAGCACAACATCGACCTGA AGAGCTACTTCGCCTCGGCAGCCCCCTCCACGCAAAAGCTGGAGCTTCTGCCGG CTCAAATGAGAGTCGCGGGCGTCGATCTGAACATCAGCGACCCGATTGTCGTCT CCAAGGCCGACATCTTCAAGGGCGAAGACGGCGGGCCGCTGTCTGTGCTGGACT ACGGCTCTGGCAAGGTGGTAGGTGAGCCTGTGATCATCTGCAAGGACACCAGCA GGGCCGGGAAGATCAGTTCCCTCGCCACGAAATGCCGGACGCTACGCGACGTGA TCCGCAACTTTCGTCATTCACTGCATCAGGGTACGCCCTTGCCCAGCAAGAGTCT TGAATTTCTCCAAGAAGTTCCCGTGGCGCAGGAGGACTTGAAGTCGCCCAGCCC TCGCTATCTGATCCAGACGTGGATCAAGTACATCAGGACCGAGATGAAGCGACT TCACTACAAGGTCTGGCTCGACGGGTACTGCCACACTAGCGAAGCCATGCGGAT GCTCGATCTGGTCGATCAGGTCGCGTTCCTAATGAAGTCGTATGAGAACATCCA CGTCAAACCGGGTCACAAGATCCCGAAGAAGACCACTGCGTCCCTGAAGATTCG CGAGACGAGCCGAACGCGGTTCAGGCTGCATGTCTCCCGACTTGTCGGTGCGCG AGTGGTTCGTGCGTGCGAGGACTGCCAGATGATCTTCTTGGAAGATCTCTCGAC AGGTTTCTCCGAAGGCAGCAATAACTCGTTGGCCCGGCTGTTTTCGGCTGGCCAA CTGAAGAACTCCATCGTCATGGCGGCAGAGAAGGTCGGCAAAGCCGTAGTCTTC GTGTGCAAGGACGGTACGTCTATCAAAGATCCTGTTTTCGGTCTGCACGGCCTGC GGCCTTCCAAGAAGACCCCAGGCATTCCGCCGGACAAGAAGCGTCTCTACGTCC GTCGCGACGGCAAAATCGGCTACATCAACGCCGATCAGGCGGCTTCGCTGAATG TGCTGCTGGTGGGGCTGTCGCACTCCGTCGCCAAGTACAAGTTCTTTGTCAAGGA CGGGAAGCTACAAGACGACGAGGAGGTGGCTGCCCAGCAAGGCAAGAAGCCGC GTAAGATCGGTGTTCGTCTGGAAAGACTCCTCAAGATGAACTTCGACACCCCTG CCAGACTTTTCTTCTCGCTGGAGAAAGATCAGGTGGTGCCGTGCTACGAGGCCA CGGAAACGCCTTACACCGGCTGGGTCTACGTCTACCACGACCGACTGCTGACCA CCCGGCAACGTGATCAGCAAGTCAATCAACTCGCACAGGAAGTCAAGGATCTCC TGCGGGATGGTGTCAAAATTCCTAAATGGGACGTAATCCCAGACTGCGACAACA GTTGCCTCGGCTTCTCTCCGTGTCTGGTGCTTGACGACGAAACCGCCGCTGTTTC TGTGGTTTAG 7 DirectrepeatsequenceofCas-1 GUGCUGACGACCAGCACUAGAUGGUCGUUCAGGCAC 8 DirectrepeatsequenceofCas-2 GUGCUGAACAGGGUCGCUAGGCGUUGUUCAAGGCAC 9 DirectrepeatsequenceofCas-1R34 GCUGACGACCAGCACUAGAUGGUCGUUCAGGCAC 10 DirectrepeatsequenceofCas-1R32 UGACGACCAGCACUAGAUGGUCGUUCAGGCAC 11 DirectrepeatsequenceofCas-1R30 ACGACCAGCACUAGAUGGUCGUUCAGGCAC 12 DirectrepeatsequenceofCas-1R28 GACCAGCACUAGAUGGUCGUUCAGGCAC 13 DirectrepeatsequenceofCas-1R26 CCAGCACUAGAUGGUCGUUCAGGCAC 14 DirectrepeatsequenceofCas-1R24 AGCACUAGAUGGUCGUUCAGGCAC 15 DirectrepeatsequenceofCas-1R22 CACUAGAUGGUCGUUCAGGCAC 16 DirectrepeatsequenceofCas-1R20 CUAGAUGGUCGUUCAGGCAC 17 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1 GTGCTGACGACCAGCACTAGATGGTCGTTCAGGCAC 18 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-2 GTGCTGAACAGGGTCGCTAGGCGTTGTTCAAGGCACG 19 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R34 GCTGACGACCAGCACTAGATGGTCGTTCAGGCAC 20 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R32 TGACGACCAGCACTAGATGGTCGTTCAGGCAC 21 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R30 ACGACCAGCACTAGATGGTCGTTCAGGCAC 22 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R28 GACCAGCACTAGATGGTCGTTCAGGCAC 23 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R26 CCAGCACTAGATGGTCGTTCAGGCAC 24 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R24 AGCACTAGATGGTCGTTCAGGCAC 25 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R22 CACTAGATGGTCGTTCAGGCAC 26 EncodingnucleicacidsequenceofdirectrepeatsequenceofCas-1R20 CTAGATGGTCGTTCAGGCAC 27 NLSsequence SRADPKKKRKV 28 AminoacidsequenceofCas-1-NLSfusionprotein MGPKKKRKVMDYKDHDGDYKDHDIDYKDDDDKMSKDGDKMKHSKQEVSEDKG VKKDPSVFRCONLAILEMREEVADYYADLQADYRHYLPLMILWGQNGFMEPDETG LNFSAVLPSTAKTRCENRYGAQIAADLKLIADDLPSSLYAIFTNDYQPLTGVKKVGD RWKFVTSLNTVSDEEFAKKVKPLLRDYKGVTEEQWRRFREERVAGKTADQVWDD LDSILRNPTLAALKKIHDKHMFRVKRPESKLYCDALAHmLSSSYDSWKKINEARQL ELQAMRDEQQGLLANKTTNSLYNLVCEFADELESQNYGLTRRIFFIARKEWEDEKS HYLPVIKLLVTEKYLPLLQADNKTVSPAVKAWDICSTLKQRSPYIRFVPPTKDYPIPF GQTGRGHKFTLKEEAGKVCFLVPGLTKDEMPLSGSHYFSSLRIDKVGTDFKISFRHK TKIRSKKSARVTVKQPRINGTVTEVGILRRNGKFFLRVAYKIAYPQHNIDLKSYFASA APSTQKLELLPAQMRVAGVDLNISDPIVVSKADIFKGEDGGPLSVLDYGSGKVVGEP VIICKDTSRAGKISSLATKCRTLRDVIRNFRHSLHQGTPLPSKSLEFLQEVPVAQEDL KSPSPRYLIQTWIKYIRTEMKRLHYKVWLDGYCHTSEAMRmLDLVDQVAFLMKSY ENIHVKPGHKIPKKTTASLKIRETSRTRFRLHVSRLVGARVVRACEDCQMIFLEDLST GFSEGSNNSLARLFSAGQLKNSIVMAAEKVGKAVVFVCKDGTSIKDPVFGLHGLRP SKKTPGIPPDKKRLYVRRDGKIGYINADQAASLNVLLVGLSHSVAKYKFFVKDGKL QDDEEVAAQQGKKPRKIGVRLERLLKMNFDTPARLFFSLEKDQVVPCYEATETPYT GWVYVYHDRLLTTRQRDQQVNQLAQEVKDLLRDGVKIPKWDVIPDCDNSCLGFSP CLVLDDETAAVSVVSRADPKKKRKV 29 AminoacidsequenceofCas-2-NLSfusionprotein MGPKKKRKVMDYKDHDGDYKDHDIDYKDDDDKMELKTTFCQSLKIKILPESHKLY YDELQADLKKMRIFFATHSGLNYSYKEDENDSEWKIASSKKSTLSPEAQRLIDIIAKG GIGAEHAYGIFYKDKKVSDKIRRDNSNCIFAmLLPNLVNFEEEYKKRKSKLQLTKEE WEEYHKRCLAGESVEVLWSEEIAPKIGPNEAARLASLAYHQFPDSRHPADITHCRAL MSMVADSFCSWVECHKLHREETEKLKTKLQEKIESCKEAYNLLLAYSEDLYSRNYG LSARVLKALIDNKSRCCFDIAFQIAEKYPALMKIDEKTLFKAYNALKLHWKIKRRKP FISLPKIERDYQVPFGLTGSRGKKFDVYVENKNIVVEIDGQKVETFSSHYFSDMQISK IYGEKKNLKGFKLKFRHKLKDKKKEVYGEWIEAELKEIKIKKDMETGDFYLYLPYT TTHNEKNLLLEKFFSYADPLKETIFKNGKTELPNEFLGFGFDLNLSDPIAMAVAEFVR DSSDGEIGALDYGHGKLLDASVLVCNSSLSKRINDLVGDNRKLIQAIRSYKNSLVTK EMDEESGEWLKKALGNAKYGNHRHQLQLVMSKLNKKSKKYWQECRKNGHNDLS ENIALLKLLDVQFSLHKSYNNIHIFDYKKQIHSHKTDSKRENFREFVTKQFAATIVKH CKEVMAKYGREYAVVFLEDLEMYFDADADNNSLIRLFAPGQLKKYIASALAKSKIG YVFIPPAGTSKTDPLTSKIGFRATGKYYKQLGYLLNKSDLYVERNGVIGKINSDVAA AINILLKGVNHSIVPYRFLNKHAGSEQKRLKRFEGEIGCDLKKMPKTRLYYLDGKIIT EEEKNSLEEALLAEIKHFLLSKANIPEFDAIPGSNGTCKAFSVPRCLKSRADPKKKRK V 30 AminoacidsequenceofCas-1D31-NLSfusionprotein MGPKKKRKVMDYKDHDGDYKDHDIDYKDDDDKNLAILEMREEVADYYADLQAD YRHYLPLMILWGQNGFMEPDETGLNFSAVLPSTAKTRCENRYGAQIAADLKLIADD LPSSLYAIFTNDYQPLTGVKKVGDRWKFVTSLNTVSDEEFAKKVKPLLRDYKGVTE EQWRRFREERVAGKTADQVWDDLDSILRNPTLAALKKIHDKHMFRVKRPESKLYC DALAHmLSSSYDSWKKINEARQLELQAMRDEQQGLLANKTTNSLYNLVCEFADEL ESQNYGLTRRIFFIARKEWEDEKSHYLPVIKLLVTEKYLPLLQADNKTVSPAVKAWD ICSTLKQRSPYIRFVPPTKDYPIPFGQTGRGHKFTLKEEAGKVCFLVPGLTKDEMPLS GSHYFSSLRIDKVGTDFKISFRHKTKIRSKKSARVTVKQPRINGTVTEVGILRRNGKF FLRVAYKIAYPQHNIDLKSYFASAAPSTQKLELLPAQMRVAGVDLNISDPIVVSKADI FKGEDGGPLSVLDYGSGKVVGEPVIICKDTSRAGKISSLATKCRTLRDVIRNFRHSLH QGTPLPSKSLEFLQEVPVAQEDLKSPSPRYLIQTWIKYIRTEMKRLHYKVWLDGYCH TSEAMRmLDLVDQVAFLMKSYENIHVKPGHKIPKKTTASLKIRETSRTRFRLHVSRL VGARVVRACEDCQMIFLEDLSTGFSEGSNNSLARLFSAGQLKNSIVMAAEKVGKAV VFVCKDGTSIKDPVFGLHGLRPSKKTPGIPPDKKRLYVRRDGKIGYINADQAASLNV LLVGLSHSVAKYKFFVKDGKLQDDEEVAAQQGKKPRKIGVRLERLLKMNFDTPAR LFFSLEKDQVVPCYEATETPYTGWVYVYHDRLLTTRQRDQQVNQLAQEVKDLLRD GVKIPKWDVIPDCDNSCLGFSPCLVLDDETAAVSVVSRADPKKKRKV 31 NucleotidesequenceofCas-1systemexpressioncassette ATGGGACCAAAGAAGAAGAGAAAGGTTATGGATTACAAGGACCACGACGGAGA CTATAAAGATCATGACATCGATTATAAGGATGATGATGACAAGATGTCTAAAGA CGGGGACAAAATGAAACACTCCAAGCAAGAAGTTTCCGAGGACAAAGGCGTCA AGAAAGATCCGTCCGTGTTTCGATGCCAGAACCTCGCCATACTCGAGATGAGGG AGGAGGTCGCTGACTATTACGCAGACCTTCAGGCTGACTACCGGCACTACCTGC CTCTGATGATTCTGTGGGGGCAAAATGGCTTTATGGAGCCAGATGAAACGGGCC TCAACTTCAGCGCCGTCCTGCCGTCGACCGCGAAAACACGCTGTGAGAATAGGT ATGGTGCCCAGATTGCAGCCGACCTGAAACTCATCGCAGATGATCTGCCGTCTA GCCTGTATGCTATATTCACGAATGATTACCAGCCTTTAACTGGCGTGAAGAAGGT TGGTGACCGTTGGAAGTTTGTGACTTCATTGAACACCGTGTCGGATGAGGAATTT GCAAAGAAAGTCAAACCCCTCCTTCGCGACTACAAAGGTGTCACAGAGGAGCAG TGGCGTCGTTTCCGTGAGGAACGAGTTGCTGGGAAAACAGCCGACCAGGTTTGG GACGATCTCGATTCCATCTTGAGGAACCCGACGCTTGCCGCCCTCAAAAAGATT CATGACAAGCACATGTTCAGAGTTAAGAGGCCGGAGTCAAAGCTGTATTGTGAT GCATTGGCGCATATGCTCTCATCTTCATATGATTCATGGAAAAAGATCAATGAA GCACGTCAGTTGGAACTACAGGCCATGAGAGACGAACAACAAGGCCTGCTGGC AAATAAAACTACTAACTCTCTTTACAATCTGGTCTGTGAGTTCGCGGATGAGCTT GAATCACAAAACTACGGCCTGACCCGAAGGATATTCTTCATCGCCAGGAAAGAG TGGGAGGATGAAAAGTCGCACTACCTTCCTGTGATCAAACTCTTGGTGACCGAG AAGTATCTTCCACTTCTGCAGGCGGATAACAAAACAGTTTCTCCGGCGGTTAAG GCGTGGGATATTTGCAGCACGTTGAAGCAGCGCAGTCCATATATACGGTTCGTT CCTCCCACAAAAGACTATCCGATTCCATTTGGCCAGACAGGTAGGGGACACAAG TTTACACTCAAGGAGGAGGCCGGGAAGGTTTGCTTCCTGGTTCCAGGACTCACC AAGGATGAAATGCCACTCTCGGGCTCCCACTATTTCTCTAGTCTGCGAATTGACA AGGTCGGCACAGACTTCAAGATCAGCTTCCGCCATAAGACGAAGATTCGAAGCA AGAAGTCAGCTAGAGTGACCGTAAAACAACCTCGGATCAACGGCACCGTTACGG AGGTGGGTATCCTGCGTCGCAACGGCAAGTTTTTCTTACGGGTTGCGTACAAGAT TGCTTACCCGCAACACAACATTGATCTTAAGAGCTACTTCGCCAGCGCCGCCCCT TCCACTCAGAAGCTCGAGCTGCTCCCTGCTCAGATGAGGGTGGCGGGAGTCGAT TTGAACATTTCCGATCCCATCGTCGTATCCAAAGCTGATATTTTTAAGGGAGAGG ATGGAGGGCCATTATCGGTCTTGGACTACGGATCTGGCAAGGTAGTGGGCGAGC CCGTCATCATCTGTAAGGATACAAGTCGCGCAGGTAAAATCTCCTCGCTCGCGA CTAAGTGCCGCACCTTAAGAGACGTCATTCGGAATTTCCGCCACTCTCTCCATCA GGGAACCCCCCTACCCAGTAAGAGCCTGGAGTTTCTGCAGGAGGTGCCAGTGGC CCAGGAGGACCTAAAATCGCCATCCCCCAGGTATCTCATACAGACCTGGATAAA GTATATCAGAACAGAGATGAAGCGCTTGCATTATAAAGTGTGGTTGGACGGCTA CTGCCACACATCTGAAGCTATGCGTATGCTTGATCTAGTTGATCAGGTCGCCTTC CTCATGAAGTCATACGAAAACATACATGTGAAGCCGGGCCATAAAATTCCCAAG AAAACTACCGCGAGTCTCAAGATTAGGGAAACATCCAGGACGAGATTTAGACTG CATGTATCCAGGTTGGTCGGGGCCAGGGTGGTCCGCGCCTGTGAAGATTGTCAG ATGATCTTCCTGGAAGACTTGTCGACCGGCTTCTCAGAAGGTAGCAACAATAGC CTCGCGCGGCTGTTTAGCGCCGGCCAGCTGAAGAATAGTATAGTGATGGCGGCG GAGAAGGTGGGTAAAGCTGTTGTATTTGTCTGCAAGGATGGAACTAGCATAAAG GACCCTGTCTTCGGTTTACACGGCCTTCGGCCGTCGAAAAAGACCCCAGGCATC CCTCCGGATAAAAAAAGACTTTATGTGAGACGGGATGGTAAGATTGGGTACATC AATGCTGATCAAGCTGCAAGTCTGAATGTACTATTGGTAGGTCTGAGCCATAGT GTGGCCAAGTACAAGTTCTTCGTCAAAGATGGTAAGTTGCAGGATGACGAAGAG GTGGCGGCACAACAAGGAAAGAAGCCTAGGAAGATTGGCGTTCGCCTGGAAAG ACTACTAAAGATGAATTTTGATACCCCGGCCCGCCTTTTTTTTAGCCTGGAGAAG GACCAGGTGGTTCCGTGCTACGAGGCGACTGAGACGCCTTACACGGGTTGGGTG TACGTCTACCATGACAGGCTTCTCACAACACGCCAGAGGGACCAACAGGTGAAT CAACTTGCCCAAGAGGTGAAAGACTTACTTAGGGACGGGGTGAAGATCCCTAAA TGGGACGTGATCCCAGACTGCGACAACAGTTGCCTCGGGTTCTCTCCCTGCCTGG TGCTTGACGACGAGACTGCTGCTGTCTCAGTTGTTGGATCTGGGAGCAAGCGGC CCGCAGCAACGAAAAAGGCAGGGCAAGCAAAAAAGAAGAAATGA 32 NucleotidesequenceofCas-2systemexpressioncassette ATGGGTCCGAAAAAGAAGCGGAAAGTTATGGATTATAAGGATCATGATGGCGA CTATAAGGATCATGACATAGATTATAAAGATGATGACGACAAAATGGAATTGAA GACAACTTTTTGTCAGTCACTAAAAATTAAGATCCTCCCCGAAAGCCATAAGTTG TATTACGATGAGCTGCAAGCCGACCTAAAGAAGATGAGAATCTTCTTCGCAACG CACTCTGGGCTCAACTACAGCTACAAGGAAGATGAAAACGATTCGGAATGGAAG ATAGCGTCGTCAAAAAAGAGCACACTCTCACCTGAAGCCCAAAGGCTTATTGAC ATCATCGCCAAAGGCGGTATAGGAGCAGAGCACGCCTACGGAATATTTTACAAG GACAAAAAGGTGTCGGACAAGATCCGGCGAGATAATTCAAACTGCATCTTCGCC ATGCTATTACCTAACCTCGTCAATTTCGAGGAAGAATACAAAAAGAGGAAAAGT AAGCTCCAGCTCACCAAGGAAGAGTGGGAGGAGTACCATAAACGATGTCTTGCA GGCGAGAGCGTGGAAGTCCTGTGGTCCGAGGAGATTGCTCCAAAAATTGGGCCT AACGAAGCGGCGAGACTGGCATCACTCGCGTATCACCAGTTCCCAGATTCTCGT CATCCGGCGGACATTACCCACTGCAGGGCGCTCATGTCTATGGTAGCAGACTCTT TTTGTTCCTGGGTCGAGTGCCACAAGCTGCACCGTGAGGAAACAGAGAAACTCA AGACGAAATTACAGGAGAAGATAGAGAGCTGCAAGGAGGCATACAATCTGCTA CTCGCTTATTCCGAAGACCTGTACAGTAGGAATTATGGTCTGAGTGCCAGGGTTT TGAAGGCACTTATTGACAACAAATCCCGGTGCTGCTTCGACATAGCCTTTCAGAT CGCGGAGAAATATCCTGCACTGATGAAGATTGATGAAAAGACCCTGTTCAAGGC CTACAACGCCCTAAAGCTCCACTGGAAAATAAAACGACGCAAGCCGTTCATCTC GCTGCCCAAGATCGAGAGGGACTACCAGGTGCCCTTTGGACTCACAGGAAGCAG AGGCAAGAAGTTCGACGTCTATGTGGAAAACAAGAATATTGTGGTCGAGATCGA CGGCCAGAAAGTCGAAACCTTCAGCTCTCACTATTTCTCTGATATGCAGATCTCC AAGATTTACGGTGAAAAGAAAAATTTGAAGGGATTCAAGTTGAAATTCCGCCAC AAGCTGAAGGACAAGAAGAAGGAGGTTTATGGAGAGTGGATTGAGGCGGAGCT GAAGGAAATCAAGATTAAGAAAGACATGGAGACAGGCGACTTCTACCTGTATCT CCCATACACTACAACGCATAATGAGAAAAACTTATTGCTCGAGAAGTTCTTTTCC TACGCCGATCCACTTAAGGAGACTATTTTTAAGAACGGAAAGACGGAGCTTCCA AATGAATTTCTAGGCTTCGGCTTCGATTTGAACCTTTCCGACCCGATTGCGATGG CGGTTGCCGAGTTTGTTCGGGACTCATCAGATGGTGAGATTGGCGCCCTCGACTA TGGTCACGGTAAGCTCCTCGACGCTAGTGTTTTGGTCTGCAACTCATCACTATCA AAGCGCATCAATGACCTGGTTGGAGACAATAGAAAGCTGATCCAAGCTATTAGA TCGTACAAGAACTCGCTAGTAACTAAGGAAATGGACGAGGAGAGTGGCGAGTG GCTTAAAAAAGCTCTAGGCAATGCAAAGTACGGCAACCATAGGCATCAGCTTCA ACTGGTCATGTCTAAGCTGAACAAAAAAAGCAAAAAGTACTGGCAGGAATGTCG CAAGAACGGCCACAACGATCTTTCCGAAAATATTGCACTATTAAAGTTGTTGGA CGTGCAGTTCTCTTTGCATAAAAGCTATAACAACATACACATCTTTGATTACAAA AAACAAATTCACAGCCATAAGACTGACTCCAAAAGGGAGAATTTCCGTGAATTT GTGACGAAGCAATTTGCGGCTACCATAGTAAAGCATTGCAAGGAGGTGATGGCT AAATATGGCAGAGAGTACGCCGTTGTCTTTTTGGAGGACCTGGAGATGTACTTT GATGCTGATGCTGATAACAATTCTCTGATCCGGCTTTTTGCACCCGGCCAATTGA AGAAGTACATCGCCTCCGCTCTGGCTAAGAGCAAGATAGGTTATGTGTTCATCC CTCCTGCTGGGACAAGTAAGACCGACCCGTTAACTTCGAAAATTGGATTCAGGG CTACCGGAAAATACTACAAGCAGCTGGGCTATCTGCTTAATAAGTCCGACCTCT ACGTGGAGCGCAACGGGGTTATCGGGAAGATCAACAGTGATGTAGCTGCGGCG ATAAATATTCTCTTAAAGGGCGTGAACCACTCAATCGTGCCGTACCGCTTCCTGA ATAAACATGCTGGGTCTGAACAGAAGCGCCTCAAACGTTTTGAAGGGGAGATCG GTTGTGATCTGAAAAAGATGCCAAAGACACGCCTCTACTATCTTGACGGAAAAA TCATCACTGAAGAGGAGAAGAACAGCCTGGAGGAAGCGCTACTCGCTGAGATC AAGCACTTCCTTCTGTCGAAGGCCAACATTCCGGAGTTTGATGCAATACCGGGG AGCAATGGCACCTGCAAAGCCTTCTCCGTTCCACGCTGCCTTAAGGGCAGCGGG AGCAAGCGGCCCGCCGCGACCAAAAAAGCGGGGCAAGCCAAGAAGAAAAAGTG A 33 NucleotidesequenceofCas-1D31systemexpressioncassette ATGGGACCAAAGAAGAAGAGAAAGGTTATGGATTACAAGGACCACGACGGAGA CTATAAAGATCATGACATCGATTATAAGGATGATGATGACAAGAACCTCGCCAT ACTCGAGATGAGGGAGGAGGTCGCTGACTATTACGCAGACCTTCAGGCTGACTA CCGGCACTACCTGCCTCTGATGATTCTGTGGGGGCAAAATGGCTTTATGGAGCC AGATGAAACGGGCCTCAACTTCAGCGCCGTCCTGCCGTCGACCGCGAAAACACG CTGTGAGAATAGGTATGGTGCCCAGATTGCAGCCGACCTGAAACTCATCGCAGA TGATCTGCCGTCTAGCCTGTATGCTATATTCACGAATGATTACCAGCCTTTAACT GGCGTGAAGAAGGTTGGTGACCGTTGGAAGTTTGTGACTTCATTGAACACCGTG TCGGATGAGGAATTTGCAAAGAAAGTCAAACCCCTCCTTCGCGACTACAAAGGT GTCACAGAGGAGCAGTGGCGTCGTTTCCGTGAGGAACGAGTTGCTGGGAAAACA GCCGACCAGGTTTGGGACGATCTCGATTCCATCTTGAGGAACCCGACGCTTGCC GCCCTCAAAAAGATTCATGACAAGCACATGTTCAGAGTTAAGAGGCCGGAGTCA AAGCTGTATTGTGATGCATTGGCGCATATGCTCTCATCTTCATATGATTCATGGA AAAAGATCAATGAAGCACGTCAGTTGGAACTACAGGCCATGAGAGACGAACAA CAAGGCCTGCTGGCAAATAAAACTACTAACTCTCTTTACAATCTGGTCTGTGAGT TCGCGGATGAGCTTGAATCACAAAACTACGGCCTGACCCGAAGGATATTCTTCA TCGCCAGGAAAGAGTGGGAGGATGAAAAGTCGCACTACCTTCCTGTGATCAAAC TCTTGGTGACCGAGAAGTATCTTCCACTTCTGCAGGCGGATAACAAAACAGTTTC TCCGGCGGTTAAGGCGTGGGATATTTGCAGCACGTTGAAGCAGCGCAGTCCATA TATACGGTTCGTTCCTCCCACAAAAGACTATCCGATTCCATTTGGCCAGACAGGT AGGGGACACAAGTTTACACTCAAGGAGGAGGCCGGGAAGGTTTGCTTCCTGGTT CCAGGACTCACCAAGGATGAAATGCCACTCTCGGGCTCCCACTATTTCTCTAGTC TGCGAATTGACAAGGTCGGCACAGACTTCAAGATCAGCTTCCGCCATAAGACGA AGATTCGAAGCAAGAAGTCAGCTAGAGTGACCGTAAAACAACCTCGGATCAAC GGCACCGTTACGGAGGTGGGTATCCTGCGTCGCAACGGCAAGTTTTTCTTACGG GTTGCGTACAAGATTGCTTACCCGCAACACAACATTGATCTTAAGAGCTACTTCG CCAGCGCCGCCCCTTCCACTCAGAAGCTCGAGCTGCTCCCTGCTCAGATGAGGG TGGCGGGAGTCGATTTGAACATTTCCGATCCCATCGTCGTATCCAAAGCTGATAT TTTTAAGGGAGAGGATGGAGGGCCATTATCGGTCTTGGACTACGGATCTGGCAA GGTAGTGGGCGAGCCCGTCATCATCTGTAAGGATACAAGTCGCGCAGGTAAAAT CTCCTCGCTCGCGACTAAGTGCCGCACCTTAAGAGACGTCATTCGGAATTTCCGC CACTCTCTCCATCAGGGAACCCCCCTACCCAGTAAGAGCCTGGAGTTTCTGCAG GAGGTGCCAGTGGCCCAGGAGGACCTAAAATCGCCATCCCCCAGGTATCTCATA CAGACCTGGATAAAGTATATCAGAACAGAGATGAAGCGCTTGCATTATAAAGTG TGGTTGGACGGCTACTGCCACACATCTGAAGCTATGCGTATGCTTGATCTAGTTG ATCAGGTCGCCTTCCTCATGAAGTCATACGAAAACATACATGTGAAGCCGGGCC ATAAAATTCCCAAGAAAACTACCGCGAGTCTCAAGATTAGGGAAACATCCAGGA CGAGATTTAGACTGCATGTATCCAGGTTGGTCGGGGCCAGGGTGGTCCGCGCCT GTGAAGATTGTCAGATGATCTTCCTGGAAGACTTGTCGACCGGCTTCTCAGAAG GTAGCAACAATAGCCTCGCGCGGCTGTTTAGCGCCGGCCAGCTGAAGAATAGTA TAGTGATGGCGGCGGAGAAGGTGGGTAAAGCTGTTGTATTTGTCTGCAAGGATG GAACTAGCATAAAGGACCCTGTCTTCGGTTTACACGGCCTTCGGCCGTCGAAAA AGACCCCAGGCATCCCTCCGGATAAAAAAAGACTTTATGTGAGACGGGATGGTA AGATTGGGTACATCAATGCTGATCAAGCTGCAAGTCTGAATGTACTATTGGTAG GTCTGAGCCATAGTGTGGCCAAGTACAAGTTCTTCGTCAAAGATGGTAAGTTGC AGGATGACGAAGAGGTGGCGGCACAACAAGGAAAGAAGCCTAGGAAGATTGGC GTTCGCCTGGAAAGACTACTAAAGATGAATTTTGATACCCCGGCCCGCCTTTTTT TTAGCCTGGAGAAGGACCAGGTGGTTCCGTGCTACGAGGCGACTGAGACGCCTT ACACGGGTTGGGTGTACGTCTACCATGACAGGCTTCTCACAACACGCCAGAGGG ACCAACAGGTGAATCAACTTGCCCAAGAGGTGAAAGACTTACTTAGGGACGGGG TGAAGATCCCTAAATGGGACGTGATCCCAGACTGCGACAACAGTTGCCTCGGGT TCTCTCCCTGCCTGGTGCTTGACGACGAGACTGCTGCTGTCTCAGTTGTTGGATC TGGGAGCAAGCGGCCCGCAGCAACGAAAAAGGCAGGGCAAGCAAAAAAGAAG AAATGA 34 PAMlibrarysequence NNNNNNNNGGTATAACAACTTCGACGAGCTCTACA 35 TargetsequencesforPAMrecognitionsiteidentificationofCas-1 GGUAUAACAACUUCGACGAGCUCUACA 36 GuidesequenceofinvitroenzymedigestionofCas-1 GGUAUAACAACUUCGACGAGCUCUACA 37 GuidesequenceofCas-1inmaize CGGUGGGCUGGCGCUGGGGUUCAGCU 38 Guidesequenceofsg1ofCas-1inhumancells GAGCCAGAGAGGAUCCUGGGAGGGAG 39 Guidesequenceofsg2ofCas-1inhumancells UGACUUUGUCACAGCCCAAGAUAGUU 40 Guidesequenceofsg3ofCas-1inhumancells AAACCCAGACACAUAGCAAUUCAGGA 41 Guidesequenceofsg4ofCas-1inhumancells CUGAGGGGCUGCUGGUUUGGCUGGUG 42 Guidesequenceofsg5ofCas-1inhumancells GAGAUGCCAGCAGAAGUUGGGCAGAA 43 Guidesequenceofsg6ofCas-1inhumancells GGGCAGAGUGGAGAUGGUGGGGACAA 44 Guidesequenceofsg7ofCas-1inhumancells ACUAGGGUGGGCAACCACAAACCCAC 45 Guidesequenceofsg8ofCas-1inhumancells UGUACAGAAGGCUGAAAGGAGAGAAC 46 Guidesequenceofsg9ofCas-1inhumancells CGGUACCAGUUUAGCACGAAGCUCUC 47 Guidesequenceofsg10ofCas-1inhumancells CCACCAUUGUCUUUCCUAGCGGAAUG 48 Guidesequenceofsg11ofCas-1inhumancells AAGCACUGUGGGUACGAAGGAAAUGA 49 Guidesequenceofsg12ofCas-1inhumancells UGUCACAAAGUAAGGAUUCUGAUGUG 50 Guidesequenceofsg13ofCas-1inhumancells CUGUUGUUGAAGGCGUUUGCACAUGC 51 Guidesequenceofsg14ofCas-1inhumancells UCUGCAGGCCAGAUGAGGGCUCCAGA 52 GuidesequenceofAAVS1-ATGPAMinCas-1humancells GAGCCAGAGAGGAUCCUGGGAGGGAG 53 GuidesequenceofAAVS1-ATAPAMinCas-1humancells GACCACUGUGUGGGGGUAAAGGACCU 54 GuidesequenceofAAVS1-ACAPAMinCas-1humancells CCCCCAUUUCCUGGAGCCAUCUCUCU 55 GuidesequenceofAAVS1-GCAPAMinCas-1humancells AACCUUAGAGGUUCUGGCAAGGAGAG 56 GuidesequenceofAAVS1-GTAPAMinCas-1humancells AGCAAACCUUAGAGGUUCUGGCAAGG 57 GuidesequenceofAAVS1-ACGPAMinCas-1humancells AUGGAGCCAGAGAGGAUCCUGGGAGG 58 GuidesequenceofAAVS1-GTGPAMinCas-1humancells GGAGGGAAGGGGGGGAUGCGUGACCU 59 GuidesequenceofAAVS1-GCGPAMinCas-1humancells UGACCUGCCCGGUUCUCAGUGGCCAC 60 GuidesequenceofCas-1T26 GAGCCAGAGAGGAUCCUGGGAGGGAG 61 GuidesequenceofCas-1T24 GAGCCAGAGAGGAUCCUGGGAGGG 62 GuidesequenceofCas-1T22 GAGCCAGAGAGGAUCCUGGGAG 63 GuidesequenceofCas-1T21 GAGCCAGAGAGGAUCCUGGGA 64 GuidesequenceofCas-1T20 GAGCCAGAGAGGAUCCUGGG 65 GuidesequenceofCas-1T19 GAGCCAGAGAGGAUCCUGG 66 GuidesequenceofCas-1T18 GAGCCAGAGAGGAUCCUG 67 GuidesequenceofCas-1T16 GAGCCAGAGAGGAUCC

Specific Models for Carrying Out the Invention

[0216] The present invention is now described with reference to the following examples which are intended to illustrate the present invention (but not to limit the present invention).

[0217] Unless otherwise specified, the experiments and procedures described in the examples were basically performed according to the methods known in the art and using conventional methods described in various references. For example, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA used in the present invention can be found in Sambrook, Fritsch, and Maniatis, MOLECULAR CLONING: A LABORATORY MANUAL, 2nd ed. (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, edited by F. M. Ausubel et al. (1987)); METHODS IN ENZYMOLOGY series, Academic Press: PCR 2: A PRACTICAL METHOD APPROACH, edited by M. J. MacPherson, B. D. Hames, and G. R. Taylor (1995); ANTIBODIES, A LABORATORY MANUAL, edited by Harlow and Lane (1988); and, ANIMAL CELL CULTURE, edited by R. I. Freshney (1987).

[0218] In addition, when specific conditions were not specified in the examples, they were carried out under conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used without indicating the manufacturer were all conventional products that could be obtained commercially. It is known to those skilled in the art that the examples describe the present invention by way of example, and are not intended to limit the scope sought to be protected by the present invention. All publications and other references mentioned herein are incorporated herein by reference in their entirety.

[0219] The sources of some reagents involved in the following examples were as follows:

[0220] LB liquid culture medium: 10 g of tryptone, 5 g of yeast extract, 10 g of NaCl, diluted to 1 L, and sterilized. If the addition of an antibiotic was required, it was added after the culture medium was cooled down, and its final concentration was 50 g/mL.

[0221] Chloroform/isoamyl alcohol: 240 mL of chloroform was added with 10 mL of isoamyl alcohol, and mixed well.

[0222] RNP buffer: 100 mM sodium chloride, 50 mM Tris-HCl, 10 mM MgCl.sub.2, 100 g/mL BSA, pH 7.9.

[0223] Prokaryotic expression vectors pET-30a, pUC19, and pACYCDuet-1 were purchased from Beijing Quanshijin Biotechnology Co., Ltd.

[0224] Escherichia coli competent TSC-E03 was purchased from Beijing Qingke Biotechnology Co., Ltd.

Example 1. Acquisition of Cas Sequences and Cas Guide RNA

[0225] 1. Annotation of CRISPR and genes: Prodigal was used to perform the gene annotation of the data of microbial genome and metagenome of NCBI and JGI databases to obtain all proteins, and Piler-CR was used to perform the annotation of CRISPR loci, and the parameters were all default parameters.

[0226] 2. Protein filtering: The annotated proteins were subjected to redundancy removal through sequence consistency so as to remove proteins with completely identical sequences.

[0227] 3. Acquisition of CRISPR-related proteins: Each CRISPR locus was extended by 10 Kb upstream and downstream, and the non-redundant proteins in the CRISPR adjacent interval were identified.

[0228] 4. Clustering of CRISPR-related proteins: BLASTP was used to perform internal pairwise alignment of non-redundant CRISPR-related proteins, and the alignment results with Evalue<1E10 were outputted. MCL was used to perform clustering analysis on the output results of BLASTP, CRISPR-related protein families.

[0229] 5. Identification of CRISPR-enriched protein families: BLASTP was used to align the proteins of the CRISPR-related protein families to the non-redundant proteins databases from which the non-CRISPR-related proteins were removed, and the alignment results with Evalue<1E10 were outputted. If the homologous proteins found in a non-CRISPR-related protein database were less than 100%, it meant that the proteins of this family were enriched in the CRISPR region. In this way, the CRISPR-enriched protein families were identified.

[0230] 6. Annotation of protein functions and domains: The CRISPR-enriched protein family was annotated using the Pfam database, the NR database, and the Cas proteins collected from NCBI to obtain a new CRISPR/Cas protein family. Multiple sequence alignment of each CRISPR/Cas family protein was performed using Mafft, and then conserved domain analysis was performed using JPred and HHpred to identify the protein family containing RuvC domain.

[0231] On this basis, the inventors obtained some new Cas effector proteins, which were named Cas-1 and Cas-2, respectively, the sequences of the proteins were set forth in SEQ ID NO: 1 and SEQ ID NO: 2, and the nucleotide sequences encoding the proteins were as set forth in SEQ ID NO: 4 and SEQ ID NO: 5. The direct repeat sequences (the repeat sequences contained in pre-crRNA) corresponding to Cas-1 and Cas-2 were set forth in SEQ ID NO: 7 and SEQ ID NO: 8.

Example 2. Description of Sequence Structure of Cas Gene

[0232] 1. The CRISPR/Cas sequence fragment was synthesized by Beijing Qingke Biotechnology Co., Ltd. and constructed into the protein expression vector pET-30a (+), and the first generation sequencing was performed for confirmation. According to the sequencing results, the recombinant plasmid pET-30a+CRISPR/Cas was described as follows:

[0233] (1) The recombinant plasmid pET-30a+CRISPR/Cas-1 contained an expression cassette, and the expression cassette sequence was set forth in SEQ ID NO: 31. In the sequence as set forth in SEQ ID NO: 31, from the 5 end, positions 1 to 27 were the nucleotide sequence of SV40-NLS, positions 28 to 96 were the nucleotide sequence of 3FLAG, positions 97 to 2904 are were nucleotide sequence of Cas-1, and positions 2905 to 2964 were the nucleoplasmin NLS signal peptide.

[0234] (2) The recombinant plasmid pET-30a+CRISPR/Cas-2 contained an expression cassette, and the expression cassette sequence was set forth in SEQ ID NO: 32. In the sequence as set forth in SEQ ID NO: 32, from the 5 end, positions 1 to 27 were the nucleotide sequence of SV40-NLS, positions 28 to 96 were the nucleotide sequence of 3FLAG, positions 97 to 2697 were the nucleotide sequence of Cas-2, and positions 2698 to 2757 were the nucleoplasmin NLS signal peptide.

Example 3. Identification of PAM and DNA Cleavage Mode of CRISPR/Cas System

I. In Vitro Expression and Purification of Cas Protein

[0235] The specific steps of in vitro expression and purification of Cas protein were as follows:

[0236] 1. Artificial synthesis of nucleotide sequences as set forth in SEQ ID NOs: 31 to 32.

[0237] 2. The recombinant plasmids pET-30a-CRISPR/Cas-1 and pET-30a-CRISPR/Cas-2 were introduced into E. coli TSC-E03 to obtain recombinant bacteria, and the recombinant bacteria were named TSC-E03-CRISPR/Cas-1 and TSC-E03-CRISPR/Cas-2. The single clones of TSC-E03-CRISPR/Cas-1 and TSC-E03-CRISPR/Cas-2 were picked out, inoculated into 100 mL of LB liquid culture medium (containing 50 g/mL kanamycin), and cultured under shaking at 37 C. and 200 rpm for 12 h to obtain culture solutions.

[0238] 3. The culture solutions were taken and inoculated into 50 mL of LB liquid culture medium (containing 50 g/mL kanamycin) at a volume ratio of 1:100, cultured under shaking at 37 C. and 200 rpm until the OD.sub.600 nm value was 0.6, then IPTG was added to have a concentration of 1 mM, cultured under shaking at 18 C. and 220 rpm for 14 h, and centrifuged at 4 C. and 7000 rpm for 10 min to obtain bacterial precipitates.

[0239] 5. The bacterial precipitates were taken, added with 100 mL of pH 8.0, 100 mM Tris-HCl buffer, resuspended and ultrasonically disrupted (ultrasonic power was 600 W, and cycle program was: disruption 4 s, stop 6 s, total 20 min), and then centrifuged at 4 C., 10000 rpm for 10 min to collect Supernatant A.

[0240] 6. Supernatant A was taken, centrifuged at 4 C., 12000 rpm for 10 min to collect Supernatant B.

[0241] 7. The nickel column produced by GE was used to purify Supernatant B (referring to the instructions of the nickel column for the specific steps of purification), and then the protein quantification kit produced by Thermo Fisher was used to quantify Cas-1 to Cas-3 proteins.

II. Transcription and Purification of Cas Protein Guide RNA:

[0242] 1. The templates for guide RNA transcription were designed respectively. The structure of the transcription templates were: (1) T7 promoter+direct repeat sequence of Cas-1 and Cas-2 (SEQ ID NOs: 7 to 8)+guide sequence (SEQ ID NO: 36). The primers were designed using Primer5.0 software to ensure that the Forward primer and Reward primer had at least 18 bp of overlapping sequence.

[0243] 2. The following reaction system was prepared, gently blown and beaten and mixed well, then centrifuged briefly, and placed in a PCR instrument for slow annealing. The PCR system was as follows:

TABLE-US-00002 Component Volume (L) Forward Primer (100 nM) 7.5 Reward Primer (100 nM) 7.5 2*KAPA Mix 25 ddH.sub.2O 10 Total volume 50

[0244] 3. MinElute PCR Purifcation Kit was used to purify the template, and the steps were as follows: [0245] 1) The PCR product was added with PB of 5 times volume, and a MinElute column was placed on a 2 mL collection tube, allowed to stand at room temperature for 2 min, and centrifuged at 12000 g for 1 min; [0246] 2) The waste liquid was discarded, and 750 L of Buffer PE (ethanol was added before use) was added and centrifuged at 12000 g for 1 min; [0247] 3) The waste liquid was discarded, 350 L of Buffer PE was added and centrifuged at 12000 g for 1 min, then the waste liquid was added, and centrifugation was performed at 12000 g for 2 min; [0248] 4) The MinElute column was placed on a new 1.5 mL centrifuge tube, the lid was opened, and standing was performed at 65 C. for 2 min; [0249] 5) 20 L of preheated EB solution was added, allowed to stand for 2 min, and centrifuged at 12000 g for 2 min. In order to improve the recovery rate, the content of the centrifuge tube could pass through the MinElute centrifuge column 2 to 3 times; [0250] 6) The template was measured for concentration by Nanodrop, and frozen at 20 C. for later use.

[0251] 4. Purification of guide RNA: DNaseI in the system was extracted and removed with phenol: chloroform: isoamyl alcohol (25:24:1); [0252] 1) 80 L of RNA free H.sub.2O was added to the post-transcription reaction system to adjust the volume to 100 L; [0253] 2) 2 mL of Phase Lock Gel (PLG) Heavy was taken out, centrifuged at 15000 g for 2 min, and added with 100 L of phenol: chloroform: isoamyl alcohol (25:24:1), and 100 L of RNA digested with DNAseI, and the Phase-Lock tube was gently flicked 5 to 10 times by hand to mix evenly, and then centrifuged at 15 C. and 16000 g for 12 min; [0254] 3) A new RNA-free 1.5 mL centrifuge tube was taken, the supernatant was pipetted from the previous centrifugation and added to the centrifuge tube without pipetting the gel, then added with isopropanol of the same volume as the supernatant and sodium acetate solution of the one-tenth the volume, mixed well with a pipette tip, and placed into a 20 C. refrigerator for 1 h or overnight; [0255] 4) Centrifugation was performed at 4 C., 16000 g for 30 min, the supernatant was discarded, 75% pre-cooled ethanol was added, the precipitate was mixed well by pipetting, and centrifuged at 4 C., 16000 g for 12 min, the supernatant was discarded, then it was allowed to stand in a fume hood for 2 to 3 min, the ethanol on RNA surface was dried in the air, 100 L of RNA free H.sub.2O was added, and mixed well by pipetting.

[0256] 5. The purified crRNA was measured for concentration by Nanodrop, and uniformly diluted to 250 ng/L, divided into 200 L PCR centrifuge tubes, and frozen at 80 C. for later use.

III. Cas Protein In Vitro Enzyme Digestion and PAM Consumption:

1. Establishment of Double-Stranded DNA Enzyme Digestion System:

[0257] (1) The following reaction system was prepared, gently pipetted and mixed well, and then centrifuged briefly. It was placed at 37 C. for 15 min; and the DNA cleavage reaction system was as follows:

TABLE-US-00003 Component Sample amount 12-crRNA (250 ng/L) 600 ng 12 protein (0.5 g/L) 0.5 g 10*DNA Cleavage buffer 1 L RNA-Free H.sub.2O Supplemented to 7 L [0258] (2) 300 ng of substrate DNA (100 ng/L), 3 L, was added, gently pipetted to mix well and then centrifuged briefly. It was placed at 37 C. for 8 h; [0259] (3) RNAse was added, placed at 37 C. for 15 min to fully digest the RNA impurities in the system; [0260] (4) Proteinase K was added, placed at 58 C. for 15 min to digest Cas-1 to 3 proteins; [0261] (5) Detection was performed by running agarose gel.

[0262] The gel results showed that Cas-1 was capable of effectively cleaving double-stranded DNA.

2. Identification of PAM Site:

[0263] (1) The reaction system as in step 6 above was prepared, the substrate DNA was replaced with a plasmid library with 8 random bases before target, and placed at 37 C. for 8 h, and the secondary control sample was a sample with Cas added but no crRNA added. Three repeats were set for each protein; [0264] (2) After the reaction, the reaction sample was subjected to column purification, and the purified product was used as a template to construct the second-generation library. The system and method for library construction were the same as the library construction method in step 2 of PAM library consumption in Escherichia coli. The specific operation process was as follows:

[0265] (Each sample corresponded to one R-directed primer, and corresponded to multiple F-directed primers), the following reagents were prepared:

TABLE-US-00004 Reagent Usage amount Template 20 ng High-fidelity PCR mix 20 L NGS-Lib-Fwd-1-10 2 L NGS-Lib-Rev 2 L distilled water Supplemented to 40 L

[0266] The prepared reaction system was loaded in a PCR instrument, and the program was as follows:

TABLE-US-00005 Temperature Time 98 C. 3 min 98 C. 15 s 60 C. 30 s 72 C. 20 s Go to step 2 20 cycles 72 C. 5 min 10 C. forever [0267] Sequencing 1G for each sample; [0268] (3) The numbers of occurrences of the combined PAM sequences in the experimental group and the control group were counted, respectively, and standardized with the number of all PAM sequences in each group. For any PAM sequence, when log 2 (normalized value of the control group/normalized value of the experimental group) was greater than 3.5, it was believed that this PAM was significantly consumed. The significantly consumed PAM sequences were obtained from all PAM sequences. In addition, Weblogo was used to predict the significantly consumed PAM sequences, and finally the PAM domains of Cas were obtained (FIG. 1A and FIG. 1B). [0269] (4) Verification of PAM library domains: Through the PAM library consumption experiment, we obtained the PAM domain of Cas-1. In order to verify the rigor of this domain, we set up ATG PAM for in vivo experiments to test the editing activity of Cas-1 on this PAM. First, we integrated the 26 nt target of the T7 promoter with the corresponding PAM site and the sequence of the T7 terminator into the vector pET30a-Cas-1, which was then co-transfected with the pACYCDuet-1 plasmid and coated on kanamycin and chloramphenicol resistance plates for screening. The monoclonal plaques with double resistance were selected for shaking bacteria, and IPTG induction was performed for 12 hours at an OD value of 1.0. Then, the bacteria before and after induction were observed by gradient dilution. If the chloramphenicol gene was edited, the growth on the chloramphenicol resistance plate was poor. Through the experimental results (FIG. 2 and FIG. 3), we could see that CRISPR/Cas-1 could only effectively edit target sequences with specific PAM domains (e.g., ATG), but had no editing activity on the rest of the target sequences (e.g., CCC), thus verifying the accuracy of Cas-1 for recognition of PAM domains. Through the above experimental results, it was confirmed that Cas-1 had a rigorous PAM recognition mode, so Cas-1 could significantly reduce off-target effects. [0270] (5) Analysis of dsDNA cleavage mode of Cas-1 protein:

[0271] In order to further determine the cleavage site and substrate cleavage mode of Cas-1 protein when targeting dsDNA, the cut bands after enzymatic digestion of Cas-1 protein-targeted substrate dsDNA were individually cut and recovered. Since the dsDNA fragments were designed with the 26 nt target site at a greater distance from the 5 end than from the 3 end, bands with larger fragment lengths were subjected to the first-generation sequencing using F-direct primers that amplified linear dsDNA fragments, while the recovered bands with shorter fragment lengths were sequenced with R-direct primers as amplified. By aligning the results of the first-generation sequencing to the reference dsDNA sequence, the cleavage site of the Cas-1 protein for the substrate was located at 18 nt and 19 nt far from the PAM, thereby forming a sticky end with a 1 nt protrusion (FIG. 3).

Example 4. Analysis of Trans-Cleavage Activity of CRISPR/Cas System

[0272] A single-stranded or double-stranded DNA without SEQ ID NO: 36 was designed as the enzyme substrate for trans-cleavage activity analysis; a single-stranded or double-stranded DNA containing SEQ ID NO: 36 was used as the targeted substrate, and the purified protein and the corresponding crRNA were prepared into the following reaction system as shown in the table below.

TABLE-US-00006 Component Volume Cas protein 1 L (60 nM) crRNA 1 L (120 nM) ssDNA 1 L (30 nM) 10X ssDNA Cleavage Buffer 2 L

[0273] The above reaction system was placed in a PCR instrument and incubated at 25 C. for 10 min; then 2 L of single-stranded DNA without target sequence was added and incubated at 37 C. for 1 h; 1 L of RNase A was added and incubated at 37 C. for 30 min; 1 L of Proteinase K was added and incubated at 55 C. for 30 min; 4 L of 6DNA loading buffer was added, and detection was performed by running 1.5% agarose gel. If the targeted substrate was dsDNA, it was necessary to synthesize a primer of about 90 nt without target sequence, which was diluted to 1 M, packaged and stored at 20 C. as a random substrate in the trans-cleavage experiment. The dsDNA fragments in the in vitro dsDNA enzyme cleavage experiment were used as the targeted substrate, and the purified proteins and corresponding crRNA were prepared into the reaction system as shown in the following table.

TABLE-US-00007 Component Volume Cas protein 1 L (200 nM) crRNA 1 L (800 nM) dsDNA 1 L (30 nM) 10X ssDNA Cleavage Buffer 2 L Nuclease-free water Supplemented to 18 L

[0274] The above reaction system was placed in a PCR instrument and incubated at 37 C. for 15 min; (1) 2 L (100 nM) of diluted random substrate ssDNA was added and incubated at 37 C. for 2 h; (2) 1 L of RNase A was added and incubated at 37 C. for 30 min; (3) 1 L of Proteinase K was added and incubated at 55 C. for 30 min; (4) 2RNA Loading Buffer was added, and detection was performed by running 10% nucleic acid denaturing acrylamide gel.

[0275] The experimental results were shown in FIG. 4A and FIG. 4B, and the experimental results confirmed that after the trans-cleavage activity of Cas-1 protein was activated, both single-stranded DNA and double-stranded DNA substrates could be cleaved.

Example 5. Detection of Editing Activity of Cas in Human Cells

[0276] In this example, 14 target sites, i.e., sg1 to sg14, were selected from the genome sequence of human HELA cells, and 14 guide sequences were designed for these target sites (the sequences were set forth in SEQ ID NO: 38 to SEQ ID NO: 51, respectively). Further, the eukaryotic expression vector containing Cas-1 gene and the expression vector containing U6 promoter and guide RNA (containing the direct repeat sequence as set forth in SEQ ID NO: 7 and the editing guide sequence in human cells containing SEQ ID NO: 38 to SEQ ID NO: 51) were transferred into human HELA cells by liposome transfection, and cultured at 37 C. and 5% carbon dioxide concentration for 72 hours. DNA from all cells was extracted, and the sequences containing 200 bp of the target site were amplified. The PCR products were sent to Beijing Jiyinjia Medical Testing Laboratory Co., Ltd. for second-generation library construction and sequencing.

[0277] The sequencing results were statistically analyzed by bioinformatics and the original target sequence. The results showed that Cas-1 could effectively edit all 14 target sites, and the editing efficiency could reach up to about 60% (FIG. 5).

Example 6. Analysis of Editing Activity of Different PAM Recognition Sites of Cas-1 in HELA Cells

[0278] The PAM site recognized by Cas-1 was identified as RYR (where R was A or G, and Y was T or C) by the experimental method of Example 3. In order to further confirm whether all recognized PAM sites had editing efficiency in eukaryotes, their editing efficiency was verified in HELA cells. Eight guide sequences were designed for these PAM recognition sites (the sequences were set forth in SEQ ID NO: 52 to SEQ ID NO: 59, respectively). The target sequences of different PAM recognition sites were selected within the 200 bp range of the AAVS1 gene, and the expression vector containing U6 promoter and guide RNA (containing the direct repeat sequence as set forth in SEQ ID NO: 7 and the editing guide sequence in human cells containing SEQ ID NO: 52 to SEQ ID NO: 59) was transferred into human HELA cells by liposome transfection and cultured at 37 C. and 5% carbon dioxide concentration for 72 hours.

[0279] Through the second-generation library construction, second-generation sequencing and bioinformatics analysis of the transformed samples, it was found that the RYR PAM recognition sites of Cas-1 all had different degrees of editing efficiency, among which the PAM recognition sites of ATG and ACG had higher editing efficiency (FIG. 6A and FIG. 6B).

Example 7. Analysis of Editing Activity of Cas-1 with Different Repeat and Target Lengths

[0280] In order to further verify the effects of direct repeat sequences (Repeat) and guide sequences (Target) with different lengths on editing efficiency, we selected Repeat (SEQ ID NO: 7 or 9 to 16) and Target (SEQ ID NO: 38) sequences with different lengths for combination, and Target (SEQ ID NO: 60 to 67) and Repeat (SEQ ID NO: 7) sequences with different lengths for combination, and confirmed their differences in editing efficiency in eukaryotes by animal cell transfection. The specific experimental procedures and steps were referred to Example 5.

[0281] Through bioinformatics analysis, it was found that Repeat still retained a certain editing activity when 18 bases or less were cut off from the 5 end, and Target had editing activity in the range of 16 to 26 bases (FIG. 7).

Example 8. Analysis of Editing Activity of Cas-1 in Stable Genetic Transformation of Maize

[0282] The target site of maize ZmGL2 gene was designed (wherein, the direct repeat sequence of Cas-1 protein was set forth in SEQ ID NO: 7, and the guide sequence was set forth in SEQ ID NO: 37), and the crRNA sequence was ligated to the expression vector of OsU3 promoter, and the Ubi promoter and Cas-1 protein coding sequence were constructed into the same expression vector (FIG. 8A). Positive plants were obtained by genetic transformation screening of maize callus, and the first-generation sequencing experimental method was used to analyze whether the ZmGL2 target site was edited, and a total of 6 mutants of different deletion types were obtained (FIG. 8B).

[0283] The above experiments confirmed that the Cas-1 protein successfully edited the target site in maize.

Example 9. Analysis of Editing Activity of Cas-1 Truncated Protein

[0284] In order to further reduce the protein size and improve the delivery efficiency, we deleted the 31 amino acids at the N-terminal of Cas-1 to obtain a truncated form of Cas-1, and named the truncated form Cas-1D31 (the amino acid sequence was set forth in SEQ ID NO: 3, and the nucleotide sequence was set forth in SEQ ID NO: 6). Furthermore, the amino acid sequence of the Cas-1D31-NLS fusion protein as set forth in SEQ ID NO: 30 was synthesized based on its amino acid sequence, and constructed into the pET30a vector by homologous recombination experimental method. The steps of prokaryotic protein purification and in vitro enzyme digestion experiment were the same as those described in Example 3.

[0285] The results of in vitro enzyme digestion verification showed that Cas-1D31 still had double-stranded DNA cleavage activity (FIG. 9).

[0286] Although the specific models of the present invention have been described in detail, those skilled in the art will understand that various modifications and changes can be made to the details based on all the teachings that have been disclosed, and these changes are within the scope of protection of the present invention. All of the invention is given by the appended claims and any equivalents thereof.

Novel CRISPR-Cas delta enzyme and system

Inventors

Cpc classification

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/111

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/40

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/09

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6888

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/222

CHEMISTRY; METALLURGY

Classification Explorer

G01N2333/922

PHYSICS

Classification Explorer

C12Q1/34

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6823

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6886

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6823

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6888

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6886

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/34

CHEMISTRY; METALLURGY

Abstract

Claims

Description