CRISPR/CAS12J ENZYME AND SYSTEM

20220002691 · 2022-01-06

Inventors

Cpc classification

International classification

Abstract

Provided are a Cas effector protein, a fusion protein containing said protein, and a nucleic acid molecule coding same. Also provided are a complex and a composition for nucleic acid editing, for example, a complex and a composition for gene or genome editing, containing the Cas effector protein or the fusion protein, or the nucleic acid molecule encoding same. Also provided is a method for nucleic acid editing, for example, a method for gene or genome editing, using the Cas effector protein or the fusion protein.

Claims

1. A protein having an amino acid sequence as shown in any one of SEQ ID NOs: 1-20, 107, and 108 or an ortholog, homolog, variant or functional fragment thereof; wherein, the ortholog, homolog, variant or functional fragment substantially retains the biological function of the sequence from which it is derived; for example, the ortholog, homolog, or variant has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared to the sequence from which it is derived; for example, the ortholog, homolog, or variant has at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity compared with the sequence as shown in any one of SEQ ID NOs: 1-20, 107, 108, and substantially retains the biological functions of the sequence from which it is derived; for example, the protein is an effector protein in the CRISPR/Cas system.

2. The protein of claim 1, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in any one of SEQ ID NOs: 1-20, 107, 108; (ii) compared with the sequence as shown in any one of SEQ ID NOs: 1-20, 107, 108, a sequence having one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions); or (iii) a sequence having at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with the sequence as shown in any one of SEQ ID NOs: 1-20, 107, and 108; for example, the protein has an amino acid sequence as shown in any one of SEQ ID NOs: 1-20, 107, and 108.

3. The protein of claim 1 or 2, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in SEQ ID NO: 17; (ii) compared with the sequence as shown in SEQ ID NO: 17, a sequence having one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions); or (iii) a sequence having at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with the sequence as shown in SEQ ID NO: 17; for example, the protein has an amino acid sequence as shown in SEQ ID No: 17.

4. The protein of claim 1 or 2, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in SEQ ID NO: 2; (ii) compared with the sequence as shown in SEQ ID NO: 2, a sequence having one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions); or (iii) a sequence having at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with the sequence as shown in SEQ ID NO: 2; for example, the protein has an amino acid sequence as shown in SEQ ID No: 2.

5. The protein of claim 1 or 2, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in SEQ ID NO: 22; (ii) compared with the sequence as shown in SEQ ID NO: 22, a sequence having one or more amino acid substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 amino acid substitutions, deletions or additions); or (iii) a sequence having at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with the sequence as shown in SEQ ID NO: 22; for example, the protein has an amino acid sequence as shown in SEQ ID No: 22.

6. A conjugate comprising the protein of any one of claims 1-5 and a modified portion; for example, the modified portion is selected from an additional protein or polypeptide, a detectable label, and any combinations thereof; for example, the modified portion is optionally connected to the N-terminus or C-terminus of the protein through a linker; for example, the modified portion is fused to the N-terminus or C-terminus of the protein; for example, the additional protein or polypeptide is selected from an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcription activation domain (such as, VP64), a transcription repression domain (for example, KRAB domain or SID domain), a nuclease domain (for example, Fok 1), a domain having an activity selected from: nucleotide deaminase, methylase activity, demethylase, transcription activation activity, transcription inhibition activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity and nucleic acid binding activity; and any combinations thereof; for example, the conjugate comprises an epitope tag; for example, the conjugate comprises an NLS sequence; for example, the NLS sequence is shown in SEQ ID NO: 81; for example, the NLS sequence is located at, near or close to the end of the protein (e.g., N-terminal or C-terminal).

7. A fusion protein comprising the protein of any one of claims 1-5 and an additional protein or polypeptide; for example, the additional protein or polypeptide is optionally linked to the N-terminus or C-terminus of the protein through a linker; for example, the additional protein or polypeptide is selected from an epitope tag, a reporter gene sequence, a nuclear localization signal (NLS) sequence, a targeting moiety, a transcription activation domain (such as, VP64), a transcription repression domain (for example, KRAB domain or SID domain), a nuclease domain (for example, Fok 1), a domain having an activity selected from: a nucleotide deaminase, methylase activity, a demethylase, transcription activation activity, transcription inhibition activity, transcription release factor activity, histone modification activity, nuclease activity, single-stranded RNA cleavage activity, double-stranded RNA cleavage activity, single-stranded DNA cleavage activity, double-stranded DNA cleavage activity and nucleic acid binding activity ; and any combinations thereof; for example, the fusion protein comprises an epitope tag; for example, the fusion protein comprises an NLS sequence; for example, the NLS sequence is shown in SEQ ID NO: 81; for example, the NLS sequence is located at, near, or close to the end of the protein (for example, the N-terminus or the C-terminus); for example, the fusion protein has an amino acid sequence selected from: SEQ ID NOs: 82-101; for example, the fusion protein has an amino acid sequence selected from: SEQ ID NOs: 83, 98, 101.

8. An isolated nucleic acid molecule comprising a sequence selected from the following or consisting of a sequence selected from the following: (i) a sequence as shown in any one of SEQ ID NOs: 41-60; (ii) compared with the sequence as shown in any one of SEQ ID NOs: 41-60, a sequence having one or more base substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions or additions); (iv) a sequence having at least 95% sequence identity with the sequence as shown in any one of SEQ ID NO: 41-60; (v) a sequence that hybridizes to the sequence as described in any one of (i) to (iii) under stringent conditions; or (vi) a complementary sequence of the sequence as described in any one of (i).sup.-(iii); in addition, the sequence as described in any one of (ii)-(v) substantially retains the biological function of the sequence from which it is derived; for example, the nucleic acid molecule contains one or more stem loops or optimized secondary structures; for example, the sequence described in any one of (ii)-(v) retains the secondary structure of the sequence from which it is derived; for example, the nucleic acid molecule comprises a sequence selected from the following, or consists of a sequence selected from the following: (a) a nucleotide sequence as shown in any one of SEQ ID NOs: 41-60; (b) a sequence that hybridizes to the sequence as described in (a) under stringent conditions; or (c) a complementary sequence of the sequence as described in (a); for example, the isolated nucleic acid molecule is RNA; for example, the isolated nucleic acid molecule is a direct repeat sequence in the CRISPR/Cas system.

9. The isolated nucleic acid molecule of claim 8, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in SEQ ID NO: 57; (ii) compared with the sequence as shown in SEQ ID NO: 57, a sequence having one or more base substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions or additions); (iii) a sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% sequence identity with the sequence as shown in any one of SEQ ID NO: 57; or (iv) a sequence that hybridizes to the sequence as described in any one of (i) to (iii) under stringent conditions; or (v) a complementary sequence of the sequence as described in any one of (i).sup.-(iii); for example, the nucleic acid molecule comprises a sequence selected from the following, or consists of a sequence selected from the following: (a) a nucleotide sequence as shown in SEQ ID NO: 57; (b) a sequence that hybridizes to the sequence as described in (a) under stringent conditions; or (c) a complementary sequence of the nucleotide sequence as shown in SEQ ID No: 57.

10. The isolated nucleic acid molecule of claim 8, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in SEQ ID NO: 42; (ii) compared with the sequence as shown in SEQ ID NO: 42, a sequence having one or more base substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions or additions); (iii) a sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% sequence identity with the sequence as shown in SEQ ID NO: 42; or (iv) a sequence that hybridizes to the sequence as described in any one of (i) to (iii) under stringent conditions; or (v) a complementary sequence of the sequence as described in any one of (i)-(iii); for example, the nucleic acid molecule comprises a sequence selected from the following, or consists of a sequence selected from the following: (a) a nucleotide sequence as shown in SEQ ID NO: 42; (b) a sequence that hybridizes to the sequence as described in (a) under stringent conditions; or (c) a complementary sequence of the nucleotide sequence as shown in SEQ ID No: 42.

11. The isolated nucleic acid molecule of claim 8, which comprises a sequence selected from the following, or consists of a sequence selected from the following: (i) a sequence as shown in SEQ ID NO: 60; (ii) compared with the sequence as shown in SEQ ID NO: 60, a sequence having one or more base substitutions, deletions or additions (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 base substitutions, deletions or additions); (iii) a sequence having at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% sequence identity with the sequence as shown in SEQ ID NO: 60; or (iv) a sequence that hybridizes to the sequence as described in any one of (i) to (iii) under stringent conditions; or (v) a complementary sequence of the sequence as described in any one of (i).sup.-(iii); for example, the nucleic acid molecule comprises a sequence selected from the following, or consists of a sequence selected from the following: (a) a nucleotide sequence as shown in SEQ ID NO: 60; (b) a sequence that hybridizes to the sequence as described in (a) under stringent conditions; or (c) a complementary sequence of the nucleotide sequence as shown in SEQ ID No: 60.

12. A complex comprising: (i) a protein component, which is selected from: the protein of any one of claims 1-5, the conjugate of claim 6 or the fusion protein of claim 7, and any combinations thereof; and (ii) a nucleic acid component, which comprises the isolated nucleic acid molecule of any one of claims 8-11 and a guide sequence capable of hybridizing to the target sequence from the 5′ to 3′, wherein the protein component and the nucleic acid component combine with each other to form a complex; for example, the guide sequence is attached to the 3′ end of the nucleic acid molecule; for example, the guide sequence comprises the complementary sequence of the target sequence; for example, the nucleic acid component is a guide RNA in the CRISPR/Cas system; for example, the nucleic acid molecule is RNA; for example, the complex does not comprise trans-acting crRNA (tracrRNA).

13. The complex of claim 12, comprising: (i) a protein component selected from: the protein of claim 3, a conjugate or fusion protein comprising the protein; and (ii) a nucleic acid component, which comprises the isolated nucleic acid molecule of claim 9 and the guide sequence.

14. The complex of claim 12, comprising: (i) a protein component selected from: the protein of claim 4, a conjugate or fusion protein comprising the protein; and (ii) a nucleic acid component, which comprises the isolated nucleic acid molecule of claim 10 and the guide sequence.

15. The complex of claim 12, comprising: (i) a protein component selected from: the protein of claim 5, a conjugate or fusion protein comprising the protein; and (ii) a nucleic acid component, which comprises the isolated nucleic acid molecule of claim 11 and the guide sequence.

16. An isolated nucleic acid molecule comprising: (i) a nucleotide sequence encoding the protein of any one of claims 1-5, or the conjugate of claim 6, or the fusion protein of claim 7; (ii) a nucleotide sequence encoding the isolated nucleic acid molecule of claims 8-11; and/or, (iii) a nucleotide sequence containing (i) and (ii); for example, the nucleotide sequence described in any one of (i) to (iii) is codon optimized for expression in a prokaryotic cell or eukaryotic cell.

17. A vector comprising the isolated nucleic acid molecule of claim 16.

18. A host cell comprising the isolated nucleic acid molecule of claim 16 or the vector of claim 17.

19. A composition comprising: (i) a first component, which is selected from: the protein of any one of claims 1-5, the conjugate of claim 6, the fusion protein of claim 7, a nucleotide sequence encoding the protein or fusion protein, and any combinations thereof; and (ii) a second component, which is a nucleotide sequence containing a guide RNA, or a nucleotide sequence encoding the nucleotide sequence containing a guide RNA; wherein the guide RNA includes a direct repeat sequence and a guide sequence from the 5′ to 3′, and the guide sequence can hybridize with the target sequence; the guide RNA can form a complex with the protein, conjugate or fusion protein as described in (i); for example, the direct repeat sequence is an isolated nucleic acid molecule as defined in any one of claims 8-11; for example, the guide sequence is connected to the 3′ end of the direct repeat sequence; for example, the guide sequence comprises the complementary sequence of the target sequence; for example, the composition does not contain trans-acting crRNA (tracrRNA); for example, the composition is non-naturally occurring or modified; for example, at least one component in the composition is non-naturally occurring or modified; for example, the first component is non-naturally occurring or modified; and/or, the second component is non-naturally occurring or modified.

20. The composition of claim 19, wherein: the first component is selected from: the protein of claim 3, or a conjugate or fusion protein comprising the protein, or a nucleotide sequence encoding the protein or fusion protein, and any combinations thereof; the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 9; preferably, when the target sequence is DNA, the target sequence is located at the 3′ end of the original spacer sequence adjacent motif (PAM), and the PAM has the sequence shown by 5′-ATG.

21. The composition of claim 19, wherein: the first component is selected from: the protein of claim 4, or a conjugate or fusion protein comprising the protein, or a nucleotide sequence encoding the protein or fusion protein, and any combinations thereof; the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 10; preferably, when the target sequence is DNA, the target sequence is located at the 3′ end of the original spacer sequence adjacent motif (PAM), and the PAM has a sequence shown by 5′-TTN.

22. The composition of claim 19, wherein: the first component is selected from: the protein of claim 5, or a conjugate or fusion protein comprising the protein, or a nucleotide sequence encoding the protein or fusion protein, and any combinations thereof; the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 11; preferably, when the target sequence is DNA, the target sequence is located at the 3′ end of the original spacer sequence adjacent motif (PAM), and the PAM has the sequence shown by 5′-KTR.

23. A composition comprising one or more vectors, the one or more vectors comprising: (i) a first nucleic acid, which is a nucleotide sequence encoding the protein of any one of claims 1-5 or the fusion protein of claim 7; optionally, the first nucleic acid is operably linked to a first regulatory element; and (ii) a second nucleic acid, which encodes a nucleotide sequence comprising a guide RNA; optionally the second nucleic acid is operably linked to a second regulatory element; wherein: the first nucleic acid and the second nucleic acid are present on the same or different vectors; the guide RNA comprises a direct repeat sequence and a guide sequence from the 5′ to 3′, and the guide sequence can hybridize with the target sequence; the guide RNA can form a complex with the effector protein or fusion protein as described in (i); for example, the direct repeat sequence is an isolated nucleic acid molecule as defined in any one of claims 8-11; for example, the guide sequence is connected to the 3′ end of the direct repeat sequence; for example, the guide sequence comprises the complementary sequence of the target sequence; for example, the composition does not contain trans-acting crRNA (tracrRNA); for example, the composition is non-naturally occurring or modified; for example, at least one component in the composition is non-naturally occurring or modified; for example, the first regulatory element is a promoter, such as an inducible promoter; for example, the second regulatory element is a promoter, such as an inducible promoter.

24. The composition of claim 23, wherein: the first nucleic acid is a nucleotide sequence encoding the protein of claim 3 or a fusion protein containing the protein; the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 9; preferably, when the target sequence is DNA, the target sequence is located at the 3′ end of the original spacer sequence adjacent motif (PAM), and the PAM has the sequence shown by 5′-ATG.

25. The composition of claim 23, wherein: the first nucleic acid is a nucleotide sequence encoding the protein of claim 4 or a fusion protein containing the protein; the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 10; preferably, when the target sequence is DNA, the target sequence is located at the 3′ end of the original spacer sequence adjacent motif (PAM), and the PAM has a sequence shown by 5′-TTN.

26. The composition of claim 23, wherein: the first nucleic acid is a nucleotide sequence encoding the protein of claim 5 or a fusion protein containing the protein; the direct repeat sequence is an isolated nucleic acid molecule as defined in claim 11; preferably, when the target sequence is DNA, the target sequence is located at the 3′ end of the original spacer sequence adjacent motif (PAM), and the PAM has the sequence shown by 5′-KTR.

27. The composition of any one of claims 19-26, wherein when the target sequence is RNA, the target RNA sequence does not have PAM domain restrictions.

28. The composition of any one of claims 19-27, wherein the target sequence is a DNA or RNA sequence derived from a prokaryotic cell or a eukaryotic cell; or the target sequence is a non-naturally occurring DNA or RNA sequence .

29. The composition of any one of claims 19-28, wherein the target sequence is present in a cell; for example, the target sequence is present in the cell nucleus or in the cytoplasm (e.g., organelles); for example, the cell is a eukaryotic cell; for example, the cell is a prokaryotic cell.

30. The composition of any one of claims 19-29, wherein the protein is linked to one or more NLS sequences, or the conjugate or fusion protein comprises one or more NLS sequences; for example, the NLS sequence is linked to the N-terminus or C-terminus of the protein; for example, the NLS sequence is fused to the N-terminus or C-terminus of the protein.

31. A kit comprising one or more components selected from the group consisting of: the protein of any one of claims 1-5, the conjugate of claim 6, the fusion protein of claim 7, the isolated nucleic acid molecule of any one of claims 8-11, the complex of any one of claims 12-15, the isolated nucleic acid molecule of claim 16, the vector of claim 17, the composition of any one of claims 19-30; for example, the kit comprises the composition of any one of claims 19-22, and instructions for using the composition; for example, the kit comprises the composition of any one of claims 23-26, and instructions for using the composition.

32. A delivery composition comprising a delivery vehicle and one or more selected from the group consisting of: the protein of any one of claims 1-5, the conjugate of claim 6, the fusion protein of claim 7, the isolated nucleic acid molecule of any one of claims 8-11, the complex of any one of claims 12-15, the isolated nucleic acid molecule of claim 16, the vector of claim 17, the composition of any one of claims 19-30; for example, the delivery vehicle is a particle; for example, the delivery vehicle is selected from a lipid particle, sugar particle, metal particle, protein particle, liposome, exosome, microvesicle, gene gun, or viral vector (e.g., replication defective retrovirus, lentivirus, adenovirus or adeno-associated virus).

33. A method for modifying a target gene, comprising: contacting the complex of any one of claims 12-15 or the composition of any one of claims 19-30 with the target gene, or delivering that to a cell containing the target gene; the target sequence is present in the target gene; for example, the target gene is present in the cell; for example, the cell is a prokaryotic cell; for example, the cell is a eukaryotic cell; for example, the cell is selected from (for example, a mammalian cell, such as a human cell), a plant cell; for example, the target gene is present in a nucleic acid molecule (e.g., a plasmid) in vitro; for example, the modification refers to a break in the target sequence, such as a double-strand break in DNA or a single-strand break in RNA; for example, the modification further includes inserting an exogenous nucleic acid into the break.

34. The method of claim 33, which comprises contacting the complex of claim 13, the composition of claim 20, or the composition of claim 24 with the target gene, or delivering that to a cell containing the target gene.

35. The method of claim 33, which comprises contacting the complex of claim 14, the composition of claim 21, or the composition of claim 25 with the target gene, or delivering that to a cell containing the target gene.

36. The method of claim 33, comprising contacting the complex of claim 15, the composition of claim 22, or the composition of claim 26 with the target gene, or delivering that to a cell containing the target gene.

37. A method for altering the expression of a gene product, comprising: contacting the complex of any one of claims 12-15 or the composition of any one of claims 19-30 with a nucleic acid molecule encoding the gene product, or delivering that to a cell containing the nucleic acid molecule, the target sequence is present in the nucleic acid molecule; for example, the nucleic acid molecule is present in the cell; for example, the cell is a prokaryotic cell; for example, the cell is a eukaryotic cell; for example, the cell is selected from (for example, a mammalian cell, such as a human cell), a plant cell; for example, the nucleic acid molecule is present in a nucleic acid molecule (e.g., a plasmid) in vitro; for example, the expression of the gene product is altered (e.g., enhanced or decreased); for example, the gene product is a protein.

38. The method of claim 37, which comprises contacting the complex of claim 13, the composition of claim 20, or the composition of claim 24 with a nucleic acid molecule encoding the gene product, or delivering that to a cell containing the nucleic acid molecule.

39. The method of claim 37, which comprises contacting the complex of claim 14, the composition of claim 21, or the composition of claim 25 with a nucleic acid molecule encoding the gene product, or delivering that to a cell containing the nucleic acid molecule.

40. The method of claim 37, which comprises contacting the complex of claim 15, the composition of claim 22, or the composition of claim 26 with a nucleic acid molecule encoding the gene product, or delivering that to a cell containing the nucleic acid molecule.

41. The method of any one of claims 32-40, wherein the protein, conjugate, fusion protein, isolated nucleic acid molecule, complex, vector or composition is contained in a delivery vehicle; for example, the delivery vehicle is selected from a lipid particle, sugar particle, metal particle, protein particle, liposome, exosome, viral vector (such as replication-defective retrovirus, lentivirus, adenovirus or adeno-associated virus).

42. The method of any one of claims 32-41, which is used to change one or more target sequences in a target gene or a nucleic acid molecule encoding a target gene product to modify a cell, cell line, or organism.

43. A cell or its progeny obtained by the method of any one of claims 32-42, wherein the cell contains a modification that is not present in its wild type.

44. The cell product of the cell or its progeny of claim 43.

45. An in vitro, isolated or in vivo cell or cell line or its progeny, the cell or cell line or its progeny comprises: the protein of any one of claims 1-5, the conjugate of claim 6, the fusion protein of claim 7, the isolated nucleic acid molecule of any one of claims 8-11, the complex of claims 12-15, the isolated nucleic acid molecule of claim 17, the vector of claim 17, the composition of any one of claims 19-30; for example, the cell or cell line or its progeny comprises: the complex of claim 13, the composition of claim 20, or the composition of claim 24; for example, the cell or cell line or its progeny comprises: the complex of claim 14, the composition of claim 21, or the composition of claim 25; for example, the cell or cell line or its progeny comprises: the complex of claim 15, the composition of claim 22, or the composition of claim 26; for example, the cell is a eukaryotic cell; for example, the cell is an animal cell (for example, a mammalian cell, such as a human cell) or a plant cell; for example, the cell is a stem cell or stem cell line.

46. Use of the protein of any one of claims 1-5, the conjugate of claim 6, the fusion protein of claim 7, the isolated nucleic acid molecule of any one of claims 8-11, the complex of any one of claims 12-15, the isolated nucleic acid molecule of claim 16, the vector of claim 17, the composition of any one of claims 19-30, or the kit of claim 32 for nucleic acid editing (for example, gene or genome editing); for example, the gene or genome editing includes modifying genes, knocking out genes, altering the expression of gene products, repairing mutations, and/or inserting polynucleotides.

47. Use of the protein of any one of claims 1-5, the conjugate of claim 6, the fusion protein of claim 7, the isolated nucleic acid molecule of any one of claims 8-11, the complex of any one of claims 12-15, the isolated nucleic acid molecule of claim 16, the vector of claim 17, the composition of any one of claims 19-30, or the kit of claim 32 in the preparation of a formulation for: (i) the in vitro gene or genome editing; (ii) the detection of an isolated single-stranded DNA; (iii) editing the target sequence in the target locus to modify a biological or non-human organism; (iv) the treatment of the disease caused by defects in the target sequence in the target locus.

Description

DESCRIPTION OF THE DRAWINGS

[0514] FIG. 1 shows a gel electrophoresis result of pre-crRNA processing by cas12j protein.

[0515] FIGS. 2A-2B show a result of the analysis of the PAM domain of the cas12j protein.

[0516] FIG. 3 shows an identification result of the DNA cutting method of the CRISPR/Cas12j system.

[0517] FIG. 4 shows a result of in vitro cleavage site analysis of Cas12j.4, Cas12j.19 and Cas12j.22.

[0518] FIG. 5 shows a result of in vitro digestion activity of Cas12j.19 at different temperatures.

[0519] FIG. 6 shows a result of the effect of different spacer lengths on the enzyme cleavage activity in the CRISPR/Cas12j.19 system.

[0520] FIG. 7 shows a result of the effect of different repeat lengths on the enzyme cleavage activity in the CRISPR/Cas12j.19 system. WT represents a repeat sequence without truncation.

[0521] FIG. 8 shows a result of CRISPR/Cas12j.19 system's tolerance to spacer mismatches. WT represents a spacer sequence without mutation.

SEQUENCE INFORMATION

[0522] Partial sequence information involved in the present invention is provided in Table 1 below.

TABLE-US-00001 TABLE 1 Description of the sequence SEQ ID NO: Description 1 amino acid sequence of Cas12j.3 2 amino acid sequence of Cas12j.4 3 amino acid sequence of Cas12j.5 4 amino acid sequence of Cas12j.6 5 amino acid sequence of Cas12j.7 6 amino acid sequence of Cas12j.8 7 amino acid sequence of Cas12j.9 8 amino acid sequence of Cas12j.10 9 amino acid sequence of Cas12j.11 10 amino acid sequence of Cas12j.12 11 amino acid sequence of Cas12j.13 12 amino acid sequence of Cas12j.14 13 amino acid sequence of Cas12j.15 14 amino acid sequence of Cas12j.16 15 amino acid sequence of Cas12j.17 16 amino acid sequence of Cas12j.18 17 amino acid sequence of Cas12j.19 18 amino acid sequence of Cas12j.20 19 amino acid sequence of Cas12j.21 20 amino acid sequence of Cas12j.22 21 an encoding nucleic acid sequence of Cas12j.3 22 an encoding nucleic acid sequence of Cas12j.4 23 an encoding nucleic acid sequence of Cas12j.5 24 an encoding nucleic acid sequence of Cas12j.6 25 an encoding nucleic acid sequence of Cas12j.7 26 an encoding nucleic acid sequence of Cas12j.8 27 an encoding nucleic acid sequence of Cas12j.9 28 an encoding nucleic acid sequence of Cas12j.10 29 an encoding nucleic acid sequence of Cas12j.11 30 an encoding nucleic acid sequence of Cas12j.12 31 an encoding nucleic acid sequence of Cas12j.13 32 an encoding nucleic acid sequence of Cas12j.14 33 an encoding nucleic acid sequence of Cas12j.15 34 an encoding nucleic acid sequence of Cas12j.16 35 an encoding nucleic acid sequence of Cas12j.17 36 an encoding nucleic acid sequence of Cas12j.18 37 an encoding nucleic acid sequence of Cas12j.19 38 an encoding nucleic acid sequence of Cas12j.20 39 an encoding nucleic acid sequence of Cas12j.21 40 an encoding nucleic acid sequence of Cas12j.22 41 Cas12j.3 prototype direct repeat sequence 42 Cas12j.4 prototype direct repeat sequence 43 Cas12j.5 prototype direct repeat sequence 44 Cas12j.6 prototype direct repeat sequence 45 Cas12j.7 prototype direct repeat sequence 46 Cas12j.8 prototype direct repeat sequence 47 Cas12j.9 prototype direct repeat sequence 48 Cas12j.10 prototype direct repeat sequence 49 Cas12j.11 prototype direct repeat sequence 50 Cas12j.12 prototype direct repeat sequence 51 Cas12j.13 prototype direct repeat sequence 52 Cas12j.14 prototype direct repeat sequence 53 Cas12j.15 prototype direct repeat sequence 54 Cas12j.16 prototype direct repeat sequence 55 Cas12j.17 prototype direct repeat sequence 56 Cas12j.18 prototype direct repeat sequence 57 Cas12j.19 prototype direct repeat sequence 58 Cas12j.20 prototype direct repeat sequence 59 Cas12j.21 prototype direct repeat sequence 60 Cas12j.22 prototype direct repeat sequence 61 a coding nucleic acid sequence of Cas12j.3 prototype direct repeat sequence 62 a coding nucleic acid sequence of Cas12j.4 prototype direct repeat sequence 63 a coding nucleic acid sequence of Cas12j.5 prototype direct repeat sequence 64 a coding nucleic acid sequence of Cas12j.6 prototype direct repeat sequence 65 a coding nucleic acid sequence of Cas12j.7 prototype direct repeat sequence 66 a coding nucleic acid sequence of Cas12j.8 prototype direct repeat sequence 67 a coding nucleic acid sequence of Cas12j.9 prototype direct repeat sequence 68 a coding nucleic acid sequence of Cas12j.10 prototype direct repeat sequence 69 a coding nucleic acid sequence of Cas12j.11 prototype direct repeat sequence 70 a coding nucleic acid sequence of Cas12j.12 prototype direct repeat sequence 71 a coding nucleic acid sequence of Cas12j.13 prototype direct repeat sequence 72 a coding nucleic acid sequence of Cas12j.14 prototype direct repeat sequence 73 a coding nucleic acid sequence of Cas12j.15 prototype direct repeat sequence 74 a coding nucleic acid sequence of Cas12j.16 prototype direct repeat sequence 75 a coding nucleic acid sequence of Cas12j.17 prototype direct repeat sequence 76 a coding nucleic acid sequence of Cas12j.18 prototype direct repeat sequence 77 a coding nucleic acid sequence of Cas12j.19 prototype direct repeat sequence 78 a coding nucleic acid sequence of Cas12j.20 prototype direct repeat sequence 79 a coding nucleic acid sequence of Cas12j.21 prototype direct repeat sequence 80 a coding nucleic acid sequence of Cas12j.22 prototype direct repeat sequence 81 NLS sequence 82 an amino acid sequence of Cas12j.3-NLS fusion protein 83 an amino acid sequence of Cas12j.4-NLS fusion protein 84 an amino acid sequence of Cas12j.5-NLS fusion protein 85 an amino acid sequence of Cas12j.6-NLS fusion protein 86 an amino acid sequence of Cas12j.7-NLS fusion protein 87 an amino acid sequence of Cas12j.8-NLS fusion protein 88 an amino acid sequence of Cas12j.9-NLS fusion protein 89 an amino acid sequence of Cas12j.10-NLS fusion protein 90 an amino acid sequence of Cas12j.11-NLS fusion protein 91 an amino acid sequence of Cas12j.12-NLS fusion protein 92 an amino acid sequence of Cas12j.13-NLS fusion protein 93 an amino acid sequence of Cas12j.14-NLS fusion protein 94 an amino acid sequence of Cas12j.15-NLS fusion protein 95 an amino acid sequence of Cas12j.16-NLS fusion protein 96 an amino acid sequence of Cas12j.17-NLS fusion protein 97 an amino acid sequence of Cas12j.18-NLS fusion protein 98 an amino acid sequence of Cas12j.19-NLS fusion protein 99 an amino acid sequence of Cas12j.20-NLS fusion protein 100 an amino acid sequence of Cas12j.21-NLS fusion protein 101 an amino acid sequence of Cas12j.22-NLS fusion protein 102 Plasmid expressing Cas12j.3 system 103 PAM library sequence 104 Pre-crRNA processing and PAM consumption guide RNA 105 Cas12j.19 guide RNA 106 targeted double-stranded DNA sequence 107 Cas12j.1 amino acid sequence 108 Cas12j.2 amino acid sequence

DETAILED DESCRIPTION

[0523] The invention will now be described with reference to the following examples which are intended to illustrate the present invention rather than limit the present invention.

[0524] Unless otherwise specified, the experiments and methods described in the examples are basically performed according to conventional methods well known in the art and described in various references. For example, conventional techniques such as immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA used in the present invention can be found in Sambrook, Fritsch and Maniatis, “MOLECULAR CLONING: A LABORATORY MANUAL”, 2nd edition (1989); “CURRENT PROTOCOLS IN MOLECULAR BIOLOGY” (edited by F. M. Ausubel et al., (1987)); “METHODS IN ENZYMOLOGY” series (Academic Publishing Company): “PCR 2: A PRACTICAL APPROACH” (edited by M. J. MacPherson, BD Hames and G. R. Taylor (1995)), “ANTIBODIES, A LABORATORY MANUAL”, edited by Harlow and Lane (1988), and “ANIMAL CELL CULTURE” (edited by R. I. Freshney (1987)).

[0525] In addition, if the specific conditions are not specified in the examples, it shall be carried out in accordance with the conventional conditions or the conditions recommended by the manufacturer. The reagents or instruments used without the manufacturer's indication are all conventional products that can be purchased commercially. Those skilled in the art know that the embodiments describe the present invention by way of example, and are not intended to limit the scope of protection claimed by the present invention. All publications and other references mentioned in this article are incorporated into this article by reference in their entirety.

[0526] The sources of some reagents involved in the following examples are as follows:

[0527] LB liquid medium: 10 g Tryptone, 5 g Yeast Extract, 10 g NaCl, diluted to 1L, and sterilized. If antibiotics are needed, they are added at a final concentration of 50 μg/ml after cooling the medium.

[0528] Chloroform/isoamyl alcohol: adding 240 ml of chloroform to 10 ml of isoamyl alcohol and mixing them well.

[0529] RNP buffer: 100 mM sodium chloride, 50 mM Tris-HCl, 10 mM MgCl.sub.2, 100 μg/ml BSA, pH 7.9.

[0530] The prokaryotic expression vectors pACYC-Duet-1 and pUC19 are purchased from Genscript Biotech Corporation.

[0531] E. coli competence EC100 is purchased from Epicentre company.

EXAMPLE 1

Acquisition of Cas12j Gene and Cas12j Guide RNA

[0532] 1. CRISPR and gene annotation: using Prodigal to perform gene annotation on the microbial genome and metagenomic data of NCBI and JGI databases to obtain all proteins. At the same time, using Piler-CR to annotate CRISPR locus. The parameters are the default parameters.

[0533] 2. Protein filtering: Eliminating redundancy of annotated proteins by sequence identity, removing proteins with exactly identical sequence, and at the same time classifying proteins longer than 800 amino acids into macromolecular proteins. Since all the effector proteins of the second type of CRISPR/Cas system discovered so far are more than 900 amino acids in length, in order to reduce the computational complexity, when we are mining CRISPR effector proteins, we only consider macromolecular proteins larger than 800 amino acids. 3. Obtaining CRISPR-associated macromolecular proteins: extending each CRISPR locus by 10 Kb upstream and downstream, and identifying non-redundant macromolecular proteins in the adjacent interval of CRISPR.

[0534] 4. Clustering of CRISPR-associated macromolecular proteins: using BLASTP to perform internal pairwise comparisons of non-redundant macromolecular CRISPR-associated proteins, and output the comparison result of Evalue<1E-10. Using MCL to perform cluster analysis on the output result of BLASTP, CRISPR-associated protein family.

[0535] 5. Identification of CRISPR-enriched macromolecular protein family: using BLASTP to compare the proteins of the CRISPR-associated protein family to the non-redundant macromolecular protein database that removes the CRISPR-associated proteins and output the comparison result of Evalue<1E-10. If the homologous protein found in a non-CRISPR-associated protein database is less than 100%, it means that the proteins of this family are enriched in the CRISPR region. In this way, we identify the CRISPR-enriched macromolecular protein family. 6. Annotation of protein functions and domains: using the Pfam database, NR database and Cas protein collected from NCBI to annotate the CRISPR-enriched macromolecular protein family to obtain a new CRISPR/Cas protein family. Using Mafft to perform multiple sequence alignments for each CRISPR/Cas family protein, and then using JPred and HHpred to perform conserved domain analysis to identify protein families containing RuvC domains.

[0536] On this basis, the present inventors have obtained a new Cas effector protein, namely Cas12j, named Cas12j.3 (SEQ ID NO: 1) , Cas12j.4 (SEQ ID NO: 2) , Cas12j.5 (SEQ ID NO: 3) ,Cas12j.6 (SEQ ID NO: 4) , Cas12j.7 (SEQ ID NO: 5) ,Cas12j.8 (SEQ ID NO: 6) , Cas12j.9 (SEQ ID NO: 7) ,Cas12j.10 (SEQ ID NO: 8) , Cas12j.11 (SEQ ID NO: 9) ,Cas12j.12 (SEQ ID NO: 10) , Cas12j.13 (SEQ ID NO: 11) ,Cas12j.14 (SEQ ID NO: 12) , Cas12j.15 (SEQ ID NO: 13) ,Cas12j.16 (SEQ ID NO: 14) , Cas12j.17 (SEQ ID NO: 15) ,Cas12j.18 (SEQ ID NO: 16), Cas12j.19(SEQ ID NO: 17), Cas12j.20(SEQ ID NO: 18), Cas12j.21 (SEQ ID NO: 19) , Cas12j.22 (SEQ ID NO: 20) , Cas12j.1 (SEQ ID NO: 107) , Cas12j.2 (SEQ ID NO: 108) , respectively with its 22 active homologue sequences. The coding DNA of 20 homologues are shown in SEQ ID NOs: 21-40, respectively. The prototype direct repeat sequences (repeat sequences contained in pre-crRNA) corresponding to Cas12j.3, Cas12j.4, Cas12j.5, Cas12j.6, Cas12j.7, Cas12j.8, Cas12j.9, Cas12j.10, Cas12j.11, Cas12j.12, Cas12j.13, Cas12j.14, Cas12j. 15, Cas12j.16, Cas12j.17, Cas12j.18, Cas12j.19, Cas12j.20 are shown in SEQ ID NOs: 41-60, respectively.

EXAMPLE 2

Processing of Pre-crRNA by Cas12j Gene

[0537] I. In vitro Expression and Purification of Cas12j Protein

[0538] The specific steps of in vitro expression and purification of Cas12j protein were as follows:

[0539] 1. Artificially synthesizing DNA sequence encoding Cas12j protein (SEQ ID NO: 82-101) with nuclear localization signal.

[0540] 2. Connecting the double-stranded DNA molecule synthesized in step 1 with the prokaryotic expression vector pET-30a (+) to obtain a recombinant plasmid pET-30a-CRISPR/Cas12j.

[0541] 3. Introducing the recombinant plasmid pET-30a-CRISPR/Cas12j into E. coli EC100 to obtain a recombinant bacteria, which is named EC100-CRISPR/Cas12j.

[0542] Taking a single clone of EC100-CRISPR/Cas12j, inoculating it into 100mL LB liquid medium (containing 50μg/mL ampicillin), culturing it with shaking at 37° C. and 200rpm for 12h to obtain a culture bacteria liquid.

[0543] 4. Taking the culture bacteria liquid, inoculated to 50mL LB liquid medium (containing 50 μg/mL ampicillin) at a volume ratio of 1:100, cultured with shaking at 37° C. and 200 rpm until the OD.sub.600nm value is 0.6, then adding IPTG and making the concentration 1 mM, cultured with shaking at 28° C., 220 rpm for 4h, centrifuged at 4° C., 10000 rpm for 10 min, and the bacterial precipitation was collected.

[0544] 5. Taking the bacterial precipitation, adding 100 mL of pH 8.0, 100 mM Tris-HCl buffer, subjected to ultrasonication after resuspension (ultrasonic power 600W, cycle program: broken for 4s, stopped for 6s, 20 min in total), then centrifuged at 4° C., 10000 rpm for 10 min, collecting the supernatant A. 6. Taking the supernatant A, centrifuged at 12000 rpm at 4° C. for 10 min, and the supernatant B was collected.

[0545] 7. Using the nickel column produced by GE to purify the supernatant B (refer to the instructions of the nickel column for the specific purification steps), and then using the protein quantification kit produced by Thermo Fisher to quantify the Cas12j protein.

[0546] II. Transcription and Purification of Cas12j Protein Guide RNA:

[0547] 1. Designing a template for guide RNA transcription. The structure of the transcription template is: T7 promoter+Cas12j prototype repeat (SEQ ID NO: 41-60)+spacer (SEQ ID NO: 104), primer design uses Primer5.0 software to ensure that the forward primer and reverse primer have at least 18 bp overlapping sequence.

[0548] 2. Configuring the following reaction system, gently pipetted to mix, centrifuged briefly, and annealed slowly in a PCR machine:

TABLE-US-00002 PCR amplification reaction component volume (μl) Forward primer (100 nM) 7.5 Reverse primer (100 nM) 7.5 2*KAPA Mix 25 ddH.sub.2O 10 total volume 50

TABLE-US-00003 Primer annealing PCR reaction procedure Temperature (° C.) Time ramp at(° C./s) 98° C. 5 min 2° C./s 85° C./95° C. 0.05 s — 85° C. 1 min 0.03° C./s 75° C./85° C. 0.05 s — 75° C. 1 min 0.03° C./s 72° C./75° C. 0.05 s — 72° C. 1 min 0.03° C./s 55° C./65° C. 0.05 s — 55° C. 1 min 0.03° C./s 45° C./55° C. 0.05 s — 45° C. 1 min 0.03° C./s 35° C./45° C. 0.05 s — 35° C. 1 min 0.03° C./s 30° C./35° C. 0.05 s — 30° C. 1 min 0.03° C./s 25° C. 1 min — 10° C. forever —

[0549] 3. Using the MinElute PCR Purifcation Kit to purify the template. The steps are as follows:

[0550] 1) Adding 5 times the volume of PB to the PCR product, putting a MinElute column on a 2 ml collection tube, and placing it at room temperature for 2 min, 12000 g/2 min;

[0551] 2) Discarding the waste liquid and adding 750 μl Buffer PE (remember to add ethanol before use), 12000 g/2 min;

[0552] 3) Discarding the waste liquid, adding 3541 Buffer PE, 12000 g/2 min, discarding the waste liquid, 12000 g, and vacuum centrifuged for 2 min;

[0553] 4) Changing the MinElute column to a new 1.5 ml centrifuge tube, opening the cap, and placing it at 65° C. for 2 minutes;

[0554] 5) Adding 20 μl of preheated EB solution and placing it for 2 min, 12000 g/2 min, in order to improve the recovery rate, the contents of the centrifuge tube can be passed through the MinElute spin column for 2-3 times;

[0555] 6) Measuring the concentration with Nanodrop and stored at −20° C. , ready for use.

[0556] 4. Purification of guide RNA: phenol: chloroform: isoamyl alcohol (25:24:1) extraction to remove DNAseI in the system

[0557] 1) Adding 80 μl RNA free H.sub.2O to the post-transcription reaction system and adjusting the volume to 100 μl;

[0558] 2) Taking out 2 ml Phase Lock Gel (PLG) Heavy, 15000 g, centrifuged for 2 min, adding 100 μ1 phenol:chloroform:isoamyl alcohol (25:24:1), 100 μ1 RNA digested with DNAseI, gently flicking the Phase-Lock tube 5-10 times by hand to make it evenly mixed, then centrifuged at 15° C./16000 g for 12 min;

[0559] 3) Taking a new RNA-free 1.5 ml centrifuge tube and aspirating the supernatant from the previous centrifugation into the centrifuge tube. Be careful not to absorb the gel, adding isopropanol equal to the volume of the supernatant and one-tenth volume of sodium acetate solution, mixed well with pipette tip, putting it in the refrigerator at −20° C. for 1h or overnight;

[0560] 4) Centrifuged at 4° C./16000 g for 30 min, discarding the supernatant, adding 75% of pre-cooled ethanol, mixing the precipitate well by pipetting, centrifuged at 4° C./16000 g for 12 min, discarding the supernatant, and placed for 2-3 min in a fume cupboard. Drying the ethanol on the surface of the RNA, adding 100 μl of RNA free H.sub.2O, and mixed by pipetting.

[0561] 5) Measuring the purified crRNA concentration with Nanodrop, and uniformly diluted to 250ng/μ1, dispensed into 200 μ1 of PCR centrifuge tubes, and stored at −80° C., ready for use.

[0562] 4. The precrRNA transcription of Cas12f uses NEB's HiScribe T7 high-efficiency RNA synthesis kit. The reaction system is shown in the following table:

TABLE-US-00004 DNA transcription system Component Volume (μl) ATP(100 mM) 2 GTP(100 nM) 2 CTP (100 nM) 2 UTP(100 nM) 2 10*Reaction buffer 2 T7 RNA Polymerase Mix 2 DNA template 8 total 20

[0563] Setting the PCR reaction procedure as: 37° C./3h or 31° C./forever, adding DNAseI, 37° C./45 min

[0564] 5. Purification of PrecrRNA:

[0565] (1) Phenol: chloroform: isoamyl alcohol (25:24:1) extraction to remove DNAseI in the system

[0566] 1) Adding 80 μ1 of RNA free H.sub.2O to the post-transcription reaction system and adjusting the volume to 100 μ1;

[0567] 2) Taking out 2 ml of Phase Lock Gel (PLG) Heavy, 15000 g, centrifuged for 2 min, adding 100 μ1 of phenol:chloroform:isoamyl alcohol (25:24:1), 100 μ1 of RNA digested with DNAseI, gently flicking the Phase-Lock tube by hand 5-10 times to make it evenly mixed, then centrifuged at 15° C./16000 g for 12 min;

[0568] 3) Taking a new RNA-free of 1.5 ml centrifuge tube and aspirating the supernatant from step 0 into the centrifuge tube. Be careful not to get the gel, adding isopropanol equal to the volume of the supernatant and one-tenth volume of sodium acetate solution, mixed well with pipette tip, putting it in the refrigerator at −20° C. for lh or overnight;

[0569] 4) Centrifuged at 4° C./16000 g for 30 min, discarding the supernatant, adding 75% of pre-cooled ethanol, mixing the precipitate well by pipetting, centrifuged at 4° C./16000 g for 12 min, discarding the supernatant, and placed for 2-3 min in a fume cupboard. Drying the ethanol on the surface of the RNA, adding 100 μ1 of RNA free H2O, and mixed by pipetting.

[0570] (2) Running the gel and purifying the precrRNA from the polyacrylamide gel, using ZR Small-RNATM PAGE Recovery Kit from ZYMO RESEARCH to purify and recover the precrRNA. Steps are shown as follows:

[0571] 1) The size of the precrRNA band is about 90 bp, cutting the RNA fragment of the corresponding band, and transferring it to a 1.5 ml RNA-free centrifuge tube;

[0572] 2) Using Squisher™-single to completely mash the gel, adding 400 μ1 of RNA Recovery Buffer, and heated in a 65° C. water bath for 15 minutes;

[0573] 3) Quick freezing in liquid nitrogen for 5 minutes, immediately taking it out and putting it in a 65° C. water bath to heat for 5 minutes;

[0574] 4) Taking out the Zymo-Spin™ IV column on the collection tube, then adding the dissolved gel to it, centrifuged at 12000 g for 5 minutes, and retaining the liquid in the collection tube;

[0575] 5) Taking out the Zymo-Spin™IIIC column on a new collection tube, adding the liquid collected in the previous step to it, centrifuged at 2000 g for 2 minutes, and retaining the liquid in the collection tube;

[0576] 6) Estimating the volume of liquid in the collection tube, adding 2 times the volume of RNA MAX Buffer, and mixed upside down;

[0577] 7) Taking out the Zymo-Spin™ IC column in a new collection tube, adding the liquid in the collection tube of step {circle around (6)} into it, placed for 2 minutes, and centrifuged at 12000 g for 2 minutes;

[0578] 8) Adding 800 μ1 RNA Wash Buffer (note that adding a certain volume of absolute ethanol according to the instructions before use), centrifuged at 12000 g for 2 min, and discarding the liquid in the collection tube; 9) Adding 400 μ1 RNA Wash Buffer, centrifuged at 12000 g for 2 min, discarding the liquid in the collection tube, and then vacuumcentrifuged for 2 min;

[0579] 10) Placed in a 65° C. oven for 1 min, adding 20 μ1 RNA-free H.sub.2O, measuring the concentration of the collected precrRNA with nanodrop, and uniformly adjusting the concentration to 200ng/μ1, dispensed into PCR centrifuge tubes, and stored frozen at minus 80° C., ready for use.

[0580] 6. Establishment of an in vitro pre-crRNA digestion system

[0581] (1) Configuring the following reaction system, mixed gently by pipetting and centrifuged briefly. Placed at 37° C., 1 hour;

TABLE-US-00005 In vitro pre-crRNA digestion system Reagent Dosage pre-crRNA 400 ng Cas protein 1 μg RNA Cleavage Buffer 1 μL RNA-free H.sub.2O make up to 10 μL

[0582] (2) Adding 10 μ1 2× RNA loading dye to the above reaction system and placed at 98° C. for 3 min. Placed on ice for 2 min immediately after the reaction is completed;

[0583] (3) Loading 10 μ1 in the sample well of 10% TBE-Urea polyacrylamide gel, 150V/40 min;

[0584] (4) Adding SYBR Gold nucleic acid gel stain dye to the 1× TBE electrophoresis buffer, placed in the gel, stained at room temperature for 10-15 minutes, and then scanning the gel.

[0585] The gel scanning results are shown in FIG. 1. The result shows that Cas12j.1, Cas12j.4, Cas12j.18, Cas12j.19, Cas12j.21, and Cas12j.22 have pre-crRNA cleavage activity in vitro.

EXAMPLE 3

Identification of the PAM Domain of Cas12j Protein

[0586] 1. Construction of the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j and sequencing According to the sequencing results, the structure of the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j is described as follows: replacing the small fragment between the restriction endonuclease of Pml I and Kpn I recognition sequences of the vector pACYC-Duet-1 with the Cas12j gene (in the sequence as shown in SEQ ID NO: 21-40, double-stranded DNA molecules from the 1st position from the 5′ end to the last position at the 3′ end). The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j expresses the Cas12j protein (SEQ ID NO: 1-20, 107, 108) and the Cas12j guide RNA as shown in SEQ ID NO: 104.

[0587] 2. The recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j contains an expression cassette, and the nucleotide sequence of the expression cassette is composed of Cas12j gene connected to SEQ ID NO: 104 respectively. For example, as shown in SEQ ID No: 102. In the sequence as shown in SEQ ID NO: 102, positions 1 to 44 from the 5′end is the nucleotide sequence of the pLacZ promoter, positions 45 to 3056 is the nucleotide sequence of the Cas12j.3 gene, and positions 3057 to 3143 is the nucleotide sequence of the rrnB T1 terminator (used to terminate transcription). From the 5′end, positions 3144 to 3178 is the nucleotide sequence of the J23119 promoter, positions 3179 to 3241 is the nucleotide sequence of the CRISPR array, and positions 3244 to 3268 is the nucleotide sequence of the rrnB-T2 terminator (used to terminate transcription).

[0588] 3. Obtaining recombinant Escherichia coli: the recombinant plasmid pACYC-Duet-1+CRISPR/Cas12j was introduced into Escherichia coli EC100 to obtain recombinant Escherichia coli, named EC100/pACYC-Duet-1+CRISPR/Cas12j. The recombinant plasmid pACYC-Duet-1 was introduced into E. coli EC100 to obtain a recombinant E. coli named EC100/pACYC-Duet-1.

[0589] 4. Construction of the PAM library: the sequence as shown in SEQ ID NO: 103 is artificially synthesized and connected to the pUC19 vector, wherein the sequence as shown in SEQ ID NO: 103 includes eight random bases at the 5′end and the target sequence. Eight random bases were designed in front of the 5′ end of the target sequence of the PAM library to construct a plasmid library. The plasmids were transferred into Escherichia coli containing the Cas12j locus and Escherichia coli without the Cas.12j locus, respectively. After treatment at 37° C. for 1 hour, we extracted the plasmid, and performed PCR amplification and sequencing on the sequence of the PAM region.

[0590] 5. The acquisition of the PAM library domain: counting the number of appearances of 65,536 combinations of PAM sequences in the experimental group and the control group, respectively, and the number of PAM sequences in each group was subjected to normalization. For any PAM sequence, when log2 (Normalized value of control group/normalized value of experimental group) is greater than 3.5, we consider this PAM to be significantly consumed. We used Weblogo to predict the PAM sequence that was significantly consumed, and have found the PAM domain of each protein. Among them, Cas12j.1 is 5′-TTVW, Cas12j.4 and Cas12j.12 are 5′-TTN, and Cas12j.18 is 5′. -AYR, Cas12j.19 is 5′-ATG, Cas12j.21 is 5′-VTTG, Cas12j.22 is 5′-KTR. The PAM domain analysis results are shown in FIGS. 2A-2B.

EXAMPLE 4

Identification of DNA Cutting Method of CRISPR/Cas12j System

[0591] I. In vitro Expression and Purification of Cas12j Protein

[0592] The specific steps of in vitro expression and purification of Cas12j protein are as follows:

[0593] 1. Artificially synthesizing DNA sequence encoding Cas12j protein (SEQ ID NO: 82-101) with nuclear localization signal.

[0594] 2. Connecting the double-stranded DNA molecule synthesized in step 1 with the prokaryotic expression vector pET-30a (+) to obtain a recombinant plasmid pET-30a-CRISPR/Cas12j .

[0595] 3. Introducing the recombinant plasmid pET-30a-CRISPR/Cas12j into E. coli EC100 to obtain a recombinant bacteria, which is named EC100-CRISPR/Cas12j. Taking a single clone of EC100-CRISPR/Cas12j, inoculated into 100mL LB liquid medium (containing 50 μg/mL ampicillin), cultured with shaking at 37° C. and 200rpm for 12h to obtain a culture bacteria solution.

[0596] 4. Taking the culture bacteria solution, inoculated to 50mL LB liquid medium (containing 50 μg/mL ampicillin) at a volume ratio of 1:100, cultured with shaking at 37° C. and 200rpm until the OD.sub.600nm value is 0.6, then adding IPTG and making the concentration 1 mM, cultured with shaking at 28° C. and 220 rpm for 4 hours, and centrifuged at 4° C. and 10000 rpm for 10 minutes to collect the bacteria precipitation.

[0597] 5. Taking the bacteria precipitation, adding 100mL pH 8.0, 100mM Tris-HCl buffer solution, subjected to ultrasonication after resuspension (ultrasonic power 600W, cycle program: broken for 4s, stopped for 6s, 20 min in total), then centrifuged at 4° C., 10000rpm for 10 min, collecting the supernatant A.

[0598] 6. Taking the supernatant A, centrifuged at 12000 rpm at 4° C. for 10 min, and collecting the supernatant B.

[0599] 7. Using the nickel column produced by GE to purify the supernatant B (refer to the instructions of the nickel column for the specific steps of purification), and then using the protein quantification kit produced by Thermo Fisher to quantify the Cas12j protein.

[0600] II. Transcription and Purification of Cas12j Protein Guide RNA:

[0601] 1. Designing a template for guide RNA transcription. The structure of the transcription template is: T7 promoter +Cas12j prototype repeat (SEQ ID NO: 41-60)+spacer (SEQ ID NO: 105), primer design uses Primer 5.0 software to ensure that the forward primer and reverse primer have at least 18 bp overlapping sequence.

[0602] 2. Configuring the following reaction system, gently pipetted to mix, centrifuged briefly, and annealed slowly in a PCR machine:

TABLE-US-00006 PCR amplification reaction component Volume (μl) Forward primer (100 nM) 7.5 Reverse primer (100 nM) 7.5 2*KAPA Mix 25 ddH.sub.2O 10 total volume 50

[0603] 3. Using the MinElute PCR Purifcation Kit to purify the template. The steps are as follows:

[0604] 1) Adding 5 times the volume of PB to the PCR product, putting a MinElute column on a 2 ml collection tube, and placed at room temperature for 2 min, 12000 g/2 min;

[0605] 2) Discarding the waste liquor and adding 750p1 Buffer PE (remember to add ethanol before use), 12000 g/2 min;

[0606] 3) Discarding the waste liquor, adding 3541 Buffer PE, 12000 g/2 min, discarding the waste liquor, 12000 g, and vacuum centrifuged for 2 min;

[0607] 4) Changing the MinElute column to a new 1.5 ml centrifuge tube, opening the cap, and placed at 65° C. for 2 minutes;

[0608] 5) Adding 20 μl of preheated EB solution and placed for 2 min, 12000 g/2 min. In order to improve the recovery rate, the contents of the centrifuge tube can be passed through the MinElute spin column 2-3 times;

[0609] 6) Measuring the concentration with Nanodrop and frozen stored at −20° C., ready for use.

[0610] 4. Purification of guide RNA: phenol: chloroform: isoamyl alcohol (25:24:1) extraction to remove DNAseI in the system

[0611] 1) Adding 80 μl RNA free H.sub.2O to the post-transcription reaction system and adjusting the volume to 100 μl;

[0612] 2) Taking out 2 ml Phase Lock Gel (PLG) Heavy, 15000 g, centrifuged for 2 min, adding 100 μ1 phenol:chloroform:isoamyl alcohol (25:24:1), 100 μ1 RNA digested with DNAseI, gently flicking the Phase-Lock tube by hand 5-10 times to make it evenly mixed, then centrifuged at 15° C./16000 g for 12 min;

[0613] 3) Taking a new RNA-free 1.5 ml centrifuge tube and aspirating the supernatant from the previous centrifugation into the centrifuge tube. Be careful not to get the gel, adding isopropanol equal to the volume of the supernatant and one-tenth volume of sodium acetate solution, mixed well with pipette tip, putting it in the refrigerator at -20° C. for lh or overnight;

[0614] 4) Centrifuged at 4° C./16000 g for 30 min, discarding the supernatant, adding 75% pre-cooled ethanol, mixed the precipitate by pipetting, centrifuged at 4° C./16000 g for 12 min, discarding the supernatant, and placed for 2-3 min in a fume cupboard. Drying the ethanol on the surface of the RNA, adding 100 μl of RNA free H.sub.2O, and mixed by pipetting.

[0615] 5. Measuring the purified crRNA concentration with Nanodrop, and uniformly diluted to 250ng/μ1, dispensed into 200 μ1 PCR centrifuge tubes, and frozen stored at −80° C., ready for use.

[0616] 6. The establishment of double-stranded DNA digestion system:

[0617] (1) Configuring the following reaction system, mixed gently by pipetting and centrifuged briefly. Placed at 37° C. for 15 min;

TABLE-US-00007 DNA cleavage reaction system component Sample volume 12j-crRNA (250 ng/μl) 600 ng 12j protein (0.5 μg/μl) 0.5 μg 10*DNA Cleavage buffer 1 μl RNA-Free H.sub.2O Make up to 7 μl

[0618] (2) Adding 300ng substrate DNA (SEQ ID NO: 106) (100 ng/μ1), 3 μL, gently pipetted to mix, and centrifuged briefly. Placed at 37° C., 8 hours;

[0619] (3) Adding RNase and placed at 37° C. for 15 minutes to fully digest the RNA impurities in the system;

[0620] (4) Adding proteinase K, placed at 58° C. for 15 minutes, and digesting Cas12j protein;

[0621] (5) Agarose running gel for the detection.

[0622] The result of the running gel is shown in FIG. 3. Cas12j.4, Cas12j.19 and Cas12j.22 can cut double-stranded DNA effectively. However, the cleavage activity of Cas12j.22 is very weak.

[0623] III. The results of in vitro cleavage site analysis of Cas12j.4, Cas12j.19, and Cas12j.22

[0624] Next, we analyzed the in vitro cleavage active sites of these three proteins with DNA double-strand cleavage activity. We recovered the strips cut in the previous step and sent them to the company for Sanger sequencing. Sequencing results are compared with seqman software. The comparison results are shown in FIG. 4. From the peak diagram we can see: Cas12j.4, Cas12j.19, Cas12j.22 have different cleavage methods, the cleavage sites of Cas12j.4 and Cas12j.22 are located at 18nt and 25nt at the end of PAM. After cutting, a Int sticky end is formed. Cas12j.19 has a cleavage site 25nt away from the distal end of PAM, forming an end of about lnt.

EXAMPLE 5

The results of in vitro enzyme cleavage activity detection of Cas12j.19 at different temperatures

[0625] Incubating Cas12j.19 (SEQ ID NO: 17) and guide RNA (SEQ ID NO: 105) at 25° C. for 15 minutes to form a mixture of RNA and protein, usually called RNP, and then adding double-stranded DNA (SEQ ID NO: 106) to the reaction system, and placed in different temperature settings, the set temperatures are: 17° C., 22° C., 27° C., 32° C., 37° C., 42° C., 47° C., 52° C., 62° C., 67° C., 72° C., reacted for 8h, adding RNase after the reaction is completed, digesting RNA for 15 minutes at 37° C., and proteinase K, reacted at 58° C. for 15 minutes to digest the protein, and the result of DNA consumption was detected by agarose gel electrophoresis. The results are shown in FIG. 5. The result shows that Cas12j.19 has double-stranded DNA cleavage activity between 27° C. and 42° C.

EXAMPLE 6

Results of the Effect of Different Spacer Lengths of Cas12j.19 on Enzyme Cleavage Activity

[0626] Since the cleavage site of Cas12j.19 is outside the target sequence, we further tested the Cas12j.19 guide RNA (SEQ ID NO: 105) containing the sequence of the target site, also commonly referred to the influence of the length of the spacer sequence on the cleavage activity. The guide RNA containing the target site sequence was truncated (14-28nt) to obtain the truncation as shown in FIG. 6. Cas12j.19 and the truncated guide RNA were incubated at 25° C. for 15 minutes to form RNP, and then adding double-stranded DNA (SEQ ID NO: 106) to the reaction system, and reacted at 37° C. for 8 hours. After the reaction was completed, RNase was added, the RNA was digested at 37° C. for 15 minutes, and the proteinase K was reacted at 58° C. for 15 minutes to digest the protein, and the digestion results were detected by agarose gel electrophoresis. The results are shown in FIG. 6. The result shows that the spacer length required for Cas12j.19 to exert its cleavage activity is at least 14 nt.

EXAMPLE 7

Results of the Effect of Different Repeat Lengths of Cas12j.19 on Enzyme Cleavage Activity

[0627] Similarly, we tested the effect of the length of the direct repeat sequence of the guide RNA on the cleavage activity of Cas12j.19 double-stranded DNA. We truncated the direct repeat sequence in the guide RNA (SEQ ID NO: 105) to 2434 nt to obtain the truncation in FIG. 7. The Cas12j.19 and the corresponding guide RNA with different repeat lengths were incubated at 25° C. for 15 minutes to form RNP, then adding double-stranded DNA to the reaction system, and reacted at 37° C. for 8 hours. After the reaction, RNase was added, the RNA was digested at 37° C. for 15 minutes, and the proteinase K was reacted at 58° C. for 15 minutes to digest the protein, and the digestion results were detected by agarose gel electrophoresis. The result is shown in FIG. 7. The result shows that the shortest direct repeat sequence required for detection has a length of 32 nt.

EXAMPLE 8

Results of Cas12j.19's Tolerance to Spacer Mismatch

[0628] The complementary pairing between the sequence containing the target site in the guide RNA and the original target sequence is of great significance for DNA recombination and cleavage. The part of the guide RNA (SEQ ID NO: 105) that contains the target sequence was subjected to point mutations successively (that is, the bases at positions 1, 3, 5, 7, 9, 11, 13, 15, 17 starting from the 5′end of the spacer) to obtain the mutant in FIG. 8, thereby forming a mismatch with the target sequence. Incubating Cas12j.19 with the corresponding guide RNA containing the mutation site at 25° C. for 15 minutes to form RNP, and then adding double-stranded DNA (SEQ ID NO: 106) to the reaction system at 37° C. and reacted for 8 hours. After the reaction, RNase was added, the RNA was digested at 37° C. for 15 minutes, and the proteinase K was reacted at 58° C. for 15 minutes to digest the protein, and the digestion results were detected by agarose gel electrophoresis. The results are shown in FIG. 8. The results show that within 5 nt before the 5′ end of the spacer sequence, the mutation of the target sequence base has an important effect on the cleavage of Cas12j.19 double-stranded DNA. In addition, the mispairing of the 13th nt target sequence greatly affects the cleavage activity of Cas12j.19 double-stranded DNA. Cas12j.19′s strict mismatch tolerance makes it possible to have a lower off-target rate.

[0629] Although the specific embodiments of the present invention have been described in detail, those skilled in the art will understand that various modifications and changes can be made to the details according to all the teachings that have been published, and these changes are within the protection scope of the present invention. All of the present invention is given by the appended claims and any equivalents thereof.

CRISPR/CAS12J ENZYME AND SYSTEM

Inventors

Cpc classification

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

A61K31/7088

HUMAN NECESSITIES

Classification Explorer

C07K2319/20

CHEMISTRY; METALLURGY

Classification Explorer

A61K38/465

HUMAN NECESSITIES

Classification Explorer

C12N2800/80

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

A61K48/00

HUMAN NECESSITIES

Classification Explorer

C07K2319/70

CHEMISTRY; METALLURGY

Classification Explorer

A61K47/62

HUMAN NECESSITIES

Classification Explorer

C07K2319/00

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/09

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/40

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/907

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/71

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

A61K31/7088

HUMAN NECESSITIES

Classification Explorer

A61K38/46

HUMAN NECESSITIES

Classification Explorer

A61K47/62

HUMAN NECESSITIES

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/90

CHEMISTRY; METALLURGY

Abstract

Claims

Description