UNIVERSAL RIBOSWITCH FOR INDUCIBLE GENE EXPRESSION

Abstract

Aspects described herein relate to methods for controlling expression of RNA and polypeptides of interest using a tuneable self-splicing intron. Specifically, there is provided modified 5 and 3 exons of the T4 td intron which function as a tuneable self-splicing intron that can be introduced to any gene of interest to multiple spots in the open reading frame therefore allowing the intron to be inserted without changing the amino acid sequence of the protein of interest. Methods and a system for inducer controlled modification of a target genomic locus in a cell are also provided herein. The invention further provides kits for expressing an RNA of interest or a polypeptide of interest, and wherein the expression is in transformed host cells under the control of an inducer molecule.

Claims

1. A method for controlling expression of a polypeptide of interest (POI) in a cell, comprising A. providing a cell comprising a polynucleotide construct, the polynucleotide construct comprising: i. a promoter functional in the cell; ii. a polynucleotide portion encoding said P01; and iii. a polynucleotide portion encoding at least one self-splicing intron which includes 5 and 3 exon nucleotide sequences, wherein the self-splicing activity of the intron is controlled by an inducer molecule; wherein the inducer-controlled self-splicing intron is located (a) at or 5 of the start of the polynucleotide portion encoding the POI, or (b) within the polynucleotide portion encoding the P01; B. subjecting the cell to conditions which express polypeptides in the cell and thereby the transcription of the polynucleotide construct into RNA transcripts in the cell; and C. subjecting the cell to conditions which cause a concentration of inducer molecule to promote the self-splicing activity of the intron in the transcripts; thereby resulting in expression of the POI.

2. A method for controlling expression of an RNA of interest (ROI) in a cell, comprising: A. providing a cell comprising a polynucleotide construct, the polynucleotide construct comprising: i. a promoter functional in the cell; ii. a polynucleotide portion encoding the ROI; and iii. a polynucleotide portion encoding at least one self-splicing intron which includes 5 and 3 exon sequences, wherein the self-splicing activity of the intron is controlled by an inducer molecule; wherein the inducer-controlled self-splicing intron is located (a) at or 5 of the start of the polynucleotide portion encoding the ROI, or (b) within the polynucleotide portion encoding the ROI, B. subjecting the cell to conditions which expresses the polynucleotide construct into RNA transcripts in the cell; and C. subjecting the cell to conditions which produces a concentration of inducer molecule which promotes the self-splicing activity of the intron in the RNA transcript to produce the ROI; thereby resulting in the expression of the ROI.

3. A method as claimed in claim 1, wherein the self-splicing intron is 3 of and in-frame with the start codon and the expressed POI comprises an amino acid tag sequence encoded by a polynucleotide sequence which includes the 5 and 3 exon nucleotide sequences of the self-splicing intron rendered contiguous by self-splicing of the intron; preferably wherein the self-splicing intron is directly adjacent to the start codon and the amino acid tag sequence is an N-terminal amino acid tag in the expressed POI.

4. A method as claimed in claim 1 or claim 2, wherein the self-splicing intron is 5 of the polynucleotide portion from which the ROI or POI is expressed and the said polynucleotide is not disrupted by the self-splicing activity of the intron; preferably wherein the self-splicing intron is 5 of the start codon of the polynucleotide encoding the POI.

5. A method as claimed in claim 1 or claim 2, wherein the self-splicing intron is located within the polynucleotide portion encoding the ROI and preferably does not result in a tag sequence in the ROI or POI.

6. A method as claimed in any of claim 1, 3, 4 or 5, wherein the polynucleotide construct further comprises a polynucleotide sequence encoding an additional amino acid sequence.

7. A method as claimed in claim 6, wherein the additional amino acid sequence is a functional moiety, e.g. a protein purification or detection tag, a cellular localization sequence, a fluorescent moiety.

8. A method as claimed in any preceding claim, wherein there are two or more self-splicing introns located 3 and in frame of the start codon.

9. A method as claimed in any preceding claim, wherein there is a single self-splicing intron located 5 of the start of the polynucleotide portion encoding the ROI or POI.

10. A method as claimed in any preceding claim, wherein the inducer molecule is provided to the cell.

11. A method as claimed in any of claims 1 to 9, wherein (a) the inducer molecule is generated as a result of expression of a separate gene in the cell, wherein the separate gene is under the control of different expression regulatory elements; optionally wherein the different expression regulatory elements are responsive to a different inducer molecule and/or physical condition, e.g. temperature; or (b) wherein the inducer molecule is naturally synthesized by the cell in response to chemical and/or physical condition to which the cell is subjected to.

12. A method as claimed in any preceding claim, wherein the self-splicing intron comprises an aptamer which has binding affinity for the inducer molecule.

13. A method as claimed in any preceding claim, wherein the inducer is selected from flavin mononucleotide, thiamine pyrophosphate, s-adenosylmethionine, s-adenosylhomocysteine, adenosylcobalamin, cyclic di-GMP, adenine, guanine, glycine, lysine, theophylline, 3-methylxanthine, caffeine, 1-methylxanthine, 7-methylxanthine, 1,3-dimethyl uric acid, hypoxanthine, xanthine, theobromine tetracycline, neomycin or malachite green; preferably wherein the inducer is theophylline.

14. A method as claimed in any preceding claim, wherein the 5 exon nucleotide sequence and/or 3exon nucleotide sequence of the self-splicing intron are modified compared to the respective wild type exon nucleotide sequence(s) of the intron.

15. A method as claimed in any preceding claim, wherein the self-splicing intron is a group I intron.

16. A method as claimed in any of claims 1 to 14, wherein the self-splicing intron is a group II or a group III intron.

17. A method as claimed in any preceding claim, wherein the 5 exon sequence of the self-splicing intron is NNNNNNGGT (SEQ ID NO: 3) and the 3 exon sequence of the self-splicing intron is CTN (SEQ ID NO: 4), wherein N is A, T, C or G; optionally wherein the exon sequence is TTBYBDGGT (SEQ ID NO: 5) and the 3 exon sequence is CTH (SEQ ID NO: 6), wherein B=G/T/C, Y=C/T, D=G/A/T and H=A/T/C optionally wherein the 5 exon sequence is selected from TCCTCAGGT (SEQ ID NO: 7), TCCTCGGGT (SEQ ID NO: 8), TCCTTGGGT (SEQ ID NO: 9), TCCTCTGGT (SEQ ID NO: 10) or TTCTTGGGT (SEQ ID NO: 11) and the 3 exon sequence is CTA (SEQ ID NO: 12).

18. A method as claimed in any of claims 1 or 3 to 17, wherein the POI is selected from any of: i. a sequence specific DNA/RNA binding protein; preferably a meganuclease (MGN), zinc finger nuclease (ZFN), a TALEN, an RNA-guided nuclease or a DNA-guided nuclease; ii. an RNA-guided nuclease; preferably a Crispr-Cas protein; iii. a sequence-specific DNA binding protein lacking nuclease activity or a nickase; optionally fused to a heterologous functional moiety; preferably wherein the POI is a base editor or a prime editor.

19. A method as claimed in claim 18, wherein the POI is ii) or iii) and the polynucleotide further comprises a portion encoding a targeting RNA molecule, e.g. guide RNA (gRNA) which directs ii) or iii) to a target locus in a DNA sequence.

20. An isolated polynucleotide comprising: i. a promoter functional in a cell; ii. a polynucleotide portion encoding an RNA of interest (ROI) or a polypeptide of interest (P01); and iii. a polynucleotide portion encoding at least one self-splicing intron which includes 5 and 3 exon nucleotide sequences, wherein the self-splicing activity of the intron is controlled by an inducer molecule; wherein the inducer-controlled self-splicing intron is located (a) at or 5 of the start of the polynucleotide portion encoding the ROI or POI, or (b) within the polynucleotide portion encoding the POI or ROI.

21. A polynucleotide as claimed in claim 20, wherein the ROI is translatable into a POI.

22. A polynucleotide as claimed in claim 20 or claim 21, wherein the self-splicing intron is 3 of and in-frame with the start codon and a POI when expressed from the polynucleotide comprises an amino acid tag sequence encoded by a polynucleotide sequence which includes the 5 and 3 exon nucleotide sequences of the self-splicing intron rendered contiguous by self-splicing of the intron; preferably wherein the amino acid tag sequence is an N-terminal amino acid tag in the expressed POI.

23. A polynucleotide as claimed in claim 20 or claim 21, wherein the self-splicing intron is 5 of the polynucleotide portion from which the ROI or POI is expressed and the said polynucleotide is not disrupted by the self-splicing activity of the intron; preferably wherein the self-splicing intron is 5 of the start codon of the polynucleotide encoding the POI.

24. A polynucleotide as claimed in any one of claims 20 to 23, wherein the self-splicing intron is located within the polynucleotide portion encoding the ROI or POI and preferably does not result in a tag sequence in the ROI or POI.

25. A polynucleotide as claimed in any of claims 20 to 24, wherein the polynucleotide construct further comprises a polynucleotide sequence encoding an additional amino acid sequence; optionally wherein the additional amino acid sequence is a functional moiety, e.g. a protein purification or detection tag, a cellular localization sequence, a fluorescent moiety.

26. A polynucleotide as claimed in any of claims 20 to 25, wherein there is a single self-splicing intron located 5 of the start of the polynucleotide portion encoding the ROI or POI.

27. A polynucleotide as claimed in any of claims 20 to 26, wherein the self-splicing intron comprises an aptamer which has binding affinity for the inducer molecule; optionally wherein the inducer is selected from flavin mononucleotide, thiamine pyrophosphate, s-adenosylmethionine, s-adenosylhomocysteine, adenosylcobalamin, cyclic di-GMP, adenine, guanine, glycine, lysine, theophylline, 3-methylxanthine, caffeine, 1-methylxanthine, 7-methylxanthine, 1,3-dimethyl uric acid, hypoxanthine, xanthine, theobromine tetracycline, neomycin or malachite green; preferably wherein the inducer is theophylline.

28. A polynucleotide as claimed in any of claims 20 to 27, wherein the 5 exon nucleotide sequence and/or 3exon nucleotide sequence of the self-splicing intron are modified compared to the respective wild type exon nucleotide sequence(s) of the intron.

29. A polynucleotide as claimed in any of claims 20 to 28, wherein the self-splicing intron is a group I intron.

30. A polynucleotide as claimed in any of claims 20 to 29, wherein the 5 exon sequence of the self-splicing intron is NNNNNNGGT (SEQ ID NO: 3) and/or the 3 exon sequence is CTN (SEQ ID NO: 4), wherein N is A, T, C or G; optionally wherein the 5 exon sequence is TTBYBDGGT (SEQ ID NO: 5) and the 3 exon sequence is CTH (SEQ ID NO: 6), wherein B=G/T/C, Y=C/T, D=G/A/T and H=A/T/C; preferably wherein the 5 exon sequence is selected from TCCTCAGGT (SEQ ID NO: 7), TCCTCGGGT (SEQ ID NO: 8, TCCTTGGGT (SEQ ID NO: 9), TCCTCTGGT (SEQ ID NO: 10) or TTCTTGGGT (SEQ ID NO: 11) and the 3 exon sequence is CTA (SEQ ID NO: 12).

31. A polynucleotide as claimed in any of claims 20 to 30, wherein the POI is selected from i. a sequence specific DNA/RNA binding protein; preferably a meganuclease (MGN), zinc finger nuclease (ZFN), a TALEN, an RNA-guided nuclease or a DNA-guided nuclease; ii. an RNA-guided nuclease; preferably a Crispr-Cas protein; iii. a sequence-specific DNA binding protein lacking nuclease activity or a nickase; optionally fused to an heterologous functional moiety; preferably wherein the POI is a base editor or a prime editor.

32. A polynucleotide as claimed in claim 31, wherein the POI is ii) or iii) and the polynucleotide further comprises a portion encoding a targeting RNA molecule, e.g. a guide RNA (gRNA) which directs the ii) or iii) to a target locus in a DNA sequence; optionally wherein the gRNA is under the control of a self-splicing intron.

33. An expression vector comprising a polynucleotide of any of claims 20 to 32.

34. A transformed cell for inducer molecule-controlled expression of an RNA of interest (ROI) or polypeptide of interest (POI) thereby, wherein the cell comprises a polynucleotide of any of claims 20 to 32, or an expression vector of claim 33.

35. A kit for expressing an RNA of interest (ROI) or a polypeptide of interest (POI) and wherein the expression is under the control of an inducer molecule comprising: i. a composition comprising a polynucleotide of any of claims 20 to 32, or an expression vector of claim 33, or a transformed cell of claim 34; and ii. a composition comprising an inducer molecule which activates self-splicing activity of a self-splicing intron when expressed in a cell.

36. A system for generating an RNA of interest (ROI) or a polypeptide of interest (POI), comprising a transformed cell of claim 34.

37. A method of inducer controlled modification of a target genomic locus in a cell, comprising introducing or generating in the cell a ribonuclease complex comprising a Crispr-Cas nuclease and a gRNA molecule for the target genetic locus; wherein the Crispr-Cas nuclease and/or the gRNA is comprised as the POI and/or ROI in a polynucleotide of any of claims 20 to 32 or an expression vector of claim 33; and subjecting the cell to a condition which causes a concentration of inducer molecule to promote the self-splicing activity of the intron, thereby resulting in expression of the Crispr-Cas nuclease and/or gRNA in the cell; optionally wherein an homologous repair (HR) template encoded by the same or different polynucleotide or expression vector, and the HR template is expressed in the cell.

38. A method of inducer-controlled base editing of a target genomic locus in a cell, comprising: A. introducing or generating in the cell a ribonuclease complex comprising a base editor and a gRNA molecule for the target genetic locus, wherein the base editor and/or gRNA is comprised as the respective ROI or POI in a polynucleotide or polynucleotides of any of claims 20 to 32 or an expression vector of claim 33; and B. (a) providing inducer molecule to the cell, or (b) subjecting the cell to a condition which causes a concentration of inducer molecule to promote the self-splicing activity of the intron, thereby resulting in expression of the base editor and/or gRNA in the cell.

39. A method of inducer-controlled prime editing of a target genomic locus in a cell, comprising: A. introducing or generating in the cell a ribonuclease complex comprising a prime editor and a prime editing guide RNA (pegRNA) molecule for the target genetic locus, wherein the prime editor and/or pegRNA is comprised as the respective ROI or POI in a polynucleotide or polynucleotides of any of claims 20 to 32 or an expression vector of claim 33; and B. (a) providing inducer molecule to the cell, or (b) subjecting the cell to a condition which causes a concentration of inducer molecule to promote the self-splicing activity of the intron, thereby resulting in expression of the prime editor and/or pegRNA in the cell.

40. A method as claimed in any of claims 37 to 39, wherein the inducer molecule is provided to the cell.

41. A method as claimed in any of claims 37 to 39, wherein (a) the inducer molecule is generated as a result of expression of a separate gene in the cell, wherein the separate gene is under the control of different expression regulatory elements; optionally wherein the different expression regulatory elements are responsive to a different inducer molecule and/or physical condition, e.g. temperature; or (b) the inducer molecule is naturally synthesized by the cell in response to chemical and/or physical condition to which the cell is subjected to.

42. A method as claimed in any of claims 37 to 40, wherein a first polynucleotide comprises a self-splicing intron under the control of a first inducer molecule, and a second polynucleotide of comprises a self-splicing intron which is under the control of a second different inducer molecule.

43. A system for inducer controlled genetic modification of a cell, comprising at least a first expression vector, the first expression vector comprising a polynucleotide of any of claims 20 to 32, wherein the respective POI or ROI is selected from: A. a Crispr-Cas nuclease, and/or B. a gRNA, and/or C. an HR template.

44. A system for inducer controlled genetic modification of a cell, comprising at least a first expression vector, the first expression vector comprising a polynucleotide of any of claims 20 to 32, wherein the respective POI or ROI is selected from: A. a base editor, and/or B. a gRNA

45. A system for inducer controlled genetic modification of a cell, comprising at least a first expression vector, the first expression vector comprising a polynucleotide of any of claims 20 to 32, wherein the respective POI or ROI is selected from: A. a prime editor, and/or B. a pegRNA

46. A system as claimed in any of claims 43 to 45, wherein each individual POI and/ROI is under the control of a respective self-splicing intron.

47. A system as claimed in any of claims 43 to 46, wherein a first polynucleotide comprises a self-splicing intron under the control of a first inducer molecule, and a second polynucleotide comprises a self-splicing intron which is under the control of a second different inducer molecule.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0140] Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which:

[0141] FIG. 1 is a diagram of the predicted secondary and tertiary structure of the T4 td intron.

[0142] FIG. 2 is a simplistic cartoon version of the T4 td intron as shown in FIG. 1.

[0143] FIG. 3 shows the nucleotide sequences and structures of wild type P6a loop (left) and theophylline aptamer P6a loop (right) structures of the P6a loop of the T4 td intron

[0144] FIG. 4A shows greater detail of the 5 and 3 exonic sequences of the T4 td intron.

[0145] FIG. 4B shows the nucleotide positions of the 5 and 3 exons sequences of the T4 td intron which are modifiable in pursuit of the invention.

[0146] FIG. 5 is a schematic representation of the interruption of the LacZ gene with the T4 td intron. LacZ is interrupted with the T4 td intron.

[0147] FIG. 6 shows the results of LacZ activity for position 7 mutants. The asterisk indicates the WT intron.

[0148] FIG. 7 shows the results of LacZ activity of position +296 mutants. The asterisk indicates the wild-type intron.

[0149] FIG. 8 shows the results of LacZ activity of all possible combinations for pair (P), wobble pair (W) and mismatch (M) at positions 6 to 4. The wild type intron (*) is PWM/(UUG) and set to 1.

[0150] FIG. 9A is a cartoon of the insertion of the modified T4 td intron at the GOI.

[0151] FIG. 9B shows the insertion side at the FnCas12a gene.

[0152] FIG. 10 is a schematic representation of multiple introns inserted at different positions at the GOI.

[0153] FIG. 11 is a cartoon of a T4 td intron containing the theophylline aptamer at the P6 loop.

[0154] FIG. 12A shows the steps in DNA to RNA to Protein transcription of the intron-GOI constructs of the invention.

[0155] FIG. 12 B shows the steps in DNA to RNA to Protein transcription with induction of the theophylline dependent intron-GOI constructs of the invention

[0156] FIG. 12C compares the steps in DNA to RNA to protein transcription with the induction/translation of the tagged-intron-GOI constructs of the invention.

[0157] FIG. 12D shows theophylline dependent T4 td introns introduced as tags after the start codon ATG of the gene of interest (GOI).

[0158] FIG. 13 shows steps in DNA to RNA to protein transcription with the intron positioned directly before the ATG start codon of the gene of interest.

[0159] FIG. 14 shows full sequence information for (top) the self-splicing intron just after the ATG of the GOI, and (bottom) the self-splicing intron just before the ATG of the GOI.

[0160] FIG. 15 is a photograph of plates showing induced activity of FnCas12a in E. coli MG1655 transformed with the series of plasmids listed in Table 4.

[0161] FIG. 16. SIBR-Cas genome editing assays in E. coli MG1655. (A) Editing efficiency of the LacZ gene in E. coli MG1655. Blue/white screening was performed to distinguish the edited (white) from the unedited (blue) colonies when using either of the four different SIBR-Cas variants Int1 (0%), Int2 (1%0.35%), Int3 (49%8.75%) and Int4 (80%5.57%) or the WT-FnCas12a (0%, n.d.). The percentage on top of each variant indicates the percentage of white colonies from the total number of colony forming units mL-1 (CFUs mL-1). (B) Representative plates of edited E. coli MG1655 cells at the LacZ locus using the four different SIBR-Cas variants (Int1-4; Int1 is the worst and Int4 is the best splicer). (C) Unbiased (omitting the presence of X-Gal in the medium) editing efficiency of the LacZ gene using the four different SIBR-Cas variants Int1 (0%), Int2 (0%), Int3 (29%18.04%), Int4 (38%6.41%) or the WT-FnCas12a (0%, n.d.). All of the NT controls showed 0% targeting efficiency. N.d.: not determined

[0162] FIG. 17 is a photograph of plates showing induced activity of FnCas12a in P. putida. P. putida was transformed with a series of plasmids listed in Table 6.

[0163] FIG. 18 shows the results of knock-out efficiency of the FlgM gene in P. putida using the Tagged-intron variants.

[0164] FIG. 19 shows the plasmid containing theophylline induced self-splicing intron and FnCas12a used to transform Flavobacterium IR1, the silent mutations introduced into the FnCas12a and the process of cleavage and ligation to produce a functional Cas12a.

[0165] FIG. 20 shows the experimental protocol for transformation and induction of Flavobacterium sp. IR1.

[0166] FIG. 21A is a schematic representation of SprF gene deletion in the genome of IR1.

[0167] FIG. 21B is a photograph of transformed Flavobacterium sp. IR1 cells grown on agar plates. Mutants are on the left side and WT are on the right side. FIG. 21C is a photograph of an Agarose gel electrophoresis of PCR of colonies (1-16) of transformed Flavobacterium sp. IR1 compared to WT grown for 72 hours. Also shown is a Sanger sequencing result of mutant and WT colonies.

[0168] FIG. 21D is a photograph of an Agarose gel electrophoresis of PCR of colonies (1-16) of transformed Flavobacterium sp. IR1 compared to WT grown for 96 hours. Also shown is a Sanger sequencing result of mutant and WT colonies.

[0169] FIG. 22A shows the transformation efficiencies for plasmids in Flavobacterium IR1.

[0170] FIG. 22B shows the theophylline toxicity for in Flavobacterium IR1y when in the growth medium.

[0171] FIG. 23 is an adopted diagram of the procedure for obtaining knock-outs in Flavobacterium IR1.

[0172] FIG. 24 shows the results for editing efficiencies of the SprF gene of Flavobacterium IR1 using the four different tagged-intron variants.

[0173] FIG. 25. Constructs used for testing the functionality of SIBR in the yeast Saccharomyces cerevisiae. pUDE731 constitutively expresses the WT FnCas12a. PL-319 splits the FnCas12a with a T4 td intron (which does not contain an aptamer) to a 5 and 3 exon. The intron is inserted before the RuvC I domain and the 5 and 3 flanking regions have been adopted to avoid amino acid change but still maintain splicing of the intron. PL-320 splits the FnCas12a with a SIBR T4 td intron (which contains a theophylline aptamer) to a 5 and 3 exon. The intron is inserted before the RuvC I domain and the 5 and 3 flanking regions have been adopted to avoid amino acid change but still maintain splicing of the intron.

[0174] FIG. 26. SIBR-Cas is functional in the yeast S. cerevisiae. pUD7E31, PL319 or PL320 were co-transformed either with a plasmid containing a non-targeting (NT) or a targeting (T) crRNA. Transformants were serially diluted (100, 10-1, 10-2) and plated on selective medium containing different concentrations of the theophylline inducer (0, 5, 10, mM).]

DETAILED DESCRIPTION

[0175] Ribozymes and riboswitches are gene regulation systems found in a wide range of bacterial species. The catalytic and/or regulatory functionality of these RNA molecules relies on their primary, secondary and tertiary structures, making them great candidates for developing universal tools for regulating gene expression, without the use of proteins (Breaker, R. R. Riboswitches and the RNA world. Cold Spring Harbor perspectives in biology 4, a003566 (2012); Park, S. V. et al. Catalytic RNA, ribozyme, and its applications in synthetic biology. Biotechnology advances 37, 107452 (2019); Serganov, A. & Nudler, E. A decade of riboswitches. Cell 152, 17-24 (2013); Serganov, A. & Patel, D. J. Ribozymes, riboswitches and beyond: regulation of gene expression without proteins. Nature Reviews Genetics 8, 776-790 (2007); Weinberg, C. E., Weinberg, Z. & Hammann, C. Novel ribozymes: discovery, catalytic mechanisms, and the quest to understand biological function. Nucleic acids research 47, 9480-9494 (2019)) To this end, several studies used ribozymes and riboswitches to control the expression of a gene of interest (G01), but also for regulating the activity and function of CRISPR-Cas (Zhao, J., et al. Development of aptamer-based inhibitors for CRISPR/Cas system. Nucleic Acids Research (2020); Caadas, I.s.C., et al. RiboCas: a universal CRISPR-based editing tool for Clostridium. ACS synthetic biology 8, 1379-1390 (2019); Tang, W., Hu, J. H. & Liu, D. R. Aptazyme-embedded guide RNAs enable ligand-responsive genome editing and transcriptional activation. Nature communications 8, 1-8 (2017); Siu, K.-H. & Chen, W. Riboregulated toehold-gated gRNA for programmable CRISPR-Cas9 function. Nature chemical biology 15, 217-220 (2019). Kundert, K. et al. Controlling CRISPR-Cas9 with ligand-activated and ligand-deactivated sgRNAs. Nature communications 10, 1-11 (2019); Park, S. V. et al. Catalytic RNA, ribozyme, and its applications in synthetic biology. Biotechnology advances 37, 107452 (2019)). Although quite successful, these approaches leave room for improvement. For example, the technology developed by Tang et al. (2017) requires base pairing of the CRISPR spacer sequence with the 5 end of the hammerhead ribozyme; something that requires modification in case the CRISPR spacer needs to be changed. Moreover, the studies by Kundert et al. (2019), Siu et al. (2019) and Zhao et al. (2020) rely on the secondary structure of the Cas9 single guide RNA (sgRNA), which rules out the use of other CRISPR-Cas systems. Lastly, the RiboCas technology developed by Caadas et al. (2019), regulates the expression of Cas9 by masking the RBS with a theophylline-dependent riboswitch. Whereas this technology is a smart alternative to previous approaches, it can be cumbersome to use either in organisms that do not use the canonical RBS sequence, or in cases that the secondary structure of the 5 UTR sequence interferes with the theophylline aptamer (Chen, S., Bagdasarian, M., Kaufman, M. & Walker, E. Characterization of strong promoters from an environmental Flavobacterium hibernum strain by using a green fluorescent protein-based reporter system. Appl. Environ. Microbiol. 73, 1089-1100 (2007); Gmez, E., lvarez, B., Duchaud, E. & Guijarro, J. A. Development of a markerless deletion system for the fish-pathogenic bacterium Flavobacterium psychrophilum. PLoS One 10, e0117969 (2015); Accetto, T. & AvgAtin, G. Inability of Prevotella bryantii to form a functional Shine-Dalgarno interaction reflects unique evolution of ribosome binding sites in Bacteroidetes. PloS one 6 (2011)).

[0176] The inventors substituted the Wild Type (VVT) P6a loop of the T4 td intron with a theophylline responsive aptamer (see FIG. 3) to inducibly control the translation of the thymidylate synthase gene with a variation of theophylline responsive aptamers. The intron described by Thompson et al. (2002) Supra cannot be transferred to other genes (other than the td gene) as that would cause disruption of the amino acid sequence of the POI. Disruption of the POI is caused mainly because of the 5 and 3 exon sequences of the intron as they are part of the intron but they are retained in the mRNA after splicing (see FIG. 4A). Transferring the WT T4 td intron to another gene will disrupt the amino acid sequence of the encoded protein, leading to a non-functional protein. Therefore, such riboswitch is not universal and it is restricted to the td gene.

[0177] To create a universal T4 td intron riboswitch, the inventors introduce modifications to the intron allowing it to be transferred to any gene of interest without compromising its splicing activity. The modifications are located in the 5 and 3 exon sequences of the T4 td intron (FIGS. 4A and 4B). LacZ colorimetric assays were performed in Escherichia coli. A simplified construction system was then used in the form of Tagged-intron variants to control the targeting activity of FnCas12a in E. coli. The system was then transferred into industrially relevant bacterial strains of Pseudomonas putida and Flavobacterium IR1.

[0178] When converting the T4 td intron into a universal riboswitch, certain modifications were introduced to the intron, allowing it to be transferred into any gene of interest without compromising its activity. The use of the inducer controlled self-splicing intron to control CRISPR-Cas proteins was found to solve the problem of how to engineer some prokaryotes which have proved intractable previously to attempts to modify them with a Crispr-Cas approach, as previous attempts failed to do so (e.g. Flavobacterium IR1).

[0179] In more detail, the inventors explored the role of the 5 exon and 3 exon sequences of the td intron and determined its splicing activity by substituting the relevant bases in the 5 exon and 3 exon (see FIGS. 4A and 4B). Referring to FIG. 4B, this shows all possible nucleotide sequences at the 5 and 3 exon sequences of the T4 td intron, wherein N represents any nucleotide A, T, G or C. Then 9 to 1 represent the 5 exon sequence of the T4 td intron. The +294 to +296 positions represent the 3 exon sequence of the T4 td intron. There are 16,777,216 possibly combinations of nucleic acid changes at the 9 to 1 plus +294 to +296 positions. Of the 5 exon sequence, positions 3 to 1 are kept wild type to maintain the self-splicing ability of the intron. Of the 3 exon sequence, positions +294 and +295 are kept wild type to maintain the self-splicing ability of the intron.

[0180] Initially the inventors substituted the 7 and +296 positions of the 5 exon and 3 exon, respectively, and by inserting the different variants into the LacZa gene and by performing assays in E. coli (see Examples 1 to 3). The, positions 6, 5 and 4 of the 5 exon of the td intron were tested. This defined several base substitutions which either allowed more self-splicing and therefore more LacZa activity, or less self-splicing and therefore less LacZa activity.

[0181] The inventors then further modified the 5exon and 3exon sequences of the intron in order to control/titrate its self-splicing activity, or to introduce it in multiple sites in the Open Reading Frame (ORF) of the Gene Of Interest (GOI). The inventors were successful in transferring the self-splicing intron to any GOI at different positions in the ORF.

[0182] Altered splicing efficiency by changing the base pair interactions at the P1 stem of the T4 td intron was previously observed by Pichler A. & Schroeder R. (2002) Folding Problems of the 5 Splice Site Containing the P1 Stem of the Group I Thymidylate Synthase Intron J. Biol. Chem 277 (20) 17987-17993, who created two mutant variants to either stabilize (4A, 5C, 6T) or destabilize (4C, 5A, 6C) the base pair interactions at the P1 stem and noticed increased splicing efficiency for both the stabilized and the destabilized variants compared to the WT intron. However, these results are contradicting to the present results, as stabilization (4A, 5C, 6T) of the P1 stem decreased the splicing efficiency by approximately 80% (compared to the WT intron) in our setup (FIG. 8D). Moreover, although we do not have generated an exact replica of the destabilized variant from Pichler et al. (2002), we do have a destabilized P1 stem with mismatches at 4 (T), 5 (G) and 6 (C). Similarly, a decrease (approximately 30%) in splicing efficiency compared to the WT intron was observed in our setup. The observed differences in splicing efficiency by stabilizing or destabilizing the P1 stem may be attributed to the different experimental setup as we investigated splicing efficiency based on enzymatic activities whereas Pichler et al. (2002), performed cis splicing assays by isolating total RNA from E. coli cells carrying the different intron variants. It is, therefore, very likely that the total RNA may give a wrong impression of the total protein concentration generated by the spliced T4 td intron variants. In addition, the translocation of the T4 td intron directly after the ATG start codon, may have affected the splicing efficiency of the intron in an unpredictable way. Our outcome is surprising and novel compared to Pichler et al. (2002), as they report contradicting results to our results and as the position of the intron is different.

[0183] The inventors further successfully provide a universal TAG sequence whereby the intron is introduced just after the ATG start codon and therefore is gene/protein independent. The TAG sequence leaves a 4 amino acid tag at the N-terminus of the protein of interest (P01) just after the methionine (m) encoded by the start codon. This tag sequence does not usually hinder the activity of the expressed protein as it consists only of 4 amino acids. A cleavage sequence of a TEV protease cleavage site can be added directly after the Tag sequence and then cleaved with proteases afterwards. The cleavage leaves a single amino acid attached to the protein of interest. Other cleavage sequences and proteases well known in the art may be used, e.g. https://web.expasy.org/peptide_cutter/ and https://web.expasy.org/peptide_cutter/peptidecutter_enzymes.html.

[0184] Using different versions of tag-introns, the inventors are able to control expression of a GOI at the protein level which gives the advantage of titration. Tag sequences are chosen from those shown in FIG. 12D. Tag1: 4P, 5P, 6P; Tag2: 4W, 5P, 6P; Tag3: 4W, 5W, 6P; Tag4: 4M, 5P, 6P. What the inventors have found is that self-splicing activity was in the following order: Tag1>Tag2>Tag3>Tag4, where Tag1 is the most tight and Tag4 is the most loose intron. Schematic representations of the tags are shown in FIG. 12.

[0185] The addition of Tags has been successfully tested in E. coli, P. putida and Flavobacterium IR1 by inserting Tagged introns after the start codon of Cas12a. This approach allowed efficient editing of the bacterium of interest. More specifically, for P. putida editing efficiencies of up to 75% were reached with Tag4 (FIG. 18). In addition, the non-model Flavobacterium IR1, which was never edited before with CRISPR-Cas, was easily and efficiently engineered with efficiencies reaching 100% (FIG. 24).

[0186] The invention is applicable to any self-splicing intron and these are found in many species of bacteriophage, bacteria, protozoa and fungi, for example. The self-splicing introns are usually found embedded in specific genes of a species or strain. For example, the T4 td self-splicing intron is located in the td gene of the T4 bacteriophage.

[0187] Other self-splicing introns from bacteriophages are: T6: td, RB3: td, LZ2: td, TulA: td, 1: DNA polymerase, W31: DNA polymerase, Pf-WMP3: DNA polymerase, 822: td, SPO1: DNA polymerase, SP82: DNA polymerase, cpe: DNA polymerase, SPb prophage (Ribonucleotide reductase (bnrdE and bnrdF)), Sb3: lysin, rlt: ORF40, LLH: Terminase, Twort (introns nrdE-11 & nrdE-12): ORF142.

[0188] Examples of self-splicing introns from bacteria are: Agrobacterum tumefaciens A136: tRNA.sup.ArgCCU, Azoarcus sp. strain BH72: tRNA.sup.IleCAU, Coxiella burnetii (Cbu.L1917): 23S rRNA, Coxiella burnetii (Cbu.L1951): 23S rRNA, Thermotoga neapolitana NS-E Tna.bL1931: 23S rRNA, Thermotoga subterranea SL1 Tsu.bL1926: 23S rRNA, Clostridium botulinum: tmma pos. 338, Geobacillus stearothermophilus (NBRC 12550): flagellin, Bacillus sp. Kps3: flagellin, Clostridium difficile strain 630: CD3246, Anabaena PCC7120: tRN.sup.LeuUAA, Scytonema hofmanii: RNA.sup.fMet, Synechocystis PCC 6803: RNA.sup.fMet, Neochloris aquatica: ml pos. 1931, Calothrix sp. strain PCC7601: Cal.x1, Calothrix sp. strain PCC7101: Cal.x2, L. lactis ML3: LI.LtrB, L. lactis 712: IntL, S. meliloti GR4: RmInt1.

[0189] Examples of self-splicing introns from Protozoa are: Tetrahymena thermophila (Tth.L1925): 26S rRNA, Didymium iridis (Dir.S956-1): SSU rDNA, Didymium iridis (Dir.S956-2): SSU rDNA, Physarum polycephalum (Ppo.L1925): LSU rDNA, Amoebidium parasiticum: ml, pos. 2500 and ml, pos. 1403, Naegleria (NaGIR1 and NaGIR2): SSU rRNA.

[0190] Examples of self-splicing introns from Fungi are: Neurospora crassa: ml, pos. 2449, Saccharomyces cerevisae (Sc.OX1,3): SSU rDNA, Candida albicans: 25S rRNA, Scytalidium dimidiatum (rns, pos. 1199).

[0191] Examples of self-splicing introns from other miscellaneous organisms are: Simkania negevensis Z.sup.T: 23S rRNA, Chlamydomonas nivalis: rnl, pos 2593, Dunaliella parva: rnl, pos. 1931, Aureoumbra lagunensis: SSU rRNA, Bangia atropurpurea: SSU rRNA.

[0192] Calothrix sp. strain PCC7601: Cal.x1, Calothrix sp. strain PCC7101: Cal.x2, L. lactis ML3: LI.LtrB, L. lactis 712: IntL, S. meliloti GR4: RmInt1 are Group II introns, while all others are Group I introns.

[0193] Examples of Group III introns include the Euglena gracilis introns found in the psbC, rps18, ycf8, ycf13, rpoCl, rp116, psbF, rps3, rp123, rps18, rps19, rp114, rps8, rps14, rp116, psbK genes.

[0194] A unique type of ribozymes includes the self-splicing Group I introns. Group I introns have been described to control gene expression and RNA processing in bacteria and phages but also in some eukaryotes (protozoa and plants) (Hausner, G., Hafez, M. & Edgell, D. R. Bacterial group I introns: mobile RNA catalysts. Mobile DNA 5, 1-12 (2014); Edgell, D. R., Belfort, M. & Shub, D. A. Barriers to intron promiscuity in bacteria. Journal of Bacteriology 182, 5281-5289 (2000); Nielsen, H. & Johansen, S. D. Group I introns: moving in new directions. RNA biology 6, 375-383 (2009)). Due to their prevalence and simplistic nature, Group I introns have the potential to be used as universal, synthetic ribozymes to control gene expression. Especially when ribozymes are associated with a specific ligand-binding sequence (RNA aptamer), the presence/absence of such a ligand allows for switching ON/OFF the splicing activity (riboswitch), potentially controlling the expression of an associated gene. An example of a natural Group I intron-based riboswitch has been discovered in the bacterium Clostridium difficile, where its sequence resides between the RBS and the ATG start codon of an adjacent gene. After transcription, this results in a secondary structure in the 5-UTR that prevents recruitment of the ribosome, hence hampering translation initiation. After induction by intracellular GTP or c-di-GMP, this ribozyme induces its splicing from the precursor transcript, resulting in appropriate re-positioning of the RBS upstream the start codon, thereby allowing for the ribosome to start the translation process (Lee, E. R., Baker, J. L., Weinberg, Z., Sudarsan, N. & Breaker, R. R. An allosteric self-splicing ribozyme triggered by a bacterial second messenger. Science 329, 845-848 (2010); Chen, A. G., Sudarsan, N. & Breaker, R. R. Mechanism for gene control by a natural allosteric group I ribozyme. Rna 17, 1967-1972 (2011)). Although this natural mechanism is a beautiful case of gene expression control, its requirement for specific endogenous inducers (GTP and c-di-GMP) as well as its dependency on specific secondary structures (including both the ribozyme and the coding sequence) complicates its general applicability. A synthetic alternative was provided by Thompson et al. (2002), when they combined the self-splicing Group I intron of the T4 bacteriophage with a theophylline aptamer towards a functional inducible gene expression system (Thompson, K. M., Syrett, H. A., Knudsen, S. M. & Ellington, A. D. Group I aptazymes as genetic regulatory switches. BMC biotechnology 2, 21 (2002)). Although this system was restricted to controlling the original thymidylate synthase (td) gene, we here describe its repurposing as a generic system to tune gene expression.

[0195] The inventors have also created a novel system termed Self-splicing Intron Based Riboswitch Cas (SIBR-Cas). This is created using the Group I-based aptazyme to enhance recombination in prokaryotes. The inducer controlled T4 td intron (containing an in-frame stop codon) is inserted into a CRISPR-Cas nuclease gene (Cas12a, for example) resulting in incomplete translation and avoiding formation of a functional CRISPR-Cas nuclease. Then, upon exposure to theophylline, this triggers the induction of a conformational change in the synthetic riboswitch which induces the self-splicing activity of the td intron resulting in the excision of the intron and the joining of the 5 exon to the 3 exon. This restores the complete mRNA of the CRISPR-Cas gene which consequently leads to the functional expression/translation of the CRISPR-Cas nuclease. In the particular example of the Cas12a protein, by controlling the expression, a time series can be made to find the appropriate induction time for counter-selection by Cas12a, thereby increasing the chances of generating correct HDR-based mutants.

[0196] So long as the relevant inducer, e.g. theophylline, can reach the self-splicing intron, then the SIBR-Cas system can be used in any organism. The advantages of such a technology are: [0197] Tight control of the GOI (in this case the Cas protein) at the mRNA level. Complete, functional protein will be formed after the induction with theophylline [0198] Universalitythe intron can be introduced to virtually any GOI, in any archaeon, bacterium or eukaryote as long as the inducer can enter the cell of interest, at least at moderate temperatures [0199] No complex design is required as a tag sequence can be used for the insertion of the intron at the beginning of the GOI [0200] The only option for engineering non-model organisms with high AT %, low HDR efficiencies and no inducible (or characterised) promoters (see example of Flavobacterium IR1 below)

[0201] The SIBR-Cas tool can be applied for editing virtually any GOI in any cell of interest. The inventors have applied SIBR-Cas to Flavobacterium IR1.

[0202] Suitable nucleases to be used in the methods described herein are selectable at the option of the skilled person. A choice may depend upon the optimal growth temperature of the particular microbe being used. The CRISPR-Cas nucleases may be selected from any Cas Type I, Type II or Type III. More particularly, the Cas may be selected from Cas9, Cas12a (previously known as Cpf1) or Cas13 (previously known as C2c2); also any of Caw, Cas12b, Cas12c, Cas13a,b,c,d, Cas4, Csn2, Csf1, Csx10, Csx11, Cmr5, Csm2, Cas10, Csy1,2,3, Cse1,2, Cas10d, Cas8a,b,c, Cas5 or Cas3. The CRISPR-Cas nucleases may any variant from any species, whether well-known, e.g. from Streptococcus pyogenes (SpyCas9), or less commonly used such as from Geobacillus thermodenitrificans T12 (ThermoCas9) or Geobacillus stearothermophilus (GeoCas9). Methods described herein may preferably use Cas9, preferably Streptococcus pyogenes Cas9; or C2c1. Alternatively, methods described herein may preferably use Cas 12a (Cpf1). Further alternative nucleases suitable for the methods described herein are C2C3 or Argonaute. It is also contemplated that the methods described herein may use other nucleases such as zinc finger nucleases (ZFNS), meganucleases or transcription activator effector like nucleases (TALENS

[0203] In order that expression of any of the polynucleotide constructs or expression vectors of the invention described herein can be carried out in a chosen host cell, the these incorporate regulatory elements which allow expression in the host cell of interest and preferably which facilitate high-levels of expression. Such regulatory sequences may be capable of influencing transcription or translation of a gene or gene product, for example in terms of initiation, accuracy, rate, stability, downstream processing and mobility.

[0204] Such elements may include, for example, strong and/or constitutive promoters, 5 and 3 UTR's, transcriptional and/or translational enhancers, transcription factor or protein binding sequences, start sites and termination sequences, ribosome binding sites, recombination sites, polyadenylation sequences, sense or antisense sequences, sequences ensuring correct initiation of transcription and optionally poly-A signals ensuring termination of transcription and transcript stabilisation in the host cell. The regulatory sequences may be plant-, animal-. bacteria-, fungal- or virus derived, and preferably may be derived from the same organism as the host cell. Clearly, appropriate regulatory elements will vary according to the host cell of interest. For example, regulatory elements which facilitate high-level expression in prokaryotic host cells such as in E. coli may include the pLac, T7, P(Bla), P(Cat), P(Kat), trp or tac promoters. Regulatory elements which facilitate high-level expression in eukaryotic host cells might include the AOX1 or GAL1 promoter in yeast or the CMV- or SV40-promoters, CMV-enhancer, SV40-enhancer, Herpes simplex virus VIP16 transcriptional activator or inclusion of a globin intron in animal cells. In plants, constitutive high-level expression may be obtained using, for example, the Zea mays ubiquitin 1 promoter or 35S and 19S promoters of cauliflower mosaic virus.

[0205] Suitable regulatory elements may be constitutive, whereby they direct expression under most environmental conditions or developmental stages, developmental stage specific or inducible. Suitably, promoters may be chosen which permit expression of the protein of interest at particular developmental stages or in response to extra- or intra-cellular conditions, signals or externally applied stimuli. For example, a range of promoters exist for use in E. coli which give high-level expression at particular stages of growth (e.g. osmY stationary phase promoter) or in response to particular stimuli (e.g. HtpG Heat Shock Promoter).

[0206] Suitable expression vectors may comprise additional sequences encoding selectable markers which allow for the selection of said vector in a suitable host cell and/or under particular conditions.

[0207] Regarding transformation of a host cell with an heterologous gene sequence, expression constructs comprising the polynucleotide sequences of the invention may be located in plasmids (expression vectors) which are used to transform the host cell. Methods of transformation may include but are not limited to; heat shock, electroporation, particle bombardment, chemical induction, microinjection and viral transformation, Agrobacterium-mediated transformation, PEG-mediated transformation, lipofection.

[0208] As well as a ROI or POI, the polynucleotides of the invention as described herein may include a selectable marker protein. This may be used to screen cell populations positively or negatively. For example, the expression of a particular POI in a host cell may be coupled to relief of an auxotrophic deficit, it will be appreciated that such selectable markers may include polynucleotide sequences encoding proteins to which the cell is fatally sensitive. In these embodiments of the invention, the presence of the desired product may be coupled to the restoration of translation of the reporter protein. In this way host cells expressing the protein of interest may be selected from those which do not express the protein of interest.

[0209] Where the expression of a particular POI in a host cell is coupled to promotion of cell growth and/or division, it will be appreciated that such selectable markers may include polynucleotide sequences encoding proteins which promote cell growth and/or division. In these embodiments of the invention, the presence of the desired product may be coupled to the restoration of translation of the reporter protein. In this way host cells expressing the protein of interest may be selected from those which do not express the protein of interest.

[0210] The polynucleotides may include a reporter protein which may be assayed for or monitored for. Such reporter proteins include for example Green Fluorescent Protein (GFP), Yellow Fluorescent Protein (YFP), Red Fluorescent Protein (RFP), Cyan Fluorescent Protein (CFP), or Luciferase fusion tags. The reporter protein may be an enzyme which can be used to generate an optical signal. Alternatively, the expression vector may incorporate a polynucleotide reporter encoding a luminescent protein, such as a luciferase (e.g. firefly luciferase). Alternatively, the reporter gene may be a chromogenic enzyme which can be used to generate an optical signal, e.g. a chromogenic enzyme (such as beta-galactosidase (LacZ) or beta-glucuronidase (Gus)).

[0211] Tags used for detection of reporter protein expression may also be antigen peptide tags. A cleavable tag may also be provided for affinity purification, e.g. a polyhistidine tag. It is envisaged that other types of label may also be used to indicate expression of the reporter protein including, for example, organic dye molecules or radiolabels. In particular, preferred expression vectors will include sequences encoding a fluorescent protein, for example GFP which will enable the screening and optionally separation (selection) of a cell which expresses the protein of interest for example by Fluorescence Activated Cell Sorting (FACS).

EXAMPLES

Example 1: Effect of Position 7 on the Self-Splicing of the T4 td Intron

[0212] The flanking regions (5 and 3 exons) of the group I introns are part of the coding sequence as well as of the ribozyme (see FIGS. 1, 2 and 4). The T4 td intron structure as shown in the figures follows the format of Cech, T. R., et al., (1994) Representation of the secondary and tertiary structure of group I introns Nature Structural Biology 1 (5): 273-280. Uppercase letters indicate the intron, lowercase letters the exons. Arrows indicate the splice site. Boxed portions can be replaced by the theophylline aptamer to generate a theophylline-dependent aptazyme. Referring to FIG. 2, the intron loops are shown as P1 to P10. The light grey boxes indicate the 5 and 3 exon interactions with the intron. Horizontal or vertical lines within the loops indicate base pairing, whereas the black solid circles within the loops indicate wobble pairing. Light grey arrows indicate the splice site. FIG. 3 shows wild type and theophylline aptamer structures of the P6a loop of the T4 td intron. Left hand portion of FIG. 3 is the Wild Type (WT) P6a loop of the T4 td intron. Right hand portion of FIG. 3 is the theophylline aptamer which replaces the WT P6a loop for inducible splicing of the intron.

[0213] When inserting the intron into another gene it is almost impossible to retain both the intron flanking regions and the CDS. Applying minor changes to the CDS with synonymous codons may create a site that resembles the wild type intron flanking regions. However, it is not clear to which extent the flanking regions determine the splicing efficiency.

[0214] To investigate the effect of the flanking regions of the T4 td intron on its splicing efficiency and on the expression of the target gene, a series of constructs were made containing the lacZ gene from E. coli with the intron in between amino acids D6 and S7 (see FIG. 5). These amino acids were identified because insertion at this location would be least likely to impact on the structure of the protein. The lacZ gene was used because its functional expression can be easily monitored by a colorimetric assay and because of its high tolerance for modification at the 5 end. LacZa is interrupted with the T4 td intron and therefore non-functional. Upon self-splicing and excision of the intron, the ORF of LacZa is complete and functional. LacZa encodes for -galactosidase which is able to hydrolyse ortho-nitrophenyl--D-galactopyranoside (ONPG) into -D-galactose and ortho-nitrophenol (ONP). ONP can be measured through colorimetric assays. Mutations were made by PCR in the 5 flank as depicted in Table 1.

TABLE-US-00001 TABLE1 Primersusedtointroducepointmutationsatthe7positionofthe5exon oftheT4tdintron.Boldbasesshowthe7to4positions.Underlinedbasesshow the7pointmutations. Plasmidname 7 6 5 4 +296 Forward Reverse PEA001[WT] P P W W M GATCTTAAGGATG TGActgcagAATATTAA TTCT custom-character GGTTAAT ACGGTAGCATTATGT TGAGGCCTGAGTA TCAGATAAGGTCG TAAGGTG(SEQID (SEQIDNO:25) NO:24) pEA001[7W] W P W W M GATCTTAAGGATG TGActgcagAATATTAA TTCT GGTTAAT ACGGTAGCATTATGT TGAGGCCTGAGTA TCAGATAAGGTCG TAAGGTG(SEQID (SEQIDNO:25) NO:26) pEA001[7M] M P W W M GATCTTAAGGATG TGActgcagAATATTAA TTTT custom-character GGTTAAT ACGGTAGCATTATGT TGAGGCCTGAGTA TCAGATAAGGTCG TAAGGTG(SEQID (SEQIDNO:25) NO:27) P: Pair; W: Wobble; M: Mismatch.

[0215] The wild type interactions are shown in FIG. 4A. All position numbers are relative to the 5 splice site. The P1 suggested by Thompson et al. (2002) Supra does not show involvement of position 7 in base pairing. Since there is the possibility of base pairing, the effect of disallowing base pairing was assessed. If there is no interaction between position 7 and the Internal Guide Sequence (IGS), (the sequence at P1 that forms the base pairing with the 5 exon sequence), and no difference in LacZ activity should be observed. The 7 position was mutated (while all other nucleotides remained WT) to form either a wobble pair or a mismatch. The self-splicing activity of the three intron variants for -galactosidase activity was assessed after overnight growth.

[0216] In more detail in FIG. 4A, the triangles indicate the splicing site. The light grey boxes attached to the intron indicate the 5 and 3 exons. WT and mutant variants are represented for both 5 and 3 exons. The white circles indicate modified nucleotides. Nucleotides have been modified into b (G/U/C), y (C/U), d (G/NU) or h (A/U/C). Positions 7 and +296 were changed in single nucleotide mutants only where the other positions conformed to the natural intron. Positions 4 to 6 were changed in all possible combinations of match, mismatch and wobble (where possible).

[0217] FIG. 6 shows how changing WT 7 position (pair) to a wobble pair or mismatch leads to a decreased -galactosidase activity which also indicates decreased splicing activity of the mutant T4 td introns. Changes at the 7 position negatively affect the intron splicing. Both the wobble pair (7W) and the mismatch (7M) show a decreased activity compared to the wild type (7P). The results show that position 7 preferably pairs with position +15. A weaker interaction in the form of a wobble base pair does impede the intron splicing, but not as severely as having no interaction at all.

Example 2: Effect of Position +296 on the Self-Splicing of the T4 td Intron

[0218] Thompson et al. (2002) Supra do not show any interaction between position +296 (the +3 position in the 3 exon) and the P1 loop of the T4 td intron. This is similar to the situation with the 7 position. Therefore, point mutations at the +296 position of the T4 td intron were made to see if they might impact on the splicing activity of the intron. The WT +296 position (mismatch) was mutated by PCR (Table 2) to form either a pair or a wobble pair with the P1 loop. All mutants were assayed for -galactosidase activity after overnight growth.

TABLE-US-00002 TABLE2 primersusedtointroducepointmutationsatthe+296positionofthe3exonofthe T4tdintron.Boldunderlinedbasesshowthe+296pointmutations.Sequencesareshown from5to3. Plasmidname 7 6 5 4 +296 Forward Reverse pEA001[WT] P P W W M GATCTTAAGGA TGActgcagAATATTAA TGTTCTcttgGGT ACGG custom-character AGCATTATGTT TAATTGAGGCC CAGATAAGGTCG TGAGTATAAGG (SEQIDNO:25) TG(SEQIDNO: 24) pEA001[296P] P P W W P GATCTTAAGGA TGActgcagAATATTAA TGTTTTcttgGGT ACGG AGCATTATGT TAATTGAGGCC TCAGATAAGGTCG TGAGTATAAGG (SEQIDNO:29) TG(SEQIDNO: 28) pEA001[296W] P P W W W GATCTTAAGGA TGActgcagAATATTAA TGTTTTcttgGGT ACGG custom-character AGCATTATGT TAATTGAGGCC TCAGATAAGGTCG TGAGTATAAGG (SEQIDNO:30) TG(SEQIDNO: 28) P: Pair; W: Wobble; M: Mismatch.

[0219] FIG. 7 shows the results of LacZ activity of position +296 mutants. The asterisk indicates the wild-type intron. Base substitution at the +296 position led to reduced 3-galactosidase activity. Stabilising the interactions between the 5 end of the intron and the 5 end of the 3 exon does not aid the splicing as both the pair (+296P) and the wobble pair (+296W) exhibit a lower LacZ activity than the wild type (+296M). The observed reduction implies that a mismatch allows for the highest intron splicing activity whereas the weak wobble base pair impedes splicing to some extent and the stronger pair decreases the splicing to a significantly larger extent. Therefore, the WT position +296 of T4 td intron does not appear to involve pairing with the P1 loop and that alterations in the +296 position pairing impede the self-splicing activity of the T4 td intron.

Example 3: Effect of Positions 4 to 6 on the Self-Splicing of the T4 td Intron

[0220] This investigated the effect of altering positions 4 to 6 in all possible combinations of pair (P), mismatch (M) and wobble pair (W) (if applicable) whilst preserving all the other bases as WT. With reference to FIGS. 1 and 4A, positions 4 to 6 is GUU (positions 6 to 4 are UUG) and these pair or interact with UGA at positions +12 to +14, respectively. Mutations where introduced by PCR as indicated in Table 3.

[0221] In FIG. 8 the wild type intron (*) is (sequences shown from 5 to 3) PWM/(UUG) and set to 1. All other LacZ activities are a fraction of the wild type activity. PMP reads either UAA or UGA both being stop codons (**). FIG. 8 shows that a 4 mismatch is preferred in almost all variants, except for those in which both 6 and 5 are mismatched too. A wobble base pair at position 5 negates to a large extent the effect that 6 and 4 have on the splicing. In contrast, a pair or a mismatch at position 5 means that depending on 6 and 4 the splicing efficiency may be very high or very low. Position 6 in general appears in favour of being paired, however, strengthening the P1 to full extent (PPP) is detrimental for the splicing. Completely mismatching positions 6 to 4 impedes the splicing but not to a very large extent. The cumulative effect of changes in the intron flanking regions remains currently unknown, but for many genes an insertion position with at least decent splicing can be found already by retaining the WT interactions of positions 1 to 3, and 294 to 295 and possibly changing positions 4 to 6. Such alterations will yield an active intron with a good splicing efficiency.

TABLE-US-00003 TABLE3 Primersusedtointroducemutationsatthe6to4positionsofthe5exon oftheT4tdintron.Boldbasesshowthe7to4positions.Underlinedbasesshowthe6 to4mutations.Sequencesareshownfrom5to3. Plasmidname 7 6 5 4 296 Forward Reverse pEA001[WT] P1 P W W M GATCTTAAGGA TGActgcagAATATTAA TGTTCT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:24) pEA001[PPP] P P P P M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:72) pEA001[PPW] P P P W M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:73) pEA001[PPM] P P P M M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:74) pEA001[PWP] P P W P M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:75) pEA001[PWW] P P W W M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:28) pEA001[PWM] P P W M M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GGT ACGGTAGCATTATGT TAATTGAGGCC TCAGATAAGGTCG TGAGTATAAGG (SEQIDNO:25) TG(SEQIDNO: 31) pEA001[PMP] P P M P M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:32) pEA001[PMW] P P M W M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:33) pEA001[PMM] P P M M M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:34) pEA001[MPP] P M P P M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:35) pEA001[MPW] P M P W M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:36) pEA001[MPM] ? M P M M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:37) pEA001[MPW] P M W P M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:38) pEA001[MWW] P M W W M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:39) pEA001[MWM] P M W M M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:40) pEA001[MMP] P M M P M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:41) pEA001[MMW] P M M W M GATCTTAAGGA TGActgcagAATATTAA TGTTTT GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:43) pEA001[MMM] P M M M M GATCTTAAGGA TGActgcagAATATTAA TGTTTT custom-character GG ACGGTAGCATTATGT TTAATTGAGGC TCAGATAAGGTCG CTGAGTATAAG (SEQIDNO:25) GTG(SEQID NO:42) P: Pair; W: Wobble; M: Mismatch.

Example 4: Script Development for the Introduction of the T4 td Intron at any Gene of Interest

[0222] Transferring the T4 td intron into the open reading frame (ORF) of genes other than the WT thymidylate synthase gene, can be achieved by following the script provided in Example 11, and by introducing silent mutations to the 5 and 3 flanking regions of the intron. The script retains the WT interactions of positions 1 to 3, +294 and +295, but changes the positions 4 to 6 and +296 in order to find an insertion side in the gene of interest (GOI). The script ensures that the insertion side preserves the amino acids of the encoded protein from the GOI by introducing silent mutations and it also ensures sufficient splicing activity of the intron according to our previous results.

[0223] FIG. 9A shows a cartoon of the insertion of the modified T4 td intron at the GOI. FIG. 9B shows the insertion site in the WT FnCas12a gene. Bases shown in lower case and bold typeface indicate 5 and 3 exon sequences that are modified and interact with the T4 td intron.

[0224] An example of the insertion site for the FnCas12a gene is shown in FIG. 9B and the insertion site was generated using the script as described hereto and in Example 11. This script can be applied virtually for any GOI. In addition, multiple introns (more than one) can be introduced in the GOI as shown in FIG. 10. Multiple introns will provide a tighter control for self-splicing.

[0225] To control the splicing activity of the T4 td intron, Thompson et al. (2002) Supra attached a theophylline aptamer at the P6 stem loop of the T4 td intron. In a similar fashion, the theophylline aptamer was also added to the modified (changes at positions 4 to 6) T4 td introns developed in this example. In this way, tight, titratable and inducible control of the GOI was obtained. A schematic representation of the T4 td intron with the theophylline aptamer at the P6 stem loop is shown in FIG. 11. A step-wise cartoon is also shown at FIGS. 12A and 12B. This shows the steps of DNA to RNA to protein transcription and compares this to the steps of induction and translation of the intron-GOI constructs of the invention. In FIG. 12A the GOI is split by the WT T4 td intron. Step (1): the GOI is transcribed to form the inactive pre-mRNA molecule. Step (2): the intron is excised by spontaneous self-splicing events yielding a functional mRNA. Step (3): the mRNA is translated into the protein of interest. In FIG. 12B the GOI is split by the theophylline dependent T4 td intron. Splicing and formation of the mRNA occurs only in the presence of an inducer e.g. theophylline.

Example 5: Generation of Tagged-T4 td Intron Variants and Use Thereof to Control Expression of FnCas12a in Escherichia coli MG1655

[0226] To further control the splicing activity of the modified introns, a theophylline aptamer was added at the P6 stem loop of the T4 td intron as previously described (see Thompson et al. (2002) Supra) and shown in FIG. 11. Also, to simplify the design and usage of the system, the T4 td intron was introduced at the 5 end of the gene of interest (GOI), directly after the ATG start codon, preserving the original reading frame of the protein of interest (POI) (FIG. 12C).

[0227] As shown in FIG. 12C, the intron is inserted directly after the ATG start codon of the GOI which will result in a 4 amino acid tag sequence when not counting the M from the start codon (and a 5 amino acid tag including the M encoded by the start codon) attached to the final translated protein. Step (1): the GOI is transcribed to form the inactive pre-mRNA molecule. Step (2): upon ligand binding the aptamer changes conformation and the intron can splice out of the mRNA, yielding a functional mRNA. Step (3): the mRNA is translated into the protein of interest which contains a 4 amino acid tag sequence at the N-terminus.

[0228] Splicing of the intron results in a short (four amino acid long) tag sequence attached to the N-terminus of the POI (when not counting the M encoded by the start codon) whereas unspliced mRNA results in a small, non-functional peptide sequence (due to stop codons present in the T4 td intron).

[0229] FIG. 12D shows theophylline dependent T4 td introns introduced as tags after the start codon ATG of the gene of interest (GOI). Four different intron variants (as shown previously in FIG. 8), referred to as Tag1, Tag2, etc. (Tag1: 4P, 5P, 6P; Tag2: 4W, Tag3: 4W, 5W, 6P; Tag4: 4M, 5P, 6P) are inserted directly after the start codon of the GOI (FnCas12a gene). The 5 exon sequences of the four different tags are indicated: Tag1: TCCtcaGGT (SEQ ID NO: 7); Tag2: TCCtcgGGT (SEQ ID NO: 8); Tag3: TCCttgGGT (SEQ ID NO: 9); Tag4: TCCtctGGT (SEQ ID NO: 10). The 3 exon sequence is conserved amongst the 4 different tags and indicated as CTA (SEQ ID NO: 12). The amino acids corresponding to the Tag sequences are indicated with capital bold letters above the Tagged-introns. Intron-less (wild-type) FnCas12a was used as a reference for comparison. Efficiency of targeting of the different Tagged-T4 td intron variants was assessed by using either a LacZa targeting (T) or a non-targeting (NT) crRNA and by comparing the amount of colony forming units (CFUs) of transformed E. coli MG1655 per g of plasmid used (CFUs g.sup.1). CFU was determined by the number of colonies observed. All the plasmids used for this experiment are listed in Table 4.

TABLE-US-00004 TABLE 4 Plasmids used for targeting in E. coli MG 1655. Plasmid name Description and relevant characteristics pSIBR EcoPpu NT tag 1 KanR, FnCas12a with Tag1-intron, NT crRNA pSIBR EcoPpu NT tag 2 KanR, FnCas12a with Tag2-intron, NT crRNA pSIBR EcoPpu NT tag 3 KanR, FnCas12a with Tag3-intron, NT crRNA pSIBR EcoPpu NT tag 4 KanR, FnCas12a with Tag4-intron, NT crRNA pSIBR EcoPpu NT no intron KanR, WT FnCas12a, NT crRNA pSIBR EcoPpu T lacZ tag 1 KanR, FnCas12a with Tag1-intron, LacZ T crRNA pSIBR EcoPpu T lacZ tag 2 KanR, FnCas12a with Tag2-intron, LacZ T crRNA pSIBR EcoPpu T lacZ tag 3 KanR, FnCas12a with Tag3-intron, LacZ T crRNA pSIBR EcoPpu T lacZ tag 4 KanR, FnCas12a with Tag4-intron, LacZ T crRNA pSIBR EcoPpu T lacZ no intron KanR, WT FnCas12a, LacZ T crRNA

[0230] Electrocompetent E. coli MG1655 were transformed (2.5 kV, 200 , 25 F) with 10 ng L.sup.1 of the respective plasmid and recovered for 1 hour in 500 L LB medium [10 g L.sup.1 tryptone (Oxoid), 5 g L.sup.1 yeast extract (BD), 10 g L.sup.1 NaCl (Acros)] at 37 C. Then, the recovered culture was serially diluted and drop plated on selective (50 g mL.sup.1 kanamycin) LB agar plates in the presence or absence of 2 mM theophylline. The agar plates were incubated at 30 C. for 24 hours and the CFUs were counted.

[0231] FIG. 15 shows the efficacy of the tagged-introns to control the activity of FnCas12a in E. coli MG1655. E. coli MG1655 was transformed with a series of plasmids (listed in Table 4) and serially diluted and drop plated on selective (kanamycin) LB media with or without 2 mM theophylline. All the transformants bearing an NT crRNA did not show reduction in CFUs on selective media with or without theophylline. In contrast, transformants bearing the T crRNA showed an obvious reduction in CFUs when plated on selection medium with theophylline. The effect of the CFU reduction was more apparent for the Tag4-intron where colonies could only be observed at the 10 dilution, followed by Tag3-intron (10.sup.1) then Tag2-intron (10.sup.2) and finally Tag1-intron (10.sup.2). The intron-less control transformation showed complete elimination of CFUs in both induced and non-induced selective media. Collectively, these results show the tight control of the FnCas12a gene and its inducibility using theophylline for the splicing of the T4 td intron variants.

[0232] FIG. 13 shows an alternative method of steps in DNA to RNA to protein transcription with the intron positioned directly before the ATG start codon of the gene of interest. This setting prevents the ribosome to find an appropriate start codon in close proximity and therefore protein translation is inhibited. However, upon induction/splicing of the intron, the GOI comes closer to the RBS and therefore can be translated successfully and without a tag sequence attached to it.

[0233] FIG. 14 provides more detailed sequence based information for examples of methods where the self-splicing intron is placed just after the ATG of the GOI, or just before the ATG of the GOI.

Example 6: Efficient Homologous Recombination in E. coli MG1655 Using T4 td Intron Variants

[0234] For efficient genome editing in bacteria, HR should precede CRISPR-Cas counterselection. To assess whether tight control over CRISPR-Cas targeting could bolster the efficiency of CRISPR-Cas mediated genome editing by allowing more time for HR to occur, we used SIBR-Cas and targeted the LacZ gene of E. coli MG1655 for knock-out through HR and CRISPR-Cas counterselection using a blue/white screening colony assay. To facilitate HR, we added 500 bp up- and down-stream homology arms to the plasmids expressing the four SIBR-Cas (Int1-4) and WT-FnCas12a variants that target the LacZ gene. After 1 hour recovery, we induced the expression of the SIBR-Cas variants to counterselect the WT from the mutant colonies.

[0235] The WT-FnCas12a variant targeting the LacZ gene produced no colonies, demonstrating the targeting efficiency of WT-FnCas12a but also the inefficient HR system of the WT E. coli MG1655 strain (FIG. 16A). In contrast, SIBR-Cas variants produced multiple colonies of which 80% of the total CFUs mL.sup.1 were white when Int4 was used, followed by Int3 (49%), Int2 (1%) and Int1 (0%) variants (FIGS. 16A and 16B). Similar to the previous results, the high editing efficiencies obtained with Int4 suggest that its high splicing efficiency translates into a stronger counter-selective pressure. No white colonies were observed for the non-targeting controls, demonstrating that the efficiency of editing without CRISPR-Cas counterselection is negligible.

[0236] Since disruption of LacZ can also be achieved through non-HR mediated approaches (spontaneous mutations or occasional error-prone DNA repair following DNA cleavage by Cas12a), not all gene deletions can be screened phenotypically. Therefore, we repeated our experiment, but X-gal was omitted from the medium to eliminate the possibility of false-positives. Randomly selected colonies that were obtained were screened by PCR for LacZ deletion showing a 0%, 0%, 29% and 38% editing efficiency for Int1, Int2, Int3 and Int4 SIBR-Cas variants, respectively (FIG. 24C). The WT-FnCas12a variant targeting LacZ did not yield any colonies and all the colonies obtained from the NT controls had the intact, wild-type LacZ locus. The observed decrease in editing efficiency (compared to the blue/white screening) might be attributed to spontaneous LacZ mutations that escape CRISPR-Cas counterselection. Nevertheless, a high editing efficiency (38%) was observed when SIBR-Cas Int4 was used without the use of recombinases or any other complex systems.

Example 7: Tight and Inducible Expression of FnCas12a in Pseudomonas putida Using Tagged-T4 td Intron Variants

[0237] Following successful demonstration of inducible expression of FnCas12a in E. coli, the system was transferred to Pseudomonas putida, an organism with very low HR efficiencies. Plasmids bearing the four T4 td intron-FnCas12a or the intron-less FnCas12a and an EndA T or an NT crRNA were transformed to P. putida and the targeting efficiency was assessed by comparing the CFUs g.sup.1 in the presence or absence of the theophylline inducer. All the plasmids used for this experiment are listed in Table 5.

TABLE-US-00005 TABLE 5 Plasmids used for targeting in P. putida. Plasmid name Description and relevant characteristics pSIBR EcoPpu NT tag 1 KanR, FnCas12a with Tag1-intron, NT crRNA pSIBR EcoPpu NT tag 2 KanR, FnCas12a with Tag2-intron, NT crRNA pSIBR EcoPpu NT tag 3 KanR, FnCas12a with Tag3-intron, NT crRNA pSIBR EcoPpu NT tag 4 KanR, FnCas12a with Tag4-intron, NT crRNA pSIBR EcoPpu NT no intron KanR, WT FnCas12a, NT crRNA pSIBR EcoPpu T EndA tag 1 KanR, FnCas12a with Tag1-intron, EndA T crRNA pSIBR EcoPpu T EndA tag 2 KanR, FnCas12a with Tag2-intron, EndA T crRNA pSIBR EcoPpu T EndA tag 3 KanR, FnCas12a with Tag3-intron, EndA T crRNA pSIBR EcoPpu T EndA tag 4 KanR, FnCas12a with Tag4-intron, EndA T crRNA pSIBR EcoPpu T EndA no intron KanR, WT FnCas12a, EndA T crRNA

[0238] In more detail, electrocompetent P. putida cells were transformed (2.5 kV, 200 , 25 F) with 200 ng plasmid and recovered in 1 ml LB for 2 hours at 30 C. Then, the culture was serially diluted and drop plated on selective (50 g mL.sup.1 kanamycin) LB agar plates in the presence or absence of 2 mM theophylline. The agar plates were incubated at 30 C. for 24 hours and the CFUs were counted.

[0239] FIG. 17 shows the efficacy of the tagged-introns to control the activity of FnCas12a in P. putida. P. putida was transformed with a series of plasmids (listed in Table 6) and serially diluted and drop plated on selective (kanamycin) LB media with or without 2 mM theophylline. Similarly to the results in E. coli (Example 5 above), only the transformants bearing the T crRNA and plated on selective media with theophylline showed reduced CFUs. However, for P. putida, the effect of induction was more apparent than E. coli as no CFUs could be observed for transformants bearing the T crRNA and plated on selective media with theophylline. The intron-less controls did not have any CFUs in media with or without theophylline, showing the constitutive expression and hence targeting on the FnCas12a protein. Collectively, the Tagged-intron constructs work efficiently in P. putida and can be used for genome engineering.

Example 8: Efficient Homologous Recombination in P. putida Using Tagged-T4 td Intron Variants

[0240] Further genome editing experiments were conducted to knock-out the FlgM gene of P. putida. A repair template (1125 bp) was included on the plasmids bearing approximately 500 bp homologous sides upstream and downstream of the FlgM gene. The repair template was introduced to either of the four tagged-intron-FnCas12a variants along with the T crRNA for counterselection or the NT crRNA as a control. A list of the plasmids is given in table 6. Plasmids were transformed to P. putida through electroporation and the transformed cells were recovered in LB medium for 2 hours before plating on LB agar plates containing 50 g ml.sup.1 kanamycin and 2 mM theophylline. Plates were incubated at 30 C. overnight and formed colonies were screened through colony PCR for the knock-out of the FlgM gene. FIG. 18 shows the knock-out efficiency of the FlgM gene in P. putida. Tag1 to Tag4 represent the four different Tagged-intron variant plasmids which contain the T (targeting) or the NT (non-targeting) crRNA. Transformants bearing the Tag4 intron-FnCas12a variant showed 70% editing efficiency whereas transformants bearing the Tag3, 2 or Tag1 intron-FnCas12a variant showed 36.6%, 39.3% and 0% editing efficiency, respectively. On the contrary, transformants bearing plasmids with an NT crRNA were all WT.

TABLE-US-00006 TABLE 6 Plasmids used for homologous recombination in P. putida. Plasmid name Description and relevant characteristics pSIBR EcoPpu NT FlgM HA tag 1 KanR, FnCas12a with Tag1-intron, NT crRNA, Homologous arms for FlgM pSIBR EcoPpu NT FlgM HA tag 2 KanR, FnCas12a with Tag2-intron, NT crRNA, Homologous arms for FlgM pSIBR EcoPpu NT FlgM HA tag 3 KanR, FnCas12a with Tag3-intron, NT crRNA, Homologous arms for FlgM pSIBR EcoPpu NT FlgM HA tag 4 KanR, FnCas12a with Tag4-intron, NT crRNA, Homologous arms for FlgM pSIBR EcoPpu T FlgM HA tag 1 KanR, FnCas12a with Tag1-intron, EndA T crRNA, Homologous arms for FlgM pSIBR EcoPpu T FlgM HA tag 2 KanR, FnCas12a with Tag2-intron, EndA T crRNA, Homologous arms for FlgM pSIBR EcoPpu T FlgM HA tag 3 KanR, FnCas12a with Tag3-intron, EndA T crRNA, Homologous arms for FlgM pSIBR EcoPpu T FlgM HA tag 4 KanR, FnCas12a with Tag4-intron, EndA T crRNA, Homologous arms for FlgM

Example 9: Using a Theophylline Induced Self-Splicing Intron to Control the Expression of Cas12a and Knock Out the SprF Essential Gene in the Non-Model Organism Flavobacterium IR1

[0241] Flavobacterium IR1 is a non-model organism known for its iridescent colour (see Johansen, V., et al., (2018) Genetic manipulation of structural colour in bacterial colonies Proceedings of the National Academy of Sciences 115 (11): 2652-2657; and Schertel, L., G. T. et al., (2020) Complex photonic response reveals three-dimensional self-organization of structural coloured bacterial colonies Journal of the Royal Society Interface 17 (166): 20200196). The lack of genomic tools and the low HR efficiency of IR1 are currently the main bottlenecks limiting the fundamental characterization and commercial exploitation of this phenomenon (i.e. development of new paints). As IR1 is a recently discovered non-model organism, inducible promoters are not characterized. Therefore, the control of CRISPR-Cas cannot succeed without a promoter-independent regulatory system such as is disclosed herein.

[0242] To establish controllable genetic engineering tools for IR1, plasmids were constructed by inserting the 300 bp self-splicing aptazyme intron of Thompson et al., (2002) Supra into the fncas12a gene to provide a module, and subsequently inserting this module into two editing plasmids yielding pSIBRFnCas12a_sprF_HR_NT (no-target spacer) and pSIBRFnCas12a_sprF_HR_S3 (spacer targeting sprF gene). For this, the theophylline T4 td intron was introduced in the ORF of FnCas12a. The insertion position was generated by using the algorithm of Example 11. The insertion position is illustrated in FIG. 18. As shown in FIG. 19, mutations were introduced into the FnCas12a sequence flanking the 5 and 3 of the intron in order to maintain the self-splicing activity of the ribozyme. As shown in FIG. 19, there is a map of the plasmid containing FnCas12a gene disrupted with the T4 td intron. Below the plasmid map is shown the 5 and 3 exon without (upper) and with (lower) silent mutations. The silent mutations were made to suit the 7, 4, and +296 positions of the exon. On the right of FIG. 19 is shown the mRNA of FnCas12a containing the T4 td intron. In the presence of theophylline, the intron forms a secondary structure which induces self-splicing and results in a correct mRNA which expresses functional FnCas12a.

[0243] The constructed plasmids were then transformed into IR1 and cultured following the experimental design shown in FIG. 20. An adopted method can be used as shown in FIG. 22. This setting was applied for both pRiboFnCas12a_sprF_HR_NT and pRiboFnCas12a_sprF_HR_S3. Each different treatment (incubation time variation) was done in duplicate.

[0244] Prior to theophylline induction, the liquid cultures of IR1 transformed with pSIBRFnCas12a_sprF_HR_S3 and incubated for 0, 24, and 48 h showed no obvious growth (data not shown). Correspondingly, there was no colony obtained after plating these cultures following theophylline induction in the liquid culture. In contrast, IR1 transformed with the non-targeting plasmid pSIBRFnCas12aFb_sprF_HR_NT showed growth following 24 and 48 h incubation prior and after the induction with theophylline.

[0245] FIG. 21A is a schematic representation of SprF gene deletion achieved in the genome of IR1. FIG. 21B shows Flavobacterium sp. IR1 cells transformed with pSIBRFnCas12a_sprF_S3_HR (left) and pSIBRFnCas12a_NT_HR (right) and grown in ASWBC agar. The loss of structural colour in the colonies is suggested as the result of SprF gene deletion. FIG. 21C shows Agarose gel electrophoresis showing the results from colony PCR on the colonies (1-16) of Flavobacterium sp. IR1 cells transformed with pSIBRFnCas12a_sprF_S3_HR after 72 h incubation in duplicates. The culture 1 (left) resulted in all tested colonies being knocked out (100% editing efficiency). Meanwhile, in culture 2 (right) 1 colonies was knockout mutant and 6 colonies were mixed knockout and wildtype. FIG. 21D shows Agarose gel electrophoresis showing the results from colony PCR on the colonies (1-16) of Flavobacterium sp. IR1 cells transformed with pSIBRFnCas12a_sprF_S3_HR after 96 h incubation in duplicates. Both cultures showed 16 out of 16 colonies were knockout SprF mutants (100%). The colonies were obtained after plating the transformants grown for 72 and 96 h in liquid culture and followed by induction of 2 mM of theophylline in liquid with additional 24 h incubation. A band at 3038 bp indicates the presence of the correct SprF mutant corresponding to the deleted 999 bp SprF gene. The last lane is the negative (wild-type) control that corresponds to an 4037 bp long DNA fragment.

[0246] Interestingly, after 72 h and 96 h incubation, cultures transformed with pSIBRFnCas12a_sprF_HR_S3 started to show some growth (data not shown). Likewise, colonies were also obtained when plating these cultures after theophylline induction. FIG. 21B shows structure and colour of the colonies transformed with pSIBRFnCas12a_sprF_HR_S3 with 72 h and 96 h was drastically reduced compared to the colonies transformed with pSIBRFnCas12aFb_sprF_HR_NT. This phenotype was the expected phenotype for disrupting/knocking-out the target gene, sprF, as previously reported (Johansen, V., et al., (2018) Supra. As shown in FIG. 21C, PCR screening of these colonies showed that after 72 h incubation, correctly edited mutants were obtained with editing efficiency from 43% up to 100%. As shown in FIG. 21D, 96 h incubation editing efficiency of the Cas system was maintained at 100% in two replicates. Two colonies that appeared to be knock-outs, were further confirmed by Sanger sequencing as shown in both FIGS. 21C and 21D.

Example 10: Highly Efficient Homologous Recombination in the Non-Model Organism Flavobacterium IR1

[0247] To demonstrate the inefficient HR mechanism of IR1, the organism was transformed using electroporation with a plasmid expressing an intron-less (WT) FnCas12a under a constitutive promoter (OmpA-P), and a T crRNA targeting the SprF gene under the constitutive promoter HU-P. A repair template (2963 bp) for knocking out the SprF gene through HR was also included on the plasmid resulting in the final plasmid pFnCas12aFb_sprF_HR_T. As control, the crRNA was replaced with an NT crRNA resulting in pFnCas12aFb_sprF_HR_NT. Also, the pCP11 empty vector was used as an indicator for transformation efficiency.

[0248] IR1 electrocompetent cells were prepared as follows: IR1 was grown overnight in 10 mL of ASW at 25 C., 200 rpm. The overnight culture was used to inoculate 2100 mL ASW broth in 500 ml baffled flask and grown until it reached an OD600 of 0.3. Thereafter, the cells were harvested by centrifugation at 4000 rpm for 10 minutes, 4 C. The cells were washed two times with 1volume of washing buffer (10 mM MgCl.sub.2 and 5 mM CaCl.sub.2) at 4 C. and washed once with 10% (v/v) glycerol (Gilchrist and Smit, (1991) Transformation of freshwater and marine caulobacters by electroporation Journal of Bacteriology 173.2: 921-925). The pellet was suspended using 10% glycerol to 1/100 of the initial volume. Cells were divided into aliquots of 100 L in 1.5 mL Eppendorf tubes and stored at 80 C. until use.

[0249] IR1 electrocompetent cells were transformed with 1 g l.sup.1 plasmid in 1-mm cuvette using the following settings: 1.5 kV, 200 , 25 F. 900 L of ASW medium [5 g L.sup.1 peptone (Sigma #70173), 1 g L.sup.1 yeast extract (BD), 10 g L.sup.1 sea salt (Sel marin)] was added immediately and the cells were incubated at 25 C. for 4 hours for recovery. The cells were plated on ASWBC agar [ASW medium, 15 g L.sup.1 agar (Oxoid), 100 mg L.sup.1 nigrosine (Aldrich #198285), and 5 g L.sup.1 Kappa Carrageenan (Special Ingredients)] supplemented with 200 g mL.sup.1 erythromycin and incubated at 25 C. for 2 to 3 days.

[0250] FIG. 22 shows the transformation efficiency and theophylline toxicity for Flavobacterium IR1. FIG. 22A shows the transformation efficiency using the empty vector pCP11, the control pFnCas12aFb-NT and the pFnCas12aFb-T (targeting the SprF gene) plasmids. Targeting the SprF gene did not yield any viable colonies even after recovery for 4 hours. NT control and pCP11 empty vector result 52 and 176 colonies respectively. So, as shown in FIG. 22A, and as predicted, no colonies could be obtained when IR1 was transformed with pFnCas12aFb_sprF_HR_S3. But, an average of 52 CFUs/g was obtained when using the pFnCas12aFb_sprF_HR_NT control and 176 CFUs/g when using the pCP11 empty vector.

[0251] Clearly, the constitutive expression of the WT FnCas12a and the T crRNA along with the inefficient HR machinery of IR1 resulted in the targeting of the genome of IR1 causing cell death. To overcome this limitation, it is suggested to use the four T4 td intron-FnCas12a variants (as developed for E. coli and P. putida) and this would result in the tight and controlled expression of FnCas12a in order to allow HR to precede counterselection.

[0252] Because theophylline uptake from IR1 appears to be a prerequisite, a toxicity assay was carried out on the growth of IR with varying theophylline concentrations (0, 0.1, 2, 5 and mM) grown for 24 hours at 25 C. FIG. 22B shows the theophylline toxicity at 0, 0.1, 2, and 10 mM of theophylline concentration in the growth medium of IR1. The arrow indicates the time when theophylline was added to the medium. Theophylline concentration of up to 2 mM did not affect significantly the growth of IR1. However, 5 and 10 mM of Theophylline decreased the growth of IR1 indicating toxicity but also uptake of theophylline. Therefore, 2 mM of theophylline was used for inducing the intron splicing in following experiments.

[0253] To achieve efficient HR in IR1, the WT FnCas12a in pFnCas12aFb_sprF_HR_S3 was replaced with the four T4 td intron-FnCas12a variants developed previously for E. coli and P. putida resulting in the plasmids listed in table 7. As a control, the WT FnCas12a of pFnCas12aFb_sprF_HR_NT was replaced with the four T4 td intron-FnCas12a variants (Table 7). In addition, a better method was developed in order to increase the obtained colonies as this will increase the chances of obtaining knock-outs. FIG. 23 shows schematically the procedure for obtaining knock-outs in Flavobacterium IR1. 2 g of plasmid were transformed to electrocompetent Flavobacterium IR1 and recovered in 1 ml ASW medium for 4 hours at 25 C. shaking at 200 rpm. The recovered culture was then inoculated in 10 ml ASW selective medium (erythromycin) and grown at 25 C. shaking at 200 rpm for a total of 120 hours. Every 24 hours, a 1 ml sample was recovered, centrifuged at 4700 rpm for 5 mins and the concentrated cell pellet was plated on selective ASW agar plates containing 2 mM theophylline and incubated for 48 h at 25 C. The formed colonies were then screened through colony PCR for the presence or the absence of the target gene. Obtained colonies were screened by colony PCR for knock-outs.

[0254] FIG. 14 shows the results of the editing efficiencies of the SprF gene of Flavobacterium IR1 using the four different tagged-intron variants. The editing efficiencies are indicated in the bar graphs. Colonies (if any) were analysed for knock-out of the SprF gene through colony PCR every 24 hours. The table below the bar graph indicates the editing efficiency per biological replicate. Colony PCR showed knock-out efficiencies of up to 100% when Tag 1-, 3- and 4-Intron variants were used. NT controls showed only WT colonies confirming the efficiency of our tool in counterselecting the correct knock-outs.

TABLE-US-00007 TABLE 7 plasmids used for homologous recombination in Flavobacterium IR1. Plasmid name Description and relevant characteristics pSIBR Flavo NT SprF HA tag 1 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag1- intron, NT crRNA, Homologous arms for FlgM pSIBR Flavo NT SprF HA tag 2 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag2- intron, NT crRNA, Homologous arms for FlgM pSIBR Flavo NT SprF HA tag 3 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag3- intron, NT crRNA, Homologous arms for FlgM pSIBR Flavo NT SprF HA tag 4 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag4- intron, NT crRNA, Homologous arms for FlgM pSIBR Flavo T SprF HA tag 1 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag1- intron, T crRNA, Homologous arms for FlgM pSIBR Flavo T SprF HA tag 2 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag2- intron, T crRNA, Homologous arms for FlgM pSIBR Flavo T SprF HA tag 3 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag3- intron, T crRNA, Homologous arms for FlgM pSIBR Flavo T SprF HA tag 4 SpecR for E. coli, ErmR for Flavo, FnCas12a with Tag4- intron, T crRNA, Homologous arms for FlgM

Example 11: Python Script Used to Find Insertion Sites

[0255] By using the following script, the user can upload the sequence of the gene of interest and the script will return possible insertion sites for the T4 td intron. The insertion sites will require point mutations to be introduced when inserting the intron at the target site. Multiple sites are possible options but one at the beginning of the gene is recommended to eliminate potential function of partially produced proteins.

TABLE-US-00008 def findbindingtype(Q, S): a = M if Q = = T and S = = A: a = P if Q = = T and S = = G: a = W if Q = = C and S = = G: a = P if Q = = A and S = = T: a = P if Q = = G and S = = C: a = P if Q = = G and S = = T: a = W if Q = = N or S = = N: a = X return a def permutatelist(X): Y = [] for i in X: Z = Y Y = [ ] for j in i: for I in range(len(Z)): Y.append(Z[I]+j) return Y def matchsense(q, s): a = 100.000 if q = = s: a = 1.000 if q = = R and (s = = A or s = = G): a = 1.000 if q = = Y and (s = = C or s = = T): a = 1.000 if q = = S and (s = = G or s = = C): a = 1.000 if q = = W and (s = = A or s = = T): a = 1.000 if q = = K and (s = = G or s = = T): a = 1.000 if q = = M and (s = = A or s = = C): a = 1.000 if q = = B and s = A: a = 1.000 if q = = D and s != C: a = 1.000 if q = = H and s != G: a = 1.000 if q = = V and s != T: a = 1.000 if q = = N: a = 1.000 return a def RevComp(sequence): RC = for i in sequence: j = if i = = T: j = A if i = = C: j = G if i = = A: j = T if i = = G: j = C if i = = Y: j = R if i = = W: j = W if i = = K: j = M if i = = M: j = K if i = = S: j = S if i = = R: j = Y if i = = H: j = G if i = = B: j = A if i = = D: j = C if i = = V: j = T if i = = N: j = N RC = j + RC return RC #-------------- Input Subject ------------------------- SubDNA = [ ] FN = raw_input(DNA File name = ) with open(FN, r+) as f: for r in f: for c in r: if c.upper( ) = = G or c.upper( ) = = A or c.upper( ) = = T or c.upper( ) = = C: SubDNA.append(c.upper( )) #-------------- Input Query --------------------------- #8 7 6 5 4 3 2 1 1 2 3 4 QueDNA = [N ,T ,G ,A ,G ,T, C, C ,G ,G ,A ,G ,T] #Native QueDNA = [N ,T ,G ,A ,G ,T ,C ,C ,G ,G ,A ,G ] #Simplified #Query Functions: #(P)air, (W)obbly, (N)onbinding) #(F)ixed (P)air, (F)ixed (W)obbly, (F)ixed (N)onbinding #-------------- Import Codon Table -------------------- AA = [ ] Codon = [ ] import csv with open(Codon.csv) as csvfile: f = csv.reader(csvfile, delimiter=,, quotechar=|) for r in f: Codon.append(r[0]) AA.append(r[1]) #-------------- Translate Subject --------------------- SubPro = [ ] for i in range(int(len(SubDNA)/3)): qcodon = SubDNA[3*i]+SubDNA[3*i+1]+SubDNA[3*i+2] for j in range(64): if Codon[j] = = qcodon: SubPro.append(AA[j]) #--------------- Define Peptide ----------------------- TBS_DNA = [ ] TBS_Pos = [ ] for s in range(len(SubPro)-int((len(QueDNA)+4)/3)1): #Subject Protein Start Peptide = SubPro[s:s+int((len(QueDNA)+4)/3)] #--------------- Reverse Translate -------------------- mRNA = [ ] for i in Peptide: Cdn = [ ] for j in range(64): if AA[j] = = i: Cdn.append(Codon[j]) mRNA.append(Cdn) RevTrans = permutatelist(mRNA) TBS_DNA = TBS_DNA + RevTrans for k in RevTrans: TBS_Pos.append(s) #--------------- Test binding type -------------------- print Binding Type analysis ScoreTable_BP = [ ] ScoreTable_Score = [ ] import csv with open(TdScoreTable.csv) as csvfile: f = csv.reader(csvfile, delimiter=,, quotechar=|) for r in f: ScoreTable_BP.append(r[0]) ScoreTable_Score.append(r[1]) DNA_List = [ ] Pos_List = [ ] Score_List = [ ] Pep_List = [ ] Frame_List = [ ] Query = [N,N,N] + QueDNA + [N,N,N] for i in range(len(TBS_DNA)): #Subject list Subject = TBS_DNA[i] for f in range(3): #Frame select BP = for j in range(len(QueDNA)): BP = BP + findbindingtype(QueDNA[j],Subject[j+f]) Score = 0 for k in range(len(ScoreTable_BP)): if BP = = ScoreTable_BP[k]: Score = ScoreTable_Score[k] DNA_List.append(TBS_DNA[i]) Pos_List.append(TBS_Pos[i]+1) Score_List.append(Score) Frame_List.append(f+1) Peptide = for I in SubPro[TBS_Pos[i]: TBS_Pos[i]+int((len(QueDNA)+4)/3)]: Peptide = Peptide + I Pep_List.append(Peptide) #--------------- select 5 best ------------------------ Result = [ ] for row in range(len(Pos_List)) : Result.append ([Pos_List[row], DNA_List[row], Pep_List[row], Score_List[row], Frame_List[row]]) Result2 = sorted(sorted(Result, key=lambda A: A[3], reverse=True), key=lambda A: A[0]) A = Results2[0][0] count = 0 Result3 = [ ] for r in range(len(Result2)): if Result2[r][0] = = A and count <= 5: Result3.append(Result2[r]) count = count + 1 if Result2[r][0] != A: A = Result2 [r][0] count = 1 print Result3 #--------------- Export to CSV ------------------------ print Writing intron sites to file import csv Writer = csv.writer(open(FN+.csv, wb), delimiter=,) Writer.writerow([Position, DNA, Protein, Score, Frame]) with open(FN+.csv, ab) as F: for row in range(len(Result3)): Writer = csv.writer(F, delimiter=,) Writer.writerow(Result3[row]) #------------------Find the Restriction Enzymes------------------------- #------------------Import RE Sequences------------------ PreREName = [ ] PreRESeq = [] import csv with open(NEB RE.csv) as csvfile: f = csv.reader(csvfile, delimiter=,, quotechar=|) for r in f: PreREName.append(r[0]) PreRESeq.append(r[1]) #------------------Find RE in DNA Sequence-------------------------- print(Analysing current Restriction endonulease sites) REName = [ ] RESeq = [ ] REPresent = [ ] PreREPresent = [ ] for e in range(len(PreRESeq)): QueDNA = [ ] s = 0 for c in PreRESeq[e]: QueDNA.append(c) for i in range(len(SubDNA)-len(QueDNA)+1): score = 0 for j in range(len(QueDNA)): score = score + matchsense(QueDNA[j],SubDNA[i+j]) if score = = len(QueDNA): if s = = 0: Result = str(int((i+3)/3)) if s > 0 and s <= 3: Result = Result + , + str(int((i+3)/3)) s = s + 1 if s > 0: PreREPresent.append(Result) if s = = 0: PreREPresent.append(N/A) if s < = 3: REPresent.append(PreREPresent[e]) REName.append(PreREName[e]) RESeq.append(PreRESeq[e]) #------------------Find RE is protein Sequence--------------------------- ResultName = [ ] ResultPos = [ ] ResultDNA = [ ] ResultProtein = [ ] ResultFrame = [ ] ResultOtherRE = [ ] import csv Writer = csv.writer(open(FN+-RE.csv, wb), delimiter=,) Writer.writerow([Enzyme, Position, DNA Seq, AA Seq, Frame, RE sites present]) print(Analysing potential Restriction endonulease sites) for e in range(len(RESeq)): QueDNA = [N,N,N] for c in RESeq[e]: QueDNA.append(c) QueDNA = QueDNA+[N]+[N]+[N]+[N] for f in range(3): for i in range(len(SubPro)-int((len(QueDNA)-3)/3)+1): #Subject amino acid start currentDNA = currentProtein = currentScore = 0 m = 0 #Maxscore counter for j in range(int((len(QueDNA)3)/3)): #Subject amino acid SubAA = SubPro[i+j] maxscore = 0 for k in range(64): #64 codons/AA if SubAA = = AA[k]: score = 0 QueCodon = Codon[k] for I in range(3): score = score + matchsense(QueDNA[3*j+3+I-f],QueCodon[I]) if score<100 and score>maxscore: maxscore = score maxcodon = k if maxscore>0: m = m + 1 currentDNA = currentDNA + Codon[maxcodon] currentProtein = currentProtein + AA[maxcodon] currentScore = currentScore + maxscore if m = = int((len(QueDNA)3)/3): ResultName.append(REName[e]) ResultPos.append(i+1) ResultDNA.append(currentDNA) ResultProtein.append(currentProtein) ResultFrame.append(f+1) ResultOtherRE.append(REPresent[e]) print(Writing RE sites to file) import csv with open(FN+-RE.csv, ab) as F: for i in range(len(ResultName)): Writer = csv.writer(F, delimiter=,) Writer.writerow([ResultName[i], ResultPos[i], ResultDNA[i], ResultProtein[i], ResultFrame[i], ResultOtherRE[i]]) print Done

Example 12: Using the SIBR T4 td Intron for Inducible Gene Expression into the Eukaryotic Model Organism Bakers Yeast (Saccharomyces cerevisiae)

[0256] To show the functionality and applicability of SIBR into eukaryotic systems, we transferred SIBR into the eukaryotic model organism Baker's yeast (Saccharomyces cerevisiae) and controlled the expression of the FnCas12a protein.

[0257] To control the activity of FnCas12a, we sought to disrupt its activity by disrupting the encoded protein through the placement of SIBR. To this end, by using the acquired knowledge from Example 1, 2 and 3 and the generated script from Example 4 and/or 11, we introduced SIBR before the RuvC I domain at amino acid position 859 (FIG. 25). Following the script from Example 4 and/or 11, the 5 exonic sequence of the intron was 5-AAAGAGTCGGT-3 (SEQ ID NO: 76) in order not to disturb the amino acid sequence of FnCas12a upon excision of the intron (FIG. 25). The 3 exonic sequence of the intron was 5-CTT-3 (SEQ ID NO: 77) to maintain the amino acid sequence of FnCas12a upon excision of the intron (FIG. 25). To this end, two plasmids were constructed which contained the modified intron (either with or without the theophylline aptamer) at amino acid position 859 of FnCas12a. The FnCas12a+intron gene constructs were expressed by the constitutive TEF1 promoter. Those plasmids were named PL-319 (FnCas12a+intron without aptamer) and PL-320 (FnCas12a+intron with aptamer). As a positive control for targeting, another plasmid was used (pUDE731; Addgene plasmid #103008) where the WT FnCas12a (no intron in the sequence of FnCas12a) was constitutively expressed by the TEF1 promoter.

[0258] PL-319, PL-320 or pUDE731 were co-transformed in the yeast S. cerevisiae with either a plasmid containing a non-targeting spacer (PL-207) or a plasmid containing a targeting spacer (PL-074). The targeting spacer was targeting the ADE2 gene. To transform S. cerevisiae, the LiAc/SS carrier DNA/PEG method by Gietz and Schiestl (Gietz, R. D. and Schiestl, R. H., 2007. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nature protocols, 2(1), pp. 31-34) was used. 500 ng of each plasmid was used per transformation. After transformation, the transformed yeast cells were recovered in YPD medium for 3 hours at 30 C. and then serially diluted in PBS and plated on drop-out (omitting uracil; for the selection of PL-319, PL-320 or pUDE731) minimal agar medium (1.7 g/L bacto-yeast nitrogen base w/o amino acids and without ammonium sulfate; 1 g/L monosodium glutamate; 20 g/L glucose; 20 g/L agar) containing 200 g/mL Geneticin (G418 sulfate) antibiotic (for the selection of PL-074 or PL-207; targeting and non-targeting plasmids) and containing different concentrations of theophylline (0, 5, 10, 20 mM).

[0259] The results of this experiment are depicted in FIG. 26. S. cerevisiae cells co-transformed with PL-319, PL-320 or pUDE731 plasmids and the PL-207 non-targeting plasmid showed colony formation up to the 10-2 dilution regardless of the presence or absence of theophylline. In all these cases, colonies showed a comparable size. As expected, when the pUDE731 was co-transformed with the PL-074 targeting plasmid, almost no colonies were formed regardless of the presence or absence of theophylline. Similarly, when PL-319 was co-transformed with the PL-074 targeting plasmid, a reduced number of colonies were formed regardless of the presence or absence of theophylline. This indicates that the intron is indeed able to self-splice out of the formed FnCas12a mRNA and code for a functional FnCas12a that is able to target and cleave the target site. In contrast, when PL-320 was co-transformed with the PL-074 targeting plasmid, normal size colonies were formed only when theophylline was omitted from the agar medium. However, when theophylline (5, 10, 20 mM) was included in the agar medium, a reduced number of colonies were formed. The effect of the amount of the theophylline inducer used is also apparent as less colonies and smaller ones are visible with the increasing concentration of the theophylline inducer. This result indicated that inducible excision of the intron can be achieved in eukaryotes such as S. cerevisiae.

Example 13: Turning any Group I Intron into a SIBR System

[0260] As noted herein above, Group I introns, like the T4 td intron, form core secondary structures consisting of multiple paired regions. In principle, to turn any Group I intron into a Self-splicing Intron Based Riboswitch (SIBR) according to the invention, a stepwise approach, for example as described herein below, can be followed, similar to the one described in this patent.

[0261] As a first step, a library of mutant 5 and 3 exonic sequences is developed, since the and 3 exonic sequences of Group I introns interact with the intron sequence and affect the secondary and tertiary structure of the intron. This mutant library will serve as the basis to define the effect of the 5 and 3 exonic sequences on the splicing efficiency of the intron. Moreover, this library will contain introns with a range (low to high) of splicing efficiencies. It is likely that the mutant intron library will contain introns with better splicing efficiency than the wild type intron; similar to the results observed in Examples 1-3). Also, this library will allow the transfer of the intron of interest to the open reading frame of any gene of interest without disturbing the amino acid sequence of the target gene/protein, for example when applying the script of Example 4 and/or 11.

[0262] Next, to achieve inducible control over the splicing of the intron, an aptamer moiety which responds to specific small molecules (e.g. theophylline) is introduced at one or multiple pairing (P) domains of the intron. For example, as described by Thompson et al., 2002, and also shown in Examples 5 and 7, the theophylline aptamer is introduced at the P6 domain of the T4 td intron, turning it into an inducible self-splicing gene regulator. Another example is described by Kertsburg and Soukup, 2002 (Nucleic Acids Research, Volume 30, Issue 21, 1 Nov. 2002, pages 4599-4606), where they turned the Tetrahymena group I intron into an inducible self-splicing intron by replacing the P6 or P8 or both P6 and P8 domains with a theophylline aptamer. Similar approaches (to that of Thompson et al., 2002 and that of Kertsburg and Soukup, 2002) can be taken for any other Group I intron where one of their P domains is altered to contain an aptamer moiety that responds to specific small molecules and can consequently control the splicing of the intron.

[0263] After generating the mutant intron library (mutations at the 5 and 3 exonic sequences) and achieving inducible control over the splicing of the intron (through the introduction of an aptamer in one of the P domains of the intron), the generated intron variants can be moved to the ATG start codon, or 5 to the start codon, of the polynucleotide portion encoding the POI, or within the polynucleotide portion encoding the POI. When transferring the intron at a location of choice, attention should be given in avoiding codon frameshifting after splicing as this will result in a non-sense protein.

Example 14: Turning any Group II Intron into a SIBR System

[0264] Group II introns are found in higher (plants) and lower eukaryotes (fungi and yeasts) but also in bacteria. Similar to Group I introns, group II introns reside in between genes (separating them into 5 and 3 exons) which upon excision (formation of a lariat product instead of linear product as observed for Group I introns) allow for the formation of a functional protein. Group II introns can self-splice, although some intron-encoded proteins (IEPs) may facilitate splicing by stabilizing the intron RNA structure. The 5 and 3 exonic sequences of the Group II introns (called intron-binding site or IBS) interact with conserved domains of the intron (called exon-binding site or EBS) to form long-range tertiary interactions. The intron-exon interactions are necessary for splicing as they bring the intron at the active site of the exons in order to facilitate the typical transesterification reaction that mediates the excision of the intron. The necessity of intron-exon interactions for splicing, translates into a limitation in transferring any group II intron into any gene of interest (GOI), as the exon sequences need to be conserved. To overcome this, a similar approach as the one developed in this patent for Group I introns can be used.

[0265] First, a mutant library of Group II introns can be generated in which the exon sequences (IBS1 and IBS2 for 5 exon and IBS3 for 3 exon) are mutated. In some cases, and especially when the IBS is heavily mutated, the EBS might need to be modified as well to maintain the IBS-EBS base-pairing necessary for the formation of long-range tertiary interactions. The generated mutant library is then assessed for the efficiency of the self-splicing activity of the intron, by following a similar approach as that was employed for LacZ as described in Examples 1-3. Important to note is that self-splicing efficiency can be assessed by any other in vitro or in vivo method (other than LacZ) as long as it can distinguish the formation of spliced products from un-spliced products, or the formation and quantity of active protein from inactive proteins.

[0266] In the case where the Group II intron mutant library will be assayed through a protein (similar to that of LacZ; Examples 1-3) then, for convenience and to maintain the coding sequence of the protein, the Group II intron can be transferred directly after the ATG start codon in order to maintain the coding sequence of the protein. This approach was described in Examples 1-3 (LacZ) and Example 5 (FnCas12a). The outcome of the Group II intron mutant library assay is expected to yield a range with good and bad splicing introns which can then be used to modulate/tune the expression of the gene/RNA/protein of interest.

[0267] After establishing the requirements for splicing as defined by the IBS-EBS interactions, a script similar to Example 4 and/or 11 can be developed that allows for transferring the mutant Group II intron to virtually any gene/RNA/protein of interest.

[0268] In case inducible self-splicing is required, an aptamer moiety which responds to specific small molecules (e.g. theophylline) is introduced at one or multiple pairing (P) domains of the intron. To achieve this, the approaches developed and applied by Thompson et al. (2002) Ibid, and Kertsburg and Soukup (2002) lbid, can be used.

[0269] After generating the mutant Group II intron library (mutations at the 5 and 3 exonic sequences) and achieving inducible control over the splicing of the Group II intron (through the introduction of an aptamer in one of the P domains of the intron), the generated Group II intron variants can be moved at the ATG start codon, or 5 of the ATG start of the polynucleotide portion encoding the POI, or within the polynucleotide portion encoding the POI. When transferring the intron at the location of choice, attention should be given in avoiding codon frameshifting after splicing as this will result in a non-sense protein.

Example 15: Turning any Group III Intron to a SIBR System

[0270] In general, Group III introns are short (approx. 100 nt) U-rich introns which are predominantly found in Euglena gracilis. Group III introns are considered streamlined versions of Group II introns as they retain the 5 splice site of group II introns but lack the catalytic domain V and the domains II-IV. To splice, a similar mechanism is used as that of Group II introns where the IBS1 pairs with EBS1 to form long-range tertiary interactions and facilitate splicing (Hong, L. and Hallick, R. B., 1994 A group III intron is formed from domains of two individual group II introns Genes & development, 8(13), pp. 1589-1599). In principle, Group III introns can be turned into SIBR by changing/mutating the IBS-EBS interactions as described in Example 14 and by introducing a ligand dependent aptamer to one of its domains (e.g. at the VI domain). The defined mutant libraries can then be used to modulate the splicing efficiency of the introns.

[0271] Throughout the description and claims of this specification, the words comprise and contain and variations of them mean including but not limited to, and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.

[0272] Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not restricted to the details of any foregoing embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

[0273] The readers attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.

[0274] Nucleotide Sequences

TABLE-US-00009 SequenceofWTT4tdintron(SEQIDNO:44): TAATTGAGGCCTGAGTATAAGGTGACTTATACTTGTAATCTATCTAAACGGGGAACCT CTCTAGTAGACAATCCCGTGCTAAATTGTAGGACTTGCCCTTTAATAAATACTTCTATA TTTAAAGAGGTATTTATGAAAAGCGGAATTTATCAGATTAAAAATACTTTAAACAATAAA GTATATGTAGGAAGTGCTAAAGATTTTGAAAAGAGATGGAAGAGGCATTTTAAAGATT TAGAAAAAGGATGCCATTCTTCTATAAAACTTCAGAGGTCTTTTAACAAACATGGTAAT GTGTTTGAATGTTCTATTTTGGAAGAAATTCCATATGAGAAAGATTTGATTATTGAACG AGAAAATTTTTGGATTAAAGAGCTTAATTCTAAAATTAATGGATACAATATTGCTGATG CAACGTTTGGTGATACATGTTCTACGCATCCATTAAAAGAAGAAATTATTAAGAAACGT TCTGAAACTGTTAAAGCTAAGATGCTTAAACTTGGACCTGATGGTCGGAAAGCTCTTT ACAGTAAACCCGGAAGTAAAAACGGGCGTTGGAATCCAGAAACCCATAAGTTTTGTAA GTGCGGTGTTCGCATACAAACTTCTGCTTATACTTGTAGTAAATGCAGAAATCGTTCA GGTGAAAATAATTCATTCTTTAATCATAAGCATTCAGACATAACTAAATCTAAAATATCA GAAAAGATGAAAGGTAAAAAGCCTAGTAATATTAAAAAGATTTCATGTGATGGGGTTAT TTTTGATTGTGCAGCAGATGCAGCTAGACATTTTAAAATTTCGTCTGGATTAGTTACTT ATCGTGTAAAATCTGATAAATGGAATTGGTTCTACATAAATGCCTAACGACTATCCCTT TGGGGAGTAGGGTCAAGTGACTCGAAACGATAGACAACTTGCTTTAACAAGTTGGAG ATATAGTCTGCTCTGCATGGTGACATGCAGCTGGATATAATTCCGGGGTAAGATTAAC GACCTTATCTGAACATAATG SequenceoftheWTT4tdintronwiththetheophyllineaptamer(SEQ IDNO:45): TTCTTGGGTTAATTGAGGCCTGAGTATAAGGTGACTTATACTTGTAATCTATCTAAACG GGGAACCTCTCTAGTAGACAATCCCGTGCTAAATTGATACCAGCATCGTCTTGATGCC CTTGGCAGCATAAATGCCTAACGACTATCCCTTTGGGGAGTAGGGTCAAGTGACTCG AAACGATAGACAACTTGCTTTAACAAGTTGGAGATATAGTCTGCTCTGCATGGTGACA TGCAGCTGGATATAATTCCGGGGTAAGATTAACGACCTTATCTGAACATAATGCTA SequenceoftheTag1T4tdintronwiththetheophyllineaptamer (SEQIDNO:49): TCCTCAGGTTAATTGAGGCCTGAGTATAAGGTGACTTATACTTGTAATCTATCTAAACG GGGAACCTCTCTAGTAGACAATCCCGTGCTAAATTGATACCAGCATCGTCTTGATGCC CTTGGCAGCATAAATGCCTAACGACTATCCCTTTGGGGAGTAGGGTCAAGTGACTCG AAACGATAGACAACTTGCTTTAACAAGTTGGAGATATAGTCTGCTCTGCATGGTGACA TGCAGCTGGATATAATTCCGGGGTAAGATTAACGACCTTATCTGAACATAATGCTA SequenceoftheTag2T4tdintronwiththetheophyllineaptamer (SEQIDNO:46): TCCTCGGGTTAATTGAGGCCTGAGTATAAGGTGACTTATACTTGTAATCTATCTAAAC GGGGAACCTCTCTAGTAGACAATCCCGTGCTAAATTGATACCAGCATCGTCTTGATGC CCTTGGCAGCATAAATGCCTAACGACTATCCCTTTGGGGAGTAGGGTCAAGTGACTC GAAACGATAGACAACTTGCTTTAACAAGTTGGAGATATAGTCTGCTCTGCATGGTGAC ATGCAGCTGGATATAATTCCGGGGTAAGATTAACGACCTTATCTGAACATAATGCTA SequenceoftheTag3T4tdintronwiththetheophyllineaptamer (SEQIDNO:47): TCCTTGGGTTAATTGAGGCCTGAGTATAAGGTGACTTATACTTGTAATCTATCTAAACG GGGAACCTCTCTAGTAGACAATCCCGTGCTAAATTGATACCAGCATCGTCTTGATGCC CTTGGCAGCATAAATGCCTAACGACTATCCCTTTGGGGAGTAGGGTCAAGTGACTCG AAACGATAGACAACTTGCTTTAACAAGTTGGAGATATAGTCTGCTCTGCATGGTGACA TGCAGCTGGATATAATTCCGGGGTAAGATTAACGACCTTATCTGAACATAATGCTA SequenceoftheTag4T4tdintronwiththetheophyllineaptamer (SEQIDNO:48): TCCTCTGGTTAATTGAGGCCTGAGTATAAGGTGACTTATACTTGTAATCTATCTAAACG GGGAACCTCTCTAGTAGACAATCCCGTGCTAAATTGATACCAGCATCGTCTTGATGCC CTTGGCAGCATAAATGCCTAACGACTATCCCTTTGGGGAGTAGGGTCAAGTGACTCG AAACGATAGACAACTTGCTTTAACAAGTTGGAGATATAGTCTGCTCTGCATGGTGACA TGCAGCTGGATATAATTCCGGGGTAAGATTAACGACCTTATCTGAACATAATGCTA

UNIVERSAL RIBOSWITCH FOR INDUCIBLE GENE EXPRESSION

Inventors

Cpc classification

Classification Explorer

C12N1/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/70

CHEMISTRY; METALLURGY

Classification Explorer

C12N2840/002

CHEMISTRY; METALLURGY

Classification Explorer

C12N2800/80

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/67

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/78

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/902

CHEMISTRY; METALLURGY

Classification Explorer

C12N2800/101

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/90

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/67

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/70

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/78

CHEMISTRY; METALLURGY

Classification Explorer

C12N1/20

CHEMISTRY; METALLURGY

Abstract

Claims

Description