CHEMICAL-INDUCIBLE GENOME ENGINEERING TECHNOLOGY

Abstract

The present disclosure refers to an endonuclease-based gene editing construct, wherein the construct comprises a CRISPR-associated endonuclease (such as Cas9 or Cpf1) or a derivative thereof and at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof. The present disclosure also describes a method of editing a genome of a host cell using the construct as disclosed herein, the method comprising transfecting the host cell with the nucleic acid sequence as defined herein and incubating the cell with an inducing agent.

Claims

1. An endonuclease-based gene editing construct, wherein the construct comprises the following components: (a) a CRISPR-associated endonuclease or a derivative thereof; and (b) at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof.

2. The construct of claim 1, wherein the at least one or more mutated hormone binding domains of the estrogen receptor (ERT2) are located upstream or located downstream of the CRISPR-associated endonuclease, wherein if there are two or more ERT2, the ERT2 are all located upstream, or all located downstream, or located both upstream and downstream of the CRISPR-associated endonuclease.

3. The construct of claim 1, wherein the mutated hormone binding domain of the estrogen receptor (ERT2) is SEQ ID NO: 4 or derivatives thereof.

4. The construct of claim 1, wherein the construct further comprises one or more selected from the group of one or more localization sequences, a binding tag, a self-cleaving peptide, and a selectable marker.

5.-7. (canceled)

8. The construct of claim 1 comprising the following formula (I): ##STR00003## wherein A is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or a binding tag; wherein B is a localization sequence or derivatives thereof, or the binding tag, or absent; wherein both C.sub.1 and C.sub.2 are present or only C.sub.1 is present; wherein C.sub.1 and C.sub.2 are each independently selected from the group consisting of the localization sequence, derivatives thereof of the localization sequence, and a mutated hormone binding domain of the estrogen receptor (ERT2); wherein when C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); wherein X is a CRISPR-associated endonuclease or a derivative thereof; wherein D is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2), the localization sequence and derivatives of the localization sequence; wherein E is absent or is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2) and a self-cleaving peptide; wherein F is absent or is selected from the group consisting of the self-cleaving peptide and a selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 and L.sup.8 are linker sequences; wherein at least one of the linker sequences is present; wherein each of the linkers sequences is independently between 1 to 25 amino acids long; wherein each linker sequence independently comprises natural or unnatural or a mixture of natural and unnatural amino acids; wherein, if any one or more of the linker sequences of L.sup.1 to L.sup.8 is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is selected from the group consisting of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG and TGGGS; wherein L.sup.2 is selected from the group consisting of PRGGS, GGSPRGGS, PR, GGSPRGGS and TPGGPRGGS; wherein L.sup.3 is selected from the group consisting of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA and GASGS; wherein L.sup.4 is GGGS; wherein L.sup.5 is PAG or PAGGGS; wherein L.sup.6 is GA; wherein L.sup.7 and L.sup.8 are independently selected from the linkers as disclosed in any of L.sup.1 to L.sup.6.

9. The construct of claim 8, wherein a) A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent; b) A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent; c) A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent; d) D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent; e) D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; f) D and E are each one mutated hormone binding domain of the estrogen receptor (ERT2); g) D is the mutated hormone binding domain of the estrogen receptor (ERT2) and E is absent; h) A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent; i) A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent; j) A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; k) A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; l) wherein A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; m) the linker sequences comprise of the amino acids A, E, G, P, S and T; or n) the linker sequences consist of the amino acids A, E, G, P, S and T.

10.-20. (canceled)

21. The construct of claim 1 comprising the following formula (II): ##STR00004## wherein B is a localization sequence or derivatives thereof, or the binding tag; wherein both C.sub.1 and C.sub.2 are present or only C.sub.1 is present; wherein C.sub.1 and C.sub.2 are each independently selected from the group consisting of the localization sequence, derivatives thereof of the localization sequence, and a mutated hormone binding domain of the estrogen receptor (ERT2); wherein when C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); wherein X is a CRISPR-associated endonuclease or a derivative thereof; wherein D is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2), the localization sequence and derivatives of the localization sequence; wherein E is absent or is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2) and a self-cleaving peptide; wherein F is absent or is selected from the group consisting of the self-cleaving peptide, the mutated hormone binding domain of the estrogen receptor (ERT2) and a selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.7 and L.sup.8 are linker sequences; wherein at least one of the linker sequences is present; wherein each of the linkers sequences is independently between 1 to 25 amino acids long; wherein each linker sequence independently comprises natural or unnatural or a mixture of natural and unnatural amino acids; wherein the linker sequences comprise the amino acids A, E, G, P, S and T; wherein, if any one or more of the linker sequences of L.sup.1 to L.sup.5 and L.sup.7 to L.sup.8 is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is selected from the group consisting of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG, TGPGGS, TGPGGSAGDTTGPGGS and TGGGS; wherein L.sup.2 is selected from the group consisting of PRGGS, GGSPRGGS, PR, GGSPRGGS and TPGGPRGGS; wherein L.sup.3 is selected from the group consisting of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA and GASGS; wherein L.sup.4 is GGGS; wherein L.sup.5 and L.sup.7 are independently PAG, SGS or PAGGGS; wherein L.sup.8 is selected from the linkers as disclosed in any of L.sup.1 to L.sup.5 and L.sup.7.

22. The construct of claim 21, wherein a) the linker sequences comprise of the amino acids A, E, G, P, S and T; b) the linker sequences consist of the amino acids A, E, G, P, S and T; c) B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent; d) D is a localization sequence and E and F are each a mutated hormone binding domain of the estrogen receptor (ERT2); e) A is absent, B is localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence, E is a mutated hormone binding domain of the estrogen receptor (ERT2) and F is absent; f) B is a localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2); or g) B is a localization sequence, C.sub.1 and C.sub.2 are each independently a mutated hormone binding domain of the estrogen receptor (ERT2), X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2).

23.-28. (canceled)

29. The construct of claim 1, wherein the CRISPR-associated endonuclease, or derivative thereof, is selected from the group consisting of a wild type CRISPR-associated protein 9 (Cas9), a mutated CRISPR-associated protein 9 (Cas9), wherein the mutated CRISPR-associated protein 9 (Cas9) is functional; a wild type Cpf1 (CRISPR from Prevotella and Francisella 1) protein, and a mutated Cpf1 protein, wherein the mutated Cpf1 protein is functional.

30. The construct of claim 29, wherein a) the CRISPR-associated protein 9 (Cas9), or derivative thereof, is selected from the group consisting of Streptococcus pyogenes, Streptococcus thermophiles, Listeria innocua, Staphylococcus aureus and Neisseria meningitidis b) the CRISPR-associated protein 9 (Cas9), or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 1; c) the Cpf1 protein, or derivative thereof, is selected from the group consisting of Acidaminococcus, Lachnospiraceae, Parcubacteria, Butyrivibrio proteoclasticus, Peregrinibacteria, Porphyromonas crevioricanis, Prevotella disiens, Moraxella bovoculi, Smithella, Leptospira inadai, Francisella novicida, Candidatus Methanoplasma termitum and Eubacterium eligens; or d) the Cpf1 protein, or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 2 or 3.

31.-33. (canceled)

34. The construct of claim 1, wherein a) the localization sequence is selected from the group consisting of nuclear localization sequence, mitochondrial localization sequence and derivatives thereof, optionally wherein the at least one or more nuclear localization sequences (NLS) are selected from the group consisting of Simian Vacuolating Virus 40 (SV40) Large T-antigen, Nucleoplasmin, Importin , EGL-13, c-MYC, TUS, AR, PLSCR1, PEP, TPX2, RB, TP53, N1N2, PB2, CBP80, SRY, hnRNP A1, HRP1, Borna Disease Virus p10, Ty1 Integrase, and the Chelsky consensus sequence; b) at least one or more nuclear localization sequences (NLS) are monopartite or bipartite NLS; or c) at least one or more nuclear localization sequences (NLS) are classical NLS (cNLS) or proline-tyrosine (PY)-NLS.

35.-40. (canceled)

41. The construct of claim 8, wherein the binding tag is located at either the N-terminus or the C-terminus of the construct, or at both ends of the construct.

42. (canceled)

43. The construct of claim 8, wherein the binding tag is selected from the group consisting of a V5 epitope tag, a FLAG tag, a tandem FLAG-tag, a triple FLAG tag (3FLAG), a Human influenza hemagglutinin (HA) tag, a tandem HA tag, a triple HA tag (3HA), a sextuple Histidine tag (6HIS), biotin, c-MYC, a Glutathione-S-transferase (GST) tag, a Strep-tag, a Strep-tag II, a S-tag, a natural histidine affinity tag (HAT), a Calmodulin-binding peptide (CBP) tag, a Streptavidin-binding peptide (SBP) tag, a Chitin-binding domain, a Maltose-binding protein (MBP) and derivatives thereof.

44. The construct of claim 43, wherein the V5 epitope tag sequence is SEQ ID NO: 12 or a derivative thereof.

45. The construct of claim 8, wherein the self-cleaving peptide is a 2A self-cleaving peptide or a derivative thereof.

46. The construct of claim 45, wherein the 2A self-cleaving peptide is SEQ ID NO: 13 or a derivative thereof.

47.-50. (canceled)

51. The construct of claim 1, wherein the construct has at least 90% sequence identity to SEQ ID NOs: 15 to 74.

52. The construct of claim 1, wherein the construct has a sequence selected from the group consisting of SEQ ID NO: 37, SEQ ID NO: 74 and SEQ ID NO: 249.

53. A nucleic acid sequence encoding an endonuclease-based gene editing construct, wherein the construct comprises the following components: (a) a CRISPR-associated endonuclease or a derivative thereof; and (b) at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof.

54.-56. (canceled)

57. A method of editing a genome of a host cell using the construct of any one of the preceding claims, the method comprising: (a) transfecting the host cell with a nucleic acid sequence encoding an endonuclease-based gene editing construct, wherein the construct comprises the following components: (i) a CRISPR-associated endonuclease or a derivative thereof; and (ii) at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof; and (b) incubating the cell of operation (a) with an inducing agent.

58.-64. (canceled)

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:

[0014] FIG. 1 | Building and testing a 4-HT-inducible Cas9. (a) Schematic of the ERT2-based strategy. A fusion of Cas9 (red) and ERT2 (crescent shape) was predicted to be sequestered in the cytoplasm. However, in the presence of an antagonist such as 4-HT (circle), the enzyme could enter the nucleus, form a complex with a sgRNA (brown and green), and generate a double-stranded break (triangles) in the cell's DNA (black). (b) Architectures of different Cas9-ERT2 fusions tested. Top, the original wild-type Cas9 construct, which contains a V5 epitope tag and two NLSs joined to Cas9. The orange fluorescent protein (OFP) is separated from Cas9 by a 2A self-cleaving peptide. Other rows represent five distinct configurations of NLS, ERT2, and Cas9 that were evaluated in this study. (c) Extent of genome modification determined by the Surveyor cleavage assay. Four genomic loci were tested, with each locus evaluated in at least two biological replicates, giving a total of eight independent experiments. Boxplots indicate the range of INDELs obtained from all the experiments. Wild-type Cas9 generated robust DNA modifications at all the targeted loci with or without 4-HT. In contrast, variant E, an ERT2-Cas9-ERT2 fusion, exhibited low activity in the absence of 4-HT but significantly higher activity in the presence of the chemical (**P<0.05, Student's t-test). (d) Extent of genome modification determined by Illumina deep sequencing. Wildtype Cas9 exhibited robust genome editing activity independent of tamoxifen. Consistent with the results from the Surveyor assay, fusion of the ERT2 domain to both the N-terminus and C-terminus of Cas9 could render the endonuclease activity of the enzyme to be significantly dependent on tamoxifen (**P<0.05, Student's t-test).

[0015] FIG. 2 | Optimization of the ERT2-Cas9-ERT2 architecture. (a) Left, placement of linkers and ERT2 copies evaluated in this study. Right, grouping of the 30 Cas9 variants tested on the basis of how they differ from the original variant E (see Supplementary Table 1 for details). OFP, orange fluorescent protein; 2A, self-cleaving peptide. (b) GFP disruption activity of the 30 Cas9 variants, wild-type Cas9, and the original variant E in the absence or presence of 4-HT. Boxes represent the range of values achieved by every variant within a particular group. Center lines, median; box limits, interquartile range; whiskers, 1.5 interquartile range. Each construct was tested in at least six biological replicates. n.s., not significant (P>0.25, Wilcoxon rank-sum test). (c) Detection of genome modification by Surveyor assay. Target site DNA was amplified from cells transfected with wild-type Cas9 or a Cas9 variant and treated with or without 4-HT (n=2 biological replicates per construct). n.s., not significant (P>0.25, Wilcoxon rank-sum test). (d) GPF reduction and INDEL percentages for the best-performing Cas9 variants. Each variant contained four copies of ERT2 and showed 4-HT-dependent activity, as assessed by GFP disruption assay (top), Surveyor assay (middle), and deep sequencing experiments (bottom). Transfection of wild-type Cas9 without and with the appropriate sgRNA served as negative and positive control, respectively.

[0016] FIG. 3 | Optimization of 4-HT treatment conditions. (a) Background activity of Cas9 variants 27, 29, and 30 at multiple genomic loci (P1 and P2, promoter sites; I1 and I2, intron sites), as determined by Surveyor assay to quantify INDEL frequency (n=1 replicate for variants 27 and 29; n=2 (VEGFA P1 and P2) or 3 (WAS I1 and I2, TAT, and FANCF) biological replicates for variant 30). (b) Intracellular localization of the Cas9 variants as determined by immunohistochemistry. Transfected HEK293 cells were untreated (0 h) or treated with 4-HT for 6 or 24 h before they were fixed and stained (n=7 biological replicates per construct and time point). At least 300 cells were counted for each sample. Although 4-HT treatment led to an increase in the percentage of cells containing nuclear protein for all three Cas9 constructs, variant 30 showed the lowest percentage of cells with nuclear Cas9 in the absence of 4-HT (**P<0.05, Wilcoxon rank-sum test); var, variant. Center lines, median; box limits, interquartile range; whiskers, 1.5 interquartile range. (c) Targeting efficiency of iCas (variant 30) across multiple genomic loci. Cells were analyzed by Surveyor assay after different durations of treatment with 1 M 4-HT, applied 24 h after transfection (n2 biological replicates for each locus and time point; see Online Methods for details). (d) DN A-modification specificity of iCas with different durations of 4-HT treatment, analyzed by Surveyor assay for off-target modifications (off) at various loci. Blue indicates no cleavage observed; red indicates the presence of cleavage bands (see Table 2 for details)

[0017] FIG. 4 | Comparison of iCas with an alternative inducible promoter-based system. (a) Overview of experimental setup for the comparative study. STF3A cells were engineered to stably produce the transactivator protein (Tet-On 3G) required for a doxycycline-inducible promoter (P.sub.TRE3G) to be functional. The hexagons with small circles at their corners represent retroviruses used to stably integrate the transactivator gene into the genome of the STF3A cell line. The upper concentric circles denote plasmids encoding iCas, while the lower concentric circles denote plasmids encoding wildtype Cas9 under the control of P.sub.TRE3G. (b) Detection of genome modification at the CTNNB1 locus by Surveyor assay. Arrows indicate the expected cleavage bands. The full gel image is shown in FIG. 28. (c) Repression of Wnt signaling pathway assayed by a Wnt-responsive luciferase reporter. A plasmid encoding either iCas (light) or PTRE3G-Cas9 (dark) was transfected into STF3A-Tet-On cells with or without sgRNA. The transfected cells were treated with 4-HT or dox for 6 h and harvested after another 72 h. All luciferase readings were normalized to those from control samples (no sgRNA). Data represent means.d. of 5 biological replicates (**P<0.005, ***P<0.001, Student's t-test). (d) Expression of CCND1, a Wnt target gene, measured by quantitative real-time PCR. Data represent means.d. of 5 biological replicates (*P<0.05, Student's t-test).

[0018] FIG. 5 | Comparison of iCas with intein-Cas9 and split-Cas9. (a) Detection of genome modification at the EMX1 locus by Surveyor cleavage assay. Transfected HEK293 cells were treated with (+, solid lines) or without (, dashed lines) chemical inducer and harvested at 12, 24, 48, 72, or 96 h after treatment. Data represent means.e.m. of 4 (12, 24, 48, and 96 h) or 8 (72 h) biological replicates. (b) Switching ratios at the EMX1 locus upon addition of inducer. *P<0.1, **P<0.05, Student's t-test. Center lines, median; box limits, interquartile range; whiskers, 1.5 interquartile range. (c,d) Surveyor cleavage assay evaluating the ability of iCas, intein-Cas9, and split-Cas9 to edit two genomic loci simultaneously after 12 (c) or 24 h (d) of inducer treatment. Arrows indicate the expected cleavage bands. (Full gel images are shown in FIG. 29.)

[0019] FIG. 6 | Toggling the activity of iCas on and off. (a,b) Immunofluorescence images (a) and quantification (b) of iCas in HEK293-iCas cells either untreated (N.A.) or treated with 4-HT and then fixed at 0, 48, or 72 h after removal of 4-HT from the culture medium. DAPI, nuclear staining; V5, antibody specific for V5-tagged Cas9. Scale bar, 50 In (b), at least 300 cells were counted for each sample and time point. Data represent means.d. from three biological replicates. n.s., not significant (P>0.25); **P<0.001, Student's t-test. (c) Surveyor assay detection of genome modification in HEK293 cells after two cycles of transfection and 4-HT treatment (4-HT 1 and 4-HT 2). A sgRNA targeting the WAS locus was transfected first, and a sgRNA targeting the ASXL2 locus second. Arrows indicate the expected cleavage bands. (The full gel image is shown in FIG. 30.)

[0020] FIG. 7 | Assessing the feasibility of an ERT2-based strategy to control the activity of Cas9. (a) Illumina deep sequencing was used to quantify the percentage of insertions and deletions (indels) generated at four targeted sites (n=2 biological replicates for each locus). Wildtype Cas9 exhibited robust genome editing activity independent of 4HT. In contrast, fusion of the ERT2 domain to both the N-terminus and C-terminus of Cas9 (variant E) rendered the endonuclease activity of the enzyme to be significantly dependent on 4HT (**P<0.05, Student's t-test). (b) The genome editing activity of variant E was evaluated with and without 1 M 4HT by clonal Sanger sequencing (one replicate). Specifically, to estimate the INDEL frequency, PCR amplicons were cloned into the pUC19 vector and sequenced at least 24 clones for each sample. Consistent with the results from Surveyor assays and deep sequencing, an increase in genome modification was observed upon addition of 4HT for each of the genomic locus tested. However, it was noted that even in the absence of the chemical, variant E exhibited some leaky genome editing activity, indicating that the construct needed to be further optimized in order to reduce the background activity of the fusion protein.

[0021] FIG. 8 | Subcellular localization of an ERT2-Cas9-ERT2 fusion. Western blot analysis was performed to determine the subcellular distribution of both wildtype Cas9 and the Cas9 variant E (see FIG. 1b) in the presence or absence of 1 M 4HT. Transfected HEK293 cells were separated into cytoplasmic and nuclear fractions using the REAP protocol1. Both the wildtype Cas9 protein and variant E were tagged with a V5 epitope and thus could be readily detected using an -V5 antibody. 3PGDH served as a cytosolic marker, while total histone H3 served as a nuclear marker. Treatment with 1 M 4HT for 24 hours caused a 3.4-fold increase in the nuclear-to-cytoplasmic ratio of the ERT2-Cas9-ERT2 protein but only a 1.2-fold increase for the wildtype Cas9 protein. W: whole cell lysate, N: nuclear fraction, C: cytoplasmic fraction.

[0022] FIG. 9 | GFP disruption assay for evaluating different fusion proteins of Cas9 and ERT2. (a) shows a schematic illustrating the principle behind the GFP disruption assay. Fluorescent cells are transfected with a plasmid encoding a Cas9 variant and a sgRNA targeting the eGFP gene. Upon addition of 4HT, the Cas9 variant translocates into the nucleus and cleaves the targeted genomic locus, thereby stimulating the error-prone, non-homologous end-joining (NHEJ) pathway for DNA repair. If a frame shift mutation occurs, the cell will show a loss of fluorescence signal. (b) shows flow cytometry graphs depicting representative data from a multitude of flow cytometry experiments. Within the HEK293-GFP cells, there are two sub-populations, namely GFP-high and GFP-intermediate cells. When the HEK293-GFP cells are transfected with an eGFP-targeting sgRNA and an active Cas9 enzyme, they lose fluorescence, as shown by an increase in the proportion of cells that are GFP-intermediate. Hence, to determine the activity of a Cas9 variant in the presence or absence of 4HT, mean GFP fluorescence intensity was measured from at least 10,000 live single successfully transfected (OFP-positive) cells for every sample.

[0023] FIG. 10 | Individual results from the evaluation of 30 different fusions of Cas9 and ERT2. (a) shows line graphs depicting results of the GFP disruption assay. Light yellow background shading indicates variants with two copies of ERT2 each, light blue shading indicates variants with three copies of ERT2 each, and light red shading indicates variants with four copies of ERT2 each. The dotted horizontal lines depict the median reductions in GFP intensity. Every construct was tested in at least six biological replicates. In the absence of the inducer, most of the Cas9 variants with three or four ERT2 domains exhibit a lower reduction in GFP signal than the median (9.2%). (b) shows line graphs depicting results of the Surveyor cleavage assay. Within each colour background shading, the variants are ranked in decreasing order of INDEL frequency observed without 4HT. The blue dotted horizontal line represents the median INDEL frequency observed in the absence of the inducer (2.0%), while the orange dotted horizontal line represents the median INDEL frequency observed in the presence of the inducer (10.4%) (n=2 biological replicates per construct). Most of the Cas9 variants with three or four ERT2 domains exhibit lower background activity (no 4HT present) than the median. (c) shows line graphs depicting the results of deep sequencing experiments. The orange or blue dotted horizontal line indicates the median INDEL frequency measured with or without 4HT respectively (n=2 biological replicates per construct). Notably, all the Cas9 variants with four copies of ERT2 displayed lower levels of leaky background activity than the median (2.8%).

[0024] FIG. 11 | Assessing the effectiveness of different optimization strategies. Data is shown as box plots, depicting the performance of four classes of Cas9 variants (see FIG. 2a), which was evaluated by Illumina deep sequencing. INDEL frequency was quantified from high throughput sequencing of DNA amplified from the EMX1 target locus in the absence or presence of 4HT (n=2 biological replicates per construct). A no-guide RNA control was included to determine the background measurement error. Without 4HT, Group 3 and 4 variants exhibited an activity level that is not above background (n.s.: not significant, P>0.25, Wilcoxon rank sum test).

[0025] FIG. 12 | Comparison of results from GFP disruption assay, Surveyor cleavage assay, and deep sequencing. To determine how well the results from the different assays agree with one another, the performance of the 30 tested Cas9 variants was rank ordered in terms of leakiness and level of induced activity for each of the three assays. The difference in rank between any two assays for every Cas9 variant was then calculated and the data shown as column graphs. Notably, it was found that the distribution of rank differences is clearly non-random (P<0.05, Kolmogorov-Smirnov test) and that for most of the Cas9 variants, the relative rankings from at least two of the assays are in close agreement with one another.

[0026] FIG. 13 | Flowchart for identifying the best-performing Cas9 variants. A total of 30 Cas9 variants, divided into four groups based on their architecture (see FIG. 2a), were evaluated for leakiness in activity and cutting efficiency. The different fusion proteins were assessed using the GFP disruption assay, Surveyor assay, and deep sequencing. Only eight of the variants showed less background editing activity than the original ERT2-Cas9-ERT2 protein (variant E) across all experiments. Out of these, three variants, all of which contained four copies of ERT2, showed a clear increase in editing activity upon the addition of 4HT to a level that is above the leakiness of variant E in all experiments.

[0027] FIG. 14 | Effect of 4HT dose on EMX1 targeting efficiency, as shown in three-dimensional column graphs. (a) The extent of genome modification was quantified using the Surveyor cleavage assay. Various concentrations of 4HT over different treatment durations were tested for each of the top three best-performing Cas9 variants (n=2 or 3 biological replicates for each data point, except for Variant 27's 16 hr and 48 hr time points where n=1 replicate). Editing activity appeared to show a general increase with longer durations of 4HT treatment for all three variants. (b) The extent of genome modification was quantified using Illumina deep sequencing. Due to the high sensitivity of sequencing, INDELs could be detected within 2 hours of 4HT treatment for all the three variants. Additionally, the extent of genome modification increased with longer periods of chemical treatment, agreeing with the results from the Surveyor assays (n=1 replicate for the 0 hour, 16 hour, and 24 hour time points; n=2 or 3 replicates for the 2 hour, 4 hour, 6 hour, and 8 hour time points).

[0028] FIG. 15| Titration of 4HT concentration for optimal induction of genome editing activity. The extent of genome modification at the EMX1 locus was measured using (a) the Surveyor assay and (b) deep sequencing for three different concentrations of 4HT. Overall, treatment with 10 nM 4HT consistently resulted in lower INDEL frequencies than either 100 nM or 1000 nM 4HT (**P<0.005, ***P<0.001, Wilcoxon rank sum test), indicating that at least 100 nM 4HT should be used for maximum activation of the inducible genome editing system.

[0029] FIG. 16 | Genome modification at the FANCF locus in HEK293 cells, the data represented as images showing the results and the concentration of gene induction. 24 hours after transfection with Cas9 variant 27, 29, or 30, the cells were either harvested immediately (0 hour time point) or were exposed to 0 nM, 100 nM or 1000 nM 4HT for another 24 hours before genomic DNA was isolated and analysed by the Surveyor assay. Arrows indicate the expected cleavage bands. Regardless of the Cas9 variant used, strong cleavage bands were observed when the cells were treated with 100 nM or 1000 nM 4HT. However, in the absence of 4HT, no cleavage band was detected for variant 30, while cuts were observed for variants 27 and 29. Furthermore, the leakiness in editing activity became more pronounced over time, as indicated by the increase in INDEL frequency from 2.5% to 7.7% for variant 27 and from 2.9% to 3.5% for variant 29.

[0030] FIG. 17 | Intracellular localization of the Cas9 variants. Micrographs of representative images are shown Immunohistochemical staining was utilised to determine the localization of the (ERT2)2-Cas9-(ERT2)2 proteins. 24 hours after transfection, the HEK293 cells were treated with 4HT for 0 hour, 6 hours, or 24 hours before fixing and staining them with anti-V5 antibody. Only the cells that were successfully stained (dark cells) were counted (scale bar=10 m). Over 300 cells were counted for each sample and time point.

[0031] FIG. 18 | Genome modification at the TAT locus in different cancer cell lines. The efficiency of iCas was tested in the breast cancer cell line MCF7, as well as in the colorectal cancer cell lines DLD1 and HCT116, the results of which are shown as gel images. Based on the Surveyor assays, genome modifications were detected after the cells were treated with 1 M 4HT for 6 hours. The INDEL frequency increased when the treatment duration was lengthened to 8 hours for all the cell lines tested. Arrows indicate the expected cleavage bands.

[0032] FIG. 19 | Specificity of iCas evaluated by Surveyor cleavage assay at multiple genomic loci. The specificity of iCas was tested using seven distinct guide RNAs (gRNAs), the results of which are shown in three-dimensional column graphs. It was found that iCas displayed variable specificity profiles for different gRNAs, which could be broadly divided into three groups: (a) highly specific, (b) moderately specific, and (c) unspecific. For the gRNAs targeting the EMX1 exonic locus and the second intronic site within the WAS gene, iCas exhibited almost no off-target effects for all durations of 4HT treatment tested. For the gRNAs targeting the first site within the VEGFA promoter, the TAT locus, and the FANCF locus, off-target effects were observed after 16 hours of 4HT treatment. For the gRNAs targeting the second site within the VEGFA promoter and the first intronic site within the WAS gene, off-target genome editing occurred at about the same time as the intended on-target genome modifications, regardless of how the duration of 4HT treatment was varied. (n2 replicates for each on-target site; n3 replicates for each EMX1 off-target site; n1 replicate for other off-target sites.)

[0033] FIG. 20 | Specificity of iCas evaluated by deep sequencing at multiple genomic loci. The gRNAs used were separated into three groups based on FIG. 17. The results from the deep sequencing experiments (n1 replicate for each data point) largely mirrored those obtained from the Surveyor assays. All the results are shown in three-dimensional column graphs. (a) The gRNAs targeting EMX1 and WAS intronic site 2 were highly specific. (b) The gRNAs targeting VEGFA promoter site 1 and TAT showed some off-target genome editing, which was generally less than the corresponding on-target modifications. However, unlike the Surveyor assay, only minimal off-target INDELS for the FANCF gRNA were observed using deep sequencing. (c) The gRNAs targeting VEGFA promoter site 2 and WAS intronic site 1 exhibited comparable on-target and off-target genome modifications, even with less than 8 hours of 4HT treatment. See Table 2 for details of the on-target and off-target sites.

[0034] FIG. 21| Specificity of iCas in comparison with wildtype Cas9. To evaluate the specificity of wildtype Cas9 and iCas, (a) the Surveyor assay and (b) deep sequencing, respectively, were used to analyse two known off-target sites (Off 1 and Off 2) of the EMX1-targeting sgRNA, the data of which is shown as column graphs. From the Surveyor assay, cleavage bands were observed for wildtype Cas9 at both off-target sites as previously reported. However, the iCas system did not induce any observable genome modification at the two sites after 24 hours treatment with 1 M 4HT. From deep sequencing data, it was shown that iCas produced INDELs at levels barely above background level at both off-target sites, while wildtype Cas9 generated significantly higher INDEL frequencies (*P<0.005, Student's t-test). Error bars reflect the standard deviation from at least two biological replicates.

[0035] FIG. 22 | Comparison of iCas with PTRE3G-Cas9. (a) shows a schematic overview of experimental setup for the comparative study. A previously reported STF3A cell line was engineered to stably produce the transactivator protein (Tet-On 3G) required for a functional doxycycline (dox)-inducible promoter (P.sub.TRE3G). The STF3A cells carry a TCF/Lef responsive luciferase reporter and also express high levels of Wnt3a. The hexagons with small circles at their corners represent retroviruses used to stably integrate the transactivator gene into the genome of the STF3A cell line. The upper concentric circles denote plasmids encoding iCas, while the lower concentric circles denote plasmids encoding wildtype Cas9 under the control of P.sub.TRE3G. (b) shows Brightfield and fluorescent images, showing expression of fluorescent signal in the cells, showing successful cell transfection and expression of the tdTomato gene. To evaluate the STF3A-TetOn cell line, the engineered cells were transfected with a plasmid carrying the tdTomato gene under the control of a doxycycline-inducible promoter. The cells exhibited a strong fluorescence signal upon treatment with doxycycline for 24 hours. In contrast, there was very little fluorescence signal in the absence of the chemical. Various concentrations of doxycycline, from 50 to 1000 ng/ml, were tested, all of which yielded similar fluorescence intensities (scale bar=400 m).

[0036] FIG. 23 | Levels of -catenin transcript and protein. (a) shows column graphs depicting the expression of -catenin as assessed by quantitative real-time PCR (qRT-PCR). HEK293 cells were transfected with either iCas or PTRE3G-Cas9 and then treated with the corresponding inducer for 6 hours. Subsequently, they were harvested for analysis after another 72 hours. When the cells were co-transfected with iCas and a sgRNA targeting -catenin, a significant decrease in the transcript level of -catenin (*P<0.05, Student's t-test) was observed. Such a decrease was not observed for cells transfected with an EMX1-targeting sgRNA or with P.sub.TRE3G-Cas9 instead of iCas. Error bars reflect the standard deviation from at least three biological replicates. (b) shows images of Western blots, showing the detected levels of -catenin protein. In cells that were co-transfected with iCas and a -catenin-targeting sgRNA, the amount of -catenin protein dropped to less than 20% of the original level. Such a large decrease was absent from cells that were not transfected with the -catenin-targeting sgRNA or were transfected with PTRE3G-Cas9 instead of iCas.

[0037] FIG. 24 | Perturbation of Wnt signalling by iCas. The expression levels of two Wnt target genes, MYC and JUN, were measured using qRT-PCR and the data shown as column graphs. All measurements were normalized to those of the control samples (no sgRNA). The expression of MYC and JUN were significantly down-regulated in STF3A-TetOn cells that were transfected with both a plasmid encoding iCas and a sgRNA targeting -catenin (*P<0.05, **P<0.005, Student's t-test). Error bars reflect the standard deviation from at least three biological replicates.

[0038] FIG. 25 | Benchmarking three different conditional genome editing technologies by the Surveyor cleavage assay. (a) shows line plots depicting the change in % of INDELs present over time for various systems. The TAT and WAS genomic loci were targeted using iCas, intein-Cas9, or split-Cas9. Transfected HEK293 cells were treated with (solid lines) or without (dotted lines) the appropriate chemical inducer and harvested at 12, 24, 48, 72, or 96 hours after treatment. Upon activation, the iCas technology generated INDELS more rapidly than the other two systems. Error bars reflect the SEM from at least three biological replicates. (b) shows box plots depicting the switching ratios (extent of genome modification in the presence of inducer divided by extent of genome modification in the absence of inducer) at the TAT and WAS loci. Overall, the iCas system was turned on with comparable or higher efficiencies than intein-Cas9 and split-Cas9 upon addition of the appropriate inducer (*P<0.1, Student's t-test).

[0039] FIG. 26 | Comparison of iCas with intein-Cas9 and split-Cas9. (a) shows line graphs depicting the quantification of INDEL frequency at the EMX1 (left panel), TAT (middle panel), and WAS loci (right panel) by deep sequencing. Transfected HEK293 cells were treated with (solid lines) or without (dotted lines) the appropriate chemical inducer and harvested at 12, 24, 48, 72, or 96 hours after treatment. The background activity of iCas and intein-Cas9 in the absence of 4HT were similar, but the iCas system exhibited higher editing activity upon addition of the inducer at all the three loci tested. Additionally, although the split-Cas9 architecture appeared to have the lowest amount of leakiness, its activity was switched on more slowly than iCas and intein-Cas9 after addition of the appropriate inducer. Notably, the frequency of INDELS generated by split-Cas9 after approximately 96 hours of induction could be readily achieved by iCas after only 12 hours of induction. Error bars reflect the SEM from at least three biological replicates. (b) shows box plots depicting the switching ratios at different genomic loci. Based on the deep sequencing measurements, the iCas system was turned on more efficiently than intein-Cas9 at all the loci tested upon addition of 4HT, while the split-Cas9 architecture outperformed intein-Cas9 at the WAS locus (*P<0.1, **P<0.05, ***P<0.01, Student's t-test). (c) shows gel images evaluating the ability of iCas, intein-Cas9, or split-Cas9 to edit two genomic loci (EMX1 and TCF7) simultaneously upon 24 hours of inducer treatment. Based on data gathered from the Surveyor cleavage assay, genome modification was observed at both the targeted loci for iCas. In contrast, intein-Cas9 only produced cuts at the EMX1 locus but not at the TCF7 locus, while no genome modification was detected for split-Cas9 at both the targeted loci. Arrows indicate the expected cleavage bands.

[0040] FIG. 27 | Temporal switching of iCas activity. (a) shows gel images depicting the results of the integration of the iCas system into HEK293 cells using retroviral transduction. To assess the functionality of this cell line, a lentivirus expressing a sgRNA targeting one of the coding exons of the PARP4 gene was generated, and HEK293-iCas cells were infected the virus. After puromycin selection and continuous passaging of the cells for at least two weeks, the cells were treated with or without 1 M 4HT for 24 hours before harvesting them for analysis using the Surveyor assay. Cleavage bands (indicated by arrows) were observed for the treated cells, but not for the untreated cells, indicating that the level of leakiness is sufficiently low to minimize unwanted genome editing in the absence of the inducer. (b) shows a schematic outlining the experiment to toggle the activity of iCas on-off-on. When the cells are treated with 4HT, the iCas enzyme is expected to translocate into the nucleus and be able to edit the DNA (these cells are depicted as yellow). However, when the inducer has been removed for more than 72 hours, the iCas protein is expected to translocate out of the nucleus and thus no more editing will be anticipated.

[0041] FIG. 28 | Levels of iCas transcript in a stable line and during transfection. Column graphs here show the quantification of the expression of iCas using qRT-PCR. In comparison to the HEK293-iCas stable line, the transcript level of the editing enzyme was found to be more than a hundred fold higher, compared to when an iCas-bearing plasmid was transfected into wildtype HEK293 cells. Cells were harvested 72 hours after transfection (n=3 biological replicates).

[0042] FIG. 29 | Effect of enzyme dosage on the level of background activity, shown as gel images. Different amounts of an iCas-bearing plasmid were transfected together with an EMX1-targeting sgRNA into HEK293 cells. These cells were harvested 96 hours after transfection for the Surveyor cleavage assay. DNA modification was clearly observed when 1 g of plasmid was utilised, even in the absence of the inducer. Importantly, background editing activity at the EMX1 locus was noticeably reduced when 0.5 g or 0.25 g plasmid was used instead. The switching ratio was improved from 2.83 to 4.16 and 7.96 respectively. Hence, these results indicate that the extent of leakiness may be modulated by adjusting the dosage of the iCas enzyme. Arrows indicate the height of the expected cleavage bands.

[0043] FIG. 30 | Detection of genome modification at the -catenin locus using the Surveyor assay, the results of which are shown as an agarose gel image. The uncropped full gel image corresponding to the image shown in FIG. 4a is shown.

[0044] FIG. 31 | Evaluating the ability of iCas, intein-Cas9, or split-Cas9 to edit two genomic loci simultaneously, the results of which are shown as agarose gel images. (a) Cells were treated with the appropriate inducer for 12 hours. The uncropped full gel images corresponding to FIG. 5c are shown. (b) Cells were treated with the appropriate inducer for 24 hours. The uncropped full gel images corresponding to FIG. 5d are shown.

[0045] FIG. 32 | Detection of genome modification using the Surveyor cleavage assay to demonstrate that iCas activity could be switched on and off repeatedly, as shown in the presented gel image. The uncropped full gel image corresponding to FIG. 6c is shown.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

[0046] Recently, the development of genome editing technologies has opened up new avenues of biomedical research and holds the promise to accelerate knowledge discovery and drug development. The CRISPR-Cas9 system, for example, which is co-opted from bacteria, is particularly attractive because the elements that recognize the target genomic loci are simple single guide RNA (sgRNA) molecules, which bind the loci-of-interest by complementary base-pairing and are hence straightforward to design and synthesize. The sgRNA recruits the Cas9 nuclease to the DNA to create a double-stranded break. Much effort has been devoted to improving the specificity of the technology and various strategies have been proposed to mitigate off-target mutagenesis by the Cas9 enzyme.

[0047] In one aspect, the present invention refers to an endonuclease-based gene editing construct. As used herein, the term endonuclease(s) refers to enzymes that are capable of cleaving/restricting, that is inducing a strand break, in a section of a nucleic acid sequence. Depending on the type of endonuclease required, the endonuclease can be capable of cleaving within a single strand region of a nucleic acid sequence, a double strand region of a nucleic acid sequence or both. In general, endonucleases can be divided into 3 types, that is Type I, II and III, according to their mechanism of action. Type I and type III nucleases typically refer to large multi-subunit endonucleases that have both endonuclease and methylase activity (that is ATP [adenosine triphosphate] is required as a source of energy). Type II endonucleases, on the other hand, are simpler in structure and do not require an energy source such as ATP. The type of restriction site and specificity of the endonuclease to its particular restriction site, that is the site where the strand break is induced, varies between each endonuclease. It is also possible for an endonuclease to cleave the nucleic acid strand a number of base pairs upstream or downstream from the recognition site. For example, Type I endonucleases are known for cleaving random nucleic acid sequences up to 1000 or more base pairs upstream and/or downstream from the recognition site. Type III endonucleases, on the other hand, are known for cleaving nucleic acid sequences up to 25 or more base pairs from the recognition sites. Thus, in one example, the endonuclease is, but is not limited to, CRISPR-associated endonuclease, for example Cas9 and Cpf1, or derivatives thereof.

[0048] As used herein, the term CRISPR refers to Clustered regularly interspaced short palindromic repeats, which are segments of prokaryotic DNA containing short repetitions of base sequences. Each repetition can be followed by short segments of spacer DNA within a sequence. The term Cas9 refers to CRISPR associated protein 9, which is an RNA-guided DNA endonuclease enzyme associated with the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) type II adaptive immunity system in, for example, Streptococcus pyogenes, among other bacteria. S. pyogenes utilizes Cas9 to interrogate and cleave foreign DNA, such as invading bacteriophage DNA or plasmid DNA. Cas9 interrogates the foreign DNA by unwinding it and checking whether the foreign DNA is complementary to the 20 base pair spacer region of the guide RNA. If the interrogated DNA substrate is complementary to the 20 base pair spacer region of the guide RNA, Cas9 cleaves the invading DNA. Mechanistically speaking and without being bound by theory, the CRISPR-Cas9 mechanism has a number of parallels with mechanism of the RNA interference (RNAi) present in eukaryotes.

[0049] Thus, in one example, the CRISPR-associated endonuclease, or derivative thereof, is selected from the group consisting of a wild type CRISPR-associated protein 9 (Cas9), a mutated CRISPR-associated protein 9 (Cas9), a wild type Cpf1 (CRISPR from Prevotella and Francisella 1) protein, and a mutated Cpf1 protein, In the event where the protein is mutated, the mutant protein is to be functional. In another example, the wherein the CRISPR-associated protein 9 (Cas9), or derivative thereof, is selected from the group consisting of Streptococcus pyogenes, Streptococcus thermophiles, Listeria innocua, Staphylococcus aureus and Neisseria meningitidis. In yet another example, the CRISPR-associated protein 9 (Cas9), or derivative thereof, has at least 99%, at least 98%, at least 97%, at least 96%, at least 95%, at least 94%, at least 93%, at least 92%, at least 91%, at least 90%, at least 89%, at least 85%, at least 80%, at least 75%, sequence identity to SEQ ID NO: 1. In yet another example, the CRISPR-associated protein 9 (Cas9), or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 1. In a further example, the Cpf1 protein, or derivative thereof, is selected from the group consisting of Acidaminococcus, Lachnospiraceae, Parcubacteria, Butyrivibrio proteoclasticus, Peregrinibacteria, Porphyromonas crevioricanis, Prevotella disiens, Moraxella bovoculi, Smithella, Leptospira inadai, Francisella novicida, Candidatus Methanoplasma termitum and Eubacterium eligens. In another example, the Cpf1 protein, or derivative thereof, has at least 99%, at least 98%, at least 97%, at least 96%, at least 95%, at least 94%, at least 93%, at least 92%, at least 91%, at least 90%, at least 89%, at least 85%, at least 80%, at least 75% sequence identity to SEQ ID NO: 2 or 3. In another example, the Cpf1 protein, or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 2 or 3 The term sequence identity means that two nucleic acid or amino acid sequences are identical (i.e., on a nucleotide-by-nucleotide or residue-by-residue basis) over the comparison window. The term percentage of sequence identity is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) or residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the comparison window (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. In light of the above, it is understood to a person skilled in the art what is meant by a sequence identity of, for example, at least 95%.

[0050] The terms upstream or downstream refer relative positions in nucleic acid sequence, that is in a DNA or RNA sequence. Each strand of DNA or RNA has a 5 end and a 3 end, which are so named for the carbon position on the deoxyribose (or ribose) ring. By convention, upstream and downstream relate to the 5 to 3 direction in which RNA transcription takes place. In this case, upstream is toward the 5 end of the RNA molecule and downstream is toward the 3 end of the RNA molecule. When considering double-stranded DNA, upstream is toward the 5 end of the coding strand for the gene in question and downstream is toward the 3 end. Due to the anti-parallel nature of DNA, this means the 3 end of the template strand is upstream of the gene and the 5 end is downstream. It is noted that some genes on the same DNA molecule may be transcribed in opposite directions. This means the upstream and downstream areas of the molecule may change depending on which gene is used as the reference.

[0051] In order for such an endonuclease-based gene editing construct to be functional, other factors may be required, other than the endonuclease itself. In the case of, for example, CRISPR-associated endonucleases, such as Cas9 or Cpf1, a guide nucleic acid sequence is required in order to guide the endonucleases to the correct excision or editing loci. Therefore, the endonuclease needs to be capable of cleaving a nucleic acid in a specific section marked by the binding of, for example a guide nucleic acid sequence. In one example, a single strand guide nucleic acid sequence would bind to a complementary sequence within a genome or a stretch of nucleic acid. This binding of the guide sequence to the genome results in a double strand nucleic acid section, which is then recognized by the endonuclease and is then targeted for excision. Thus, in one example, the sequence of the guide nucleic acid sequence is complementary to the sequence of the intended restriction site. In another example, the sequence of the guide nucleic acid sequence is identical to the sequence of the intended restriction site. In another example, more than one nucleic acid guide sequences are used in conjunction with one or more nucleases. In another example, for example when multiple endonucleases are used, the guide sequences are specific for each endonuclease. In another example, where a single endonuclease and multiple guide sequences are used, the guide sequences must be so constructed that the endonuclease is capable of restricting the nucleic acid sequence at all of restriction sites. Therefore, by delivering, for example, a Cas9 endonuclease and appropriate guide nucleic acid sequence into a cell, the cell's genome can be cleaved at a desired location, thereby allowing existing genes to be removed and/or new genes to be added, or the function of existing genes to be modulated. In terms of the present invention, the process of gene editing becomes simplified in terms of procedure, because the sgRNA molecules guide the Cas9 nuclease to the (then double strand) loci within the genome, which is then excised from that location. This removes the double strand section from the loci in question, thereby creating, for example, a gene knock-out or knock-down for situations where the sgRNA binds to a functional part of a gene, or a gene knock-in in the event that a gene is introduced into the restriction site.

[0052] There are various ways of controlling or inducing certain aspects of a biological system. For example, the use of the lac operon system is frequently used for prokaryotic gene regulation, as it allows for an effective, inducible regulatory mechanism based on the absence or the presence of lactose. In general, such systems can be described using the terms inducible and repressible systems, whereby an inducible system is off unless there is the presence of a control molecule (also called an inducer) that allows for, in this case, gene expression. The molecule is said to induce expression. On the other hand, a repressible system is on except in the presence of some molecule (also called a co-repressor) that suppresses, in this case gene expression. The molecule is said to repress expression. In both cases, the manner by which the induction or repression happens is dependent on the control mechanisms, as well as differences between prokaryotic and eukaryotic cells. Another example of an inducible expression system is tetracycline controlled transcriptional activation, wherein the activation of transcriptional activity is dependent on the presence of tetracycline. Having said that, these on and off switches that are usually found in the field of protein expression can be used in other situations where control over a specific enzyme function is desired. In one example, the inducible system used is the ERT2-tamoxifen inducible system. This system allows for temporal control of the enzyme in questions, as the ERT-domain can be fused to any protein of interest, allowing reversible control over their activity by administrating or removing tamoxifen, (or derivatives thereof, for example, 4-hydroxytamoxifen), that is the inducing agent that either switches the control of the target protein on or off, depending on the concept used. For example, without being bound by theory, it is thought that in the constructs disclosed herein, the ERT2 domains effectively sequester the Cas9-dependent constructs outside of the nucleus, where they cannot perform their DNA editing activity. In the presence of an inducing agent, for example tamoxifen, however, the fusion protein can then rapidly translocate into the nucleus to perform its function.

[0053] As explained previously, the inducing agent used would depend on the type of inducible/repressible system used. Also, in order to be able to function as an inducing agent, the compound which is to function as an inducing agent need to be small enough in order to penetrate the cell membrane and thereby be present in the cell cytoplasm, or even the cell nucleus, depending on where the expressed protein is found. In one example, the construct as disclosed herein comprises the following components: a CRISPR-associated endonuclease (such as Cas9 or Cpf1) or a derivative thereof; and at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof. In one example, the one or more hormone binding domains of the estrogen receptor (ERT2) are located upstream or located downstream of the CRISPR-associated endonuclease. In another example, if there are two or more ERT2 present in the construct, the ERT2 are all located upstream, or all located downstream, or located both upstream and downstream of the CRISPR-associated endonuclease. In another example, the hormone binding domains of the estrogen receptor (ERT2) is mutated. In yet another example, the mutated hormone binding domain of the estrogen receptor (ERT2) is SEQ ID NO: 4, or derivatives, or variations thereof. In one example, the inducing agent is, but is not limited to, tamoxifen, 4-hydroxytamoxifen or derivatives thereof. In another example, the inducing agent is 4-hydroxytamoxifen.

[0054] The concentration of the inducing agent used or required in order to control the protein in question depends on the inducing agent used, as well as the time in which the host cell is exposed to the incubating agent. It will be appreciated that the inducing agent may not be used in concentrations that may result in a toxic or adverse effect in the host cell. Thus, in one example, the concentration of the inducing agent used is 0.5 M, about 0.25 M, about 1 M, about 1000 nM, about 500 nM, about 250 nM, about 100 nM, about 50 nM, about 25 nM or about 10 nM. In another example, the concentration of the inducing agent used is a concentration of about 1 M. It will also appreciated that the length of time a host cell is exposed to an incubating agent may have an effect on the length of time the inducible or repressible system is turned on, or off, respectively. Thus, in one example, the host cell is incubated with the inducing agent for about 2, about 3, about 4, about 5, about 5.5, about 6, about 6.5, about 7, about 7.5, about 8, about 8.5, about 9, about 12, about 16, about 23.5, about 24, about 24.5, about 36 or about 48 hours. In another example, the host cell is incubated with the inducing agent for about 4, about 6, about 8 or about 12 hours.

[0055] As used herein, the term localization sequence refers to an amino acid sequence which tags a protein for transport into a specific compartment of the cell or the cell nucleus. One example of a localization sequence is a nuclear localization sequence or signal (NLS), which tags a protein for import into the nucleus of the cell. Another example is a nuclear export signal (NES), which has the opposite function in that it tags a protein for export out of the nucleus into the cytoplasm. Nuclear localization sequences can be divided into non-classical and classical NLSs. Classical nuclear localization sequences, that is NLSs that use the classical nuclear import cycle which may require the presence of an importin protein, can be further classified as either monopartite (which means to have a single part) or bipartite (to have more than one part, in this case two parts). For example, the sequence PKKKRKV in the SV40 Large T-antigen is considered to be a monopartite NLS. The NLS of nucleoplasmin, KR[PAATKKAGQA]KKKK, is an example of a bipartite signal, wherein two clusters of basic amino acids are present, separated by a spacer of about 10 amino acids. It is noted that this spacer may be variable in length. Examples of nuclear localization signals are, but are not limited to the nuclear localization signals of SV40 large T-Antigen (monopartite; PKKKRKV or CGGGPKKKRKVED), c-myc (monopartite; PAAKRVKLD), and nucleoplasmin (bipartite; AVKRPAATKKAGQAKKKKLD or KRPAATKKAGQAKKKK); EGL-13 (monopartite; MSRRRKANPTKLSENAKKLAKEVEN) and TUS-protein (monopartite; KLKIKRPVK). In another example, the nuclear localization signals (NLSs) are classical NLSs (cNLS) or proline-tyrosine (PY)-NLS. In yet another example, the nuclear localization signals (NLSs) are monopartite or bipartite NLSs. In a further example, the nuclear localization signal is, but is not limited to, the nuclear localization signal of the Large T-antigen of the Simian Vacuolating Virus 40 (SV40), nucleoplasmin, importin , EGL-13, c-MYC, TUS, AR, PLSCR1, PEP, TPX2, RB, TP53, N1N2, PB2, CBP80, SRY, hnRNP A1, HRP1, Borna Disease Virus p10, Ty1 Integrase, and the Chelsky consensus sequence. As used herein, in regards to NLS, the term signal and sequence is used interchangeably. In yet another example, the nuclear localization sequence (NLS) is SEQ ID NO: 5 or SEQ ID NO: 6.

[0056] There are many other types of NLS, such as the acidic M9 domain of hnRNP A1, the sequence KIPIK in yeast transcription repressor Mat2, and the complex signals of U snRNPs. Most of these NLSs appear to be recognized directly by specific receptors of the importin family without the intervention of an importin -like protein and are therefore considered to be non-classical nuclear localization sequences. Another example of a localization sequence is mitochondrial targeting signal, which is a 10 to 70 long peptide that is usually present at the end of nascent proteins and which directs these nascent proteins to the mitochondria. It is usually found at the N-terminus and comprises of an alternating pattern of hydrophobic and positively charged amino acids, thereby usually forming an amphipathic helix. Mitochondrial targeting signals can also contain additional signals that subsequently direct the protein to different regions of the mitochondria, for example the mitochondrial matrix. Like many signal peptides, mitochondrial targeting signals may and are usually cleaved in vivo once targeting is complete. Yet another example of a non-classical nuclear localization protein is a proline tyrosine nuclear localization protein, so named for the presence of a PY-NLS motif, which is a proline-tyrosine amino acid pairing which allows the protein to bind to, for example, importin 2, and thereby facilitating its transport. Therefore, in another example, the localization sequence is a nuclear localization sequence, mitochondrial localization sequence or derivatives thereof. In one example, the mitochondrial localization sequence (MLS) is, but is not limited to, ATP5B, SOD2, COX8A, OTC, or TFAM. In another example, the mitochondrial localization sequence (MLS) is, but is not limited to, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10 or SEQ ID NO: 11.

[0057] Thus, in one example, the construct as disclosed herein further comprises at least one localization sequence. In another example, the construct as disclosed herein comprises one or more localization sequences.

[0058] In terms of artificially generated fusion proteins, it is possible to attach various modifications, such as, for example, localization sequences, binding tags, selectable markers, optical markers and the like, to either the N-terminus, the C-terminus or both the N- and C-termini of a fusion peptide. This is possible even if in nature, for example, localization signals are usually found at the N-termini of proteins, as these are generally added towards the end of protein translation/expression. Therefore, the presently claimed construct can have one or more of said modifications at each terminus of the protein, provided the functionality of the modification is retained. That is, if, example, a localization signal is required to work in a biological setting in vitro, for example in protein overexpression, then the localization protein needs to be at the N-terminus of the protein, in accordance to its usual position in nature. The same can be said of other modifications, for example binding tags. Thus, in one example, the binding tag is located at either the N-terminus or the C-terminus of the construct, or at both ends of the construct. In another example, the binding tag is located at the N-terminus of the construct.

[0059] Protein or binding tags are peptide sequences which can be genetically added to the sequence of a recombinant protein prior to expression. Often, these tags are removable, and are intended to be so, by for example chemical agents or by enzymatic means, such as proteolysis or intein splicing, or by changing the physic-chemical environment of the protein, such as changing the pH value, certain solute concentrations in solution or a change of aqueous to non-aqueous solution. Binding tags are attached to proteins for various purposes, for example, but not limited to, purification via affinity, chromatographic purification, solubilization, detection (optical, immunological or otherwise), protein binding assays or to allow certain modifications of the protein, for example enzymatic modifications, or chemical modifications. Such binding tags may also be attached as multiples to the terminus of the protein in question, for example a single His-tag (HIS) may also be used as a triple His-tag (3HIS) or a sextuple His-tag (6HIS). Thus, in one example, the construct as described herein comprises a binding tag. In another example, the binding tag is, but is not limited to, a V5 epitope tag, a FLAG tag, a tandem FLAG-tag, a triple FLAG tag (3FLAG), a Human influenza hemagglutinin (HA) tag, a tandem HA tag, a triple HA tag (3HA), a sextuple Histidine tag (6HIS), biotin, c-MYC, a Glutathione-S-transferase (GST) tag, a Strep-tag, a Strep-tag II, an S-tag (a peptide derived from pancreatic ribonuclease A (RNase A)), a natural histidine affinity tag (HAT), a Calmodulin-binding peptide (CBP) tag, a Streptavidin-binding peptide (SBP) tag, a Chitin-binding domain, a Maltose-binding protein (MBP) or derivatives thereof. In one example, the construct comprises a V5 epitope tag. In another example, the V5 epitope tag sequence is SEQ ID NO: 12 or derivatives thereof.

[0060] In one example, the construct, as disclosed herein, includes a self-cleaving peptide. Self-cleaving peptides, first discovered in picornaviruses, are peptides of between 19 to 22 amino acids in length and are usually found between two proteins in some members of the picornavirus family Using self-cleaving proteins, picornaviruses are capable of producing equimolar levels of multiple genes from the same mRNA. Having said that, such self-cleaving proteins are known to be found in other species of viruses and a person skilled in the art, based on the information provided herein, will be readily able to determine a suitable substitution for the self-cleaving protein disclosed herein, if required. The term self-cleaving, as used in the art, is not entirely accurate, as, without being bound by theory, these self-cleaving peptides are thought to function by inducing the ribosome to skip the synthesis of a peptide bond at the C-terminus of a 2A element, leading to separation between, for example, the end of the 2A sequence and the next peptide downstream. The cleavage of the peptide occurs between the glycine and proline residues found on the C-terminus of the resulting peptide, meaning the upstream cistron will have a few additional residues added to the end, while the downstream cistron will start with the proline residue. Thus, in one example, the construct as described herein comprises a self-cleaving peptide. In another example, the self-cleaving peptide is, but is not limited to, a 2A self-cleaving peptide. In another example, the 2A self-cleaving peptide is SEQ ID NO: 13 or derivative thereof.

[0061] As used herein, the term selectable marker refers to a marker that can be added to the peptide in question for selection purposes. The type of detection required would then dictate the type of marker that may be used. Thus, in one example, the construct as described herein comprises a selectable marker. In another example, the selectable marker is, but is not limited to, an imaging marker, a cell-surface marker, an antibiotic, an antibiotic resistance marker or derivatives thereof.

[0062] For example, if it is required to optically select the peptide in question, one choses an optical marker or an imaging marker, that is a marker that is capable of optical detection. Examples of such an optical or imaging marker are, but are not limited to, green fluorescent protein (GFP), enhanced green fluorescent protein (eGFP), superfold green fluorescent protein, red fluorescent protein (RFP), mCherry, orange fluorescent protein (OFP), cyan fluorescent protein (CFP), enhanced cyan fluorescent protein (eCFP), Cerulean, enhanced blue fluorescent protein (eBFP), yellow fluorescent protein (YFP), enhanced yellow fluorescent protein (eYFP), Venus, far-red fluorescent protein or derivatives thereof. If selection via, for example, resistance to a certain compound is required, an antibiotic resistant marker can be included in the peptide. Examples of such an antibiotic resistant marker are, but are not limited to, a drug-resistant cassette for puromycin, a drug-resistant cassette for blasticidin, a drug-resistant cassette for zeocin, a drug-resistant cassette for G418, a drug-resistant cassette for hygromycin B, a drug-resistant cassette for ampicillin, a drug-resistant cassette for kanamycin, a drug-resistant cassette for chloramphenicol, and derivatives thereof. Such selection markers are usually added to the genetic sequence for the protein in question and are therefore expressed concurrently when the protein is expressed.

[0063] A cell-surface marker is a protein that is usually found on the surface of the cell, which can be used to characterize a cell type and/or differentiate between different cell (sub)types. Such cell-surface markers can also include glycoproteins. One example of cell-surface markers are proteins that are named after the so-called cluster of differentiation. This cluster of differentiation is used to catalogue the various epitopes (hence, proteins) present on a cells surface, which are used as targets for, for example, monoclonal antibodies. The epitopes are then numbered and named CDX, with the X denoting a running catalogue number. Therefore, it is possible to positively identify a various cell types using one or more CD markers. In one example, the cell-surface marker is, but is not limited to, CD3, CD4, CD8, CD11a, CD11b, CD14, CD15, CD16, CD19, CD20, CD22, CD24, CD25, CD30, CD31, CD34, CD38, CD56, CD61, CD91, CD117, CD45, CD114, CD182, Foxp3 or derivatives thereof.

[0064] The present disclosure describes constructs, the general formula of which is according to formula I as shown below:

##STR00001##

wherein the alphabets denote positions within the peptide sequence. In one example, A is absent, or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the binding tag. In another example, B is the localization sequence, or derivatives thereof, or the binding tag, or absent. In another example, C.sub.1 and C.sub.2 are each independently any one of the localization sequences or derivatives thereof, or the mutated hormone binding domains of the estrogen receptor (ERT2). In yet another example, in the event that C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), then C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2). In another example, C.sub.2 is absent. In a further example, X is CRISPR-associated endonuclease or a derivative thereof. In yet another example, D is a mutated hormone binding domain of the estrogen receptor (ERT2), or the localization sequences or derivatives thereof. In one example, E is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the self-cleaving peptide. In another example, F is absent or is the self-cleaving peptide, or the selectable marker. In yet another example, G is absent or is the selectable marker.

[0065] In the above structure, the terms L.sup.1 to L.sup.8 denote linker sequences between the positions within the peptide sequence. In one example, any of the linker sequences L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 or L.sup.8 are absent. In another example, one or more of the linker sequences L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 or L.sup.8 are absent. In yet another example, the linker sequences are between 1 to 5, between 4 to 8, between 5 to 10, between 10 to 20, between 20 to 25 or 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids in length.

[0066] A peptide can comprise natural amino acids, unnatural amino acids, or a combination of both unnatural and natural amino acids. As used herein, the term natural amino acid refers to proteinogenic amino acids, which are amino acids that are precursors to proteins. These amino acids are assembled during translation to result in a nascent protein. Presently, there are 23 proteinogenic amino acids known, 20 of which are found in the standard genetic code, along with an additional 3 amino acids (selenocysteine, pyrrolysine and N-formylmethionine) that can be incorporated into the peptide using special translation mechanisms. Humans are capable of synthesizing 12 of these from each other or from other molecules of intermediary metabolism. The other nine must be consumed (usually as their protein derivatives), and so they are called essential amino acids. The essential amino acids are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine (i.e. H, I, L, K, M, F, T, W, V). Unnatural, that is non-proteinogenic amino acids, are amino acids that are not naturally encoded or that are not found in the genetic code of any organisms. These unnatural amino acids, however, can be found in, for example, as intermediates in biosynthesis, post-translationally incorporated into protein, as components of, for example bacterial cell walls, neurotransmitters and toxins, and for example in natural and man-made pharmacological compounds. Thus, in one example, the linker sequences comprise natural or unnatural amino acids, or combinations of both. In another example, one or more, or all of the linker sequences comprise the amino acids A, E, G, P, S and T. In yet another example, one or more, or all of the linker sequences consist of the amino acids A, E, G, P, S and T. In one example, in the event that the linker sequence is absent, the neighbouring substituents then are bound by a peptide bond. In another example, the linker sequence L.sup.1 is any one of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG or TGGGS. In another example, the linker sequence L.sup.2 is absent or, independently, any one of PRGGS, GGSPRGGS, PR, GGSPRGGS or TPGGPRGGS. In another example, the linker sequence L.sup.3 is any one of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA or GASGS. In yet another example, the linker sequence L.sup.4 is GGGS or absent. In a further example, the linker sequence L.sup.5 is any one of PAG or PAGGGS. In yet another example, the linker sequence L.sup.6 is GA or absent.

[0067] In the present disclosure, the terms polypeptide, peptide, and protein are used interchangeably. As used herein, the term peptide thus refers to a chain of amino acids which are connected via amide bonds. When the amino acids are alpha-amino acids, either the L-optical isomer or the D-optical isomer can be used, the L-isomers being preferred in nature. The term polypeptide or protein as used herein encompasses any amino acid sequence and includes, but may not be limited to, modified sequences such as glycoproteins. The term polypeptide is specifically intended to cover naturally occurring proteins, as well as those that are recombinantly or synthetically produced.

[0068] In one example, the structure is according to formula I, wherein A is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the binding tag; wherein B is the localization sequence or derivatives thereof, or the binding tag, or absent; wherein C.sub.1 and C.sub.2 are each independently any one of the localization sequences or derivatives thereof, or the mutated hormone binding domains of the estrogen receptor (ERT2); wherein when C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); or wherein C.sub.2 is absent; wherein X is CRISPR-associated endonuclease or a derivative thereof; wherein D is a mutated hormone binding domain of the estrogen receptor (ERT2), or the localization sequence or derivatives thereof; wherein E is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the self-cleaving peptide; wherein F is absent or is the self-cleaving peptide, or the selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 and L.sup.8 are linker sequences; wherein any of the linkers L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 or L.sup.8 are absent; wherein the linkers sequences are between 1 to 5, between 4 to 8, between 5 to 10, between 10 to 20, between 20 to 25 or 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids long; wherein the linker sequences comprise the natural or unnatural amino acids; wherein the linker sequences comprise the amino acids A, E, G, P, S and T; wherein the linker sequences consist of the amino acids A, E, G, P, S and T; wherein if undefined, the linker sequence is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is any one of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG or TGGGS; wherein L.sup.2 is absent or any one of PRGGS, GGSPRGGS, PR, GGSPRGGS or TPGGPRGGS; wherein L.sup.3 is any one of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA or GASGS; wherein L.sup.4 is GGGS or absent; wherein L.sup.5 is any one of PAG or PAGGGS; wherein L.sup.6 is GA or absent, wherein L.sup.7 and L.sup.8 are independently selected from the linkers as disclosed in any of L.sup.1 to L.sup.6. In one example, A is absent. In another example, A is a mutated hormone binding domain of the estrogen receptor (ERT2). In yet another example, A is a binding tag.

[0069] In one example, B is the binding tag. In another example, B is a localization sequence.

[0070] In one example, C.sub.1 is the localization sequence. In another example, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2).

[0071] In one example, C.sub.2 is absent.

[0072] In one example, D is the localization sequence. In another example, D is a mutated hormone binding domain of the estrogen receptor (ERT2).

[0073] In one example, E is the self-cleaving peptide. In another example, E is the mutated hormone binding domain of the estrogen receptor (ERT2). In yet another example, E is absent.

[0074] In one example, F is the selectable marker. In another example, F is the self-cleaving peptide.

[0075] In one example, G is absent. In another example, G is the selectable marker.

[0076] In one example, X is the CRISPR-associated endonuclease or derivative thereof.

[0077] In a further example, A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent. In yet another example, A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent. In a further example, A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent. In yet another example, wherein D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent. In a further example, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker. In one example, D and E are each one mutated hormone binding domain of the estrogen receptor (ERT2). In another example, D is the mutated hormone binding domain of the estrogen receptor (ERT2) and E is absent. In yet another example, A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent. In a further example, A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent. In yet another example, A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker. In one example, A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker. In another example, A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker.

[0078] In another example, the construct comprising the following formula (II):

##STR00002##

wherein A is absent; wherein B is a localization sequence or derivatives thereof, or the binding tag;

[0079] wherein both C.sub.1 and C.sub.2 are present or only C.sub.1 is present; wherein C.sub.1 and C.sub.2 are each independently selected from the group consisting of the localization sequence, derivatives thereof of the localization sequence, and a mutated hormone binding domain of the estrogen receptor (ERT2); wherein when C is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); wherein X is a CRISPR-associated endonuclease or a derivative thereof; wherein D is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2), the localization sequence and derivatives of the localization sequence; wherein E is absent or is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2) and a self-cleaving peptide; wherein F is absent or is selected from the group consisting of the self-cleaving peptide, the mutated hormone binding domain of the estrogen receptor (ERT2) and a selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 and L.sup.8 are linker sequences; wherein at least one of the linker sequences is present; wherein each of the linkers sequences is independently between 1 to 25 amino acids long; wherein each linker sequence independently comprises natural or unnatural or a mixture of natural and unnatural amino acids; wherein the linker sequences comprise the amino acids A, E, G, P, S and T; wherein, if any one or more of the linker sequences of L.sup.1 to L.sup.8 is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is selected from the group consisting of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG, TGPGGS, TGPGGSAGDTTGPGGS and TGGGS; wherein L.sup.2 is selected from the group consisting of PRGGS, GGSPRGGS, PR, GGSPRGGS and TPGGPRGGS; wherein L.sup.3 is selected from the group consisting of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA and GASGS; wherein L.sup.4 is GGGS; wherein L.sup.5 and L.sup.7 are independently PAG, SGS or PAGGGS; wherein L.sup.6 is GA; wherein L.sup.8 is selected from the linkers as disclosed in any of L.sup.1 to L.sup.6.

[0080] In one example, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent. In another example, D is a localization sequence and E and F are each a mutated hormone binding domain of the estrogen receptor (ERT2). In yet another example, A is absent, B is localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence, E is a mutated hormone binding domain of the estrogen receptor (ERT2) and F is absent. In a further example, B is a localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2). In another example, B is a localization sequence, C.sub.1 and C.sub.2 are each independently a mutated hormone binding domain of the estrogen receptor (ERT2), X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2).

[0081] In one example, the construct, as disclosed herein, has at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to any one of SEQ ID NOs: 15 to 74. In another example, the construct, as disclosed herein, has a sequence identity of between 80% to 95% to any one of SEQ ID NOs: 15 to 74. In yet another example, the construct has a sequence identity of at least 90% to any one of SEQ ID NOs: 15 to 74.

[0082] As used herein, the term variant includes a reference to substantially similar sequences. Generally, nucleic acid sequence variants of the invention encode a polypeptide which retains qualitative biological activity in common with the polypeptide encoded by the non-variant nucleic acid sequence. Generally, polypeptide sequence variants of the invention also possess qualitative biological activity in common with the non-variant polypeptide. Further, these polypeptide sequence variants may have at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the non-variant peptide. Variants may be made using, for example, the methods of protein engineering and site-directed mutagenesis as is well known in the art. Further, a variant peptide or protein may include analogues, wherein the term analogue, as used herein, with reference to a peptide, means a peptide which is a derivative of a peptide of the invention, whereby the term derivative comprises a polypeptide that has addition, deletion, substitution of one or more amino acids compared to the non-variant peptide, such that the polypeptide retains substantially the same function as the non-variant peptide. The substitution may be one or more conservative amino acid substitutions. The term derivative or derivation also refer to compounds other than amino acids, which have been modified from the original compound. In some example, these derivatives retain the same or have increased desired function. In regards to chemical compounds, the term derivative refers to a chemical substance derived from another substance, either directly or by modification or partial substitution. In this case, chemical derivatives but do not necessarily retain their original function. The term conservative amino acid substitution as used herein refers to a substitution or replacement of one amino acid for another amino acid with similar properties within a peptide chain (primary sequence of a protein). For example, the substitution of the charged amino acid glutamic acid (Glu) for the similarly charged amino acid aspartic acid (Asp) would be a conservative amino acid substitution. Conservative amino acid substitution tables providing functionally similar amino acids are well known to one of ordinary skill in the art. The following six groups are examples of amino acids that are considered to be conservative substitutions for one another:

1) Alanine (A), Serine (S), Threonine (T);

[0083] 2) Aspartic acid (D), Glutamic acid (E);

3) Asparagine (N), Glutamine (Q);

4) Arginine (R), Lysine (K);

5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and

6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).

[0084] A non-conservative amino acid substitution can result from changes in: (a) the structure of the amino acid backbone in the area of the substitution; (b) the charge or hydrophobicity of the amino acid; or (c) the bulk of an amino acid side chain. Substitutions generally expected to produce the greatest changes in protein properties are those in which: (a) a hydrophilic residue is substituted for (or by) a hydrophobic residue; (b) a proline is substituted for (or by) any other residue; (c) a residue having a bulky side chain, e.g., phenylalanine, is substituted for (or by) one not having a side chain, e.g., glycine; or (d) a residue having an electropositive side chain, e.g., lysyl, arginyl, or histadyl, is substituted for (or by) an electronegative residue, e.g., glutamyl or aspartyl.

[0085] As used herein, the term mutation or grammatical variants thereof, in general, relates to an altered genetic sequence which results in the gene coding for a non-functioning protein, or a protein with substantially reduced, or altered function. The term mutation also relates to a modification of the genome or part of a nucleic acid sequence of any biological organism, virus or extra-chromosomal genetic element, or any genetic element that has been included in the nucleic acid sequence of a fusion protein. The mutation can be performed by replacing one nucleotide by another in the nucleic acid sequence of any of the genetic elements, thus creating a different amino acid in the position where the nucleotide was replaced. The techniques in order to achieve such mutations are well known to a person skilled in the art. For example, the mutation can be induced artificially using, but not limited to, chemicals, PCR reactions, and radiation. When artificially created, in the context of the invention, a mutation is by extension, the replacement of an amino acid encoded by a given nucleic acid sequence to another amino acid in a nucleic acid sequence or a genetic element. Thus, the section of the construct, as disclosed herein, containing the full, unchanged sequences for, for example, the hormone binding domain of the estrogen receptor (ERT2), would be considered to contain the wild type hormone binding domain of the estrogen receptor (ERT2), while sections of the construct carrying a mutation in the hormone binding domain of the estrogen receptor (ERT2) are termed mutated hormone binding domain of the estrogen receptor (ERT2).

[0086] The present disclosed describes constructs for the expression of fusion proteins having the desired capability of genome engineering, that is genome editing. In order for such fusion proteins to be expressed, the constructs, as disclosed herein, need to be brought into a cell for protein expression. Thus, in one example, a host cell is transfected with the nucleic acid sequence as described herein, thereby resulting in the expression of the desired protein within the cell. In another example, the transfection is done via nucleofection or electroporation. In another example, the present disclosure describes a nucleic acid sequence encoding any one of the constructs as disclosed herein. In yet another example, there is disclosed a vector comprising the nucleic acid sequence of a construct as disclosed herein. In a further example, a host cell comprising the vector as disclosed herein is described. In one example, the host cell is a mammalian cell. In another example, the mammalian cell is, but is not limited to, mouse, horse, sheep, pig, cow, hamster or human. In another example, the host cell is bacterial.

[0087] Any or all of the components, as described herein, may be provided in the form of a kit. Thus, in one example, a kit comprising the construct as disclosed herein and an inducing agent is described. In another example, the kit comprises tamoxifen as an inducing agent, and/or a derivative thereof.

[0088] Described herein are also methods for using the claimed construct for genome editing. Thus, in one example, there is disclosed a method of editing a genome of a host cell using the construct as disclosed herein, wherein the host cell, comprising the nucleic acid sequence are as defined herein, is incubated with an inducing agent. Also disclosed herein is a method of editing a genome of a host cell using the construct as defined herein, wherein the method comprises transfecting the host cell with the nucleic acid sequence as defined herein; and incubating the cell with an inducing agent. IN another example, the transfection can be done using, for example, nucleofection, or electroporation.

[0089] The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms comprising, including, containing, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.

[0090] The invention has been described broadly and generically herein. Each of the narrower species and sub-generic groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.

[0091] Other embodiments are within the following claims and non-limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.

EXPERIMENTAL SECTION

[0092] The CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 system enables ready modification of the mammalian genome and has been used to generate single or multiplexed gene knockouts, introduce specific point mutations, or insert epitope tags. However, there is a lack of generalizable methods to rapidly control the activity of the Cas9 endonuclease.

[0093] Disclosed herein is the development of a Cas9 variant, whose activity can be switched on and off in mammalian cells, for example, human cells, using an inducing agent, for example the chemical tamoxifen. Fusions of the wildtype Cas9 enzyme with the mutated hormone-binding domain of the estrogen receptor (ERT2) were generated. Furthermore, these Cas9 variant were systematically engineered by varying the position of ERT2 relative to Cas9, altering the number of ERT2 copies at the N- or C-terminus of Cas9, and testing different linker lengths and compositions. The optimized Cas9 variant (iCas) shows minimal endonuclease activity in the absence of tamoxifen but exhibits high editing efficiencies at multiple loci when the inducing agent is added. The duration and concentration of the inducing agent, for example tamoxifen, were also tuned so as to eliminate off-target genome modification. Additionally, iCas was utilised to target the Wnt signalling pathway and demonstrated that genome modification and signalling perturbation occurred much more rapidly than an alternative system that relied on a doxycycline-inducible promoter to drive Cas9 expression. The results highlight the utility of iCas for tight spatiotemporal control of genome editing activity.

Initial Development of a Chemical-Inducible Cas9 Variant

[0094] Different fusions of the ERT2 domain with wildtype Cas9 derived from the bacterium Streptococcus pyogenes (FIG. 1b) were constructed and tested. ERT2 was placed at either the N- or C-terminus of Cas9 and the position of the nuclear localization signal (NLS) was also varied. Using HEK293 cells, the constructs were evaluated for editing activity with and without 1 M 4-hydroxytamoxifen (4HT) by targeting four distinct genomic locia coding exon of the EMX1 gene, an intron of PPP1R12C gene, and two separate sites within the promoter region of the VEGFA gene. As determined using the Surveyor cleavage assay, which provides an estimation of the amount of genome modifications present, it was found that only variant E, an ERT2-Cas9-ERT2 fusion, showed low editing activity in the absence of 4HT, but significantly higher editing when the chemical was present (P<0.05, Student's t-test) across all four targeted loci (FIG. 1c). Illumina deep sequencing technology was also used to quantify the percentage of insertions and deletions (INDELS) generated at the targeted sites by each construct and observed that addition of tamoxifen significantly increased the editing activity of variant E at all the targeted loci (P<0.05, Student's t-test), but did not have a consistent effect on the other variants tested (FIG. 1d). The results were further confirmed by Sanger sequencing of individual clones (FIG. 5). The difference in genome editing activity with and without tamoxifen was not due to a change in overall Cas9 protein levels, but was rather a result of a dramatic change in the amount of Cas9 present in the nucleus (FIG. 6). Taken together, these results indicate that the fusion of ERT2 domains to both the N- and C-terminus of Cas9 rendered the endonuclease activity of Cas9 dependent on tamoxifen by sequestering the enzyme in the cytoplasm in the absence of the inducer; upon addition of tamoxifen, the ERT2-Cas9-ERT2 fusion protein was able to translocate into the nucleus to perform its genome editing function.

Optimization of the ERT2-Cas9-ERT2 Architecture

[0095] All the initial fusion variants tested showed some background activity without tamoxifen, especially at the EMX1 exonic site and one of the VEGFA promoter sites. Hence, it was sought to develop the conditional genome editing system further. First, the lengths and amino acid compositions of the protein linkers between each ERT2 domain and the Cas9 enzyme were varied. Linker lengths that were tested ranged from 2 to 20 amino acids and the main focus was on the linker composition primarily of six amino acids (A, E, G, P, S, and T), which had previously been reported to be ideal for generating open flexible loops, and therefore polypeptides in stable conformations. Second, since the size of Cas9 is around four times that of Cre (160 kDa versus 40 kDa), it was reasoned that more copies of the ERT2 domain may be required to fully control the cellular localization and subsequent activity of the Cas9 nuclease. Thus, different copy numbers of ERT2 at either the N- or C-terminus of Cas9 were tested. In total, 30 variants with distinct configurations (FIG. 2a and Table 1) were further analysed. The variants were classified into four separate groups based on how they differed from the initial ERT2-Cas9-ERT2 fusion.

[0096] To assay the activities of all the Cas9 variants, a green fluorescent protein (GFP) disruption assay was employed, whereby cleavage and erroneous repair of a constitutively expressed GFP gene in HEK293 cells causes a loss of fluorescence signal which can be detected by flow cytometry (FIG. 7). Two different sgRNAs were used to target non-overlapping regions of the GFP gene. For comparison, the original ERT2-Cas9-ERT2 fusion (variant E) and the wildtype Cas9 enzyme were included, which provided an estimation of the maximum possible reduction in fluorescence signal. It was observed that cells transfected with the wildtype Cas9 enzyme showed a high reduction in GFP intensity regardless of whether 4-hydroxytamoxifen was present or absent (FIG. 2b). In contrast, all the tested variants exhibited an increased reduction of fluorescence signal upon 24 hours of 4-hydroxytamoxifen treatment. It was also observed that most of the variants showed some loss of fluorescence signal, even without the presence of tamoxifen, suggesting activity leakage. The variants that showed the least leakage belonged to Group 3 and Group 4, which contained two copies of ERT2 on the C-terminus of Cas9.

[0097] To confirm the results of the GFP disruption experiments, the T7 endonuclease I Surveyor assay was performed to detect genome modifications (FIG. 2c and FIG. 8b) and also analysed the mutation landscape by Illumina deep sequencing (FIG. 2d, FIG. 8c and FIG. 9) using EMX1 as the test genomic locus. Consistent with the flow cytometry-based studies, it was found that varying the linker length or composition alone generally did not improve the performance of the inducible system. Instead, increasing the copy number of ERT2 domains, particularly on the C-terminus of Cas9, resulted in an overall level of background activity that was not significantly different from the control plasmid that did not express a sgRNA (no sgRNA). The fusion of additional ERT2 domains did not inactivate the Cas9 enzyme, as all the tested variants showed an increase in the amount of genome modifications, e.g. insertions and deletions (INDELs), upon 1 M 4-hydroxytamoxifen treatment, as determined by the Surveyor assay or by deep sequencing.

[0098] Next, all data was examined together to identify the best performing variants. The rank orders of the Cas9 variants in at least two out of the three assays agreed well with one another (P<0.05, Kolmogorov-Smirnov test) (FIG. 10). Notably, 8 out of the 30 Cas9 variants demonstrated a consistently lower level of background activity than the original ERT2-Cas9-ERT2 fusion (variant E) across all experiments (FIG. 11). However, only three of these (variants 27, 29, and 30), all of which were from group 4, showed consistent and robust editing activity upon induction (FIG. 2d). Hence, variants 27, 29, and 30 were pursued further, as these gave a high percentage of genome modifications with 4-hydroxytamoxifen but a low percentage of INDELS without 4-hydroxytamoxifen.

Characterization and Performance of iCas Under Different 4-hydroxytamoxifen Treatment Regimes

[0099] In previous experiments, HEK293 cells had been transfected with the relevant plasmids, incubated for 24 hours, and then treated the cells with 1 M tamoxifen for another 24 hours. However, as the amount of Cas9 in the cell has to be tightly controlled, it was sought to ascertain the behaviour of the optimized Cas9 variants under various treatment conditions, because insufficient Cas9 will give rise to inefficient cleavage of the target genomic locus, while excess Cas9 may lead to unintended non-specific cleavage of off-target sites. Hence, the aim was to ascertain the behaviour of the optimized Cas9 variants under a range of tamoxifen treatment conditions, which would in turn determine the level of nuclease activity in the cell.

[0100] Three different concentrations of 4-hydroxtamoxifen (10 nM, 100 nM, and 1000 nM) and six durations of chemical treatment (2 hours, 4 hours, 6 hours, 8 hours, 16 hours, 24 hours, and 48 hours) were tested for variants 27, 30, and 29. The amount of genome modification at the EMX1 locus was quantified using the Surveyor assay (FIG. 3a and FIG. 12a). Cleavage activity was detected within 4 hours of 4-hydroxytamoxifen treatment for all the three variants, showing an increasing trend with longer treatment durations, which was further confirmed by deep sequencing (FIG. 12b) and appearing to plateau at around 8 hours. Notably, owing to its higher sensitivity, deep sequencing also revealed a low level of DNA editing after just 2 hours of 4-hydroxytamoxifen treatment. Additionally, it was found that 4-hydroxytamoxifen yielded a significantly lower level of nuclease activity at 10 nM than at 100 nM or 1,000 nM (P<0.005, Wilcoxon rank-sum test) (FIG. 13). Hence, either 100 nM or 1,000 nM of 4-hydroxytamoxifen was used in all subsequent experiments.

[0101] A key performance measure of an inducible system is whether the system exhibits any background activity in the absence of the inducer. Surveyor assay showed a low amount of genome modification at the EMX1 locus for all three variants without 4-hydroxytamoxifen treatment (0 nM). Leaky activity per se was observed only at the last time point (48 hours) for Variant 30 (FIG. 3a; FIG. 12b). From deep sequencing, leaky activity was first detected at 6 hours, 2 hours, and 16 hours for variants 27, 29, and 30, respectively (FIG. 12b) Subsequently, it was tested whether the three variants displayed any leaky activity at six other endogenous genomic loci, namely two sites in the promoter region of the VEGFA gene, two distinct sites in the intron of the WAS gene, one site in an intron of the TAT gene, and one site in the coding region of the FANCF gene. Genomic DNA was isolated 24 hours after transfection without any tamoxifen treatment and analysed using the Surveyor assay (FIG. 3). Consistent with the EMX1 results, a low amount of genome modification was observed at four loci for variant 27 and at two loci for variant 29. No cleavage bands were detected for variant 30 in the absence of the inducer. Additionally, the leakiness in activity observed for variants 27 and 29 became more pronounced over time (FIG. 14).

[0102] At 24 hours after transfection with a FANCF-targeting plasmid, cells were treated with or without tamoxifen for another 24 hours before genomic DNA was isolated and analysed by the Surveyor assay (FIG. 10). Although strong cleavage bands were observed when the cells had been exposed to 100 nM or 1000 nM tamoxifen, respectively, for all the three Cas9 variants, it was also possible to detect an increase in genome modification for variant 27 and variant 29 in the absence of tamoxifen. Again, no cleavage bands were observed for variant 30 without tamoxifen treatment.

[0103] To verify the results from the Surveyor assays and deep sequencing experiments, immunohistochemical staining was performed to determine the subcellular localization of the three variants, all of which contained two copies of ERT2 at both termini of the enzyme ((ERT2)2-Cas9-(ERT2)2), with or without 1 M 4-HT. 24 hours after transfection with plasmids carrying a Cas9 variant and a sgRNA targeting the EMX1, VEGFA, FANCF, WAS, or TAT genomic locus, the cells were either fixed immediately and stained with anti-V5 or were subjected to 6 h or 24 h 4-HT treatment before fixation and staining (FIG. 15). The percentage of cells that showed a nuclear localization of (ERT2)2-Cas9-(ERT2)2 was quantified (FIG. 3b). For all three Cas9 variants, it was observed that addition of 4-hydroxytamoxifen led to a significant increase in the percentages of cells exhibiting a nuclear localization of (ERT2)2-Cas9-(ERT2)2 (P<0.05, Student's t-test). Most of the protein translocation occurred within the first 6 hours of 4-hydroxytamoxifen treatment. Importantly, in the absence of 4-hydroxytamoxifen, cells that were transfected with variant 30 showed significantly less nuclear localization of (ERT2)2-Cas9-(ERT2)2 than cells that were transfected with variant 27 or variant 29 (P<0.05, Student's t-test). Collectively, these data indicated that variant 30 had less background activity than variants 27 and 29 across multiple loci, thereby suggesting that variant 30 could be used for precise control of genome editing. Hence, all subsequent experiments were performed with variant 30, hereafter referred to as iCas.

[0104] It was sought to test the robustness of iCas by using it to target the VEGFA promoter as well as the WAS, TAT, and FANCF genes for different durations of 1 M 4-HT treatment (2 hours, 4 hours, 6 hours, 8 hours, 16 hours, and 24 hours). Consistently, the Surveyor assay showed nuclease activity within 4 hours of 4-hydroxytamoxifen treatment for all loci tested (FIG. 3c). The editing activity continued to increase with longer treatment durations. Additionally, iCas showed similarly fast responses to 4-hydroxytamoxifen in different human cell lines (FIG. 16), including the cancer cell lines MCF7, DLD1, and HCT116. These results indicate that iCas is a robust inducible genome-editing system in mammalian cells.

Specificity of iCas at Endogenous Off-Target Sites

[0105] To assess the DNA cleavage specificity of iCas, the modification of known Cas9 off-target sites of the EMX1, VEGFA, FANCF, WAS, and TAT sgRNAs was measured. Twenty-four hours after transfection, HEK293 cells were treated with 1 M 4-hydroxytamoxifen for different durations (4 hours, 6 hours, 8 hours, 16 hours, and 24 hours) and used the Surveyor assay to assess editing activity at each off-target site (FIG. 3d and FIG. 17). Overall, cleavage at off-target sites tended to emerge later than at the corresponding on-target sites, or it occurred at lower levels, which was further confirmed by deep sequencing (FIG. 18). Nevertheless, the sgRNAs tested could be divided into three groups. In the first group, the sgRNAs were highly specific for their intended target (FIGS. 17a and 18a). For EMX1, the iCas system did not yield any measurable cleavage at the two off-target sites tested, but wild-type Cas9 produced off-target modifications as described previously (FIG. 19). In the second group, the sgRNAs were moderately specific, as exemplified by the TAT sgRNA (FIGS. 17b and 18b). Here, the optimal time window of 4-hydroxytamoxifen treatment for minimizing off-target effects appeared to be around 4 to 8 hours. In the third group, the sgRNAs were unspecific, and genome modifications could be detected at on-target and off-target sites at approximately the same time (FIGS. 17c and 18c). For these sgRNAs, it was not possible to tune the duration of chemical treatment to obtain the desired target genome modification without considerable off-target editing. Collectively, the data showed that limiting Cas9 activity is generally a viable strategy to improve the specificity of the endonuclease at most but not all genomic loci.

Comparison of iCas with a Promoter-Based Approach

[0106] As different methods may be adopted for inducible genome editing, iCas was compared with an alternative strategy whereby the wild-type Cas9 enzyme was expressed under a doxycycline (dox)-inducible promoter (P.sub.TRE3G-Cas9). To this end, a previously reported STF3A cell line that carries a Wnt-responsive luciferase reporter and also strongly expresses a Wnt ligand was used, thereby giving high reporter activity. It was reasoned, without being bound by theory, that if -catenin, a key signal transducer in the Wnt pathway, was inactivated, luciferase expression would be reduced considerably. Thus, it was sought to use iCas or PTRE3G-Cas9 to knock out CTNNB1, which encodes -catenin, and to determine how rapidly each conditional system could perturb Wnt signalling upon induction. Firstly a gene encoding the Tet-On 3G transactivator, which binds to and activates expression from PTRE3G in the presence of doxycycline, was stably integrated into the STF3A cell line (FIG. 20a) and verified the functionality of the engineered (STF3A-Tet-On) cells (FIG. 20b). Next, iCas or P.sub.TRE3G-Cas9 was used to target the second coding exon of CTNNB1 near the ATG start codon. 24 hours after transfection, cells were treated with 1 M 4-hydroxytamoxifen or 1 g/ml doxycycline for 6 hours, 12 hours, or 24 hours. The cells were then harvested for analysis using the Surveyor assay. iCas was consistently able to modify the target locus within 6 hours of 4-hydroxytamoxifen treatment, and the INDEL frequency increased with longer exposures to 4-hydroxytamoxifen (FIG. 4a). No cleavage bands were observed in the absence of 4-hydroxytamoxifen at any time point. However, for the P.sub.TRE3G-Cas9 system, cleavage bands were only observed after the cells were exposed to doxycycline for 24 hours.

[0107] To demonstrate the impact of genome modification at the CTNNB1 locus, luciferase assays were performed on the STF3ATet-On cell line after transfection with iCas or P.sub.TRE3G-Cas9. Cells were treated for 6 hours with the respective chemical and then harvested after another 72 hours to allow sufficient time for changes in -catenin or luciferase protein levels. It was verified that both the transcript and protein levels of -catenin were downregulated in cells co-transfected with iCas and an CTNNB1-targeting sgRNA (FIG. 21). Consequently, a significant decrease in luciferase activity was observed in these cells (P<0.001, Student's t-test) (FIG. 4b). In contrast, there was no significant change in -catenin expression or luciferase activity in cells transfected with an EMX1-targeting sgRNA or P.sub.TRE3G-Cas9. Additionally, the expression profiles of known Wnt target genes paralleled the results from the luciferase assays (FIG. 4c and FIG. 22). Collectively, this data highlights the iCas system's advantage in speed over an alternative inducible-promoter approach in temporal control of genome-editing activity.

Benchmarking Different Post-Translational Control Systems

[0108] Two other chemical-inducible strategies that rely on post-translational control were recently reported, and it was sought to benchmark iCas against these other strategies. The best-performing intein-Cas9 and split-Cas9 constructs from these studies were cloned into the same plasmid backbone as iCas, and all experiments were performed side by side in HEK293 cells to ensure a fair comparison. The iCas and intein-Cas9 systems were induced with 1 M 4-hydroxytamoxifen and the split-Cas9 system with 200 nM rapamycin, on the basis of published reports. For the comparison, the EMX1, TAT, and WAS genomic loci were targeted with or without the appropriate inducer. Different durations of chemical treatment were tested, and the extent of genome modification was measured by the Surveyor assay (FIG. 5a and FIG. 23a) and by deep sequencing (FIG. 24a). Overall, without the inducer, the split-Cas9 architecture showed the lowest level of background activity, and iCas and intein-Cas9 had comparable levels of leakiness. However, with the inducer, iCas consistently showed higher cleavage efficiency than intein-Cas9 and split-Cas9, at all time points and at all genomic loci. Notably, the amount of INDELs produced by active iCas was 1.6- to 4.8-fold higher than those produced by the reassembled split-Cas9 complex. Hence, without being bound by theory, the lower background observed in split-Cas9 appeared to be a consequence of an overall reduction in editing activity. Next, the switching ratio was calculated, which is defined as the extent of genome modification with the relevant inducer divided by the extent of genome modification without the inducer (FIG. 5b and FIGS. 23b and 24b). Overall, the iCas system and the split-Cas9 architecture produced similar switching ratios. However, in the Surveyor assay, iCas showed significantly higher ratios than intein-Cas9 at the EMX1 and WAS loci (P<0.1, Student's t-test), and in deep sequencing it showed significantly higher switching ratios than intein-Cas9 at all tested loci (P<0.05, Student's t-test). These results suggest that iCas is turned on more efficiently than intein-Cas9 upon addition of 4-hydroxytamoxifen.

[0109] Besides single gene targeting, the ability of iCas to perform multiplex genome engineering was compared with that of intein-Cas9 or split-Cas9. HEK293 cells were co-transfected with a sgRNA targeting EMX1 and another sgRNA targeting a coding exon of ADAR1 (ADAR), and subsequently the extent of genome modification was analysed by the Surveyor assay. After 12 hours of chemical treatment, it was observed that iCas generated INDELs at both the EMX1 and ADAR1 genomic loci (FIG. 5c). In contrast, intein-Cas9 and split-Cas9 did not produce detectable cleavage at any of the targeted locus. Additionally, after 24 hours of chemical treatment, iCas produced more genome modification than at 12 hours, and intein-Cas9 was also able to edit both the EMX1 and ADAR1 loci (FIG. 5d). However, split-Cas9 still did not edit any of the targeted genes. These results were further confirmed with a different pair of sgRNAs (FIG. 40c). Collectively, this data highlights the advantage of iCas over intein-Cas9 and split-Cas9 in performing conditional multiplex genome editing.

Repeated Toggling of iCas Activity

[0110] In principle, a conditional system such as iCas should allow users to generate stable cell lines and induce its activity whenever needed. To demonstrate this, retroviral transduction was used to establish a HEK293 cell line that stably expresses iCas (HEK293-iCas cells). The cell line was verified to be functional (FIG. 25a) and monitored the intracellular localization of iCas by immunofluorescence (FIG. 6a). Without 4-hydroxytamoxifen, most cells produced an iCas protein that was localized in the cytoplasm; only 15% of the cells contained nuclear-localized protein. However, upon 24 hour treatment with 4-hydroxytamoxifen, the proportion of cells with nuclear-localized protein increased significantly, to 48% (P<0.001, Student's t-test). The inducer was then washed away and the cells immunostained with anti-V5 antibody at 48 and 72 hours after removal of 4-hydroxytamoxifen. Quantification of microscopy images showed that by 72 hours, the percentage of cells with nuclear-localized protein had decreased to a level that was not significantly different from that of the pre-induction state (FIG. 6b).

[0111] Subsequently, the possibility of toggling the activity of iCas was explored (FIG. 25b). After 1 M 4-hydroxytamoxifen treatment of HEK293 cells co-transfected with iCas and a first sgRNA targeting the WAS locus, the inducer was removed and 72 hours waiting time was allowed to pass, thereby allowing nuclear-localized iCas protein to exit the nucleus before introducing a second sgRNA targeting a coding exon of ASXL2. The cells were then either treated with 4-hydroxytamoxifen a second time or left untreated. From the Surveyor assay, cleavage activity was readily observed at both targeted loci for cells that were treated twice with the inducer (FIG. 6c); however, cleavage was detected only at the WAS locus in cells that were exposed to 4-hydroxytamoxifen after the first transfection but not after the second transfection, indicating that iCas was successfully switched off after the first induction event. Hence, these results show that iCas is a reversible genome-editing system.

Methods

Cell Culture and Transfection

[0112] All cell lines were cultured in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% FBS, 2 mM L-Glutamine and 1% penicillin/streptomycin. Transfection was performed in 12-well plates at around 70% cell confluency using either Turbofect (Thermo Scientific) or Lipofectamine 2000 (Life Technologies), according to manufacturers' instructions. When necessary, cells were treated with varying concentrations of 4-hydroxytamoxifen (Sigma Aldrich).

PCR and Mutagenesis

[0113] All oligonucleotides for PCR and mutagenesis reactions were purchased from Integrated DNA Technologies (IDT). PCR was performed with MyTaq DNA Polymerase (BioLine), Phusion High-Fidelity DNA Polymerase (New England Biolabs), or Q5 High-Fidelity DNA Polymerase (New England Biolabs). For MyTaq, the following cycling parameters were used: 95 C. for 3 minutes, followed by 35 cycles of (95 C. for 30 seconds, 60 C. for 30 seconds, and 72 C. for 30 seconds), and then 72 C. for 2 minutes. For Phusion and Q5, the following cycling parameters were used: 98 C. for 3 minutes followed by 40 cycles of (98 C. for 15 seconds, 63 C. for 30 seconds, and 72 C. for 30 seconds), and then 72 C. for 2 minutes. Mutagenesis was performed using QuikChange Lightning Site-Directed Mutagenesis kit (Agilent Technologies) according to manufacturer's instructions, in order to incorporate novel restriction sites or DNA linker fragments into the CRISPR-Cas9 variant plasmids. Mutagenic primers were designed using the QuikChange Primer Design Tool (http://www.genomics.agilent.com/primerDesignProgram.jsp).

Construction of Cas9 Variants

[0114] The GeneArt CRISPR nuclease vector (Life Technologies), which contains a human codon-optimized Streptococcus pyogenes Cas9 enzyme with a V5 epitope tag, was used as the wildtype Cas9 expression plasmid. The ERT2 domain was isolated using PCR from the pCAG-ERT2-Cre-ERT2 plasmid (Addgene #13777) and cloned into the pCR-BluntII-TOPO vector (Life Technologies). Different linkers and restriction sites were added using the QuikChange Lightning kit (Agilent Technologies). Each of the modified ERT2 fragment was flanked with either AgeI and SfoI or EcoRI and XbaI cut sites for cloning into the N- or C-terminus of Cas9 respectively. All Cas9 variants were confirmed by Sanger sequencing.

GFP Disruption Assay

[0115] HEK293-GFP stable cells were purchased from GenTarget. One day after seeding, cells were transfected using Lipofectamine 2000 (Life Technologies) according to manufacturer's instructions, with efficiency reaching at least about 70% per well. Experimental cells were treated with 1 mM 4-hydroxytamoxifen (Sigma Aldrich), while control cells remained in culture media devoid of tamoxifen. 5 days after transfection, cells were trypsinised and resuspended in PBS containing 2% FBS for analysis by flow cytometry. All the data were normalized to the average fluorescence intensity of cells transfected with a plasmid that did not express any sgRNA.

Generation of STF3A-TetOn Stable Cells

[0116] STF3A cells were modified to stably express the Tet-On 3G transactivator protein via retroviral transduction and drug selection. Briefly, to generate retroviruses, GP2-293 cells were transfected at around 70% confluence with a transfection mix comprising 20 g pCMV-VSVG envelope vector, 50 g pRETROX-TET3G vector (CloneTech), and 140 l Lipofectamine 2000 (LifeTechnologies) diluted in 3.75 ml Opti-MEM (Life Technologies) and 7.5 ml DMEM containing 10% FBS. The transfection mix was substituted with 10 ml DMEM containing 5% FBS after 6 hours of incubation at 37 C. Retrovirus-containing medium was harvested after 24 hours and purified using Amicon Ultra-15 Centrifugal Filter Units (Merck Millipore). STF3A cells were then infected twice with 20 l retroviruses each time and subsequently selected in DMEM containing 500 g/ml G418 over 5 days. To test the expression of the transactivator gene, STF3A-TetOn cells were transfected with 1 g pTRE-tdTomato vector (Addgene #50798) and observed for red fluorescence 24 hours after treatment with 1 g/ml doxycycline.

Luciferase Assay

[0117] STF3A-TetOn cells were transfected with 1 g iCas or pTRE3G-Cas9 and treated with 1 M tamoxifen or 1 g/ml doxycycline respectively for 6 hours. The cells were then trypsinised and re-seeded equally into a Corning 96-well flat clear bottom white plate. Samples were assayed for luciferase activity using Dual-Glo Luciferase (Promega) according to manufacturer's instructions. All measurements were taken using the i-control software for Tecan microplate readers. All firefly luciferase measurements were normalized to the corresponding renilla luciferase readings.

Surveyor Cleavage Assay

[0118] Genomic DNA was isolated from cells using the DNeasy Blood and Tissue Kit (Qiagen) and the loci-of-interest were amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs; see Table 3 for list of primers). The PCR products were purified using the GeneJET Gel Extraction Kit (Thermo Scientific). Subsequently, 250 ng DNA was incubated at 95 C. for 5 minutes in 1 NEBuffer 2 and then slowly cooled at a rate of 0.1 C./second. After annealing, 5U T7 endonuclease I (New England Biolabs) was added to each sample and the reactions were incubated at 37 C. for 50 minutes. The T7E1-digested products were separated on a 2.5% agarose gel stained with GelRed (Biotium) and the gel bands were quantified using ImageJ.

Illumina Deep Sequencing

[0119] Sequencing libraries were constructed via two rounds of PCR. In the first round, the loci-of-interest were amplified from genomic DNA using Q5 High-Fidelity DNA Polymerase (New England Biolabs) and the primers listed in Supplementary Table 4. Each forward primer contains the common sequence GCG TTA TCG AGG TC, while each reverse primer contains the common sequence GTG CTC TTC CGA TCT. In the second round, the PCR products from the first round were barcoded using Phusion High-Fidelity DNA Polymerase (New England Biolabs) and the following primers: ForwardAAT GAT ACG GCG ACC ACC GAG ATC TAC ACC CTA CAC GAG CGT TAT CGA GGT C; ReverseCAA GCA GAA GAC GGC ATA CGA GAT (barcode) GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T. 10 bp barcodes designed by Fluidigm for the Access Array System were used. All samples were sequenced on MiSeq (Illumina) to produce paired 151 bp reads.

Cell Fractionation

[0120] HEK293 cells were fractionated using the Rapid Efficient And Practical (REAP) method. Briefly, the cells were scraped in ice-cold PBS, collected into 1.5 ml Eppendorf tubes, and pop-spun for 10 seconds in a table-top centrifuge. The supernatant was discarded and the pellet was lysed with 0.1% Igepal CA630 (Sigma Aldrich) in PBS supplemented with protease inhibitor (Calbiochem). Whole cell lysates were aliquoted and the remainder was pop-spun for 10 seconds. The supernatant, comprising the cytosolic fraction, was collected into a new tube. The pellet, comprising the nuclear fraction, was resuspended using 0.1% Igepal CA630 in PBS with protease inhibitor. Whole cell lysates and nuclear fractions were subjected to 10 cycles of sonication (each cycle consisted of 30 seconds sonication followed by 30 seconds rest).

Western Blot Analysis

[0121] Proteins from whole cell lysates, nuclear fractions, and cytosolic fractions were loaded in equal amounts for SDS PAGE and then transferred onto a nitrocellulose membrane for western blot. The primary antibodies used were -V5 (Life Technologies, 1:8000 dilution), -3PGDH (Santa Cruz, 1:1000 dilution), and -total histone H3 (Abcam, 1:10000 dilution). Primary antibodies were diluted in TBST+5% milk and incubated overnight at 4 C. Secondary antibodies were used at a 1:2500 dilution in TBST+5% milk Membranes were exposed after addition of WesternBright Sirius HRP substrate (Advansta).

Immunohistochemistry

[0122] Paraformaldehyde-fixed HEK293 cells were first incubated with blocking solution (10% FBS in 0.1M PBS) (JR Scientific Inc) for 30 minutes and then quenched with 3% hydrogen peroxide. Next, the samples were incubated for 2 hours at room temperature or 4 C. overnight with primary antibody specific against the V5 epitope tag (Life Technologies) in blocking solution. Negative controls were incubated with blocking solution without any primary antibody. Subsequently, the samples were thoroughly washed with PBS and then incubated for 1 hour at room temperature with secondary horseradish peroxidase (HRP)-conjugated antibody (GE Healthcare UK Ltd). After further incubation with DAB substrate (Vector Laboratories) for 10 minutes at room temperature, the cover slips were washed with distilled water, counter-stained with hematoxylin (Vector Laboratories) for 10 minutes to reveal cellular material, and mounted onto glass slides (Thermo Scientific). All slides were viewed and imaged using a light microscope (Zeiss Axio Imager Z1 with attached Leica Axiocam MRc5 camera) with the appropriate filters.

Tables

[0123]

TABLE-US-00001 TABLE1 ListofCas9variantsconstructedandtestedAminoacidsforthe differentproteinlinkersaregiveninboldletters. No. Details 1 NLS-TG-ERT2-SGSETPGTSESAGA-Cas9-NLS-ERT2 2 NLS-TG-ERT2-SGSEGA-Cas9-NLS-ERT2 3 NLS-TG-ERT2-GGSGGSGA-Cas9-NLS-ERT2 4 NLS-TG-ERT2-GTSESATPESGGA-Cas9-NLS-ERT2 5 NLS-TG-ERT2-SGSETPGTGA-Cas9-NLS-ERI2 6 NLS-TG-ERT2-SESATPESGA-Cas9-NLS-ERT2 7 NLS-TGGGS-ERT2-SGSETPGTGA-Cas9-NLS-ERT2 8 NLS-TGGGS-ERT2-SGSETPGTPGGA-Cas9-NLS-ERT2 9 NLS-TG-ERT2-GASGSKTPG-Cas9-NLS-ERT2 10 NLS-TG-ERT2-TPESGA-Cas9-NLS-ERT2 11 NLS-TGPGGS-ERT2-GA-Cas9-NLS-ERT2 12 NLS-TGGGS-ERT2-SGSETPGTSEGA-Cas9-NLS-ERT2 13 NLS-TGGGS-ERT2-TPESGA-Cas9-NLS-ERT2 14 NLS-TGPGGSAGDTTGPGGS-ERT2-GA-Cas9-NLS-ERT2 15 NLS-TGGGS-ERT2-SESATPESGA-Cas9-NLS-ERT2 16 NLS-TGGGS-ERT2-SGSEGA-Cas9-NLS-ERT2 17 NLS-TG-ERT2-PG-Cas9-NLS-ERT2 18 NLS-TG-ERT2-GA-Cas9-NLS-SGS-ERT2 19 NLS-TG-ERT2-GA-Cas9-NLS-GGGS-ERT2 20 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2 21 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2 22 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-NLS-ERT2 23 NLS-TG-ERT2-GA-Cas9-NLS-ERT2-PAG-ERT2 24 NLS-TG-ERT2-GA-Cas9-NLS-ERT2-PAGGGS-ERT2 25 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAGGGS-ERT2 26 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAGGGS-ERT2 27 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAG-ERT2 28 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAG-ERT2 29 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAG-ERT2 30 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAGGGS-ERT2

TABLE-US-00002 TABLE2 Non-specificoff-targetsitesinvestigatedinthisstudy. EMX1 Chr2:73160982 On GAGTCCGAGCAGAAGAAGAAggg Chr5:45359083 Off1 GAGTTAGAGCAGAAGAAGAAagg Chr15:44109747 Off2 GAGTCTAAGCAGAAGAAGAAgag VEGFAP1 Chr6:43737313 On GGGTGGGGGGAGTTTGCTCCtgg Chr15:65637553 Off1 GGATGGAGGGAGTTTGCTCCtgg Chr17:39796344 Off2 TAGTGGAGGGAGCTTGCTCCtgg Chr1:99347667 Off3 GGGGAGGGGAAGTTTGCTCCtgg VEGFAP2 Chr6:43737454 On GGTGAGTGAGTGTGTGCGTGtgg Chr9:16681608 Off1 AGTGAGTGAGTGTGTGTGTGggg Chr5:89440985 Off2 AGAGAGTGAGTGTGTGCATGagg Chr5:115434659 Off3 TGTGGGTGAGTGTGTGCGTGagg Chr22:37662840 Off4 GCTGAGTGAGTGTATGCGTGtgg WASI1 ChrX:48544569 On TGGATGGAGGAATGAGGAGTtgg Chr1:30597854 Off1 TGGATGGAGGGATGAGGAGTggg Chr2:242451414 Off2 GGGATGGAGGGATGAGGAGTggg Chr18:21810215 Off3 AGGAGGGAGGAATGGGGAGTtgg WASI2 ChrX:48544562 On CCCATCCATCCAGAGACACAggg ChrX:90817748 Off1 CTCTTCCACCCAGAGACACAggg TAT Chr16:71609818 On TCCTCCTGAGACTCCATACCtgg Chr6:12810776 Off1 CCATCCTGAGACTCCATACCtgg FANCF Chr11:22647354 On GGAATCCCTTCTGCAGCACCtgg Chr18:8707544 Off1 GGAACCCCGTCTGCAGCACCagg Chr10:43410014 Off2 GGAGTCCCTCCTACAGCACCagg Chr10:37953183 Off3 GGAGTCCCTCCTACAGCACCagg Chr17:78923961 Off4 AGAGGCCCCTCTGCAGCACCagg

TABLE-US-00003 TABLE3 PCRprimersusedfortheSurveyorcleavageassay. PrimerName PrimerSequence EMX1_On_Set1_FOR GCCCCTAACCCTATGTAGCC EMX1_On_Set1_REV GGAGATTGGAGACACGGAGA EMX1_On_Set2_FOR CTGTGTCCTCTTCCTGCCCT EMX1_On_Set2_REV CTCTCCGAGGAGAAGGCCAA EMX1_Off1_FOR TTGAGACATGGGGATAGAATCA EMX1_Off1_REV CAGGAATAGCCCTACAAAGGTG EMX1_Off2_FOR GTTCTGTAAACGCCGTAGCC EMX1_Off2_REV GGATGCAGTCTGCCTTTTTG PPP1R12C_On_Set1_FOR GTCTAACCCCCACCTCCTGT PPP1R12C_On_Set1_REV ACACCTAGGACGCACCATTC PPP1R12C_On_Set2_FOR CGGTTAATGTGGCTCTGGTT PPP1R12C_On_Set2_REV CGCACGGAGGAACAATATAAA VEGFA_Promoter1_On_Set1_FOR CTGGACACTTCCCAAAGGAC VEGFA_Promoter1_On_Set1_REV AGGGAGCAGGAAAGTGAGGT VEGFA_Promoter1_On_Set2_FOR TCACTGACTAACCCCGGAAC VEGFA_Promoter1_On_Set2_REV CTGAGAGCCGTTCCCTCTTT VEGFA_Promoter1_Off1_FOR GGGCTAGAGTGTAGTGGCACA VEGFA_Promoter1_Off1_REV GCCCTGTTTTCATCCTACACA VEGFA_Promoter1_Off2_FOR AAGTTGGGCAAGAGTCCAGA VEGFA_Promoter1_Off2_REV ACCAGCAGAGGAAGGGCTAT VEGFA_Promoter1_Off3_FOR TGCCATTTTTAAGCCATCAG VEGFA_Promoter1_Off3_REV AGCCCATTCTTTTTGCAGTG VEGFA_Promoter2_On_FOR CCAGATGGCACATTGTCAGA VEGFA_Promoter2_On_REV CCAAGGTTCACAGCCTGAAA VEGFA_Promoter2_Off1_FOR GCCGTCTGTTAGAGGGACAA VEGFA_Promoter2_Off1_REV GTCTTCCCCCAACCTCCAGT VEGFA_Promoter2_Off2_FOR GGCCCAATCTTAGTGTTTCAGA VEGFA_Promoter2_Off2_REV TGGTTAAAAGCAAAGGATGTGA VEGFA_Promoter2_Off3_FOR CCCTCGCTAGATACTGAGGAAA VEGFA_Promoter2_Off3_REV TGGCCAAGATAAGGAAACAAC VEGFA_Promoter2_Off4_FOR TGATTCCGCTGACACGTAAC VEGFA_Promoter2_Off4_REV TTCAGAGCCTCTCACCACCT WAS_Intron1-2_On_Set1_FOR CAGCCAATGAAGGTGAGTCC WAS_Intron1-2_On_Set1_REV GTGGATCCCACAAACCATTC WAS_Intron1-2_On_Set2_FOR AGGAATCAGAGGCAAAGTGG WAS_Intron1-2_On_Set2_REV TCCCATCAATTCATCCCTCT WAS_Intron1_Off1_FOR CTGTCCTCTCTGCAGGAACC WAS_Intron1_Off1_REV GTCTGGATCCCTGCATCACT WAS_Intron1_Off2_FOR CGAGGTTCCAGAATGCTCTT WAS_Intron1_Off2_REV GGGAGGCTAAACCCTGAAAC WAS_Intron1_Off3_FOR TCTTCAATGTTCCCCCACAT WAS_Intron1_Off3_REV AGGCTGCCATTGTCTGAAGT WAS_Intron2_Off1_Set1_FOR TCTCAGAGATACAAGGGAAATCG WAS_Intron2_Off1_Set1_REV CCAGCAGACTCTGGGTCTATTTA WAS_Intron2_Off1_Set2_FOR TACAAGGGAAATCGTGAGACC WAS_Intron2_Off1_Set2_REV AGTCAGCATGCAGATTCTGGT TAT_On_FOR GACAACATGAAGGTGAAACCAA TAT_On_REV GTCAAAGAAAGCCAGGAAAGAA TAT_Off1_FOR TGTGGTTGGTTGGTTTGTTG TAT_Off1_REV GTGACCAAGCAGGCTCTTTC FANCF_On_FOR ACCTCTTTGTGTGGCGAAAG FANCF_On_REV CCAGGCTCTCTTGGAGTGTC FANCF_Off1_FOR CAGACTTCACCACCATGCAC FANCF_Off1_REV GGCCAGTCCTTTGTAAGCAT FANCF_Off2_FOR AATGTAAGAGGCAACCAAAGGA FANCF_Off2_REV GTTAATGGAAGGTGAAGGCAGT FANCF_Off3_FOR AATGCAAGAGGCAAACAAAAA FANCF_Off3_REV CCAACATCTTCACAAGGGTTC FANCF_Off4_FOR CAACCTTCATCCTTGGCTTG FANCF_Off4_REV GAGACAGAGCCATGCAACCTA CTNNB_1_On_FOR GCCACCAGCAGGAATCTAGT CTNNB_1_On_REV TCAAAACTGCATTCTGACTTTCA ADAR1_On_FOR GGGCAGGAACCTGTCATAAA ADAR1_On_REV CCCTTGTTCAGCCAAGATTC TCF7_On_FOR TTCCTTCCCAAGTCAGGAACT TCF7_On_REV TATGGGAGAAAAGACCAGCAC PARP4_On_FOR GGACTTCCAGCTTTTTGCAC PARP4_On_REV TTGCTCTCGGGATTTTAGGA ASXL2_On_FOR CATGGCAGCCCCTTTCTAT ASXL2_On_REV GCCTGGCCATAAGTCATTTT

TABLE-US-00004 TABLE4 PCRprimersusedformakingIlluminasequencinglibraries. PrimerName PrimerSequence EMX1_On_Adapter_FOR GCGTTATCGAGGTCGGGCCTCCTG AGTTTCTCAT EMX1_On_Adapter_REV GTGCTCTTCCGATCTGTGGTTGCC CACCCTAGTC EMX1_Off1_Adapter_FOR GCGTTATCGAGGTCTGCACATGTA TGTACAGGAGTCAT EMX1_Off1_Adapter_REV GTGCTCTTCCGATCTCACCTTTTA AGATCTGACAGAGAAA EMX1_Off2_Adapter_FOR GCGTTATCGAGGTCTGGGCGAGAA AGGTAACTTATG EMX1_Off2_Adapter_REV GTGCTCTTCCGATCTACTGTTTCA CTGCCTACCTTCC PPP1R12C_On_Adapter_Set1_FOR GCGTTATCGAGGTCGATCAGTGAA ACGCACCAGA PPP1R12C_On_Adapter_Set1_REV GTGCTCTTCCGATCTGTCTAACCC CCACCTCCTGT PPP1R12C_On_Adapter_Set2_FOR GCGTTATCGAGGTCGTCAGAGCAG CTCAGGTTCTG PPP1R12C_On_Adapter_Set2_REV GTGCTCTTCCGATCTTAGGCCTCC TCCTTCCTAGTCT VEGFA_Promoter1_On_Adapter_FOR GCGTTATCGAGGTCGCACATTGTC AGAGGGACAC VEGFA_Promoter1_On_Adapter_REV GTGCTCTTCCGATCTCACACGTCC TCACTCTCGAA VEGFA_Promoter1_Off1_Adapter_FOR GCGTTATCGAGGTCTCTCAAACTC CTGGGCTCAA VEGFA_Promoter1_Off1_Adapter_REV GTGCTCTTCCGATCTCTGGTTTTT GGTTTGGGAAA VEGFA_Promoter1_Off2_Adapter_FOR GCGTTATCGAGGTCCCCTCTCCAT GAAACTTTGC VEGFA_Promoter1_Off2_Adapter_REV GTGCTCTTCCGATCTAGGGCAAAA CAGGAGAACAG VEGFA_Promoter1_Off3_Adapter_FOR GCGTTATCGAGGTCGCATCTCTGC CTTCATTGCT VEGFA_Promoter1_Off3_Adapter_REV GTGCTCTTCCGATCTGCCTACTCC AGGGTTTCTCA VEGFA_Promoter2_On_Adapter_FOR GCGTTATCGAGGTCGCAGACGGCA GTCACTAGG VEGFA_Promoter2_On_Adapter_REV GTGCTCTTCCGATCTCCGTTCCCT CTTTGCTAGG VEGFA_Promoter2_Off1_Adapter_FOR GCGTTATCGAGGTCGATCCGGTGC TGCAGTGA VEGFA_Promoter2_Off1_Adapter_REV GTGCTCTTCCGATCTGCTCTCCAC CTCGATGTCA VEGFA_Promoter2_Off2_Adapter_FOR GCGTTATCGAGGTCTCAAAGTTTC ACATGGTTGC VEGFA_Promoter2_Off2_Adapter_REV GTGCTCTTCCGATCTGTGTGGAGG GTGGGACCT VEGFA_Promoter2_Off3_Adapter_FOR GCGTTATCGAGGTCATTATGCGTA TTCAGGGTGTGC VEGFA_Promoter2_Off3_Adapter_REV GTGCTCTTCCGATCTGCTGGTCAG AGGGTACAACTTTT VEGFA_Promoter2_Off4_Adapter_FOR GCGTTATCGAGGTCGGTTAGGAGA GCTGGCTTGGA VEGFA_Promoter2_Off4_Adapter_REV GTGCTCTTCCGATCTCTGGCCTCG GCCTCTCA WAS_Intron1-2_On_Adapter_FOR GCGTTATCGAGGTCGGCAGGGCTG TGATAACTCT WAS_Intron1-2_On_Adapter_REV GTGCTCTTCCGATCTATCTACCGC CAATCCATCC WAS_Intron1_Off1_Adapter_FOR GCGTTATCGAGGTCACGGCATGGA ATTATTTGGTT WAS_Intron1_Off1_Adapter_REV GTGCTCTTCCGATCTGCCTGGGAG AGAAATCAACTC WAS_Intron1_Off2_Adapter_FOR GCGTTATCGAGGTCACTGTGTAGG AAGCCCACTCTC WAS_Intron1_Off2_Adapter_REV GTGCTCTTCCGATCTAAAGCTTGG TGACAGTGAAATG WAS_Intron1_Off3_Adapter_FOR GCGTTATCGAGGTCCATGAAGGGA AGAGGTGCAT WAS_Intron1_Off3_Adapter_REV GTGCTCTTCCGATCTCCAACGTGA CCCTTTTTGAG WAS_Intron2_Off1_Adapter_FOR GCGTTATCGAGGTCTCACAGTCTC TTCCCCTGCT WAS_Intron2_Off1_Adapter_REV GTGCTCTTCCGATCTCTTGGCCAG TGTCTTTCCAT TAT_On_Adapter_FOR GCGTTATCGAGGTCTGTGTTTGGA AACCTGCCTA TAT_On_Adapter_REV GTGCTCTTCCGATCTCCAAATCCA AAGGACCATGT TAT_Off1_Adapter_FOR GCGTTATCGAGGTCCATCCCCTGG CATCTAGAAA TAT_Off1_Adapter_REV GTGCTCTTCCGATCTTCACTACCT GGTGGCTATGG FANCF_On_Adapter_FOR GCGTTATCGAGGTCAGCATTGCAG AGAGGCGTAT FANCF_On_Adapter_REV GTGCTCTTCCGATCTATGGATGTG GCGCAGGTAG FANCF_Off1_Adapter_FOR GCGTTATCGAGGTCCACAGATTGA TGCCACTGGA FANCF_Off1_Adapter_REV GTGCTCTTCCGATCTACGCCAGCA CTTTCTAAGGA FANCF_Off2-3_Adapter_FOR GCGTTATCGAGGTCTTACCAGATG GAGGACAGTGA FANCF_Off2-3_Adapter_REV GTGCTCTTCCGATCTACCAGTTTG AGACCTCTGACC FANCF_Off4_Adapter_FOR GCGTTATCGAGGTCGGCTCTGGGT ACAGTTCTGC FANCF_Off4_Adapter_REV GTGCTCTTCCGATCTGCCACAGAC GAAGACACAGA

TABLE-US-00005 TABLE1 Listof#Cas9variantsconstructedandtested.Aminoacidsforthe differentproteinlinkersaregiveninbold. SEQ IDNo. No. Details 15 17 NLS-PR-ERT2-PG-Cas9-ERT2 16 2 NLS-TG-ERT2-SGSEGA-Cas9-ERT2 17 9 NLS-TG-ERT2-GASGSKTPG-Cas9-ERT2 18 1 NLS-TG-ERT2-SGSETPGTSESAGA-Cas9-ERT2 19 5 NLS-TG-ERT2-SGSETPGTGPGGA-Cas9-ERT2 20 6 NLS-TG-ERT2-SESATPESGA-Cas9-ERT2 21 4 NLS-TG-ERT2-GTSESATPESGGA-Cas9-ERT2 22 3 NLS-TG-ERT2-GGSGGSGA-Cas9-ERT2 23 11 NLS-TGPGPGGS-ERT2-GA-Cas9-ERT2 24 14 NLS-TGPGPGGSAGDTTGPGTGPG-ERT2-GA-Cas9-ERT2 25 19 NLS-TG-ERT2-GA-Cas9-GGGS-ERT2 26 13 NLS-TGGGS-ERT2-TPESGA-Cas9-ERT2 27 15 NLS-TGGGS-ERT2-SESATPESGA-Cas9-ERT2 28 16 NLS-TGGGS-ERT2-SGSEGA-Cas9-ERT2 29 7 NLS-TGGGS-ERT2-SGSETPGTGA-Cas9-ERT2 30 12 NLS-TGGGS-ERT2-SGSETPGTSEGA-Cas9-ERT2 31 22 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-ERT2 32 21 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-ERT2 33 23 NLS-TG-ERT2-GA-Cas9-ERT2-PAG-ERT2 34 24 NLS-TG-ERT2-GA-Cas9-ERT2-PAGGGS-ERT2 35 8 NLS-TGGGS-ERT2-SGSETPGTPGGA-Cas9-ERT2 36 27 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-ERT2-PAG-ERT2 37 30 NLS-TGGGS-ERT2-PR-ERT2-TPESGA-Cas9-ERT2-PAGGGS-ERT2 38 25 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-ERT2-PAGGGS-ERT2 39 28 NLS-TGGGS-ERT2-TPGGPRGGS-ERT2-TPESGA-Cas9-ERT2-PAG-ERT2

CHEMICAL-INDUCIBLE GENOME ENGINEERING TECHNOLOGY

Inventors

Cpc classification

Classification Explorer

C12N2310/20

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C12N2800/80

CHEMISTRY; METALLURGY

Classification Explorer

C07K14/70567

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C07K14/721

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N9/22

CHEMISTRY; METALLURGY

Classification Explorer

C07K14/72

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Abstract

Claims

Description