CHEMICAL-INDUCIBLE GENOME ENGINEERING TECHNOLOGY
20190382740 ยท 2019-12-19
Inventors
Cpc classification
C12N2310/20
CHEMISTRY; METALLURGY
C12N9/22
CHEMISTRY; METALLURGY
C12N2800/80
CHEMISTRY; METALLURGY
C07K14/70567
CHEMISTRY; METALLURGY
C12N15/11
CHEMISTRY; METALLURGY
International classification
C12N9/22
CHEMISTRY; METALLURGY
Abstract
The present disclosure refers to an endonuclease-based gene editing construct, wherein the construct comprises a CRISPR-associated endonuclease (such as Cas9 or Cpf1) or a derivative thereof and at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof. The present disclosure also describes a method of editing a genome of a host cell using the construct as disclosed herein, the method comprising transfecting the host cell with the nucleic acid sequence as defined herein and incubating the cell with an inducing agent.
Claims
1. An endonuclease-based gene editing construct, wherein the construct comprises the following components: (a) a CRISPR-associated endonuclease or a derivative thereof; and (b) at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof.
2. The construct of claim 1, wherein the at least one or more mutated hormone binding domains of the estrogen receptor (ERT2) are located upstream or located downstream of the CRISPR-associated endonuclease, wherein if there are two or more ERT2, the ERT2 are all located upstream, or all located downstream, or located both upstream and downstream of the CRISPR-associated endonuclease.
3. The construct of claim 1, wherein the mutated hormone binding domain of the estrogen receptor (ERT2) is SEQ ID NO: 4 or derivatives thereof.
4. The construct of claim 1, wherein the construct further comprises one or more selected from the group of one or more localization sequences, a binding tag, a self-cleaving peptide, and a selectable marker.
5.-7. (canceled)
8. The construct of claim 1 comprising the following formula (I): ##STR00003## wherein A is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or a binding tag; wherein B is a localization sequence or derivatives thereof, or the binding tag, or absent; wherein both C.sub.1 and C.sub.2 are present or only C.sub.1 is present; wherein C.sub.1 and C.sub.2 are each independently selected from the group consisting of the localization sequence, derivatives thereof of the localization sequence, and a mutated hormone binding domain of the estrogen receptor (ERT2); wherein when C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); wherein X is a CRISPR-associated endonuclease or a derivative thereof; wherein D is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2), the localization sequence and derivatives of the localization sequence; wherein E is absent or is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2) and a self-cleaving peptide; wherein F is absent or is selected from the group consisting of the self-cleaving peptide and a selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 and L.sup.8 are linker sequences; wherein at least one of the linker sequences is present; wherein each of the linkers sequences is independently between 1 to 25 amino acids long; wherein each linker sequence independently comprises natural or unnatural or a mixture of natural and unnatural amino acids; wherein, if any one or more of the linker sequences of L.sup.1 to L.sup.8 is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is selected from the group consisting of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG and TGGGS; wherein L.sup.2 is selected from the group consisting of PRGGS, GGSPRGGS, PR, GGSPRGGS and TPGGPRGGS; wherein L.sup.3 is selected from the group consisting of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA and GASGS; wherein L.sup.4 is GGGS; wherein L.sup.5 is PAG or PAGGGS; wherein L.sup.6 is GA; wherein L.sup.7 and L.sup.8 are independently selected from the linkers as disclosed in any of L.sup.1 to L.sup.6.
9. The construct of claim 8, wherein a) A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent; b) A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent; c) A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent; d) D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent; e) D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; f) D and E are each one mutated hormone binding domain of the estrogen receptor (ERT2); g) D is the mutated hormone binding domain of the estrogen receptor (ERT2) and E is absent; h) A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent; i) A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent; j) A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; k) A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; l) wherein A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker; m) the linker sequences comprise of the amino acids A, E, G, P, S and T; or n) the linker sequences consist of the amino acids A, E, G, P, S and T.
10.-20. (canceled)
21. The construct of claim 1 comprising the following formula (II): ##STR00004## wherein B is a localization sequence or derivatives thereof, or the binding tag; wherein both C.sub.1 and C.sub.2 are present or only C.sub.1 is present; wherein C.sub.1 and C.sub.2 are each independently selected from the group consisting of the localization sequence, derivatives thereof of the localization sequence, and a mutated hormone binding domain of the estrogen receptor (ERT2); wherein when C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); wherein X is a CRISPR-associated endonuclease or a derivative thereof; wherein D is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2), the localization sequence and derivatives of the localization sequence; wherein E is absent or is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2) and a self-cleaving peptide; wherein F is absent or is selected from the group consisting of the self-cleaving peptide, the mutated hormone binding domain of the estrogen receptor (ERT2) and a selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.7 and L.sup.8 are linker sequences; wherein at least one of the linker sequences is present; wherein each of the linkers sequences is independently between 1 to 25 amino acids long; wherein each linker sequence independently comprises natural or unnatural or a mixture of natural and unnatural amino acids; wherein the linker sequences comprise the amino acids A, E, G, P, S and T; wherein, if any one or more of the linker sequences of L.sup.1 to L.sup.5 and L.sup.7 to L.sup.8 is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is selected from the group consisting of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG, TGPGGS, TGPGGSAGDTTGPGGS and TGGGS; wherein L.sup.2 is selected from the group consisting of PRGGS, GGSPRGGS, PR, GGSPRGGS and TPGGPRGGS; wherein L.sup.3 is selected from the group consisting of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA and GASGS; wherein L.sup.4 is GGGS; wherein L.sup.5 and L.sup.7 are independently PAG, SGS or PAGGGS; wherein L.sup.8 is selected from the linkers as disclosed in any of L.sup.1 to L.sup.5 and L.sup.7.
22. The construct of claim 21, wherein a) the linker sequences comprise of the amino acids A, E, G, P, S and T; b) the linker sequences consist of the amino acids A, E, G, P, S and T; c) B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent; d) D is a localization sequence and E and F are each a mutated hormone binding domain of the estrogen receptor (ERT2); e) A is absent, B is localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence, E is a mutated hormone binding domain of the estrogen receptor (ERT2) and F is absent; f) B is a localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2); or g) B is a localization sequence, C.sub.1 and C.sub.2 are each independently a mutated hormone binding domain of the estrogen receptor (ERT2), X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2).
23.-28. (canceled)
29. The construct of claim 1, wherein the CRISPR-associated endonuclease, or derivative thereof, is selected from the group consisting of a wild type CRISPR-associated protein 9 (Cas9), a mutated CRISPR-associated protein 9 (Cas9), wherein the mutated CRISPR-associated protein 9 (Cas9) is functional; a wild type Cpf1 (CRISPR from Prevotella and Francisella 1) protein, and a mutated Cpf1 protein, wherein the mutated Cpf1 protein is functional.
30. The construct of claim 29, wherein a) the CRISPR-associated protein 9 (Cas9), or derivative thereof, is selected from the group consisting of Streptococcus pyogenes, Streptococcus thermophiles, Listeria innocua, Staphylococcus aureus and Neisseria meningitidis b) the CRISPR-associated protein 9 (Cas9), or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 1; c) the Cpf1 protein, or derivative thereof, is selected from the group consisting of Acidaminococcus, Lachnospiraceae, Parcubacteria, Butyrivibrio proteoclasticus, Peregrinibacteria, Porphyromonas crevioricanis, Prevotella disiens, Moraxella bovoculi, Smithella, Leptospira inadai, Francisella novicida, Candidatus Methanoplasma termitum and Eubacterium eligens; or d) the Cpf1 protein, or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 2 or 3.
31.-33. (canceled)
34. The construct of claim 1, wherein a) the localization sequence is selected from the group consisting of nuclear localization sequence, mitochondrial localization sequence and derivatives thereof, optionally wherein the at least one or more nuclear localization sequences (NLS) are selected from the group consisting of Simian Vacuolating Virus 40 (SV40) Large T-antigen, Nucleoplasmin, Importin , EGL-13, c-MYC, TUS, AR, PLSCR1, PEP, TPX2, RB, TP53, N1N2, PB2, CBP80, SRY, hnRNP A1, HRP1, Borna Disease Virus p10, Ty1 Integrase, and the Chelsky consensus sequence; b) at least one or more nuclear localization sequences (NLS) are monopartite or bipartite NLS; or c) at least one or more nuclear localization sequences (NLS) are classical NLS (cNLS) or proline-tyrosine (PY)-NLS.
35.-40. (canceled)
41. The construct of claim 8, wherein the binding tag is located at either the N-terminus or the C-terminus of the construct, or at both ends of the construct.
42. (canceled)
43. The construct of claim 8, wherein the binding tag is selected from the group consisting of a V5 epitope tag, a FLAG tag, a tandem FLAG-tag, a triple FLAG tag (3FLAG), a Human influenza hemagglutinin (HA) tag, a tandem HA tag, a triple HA tag (3HA), a sextuple Histidine tag (6HIS), biotin, c-MYC, a Glutathione-S-transferase (GST) tag, a Strep-tag, a Strep-tag II, a S-tag, a natural histidine affinity tag (HAT), a Calmodulin-binding peptide (CBP) tag, a Streptavidin-binding peptide (SBP) tag, a Chitin-binding domain, a Maltose-binding protein (MBP) and derivatives thereof.
44. The construct of claim 43, wherein the V5 epitope tag sequence is SEQ ID NO: 12 or a derivative thereof.
45. The construct of claim 8, wherein the self-cleaving peptide is a 2A self-cleaving peptide or a derivative thereof.
46. The construct of claim 45, wherein the 2A self-cleaving peptide is SEQ ID NO: 13 or a derivative thereof.
47.-50. (canceled)
51. The construct of claim 1, wherein the construct has at least 90% sequence identity to SEQ ID NOs: 15 to 74.
52. The construct of claim 1, wherein the construct has a sequence selected from the group consisting of SEQ ID NO: 37, SEQ ID NO: 74 and SEQ ID NO: 249.
53. A nucleic acid sequence encoding an endonuclease-based gene editing construct, wherein the construct comprises the following components: (a) a CRISPR-associated endonuclease or a derivative thereof; and (b) at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof.
54.-56. (canceled)
57. A method of editing a genome of a host cell using the construct of any one of the preceding claims, the method comprising: (a) transfecting the host cell with a nucleic acid sequence encoding an endonuclease-based gene editing construct, wherein the construct comprises the following components: (i) a CRISPR-associated endonuclease or a derivative thereof; and (ii) at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof; and (b) incubating the cell of operation (a) with an inducing agent.
58.-64. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
DETAILED DESCRIPTION OF THE PRESENT INVENTION
[0046] Recently, the development of genome editing technologies has opened up new avenues of biomedical research and holds the promise to accelerate knowledge discovery and drug development. The CRISPR-Cas9 system, for example, which is co-opted from bacteria, is particularly attractive because the elements that recognize the target genomic loci are simple single guide RNA (sgRNA) molecules, which bind the loci-of-interest by complementary base-pairing and are hence straightforward to design and synthesize. The sgRNA recruits the Cas9 nuclease to the DNA to create a double-stranded break. Much effort has been devoted to improving the specificity of the technology and various strategies have been proposed to mitigate off-target mutagenesis by the Cas9 enzyme.
[0047] In one aspect, the present invention refers to an endonuclease-based gene editing construct. As used herein, the term endonuclease(s) refers to enzymes that are capable of cleaving/restricting, that is inducing a strand break, in a section of a nucleic acid sequence. Depending on the type of endonuclease required, the endonuclease can be capable of cleaving within a single strand region of a nucleic acid sequence, a double strand region of a nucleic acid sequence or both. In general, endonucleases can be divided into 3 types, that is Type I, II and III, according to their mechanism of action. Type I and type III nucleases typically refer to large multi-subunit endonucleases that have both endonuclease and methylase activity (that is ATP [adenosine triphosphate] is required as a source of energy). Type II endonucleases, on the other hand, are simpler in structure and do not require an energy source such as ATP. The type of restriction site and specificity of the endonuclease to its particular restriction site, that is the site where the strand break is induced, varies between each endonuclease. It is also possible for an endonuclease to cleave the nucleic acid strand a number of base pairs upstream or downstream from the recognition site. For example, Type I endonucleases are known for cleaving random nucleic acid sequences up to 1000 or more base pairs upstream and/or downstream from the recognition site. Type III endonucleases, on the other hand, are known for cleaving nucleic acid sequences up to 25 or more base pairs from the recognition sites. Thus, in one example, the endonuclease is, but is not limited to, CRISPR-associated endonuclease, for example Cas9 and Cpf1, or derivatives thereof.
[0048] As used herein, the term CRISPR refers to Clustered regularly interspaced short palindromic repeats, which are segments of prokaryotic DNA containing short repetitions of base sequences. Each repetition can be followed by short segments of spacer DNA within a sequence. The term Cas9 refers to CRISPR associated protein 9, which is an RNA-guided DNA endonuclease enzyme associated with the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) type II adaptive immunity system in, for example, Streptococcus pyogenes, among other bacteria. S. pyogenes utilizes Cas9 to interrogate and cleave foreign DNA, such as invading bacteriophage DNA or plasmid DNA. Cas9 interrogates the foreign DNA by unwinding it and checking whether the foreign DNA is complementary to the 20 base pair spacer region of the guide RNA. If the interrogated DNA substrate is complementary to the 20 base pair spacer region of the guide RNA, Cas9 cleaves the invading DNA. Mechanistically speaking and without being bound by theory, the CRISPR-Cas9 mechanism has a number of parallels with mechanism of the RNA interference (RNAi) present in eukaryotes.
[0049] Thus, in one example, the CRISPR-associated endonuclease, or derivative thereof, is selected from the group consisting of a wild type CRISPR-associated protein 9 (Cas9), a mutated CRISPR-associated protein 9 (Cas9), a wild type Cpf1 (CRISPR from Prevotella and Francisella 1) protein, and a mutated Cpf1 protein, In the event where the protein is mutated, the mutant protein is to be functional. In another example, the wherein the CRISPR-associated protein 9 (Cas9), or derivative thereof, is selected from the group consisting of Streptococcus pyogenes, Streptococcus thermophiles, Listeria innocua, Staphylococcus aureus and Neisseria meningitidis. In yet another example, the CRISPR-associated protein 9 (Cas9), or derivative thereof, has at least 99%, at least 98%, at least 97%, at least 96%, at least 95%, at least 94%, at least 93%, at least 92%, at least 91%, at least 90%, at least 89%, at least 85%, at least 80%, at least 75%, sequence identity to SEQ ID NO: 1. In yet another example, the CRISPR-associated protein 9 (Cas9), or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 1. In a further example, the Cpf1 protein, or derivative thereof, is selected from the group consisting of Acidaminococcus, Lachnospiraceae, Parcubacteria, Butyrivibrio proteoclasticus, Peregrinibacteria, Porphyromonas crevioricanis, Prevotella disiens, Moraxella bovoculi, Smithella, Leptospira inadai, Francisella novicida, Candidatus Methanoplasma termitum and Eubacterium eligens. In another example, the Cpf1 protein, or derivative thereof, has at least 99%, at least 98%, at least 97%, at least 96%, at least 95%, at least 94%, at least 93%, at least 92%, at least 91%, at least 90%, at least 89%, at least 85%, at least 80%, at least 75% sequence identity to SEQ ID NO: 2 or 3. In another example, the Cpf1 protein, or derivative thereof, has at least 95% sequence identity to SEQ ID NO: 2 or 3 The term sequence identity means that two nucleic acid or amino acid sequences are identical (i.e., on a nucleotide-by-nucleotide or residue-by-residue basis) over the comparison window. The term percentage of sequence identity is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, U, or I) or residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the comparison window (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. In light of the above, it is understood to a person skilled in the art what is meant by a sequence identity of, for example, at least 95%.
[0050] The terms upstream or downstream refer relative positions in nucleic acid sequence, that is in a DNA or RNA sequence. Each strand of DNA or RNA has a 5 end and a 3 end, which are so named for the carbon position on the deoxyribose (or ribose) ring. By convention, upstream and downstream relate to the 5 to 3 direction in which RNA transcription takes place. In this case, upstream is toward the 5 end of the RNA molecule and downstream is toward the 3 end of the RNA molecule. When considering double-stranded DNA, upstream is toward the 5 end of the coding strand for the gene in question and downstream is toward the 3 end. Due to the anti-parallel nature of DNA, this means the 3 end of the template strand is upstream of the gene and the 5 end is downstream. It is noted that some genes on the same DNA molecule may be transcribed in opposite directions. This means the upstream and downstream areas of the molecule may change depending on which gene is used as the reference.
[0051] In order for such an endonuclease-based gene editing construct to be functional, other factors may be required, other than the endonuclease itself. In the case of, for example, CRISPR-associated endonucleases, such as Cas9 or Cpf1, a guide nucleic acid sequence is required in order to guide the endonucleases to the correct excision or editing loci. Therefore, the endonuclease needs to be capable of cleaving a nucleic acid in a specific section marked by the binding of, for example a guide nucleic acid sequence. In one example, a single strand guide nucleic acid sequence would bind to a complementary sequence within a genome or a stretch of nucleic acid. This binding of the guide sequence to the genome results in a double strand nucleic acid section, which is then recognized by the endonuclease and is then targeted for excision. Thus, in one example, the sequence of the guide nucleic acid sequence is complementary to the sequence of the intended restriction site. In another example, the sequence of the guide nucleic acid sequence is identical to the sequence of the intended restriction site. In another example, more than one nucleic acid guide sequences are used in conjunction with one or more nucleases. In another example, for example when multiple endonucleases are used, the guide sequences are specific for each endonuclease. In another example, where a single endonuclease and multiple guide sequences are used, the guide sequences must be so constructed that the endonuclease is capable of restricting the nucleic acid sequence at all of restriction sites. Therefore, by delivering, for example, a Cas9 endonuclease and appropriate guide nucleic acid sequence into a cell, the cell's genome can be cleaved at a desired location, thereby allowing existing genes to be removed and/or new genes to be added, or the function of existing genes to be modulated. In terms of the present invention, the process of gene editing becomes simplified in terms of procedure, because the sgRNA molecules guide the Cas9 nuclease to the (then double strand) loci within the genome, which is then excised from that location. This removes the double strand section from the loci in question, thereby creating, for example, a gene knock-out or knock-down for situations where the sgRNA binds to a functional part of a gene, or a gene knock-in in the event that a gene is introduced into the restriction site.
[0052] There are various ways of controlling or inducing certain aspects of a biological system. For example, the use of the lac operon system is frequently used for prokaryotic gene regulation, as it allows for an effective, inducible regulatory mechanism based on the absence or the presence of lactose. In general, such systems can be described using the terms inducible and repressible systems, whereby an inducible system is off unless there is the presence of a control molecule (also called an inducer) that allows for, in this case, gene expression. The molecule is said to induce expression. On the other hand, a repressible system is on except in the presence of some molecule (also called a co-repressor) that suppresses, in this case gene expression. The molecule is said to repress expression. In both cases, the manner by which the induction or repression happens is dependent on the control mechanisms, as well as differences between prokaryotic and eukaryotic cells. Another example of an inducible expression system is tetracycline controlled transcriptional activation, wherein the activation of transcriptional activity is dependent on the presence of tetracycline. Having said that, these on and off switches that are usually found in the field of protein expression can be used in other situations where control over a specific enzyme function is desired. In one example, the inducible system used is the ERT2-tamoxifen inducible system. This system allows for temporal control of the enzyme in questions, as the ERT-domain can be fused to any protein of interest, allowing reversible control over their activity by administrating or removing tamoxifen, (or derivatives thereof, for example, 4-hydroxytamoxifen), that is the inducing agent that either switches the control of the target protein on or off, depending on the concept used. For example, without being bound by theory, it is thought that in the constructs disclosed herein, the ERT2 domains effectively sequester the Cas9-dependent constructs outside of the nucleus, where they cannot perform their DNA editing activity. In the presence of an inducing agent, for example tamoxifen, however, the fusion protein can then rapidly translocate into the nucleus to perform its function.
[0053] As explained previously, the inducing agent used would depend on the type of inducible/repressible system used. Also, in order to be able to function as an inducing agent, the compound which is to function as an inducing agent need to be small enough in order to penetrate the cell membrane and thereby be present in the cell cytoplasm, or even the cell nucleus, depending on where the expressed protein is found. In one example, the construct as disclosed herein comprises the following components: a CRISPR-associated endonuclease (such as Cas9 or Cpf1) or a derivative thereof; and at least one or more hormone binding domains of the estrogen receptor (ERT2) or derivatives thereof. In one example, the one or more hormone binding domains of the estrogen receptor (ERT2) are located upstream or located downstream of the CRISPR-associated endonuclease. In another example, if there are two or more ERT2 present in the construct, the ERT2 are all located upstream, or all located downstream, or located both upstream and downstream of the CRISPR-associated endonuclease. In another example, the hormone binding domains of the estrogen receptor (ERT2) is mutated. In yet another example, the mutated hormone binding domain of the estrogen receptor (ERT2) is SEQ ID NO: 4, or derivatives, or variations thereof. In one example, the inducing agent is, but is not limited to, tamoxifen, 4-hydroxytamoxifen or derivatives thereof. In another example, the inducing agent is 4-hydroxytamoxifen.
[0054] The concentration of the inducing agent used or required in order to control the protein in question depends on the inducing agent used, as well as the time in which the host cell is exposed to the incubating agent. It will be appreciated that the inducing agent may not be used in concentrations that may result in a toxic or adverse effect in the host cell. Thus, in one example, the concentration of the inducing agent used is 0.5 M, about 0.25 M, about 1 M, about 1000 nM, about 500 nM, about 250 nM, about 100 nM, about 50 nM, about 25 nM or about 10 nM. In another example, the concentration of the inducing agent used is a concentration of about 1 M. It will also appreciated that the length of time a host cell is exposed to an incubating agent may have an effect on the length of time the inducible or repressible system is turned on, or off, respectively. Thus, in one example, the host cell is incubated with the inducing agent for about 2, about 3, about 4, about 5, about 5.5, about 6, about 6.5, about 7, about 7.5, about 8, about 8.5, about 9, about 12, about 16, about 23.5, about 24, about 24.5, about 36 or about 48 hours. In another example, the host cell is incubated with the inducing agent for about 4, about 6, about 8 or about 12 hours.
[0055] As used herein, the term localization sequence refers to an amino acid sequence which tags a protein for transport into a specific compartment of the cell or the cell nucleus. One example of a localization sequence is a nuclear localization sequence or signal (NLS), which tags a protein for import into the nucleus of the cell. Another example is a nuclear export signal (NES), which has the opposite function in that it tags a protein for export out of the nucleus into the cytoplasm. Nuclear localization sequences can be divided into non-classical and classical NLSs. Classical nuclear localization sequences, that is NLSs that use the classical nuclear import cycle which may require the presence of an importin protein, can be further classified as either monopartite (which means to have a single part) or bipartite (to have more than one part, in this case two parts). For example, the sequence PKKKRKV in the SV40 Large T-antigen is considered to be a monopartite NLS. The NLS of nucleoplasmin, KR[PAATKKAGQA]KKKK, is an example of a bipartite signal, wherein two clusters of basic amino acids are present, separated by a spacer of about 10 amino acids. It is noted that this spacer may be variable in length. Examples of nuclear localization signals are, but are not limited to the nuclear localization signals of SV40 large T-Antigen (monopartite; PKKKRKV or CGGGPKKKRKVED), c-myc (monopartite; PAAKRVKLD), and nucleoplasmin (bipartite; AVKRPAATKKAGQAKKKKLD or KRPAATKKAGQAKKKK); EGL-13 (monopartite; MSRRRKANPTKLSENAKKLAKEVEN) and TUS-protein (monopartite; KLKIKRPVK). In another example, the nuclear localization signals (NLSs) are classical NLSs (cNLS) or proline-tyrosine (PY)-NLS. In yet another example, the nuclear localization signals (NLSs) are monopartite or bipartite NLSs. In a further example, the nuclear localization signal is, but is not limited to, the nuclear localization signal of the Large T-antigen of the Simian Vacuolating Virus 40 (SV40), nucleoplasmin, importin , EGL-13, c-MYC, TUS, AR, PLSCR1, PEP, TPX2, RB, TP53, N1N2, PB2, CBP80, SRY, hnRNP A1, HRP1, Borna Disease Virus p10, Ty1 Integrase, and the Chelsky consensus sequence. As used herein, in regards to NLS, the term signal and sequence is used interchangeably. In yet another example, the nuclear localization sequence (NLS) is SEQ ID NO: 5 or SEQ ID NO: 6.
[0056] There are many other types of NLS, such as the acidic M9 domain of hnRNP A1, the sequence KIPIK in yeast transcription repressor Mat2, and the complex signals of U snRNPs. Most of these NLSs appear to be recognized directly by specific receptors of the importin family without the intervention of an importin -like protein and are therefore considered to be non-classical nuclear localization sequences. Another example of a localization sequence is mitochondrial targeting signal, which is a 10 to 70 long peptide that is usually present at the end of nascent proteins and which directs these nascent proteins to the mitochondria. It is usually found at the N-terminus and comprises of an alternating pattern of hydrophobic and positively charged amino acids, thereby usually forming an amphipathic helix. Mitochondrial targeting signals can also contain additional signals that subsequently direct the protein to different regions of the mitochondria, for example the mitochondrial matrix. Like many signal peptides, mitochondrial targeting signals may and are usually cleaved in vivo once targeting is complete. Yet another example of a non-classical nuclear localization protein is a proline tyrosine nuclear localization protein, so named for the presence of a PY-NLS motif, which is a proline-tyrosine amino acid pairing which allows the protein to bind to, for example, importin 2, and thereby facilitating its transport. Therefore, in another example, the localization sequence is a nuclear localization sequence, mitochondrial localization sequence or derivatives thereof. In one example, the mitochondrial localization sequence (MLS) is, but is not limited to, ATP5B, SOD2, COX8A, OTC, or TFAM. In another example, the mitochondrial localization sequence (MLS) is, but is not limited to, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10 or SEQ ID NO: 11.
[0057] Thus, in one example, the construct as disclosed herein further comprises at least one localization sequence. In another example, the construct as disclosed herein comprises one or more localization sequences.
[0058] In terms of artificially generated fusion proteins, it is possible to attach various modifications, such as, for example, localization sequences, binding tags, selectable markers, optical markers and the like, to either the N-terminus, the C-terminus or both the N- and C-termini of a fusion peptide. This is possible even if in nature, for example, localization signals are usually found at the N-termini of proteins, as these are generally added towards the end of protein translation/expression. Therefore, the presently claimed construct can have one or more of said modifications at each terminus of the protein, provided the functionality of the modification is retained. That is, if, example, a localization signal is required to work in a biological setting in vitro, for example in protein overexpression, then the localization protein needs to be at the N-terminus of the protein, in accordance to its usual position in nature. The same can be said of other modifications, for example binding tags. Thus, in one example, the binding tag is located at either the N-terminus or the C-terminus of the construct, or at both ends of the construct. In another example, the binding tag is located at the N-terminus of the construct.
[0059] Protein or binding tags are peptide sequences which can be genetically added to the sequence of a recombinant protein prior to expression. Often, these tags are removable, and are intended to be so, by for example chemical agents or by enzymatic means, such as proteolysis or intein splicing, or by changing the physic-chemical environment of the protein, such as changing the pH value, certain solute concentrations in solution or a change of aqueous to non-aqueous solution. Binding tags are attached to proteins for various purposes, for example, but not limited to, purification via affinity, chromatographic purification, solubilization, detection (optical, immunological or otherwise), protein binding assays or to allow certain modifications of the protein, for example enzymatic modifications, or chemical modifications. Such binding tags may also be attached as multiples to the terminus of the protein in question, for example a single His-tag (HIS) may also be used as a triple His-tag (3HIS) or a sextuple His-tag (6HIS). Thus, in one example, the construct as described herein comprises a binding tag. In another example, the binding tag is, but is not limited to, a V5 epitope tag, a FLAG tag, a tandem FLAG-tag, a triple FLAG tag (3FLAG), a Human influenza hemagglutinin (HA) tag, a tandem HA tag, a triple HA tag (3HA), a sextuple Histidine tag (6HIS), biotin, c-MYC, a Glutathione-S-transferase (GST) tag, a Strep-tag, a Strep-tag II, an S-tag (a peptide derived from pancreatic ribonuclease A (RNase A)), a natural histidine affinity tag (HAT), a Calmodulin-binding peptide (CBP) tag, a Streptavidin-binding peptide (SBP) tag, a Chitin-binding domain, a Maltose-binding protein (MBP) or derivatives thereof. In one example, the construct comprises a V5 epitope tag. In another example, the V5 epitope tag sequence is SEQ ID NO: 12 or derivatives thereof.
[0060] In one example, the construct, as disclosed herein, includes a self-cleaving peptide. Self-cleaving peptides, first discovered in picornaviruses, are peptides of between 19 to 22 amino acids in length and are usually found between two proteins in some members of the picornavirus family Using self-cleaving proteins, picornaviruses are capable of producing equimolar levels of multiple genes from the same mRNA. Having said that, such self-cleaving proteins are known to be found in other species of viruses and a person skilled in the art, based on the information provided herein, will be readily able to determine a suitable substitution for the self-cleaving protein disclosed herein, if required. The term self-cleaving, as used in the art, is not entirely accurate, as, without being bound by theory, these self-cleaving peptides are thought to function by inducing the ribosome to skip the synthesis of a peptide bond at the C-terminus of a 2A element, leading to separation between, for example, the end of the 2A sequence and the next peptide downstream. The cleavage of the peptide occurs between the glycine and proline residues found on the C-terminus of the resulting peptide, meaning the upstream cistron will have a few additional residues added to the end, while the downstream cistron will start with the proline residue. Thus, in one example, the construct as described herein comprises a self-cleaving peptide. In another example, the self-cleaving peptide is, but is not limited to, a 2A self-cleaving peptide. In another example, the 2A self-cleaving peptide is SEQ ID NO: 13 or derivative thereof.
[0061] As used herein, the term selectable marker refers to a marker that can be added to the peptide in question for selection purposes. The type of detection required would then dictate the type of marker that may be used. Thus, in one example, the construct as described herein comprises a selectable marker. In another example, the selectable marker is, but is not limited to, an imaging marker, a cell-surface marker, an antibiotic, an antibiotic resistance marker or derivatives thereof.
[0062] For example, if it is required to optically select the peptide in question, one choses an optical marker or an imaging marker, that is a marker that is capable of optical detection. Examples of such an optical or imaging marker are, but are not limited to, green fluorescent protein (GFP), enhanced green fluorescent protein (eGFP), superfold green fluorescent protein, red fluorescent protein (RFP), mCherry, orange fluorescent protein (OFP), cyan fluorescent protein (CFP), enhanced cyan fluorescent protein (eCFP), Cerulean, enhanced blue fluorescent protein (eBFP), yellow fluorescent protein (YFP), enhanced yellow fluorescent protein (eYFP), Venus, far-red fluorescent protein or derivatives thereof. If selection via, for example, resistance to a certain compound is required, an antibiotic resistant marker can be included in the peptide. Examples of such an antibiotic resistant marker are, but are not limited to, a drug-resistant cassette for puromycin, a drug-resistant cassette for blasticidin, a drug-resistant cassette for zeocin, a drug-resistant cassette for G418, a drug-resistant cassette for hygromycin B, a drug-resistant cassette for ampicillin, a drug-resistant cassette for kanamycin, a drug-resistant cassette for chloramphenicol, and derivatives thereof. Such selection markers are usually added to the genetic sequence for the protein in question and are therefore expressed concurrently when the protein is expressed.
[0063] A cell-surface marker is a protein that is usually found on the surface of the cell, which can be used to characterize a cell type and/or differentiate between different cell (sub)types. Such cell-surface markers can also include glycoproteins. One example of cell-surface markers are proteins that are named after the so-called cluster of differentiation. This cluster of differentiation is used to catalogue the various epitopes (hence, proteins) present on a cells surface, which are used as targets for, for example, monoclonal antibodies. The epitopes are then numbered and named CDX, with the X denoting a running catalogue number. Therefore, it is possible to positively identify a various cell types using one or more CD markers. In one example, the cell-surface marker is, but is not limited to, CD3, CD4, CD8, CD11a, CD11b, CD14, CD15, CD16, CD19, CD20, CD22, CD24, CD25, CD30, CD31, CD34, CD38, CD56, CD61, CD91, CD117, CD45, CD114, CD182, Foxp3 or derivatives thereof.
[0064] The present disclosure describes constructs, the general formula of which is according to formula I as shown below:
##STR00001##
wherein the alphabets denote positions within the peptide sequence. In one example, A is absent, or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the binding tag. In another example, B is the localization sequence, or derivatives thereof, or the binding tag, or absent. In another example, C.sub.1 and C.sub.2 are each independently any one of the localization sequences or derivatives thereof, or the mutated hormone binding domains of the estrogen receptor (ERT2). In yet another example, in the event that C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), then C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2). In another example, C.sub.2 is absent. In a further example, X is CRISPR-associated endonuclease or a derivative thereof. In yet another example, D is a mutated hormone binding domain of the estrogen receptor (ERT2), or the localization sequences or derivatives thereof. In one example, E is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the self-cleaving peptide. In another example, F is absent or is the self-cleaving peptide, or the selectable marker. In yet another example, G is absent or is the selectable marker.
[0065] In the above structure, the terms L.sup.1 to L.sup.8 denote linker sequences between the positions within the peptide sequence. In one example, any of the linker sequences L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 or L.sup.8 are absent. In another example, one or more of the linker sequences L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 or L.sup.8 are absent. In yet another example, the linker sequences are between 1 to 5, between 4 to 8, between 5 to 10, between 10 to 20, between 20 to 25 or 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids in length.
[0066] A peptide can comprise natural amino acids, unnatural amino acids, or a combination of both unnatural and natural amino acids. As used herein, the term natural amino acid refers to proteinogenic amino acids, which are amino acids that are precursors to proteins. These amino acids are assembled during translation to result in a nascent protein. Presently, there are 23 proteinogenic amino acids known, 20 of which are found in the standard genetic code, along with an additional 3 amino acids (selenocysteine, pyrrolysine and N-formylmethionine) that can be incorporated into the peptide using special translation mechanisms. Humans are capable of synthesizing 12 of these from each other or from other molecules of intermediary metabolism. The other nine must be consumed (usually as their protein derivatives), and so they are called essential amino acids. The essential amino acids are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine (i.e. H, I, L, K, M, F, T, W, V). Unnatural, that is non-proteinogenic amino acids, are amino acids that are not naturally encoded or that are not found in the genetic code of any organisms. These unnatural amino acids, however, can be found in, for example, as intermediates in biosynthesis, post-translationally incorporated into protein, as components of, for example bacterial cell walls, neurotransmitters and toxins, and for example in natural and man-made pharmacological compounds. Thus, in one example, the linker sequences comprise natural or unnatural amino acids, or combinations of both. In another example, one or more, or all of the linker sequences comprise the amino acids A, E, G, P, S and T. In yet another example, one or more, or all of the linker sequences consist of the amino acids A, E, G, P, S and T. In one example, in the event that the linker sequence is absent, the neighbouring substituents then are bound by a peptide bond. In another example, the linker sequence L.sup.1 is any one of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG or TGGGS. In another example, the linker sequence L.sup.2 is absent or, independently, any one of PRGGS, GGSPRGGS, PR, GGSPRGGS or TPGGPRGGS. In another example, the linker sequence L.sup.3 is any one of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA or GASGS. In yet another example, the linker sequence L.sup.4 is GGGS or absent. In a further example, the linker sequence L.sup.5 is any one of PAG or PAGGGS. In yet another example, the linker sequence L.sup.6 is GA or absent.
[0067] In the present disclosure, the terms polypeptide, peptide, and protein are used interchangeably. As used herein, the term peptide thus refers to a chain of amino acids which are connected via amide bonds. When the amino acids are alpha-amino acids, either the L-optical isomer or the D-optical isomer can be used, the L-isomers being preferred in nature. The term polypeptide or protein as used herein encompasses any amino acid sequence and includes, but may not be limited to, modified sequences such as glycoproteins. The term polypeptide is specifically intended to cover naturally occurring proteins, as well as those that are recombinantly or synthetically produced.
[0068] In one example, the structure is according to formula I, wherein A is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the binding tag; wherein B is the localization sequence or derivatives thereof, or the binding tag, or absent; wherein C.sub.1 and C.sub.2 are each independently any one of the localization sequences or derivatives thereof, or the mutated hormone binding domains of the estrogen receptor (ERT2); wherein when C.sub.1 is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); or wherein C.sub.2 is absent; wherein X is CRISPR-associated endonuclease or a derivative thereof; wherein D is a mutated hormone binding domain of the estrogen receptor (ERT2), or the localization sequence or derivatives thereof; wherein E is absent or is a mutated hormone binding domain of the estrogen receptor (ERT2), or the self-cleaving peptide; wherein F is absent or is the self-cleaving peptide, or the selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 and L.sup.8 are linker sequences; wherein any of the linkers L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 or L.sup.8 are absent; wherein the linkers sequences are between 1 to 5, between 4 to 8, between 5 to 10, between 10 to 20, between 20 to 25 or 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 amino acids long; wherein the linker sequences comprise the natural or unnatural amino acids; wherein the linker sequences comprise the amino acids A, E, G, P, S and T; wherein the linker sequences consist of the amino acids A, E, G, P, S and T; wherein if undefined, the linker sequence is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is any one of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG or TGGGS; wherein L.sup.2 is absent or any one of PRGGS, GGSPRGGS, PR, GGSPRGGS or TPGGPRGGS; wherein L.sup.3 is any one of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA or GASGS; wherein L.sup.4 is GGGS or absent; wherein L.sup.5 is any one of PAG or PAGGGS; wherein L.sup.6 is GA or absent, wherein L.sup.7 and L.sup.8 are independently selected from the linkers as disclosed in any of L.sup.1 to L.sup.6. In one example, A is absent. In another example, A is a mutated hormone binding domain of the estrogen receptor (ERT2). In yet another example, A is a binding tag.
[0069] In one example, B is the binding tag. In another example, B is a localization sequence.
[0070] In one example, C.sub.1 is the localization sequence. In another example, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2).
[0071] In one example, C.sub.2 is absent.
[0072] In one example, D is the localization sequence. In another example, D is a mutated hormone binding domain of the estrogen receptor (ERT2).
[0073] In one example, E is the self-cleaving peptide. In another example, E is the mutated hormone binding domain of the estrogen receptor (ERT2). In yet another example, E is absent.
[0074] In one example, F is the selectable marker. In another example, F is the self-cleaving peptide.
[0075] In one example, G is absent. In another example, G is the selectable marker.
[0076] In one example, X is the CRISPR-associated endonuclease or derivative thereof.
[0077] In a further example, A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent. In yet another example, A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent. In a further example, A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent. In yet another example, wherein D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent. In a further example, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker. In one example, D and E are each one mutated hormone binding domain of the estrogen receptor (ERT2). In another example, D is the mutated hormone binding domain of the estrogen receptor (ERT2) and E is absent. In yet another example, A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent. In a further example, A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the self-cleaving peptide, F is the selectable marker and G is absent. In yet another example, A is absent, B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker. In one example, A is the mutated hormone binding domain of the estrogen receptor (ERT2), B is the binding tag, C.sub.1 is the localization sequence and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker. In another example, A is the binding tag, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent, X is the CRISPR-associated endonuclease or derivative thereof, D is the localization sequence, E is the mutated hormone binding domain of the estrogen receptor (ERT2), F is the self-cleaving peptide and G is the selectable marker.
[0078] In another example, the construct comprising the following formula (II):
##STR00002##
wherein A is absent; wherein B is a localization sequence or derivatives thereof, or the binding tag;
[0079] wherein both C.sub.1 and C.sub.2 are present or only C.sub.1 is present; wherein C.sub.1 and C.sub.2 are each independently selected from the group consisting of the localization sequence, derivatives thereof of the localization sequence, and a mutated hormone binding domain of the estrogen receptor (ERT2); wherein when C is one mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is another mutated hormone binding domain of the estrogen receptor (ERT2); wherein X is a CRISPR-associated endonuclease or a derivative thereof; wherein D is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2), the localization sequence and derivatives of the localization sequence; wherein E is absent or is selected from the group consisting of a mutated hormone binding domain of the estrogen receptor (ERT2) and a self-cleaving peptide; wherein F is absent or is selected from the group consisting of the self-cleaving peptide, the mutated hormone binding domain of the estrogen receptor (ERT2) and a selectable marker; wherein G is absent or is the selectable marker; wherein L.sup.1, L.sup.2, L.sup.3, L.sup.4, L.sup.5, L.sup.6, L.sup.7 and L.sup.8 are linker sequences; wherein at least one of the linker sequences is present; wherein each of the linkers sequences is independently between 1 to 25 amino acids long; wherein each linker sequence independently comprises natural or unnatural or a mixture of natural and unnatural amino acids; wherein the linker sequences comprise the amino acids A, E, G, P, S and T; wherein, if any one or more of the linker sequences of L.sup.1 to L.sup.8 is absent, the neighbouring substituents are bound by a peptide bond; wherein L.sup.1 is selected from the group consisting of PR, TG, TGPGPGGS, TGPGPGGSAGDTTGPGTGPG, TGPGGS, TGPGGSAGDTTGPGGS and TGGGS; wherein L.sup.2 is selected from the group consisting of PRGGS, GGSPRGGS, PR, GGSPRGGS and TPGGPRGGS; wherein L.sup.3 is selected from the group consisting of PG, SGSEGA, GASGSKTPG, SGSETPGTSESAGA, SGSETPGTGPGGA, SESATPESGA, GTSESATPESGGA, GGSGGSGA, GA, GGGS, TPESGA, SGSETPGTGA, SGSETPGTSEGA, PAG, PAGGGS, SGSETPGTPGGA, TPESGPGGA and GASGS; wherein L.sup.4 is GGGS; wherein L.sup.5 and L.sup.7 are independently PAG, SGS or PAGGGS; wherein L.sup.6 is GA; wherein L.sup.8 is selected from the linkers as disclosed in any of L.sup.1 to L.sup.6.
[0080] In one example, B is the localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2) and C.sub.2 is absent. In another example, D is a localization sequence and E and F are each a mutated hormone binding domain of the estrogen receptor (ERT2). In yet another example, A is absent, B is localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence, E is a mutated hormone binding domain of the estrogen receptor (ERT2) and F is absent. In a further example, B is a localization sequence, C.sub.1 is the mutated hormone binding domain of the estrogen receptor (ERT2), C.sub.2 is absent, X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2). In another example, B is a localization sequence, C.sub.1 and C.sub.2 are each independently a mutated hormone binding domain of the estrogen receptor (ERT2), X is a CRISPR-associated endonuclease or a derivative thereof, D is localization sequence and E and F are both each a mutated hormone binding domain of the estrogen receptor (ERT2).
[0081] In one example, the construct, as disclosed herein, has at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to any one of SEQ ID NOs: 15 to 74. In another example, the construct, as disclosed herein, has a sequence identity of between 80% to 95% to any one of SEQ ID NOs: 15 to 74. In yet another example, the construct has a sequence identity of at least 90% to any one of SEQ ID NOs: 15 to 74.
[0082] As used herein, the term variant includes a reference to substantially similar sequences. Generally, nucleic acid sequence variants of the invention encode a polypeptide which retains qualitative biological activity in common with the polypeptide encoded by the non-variant nucleic acid sequence. Generally, polypeptide sequence variants of the invention also possess qualitative biological activity in common with the non-variant polypeptide. Further, these polypeptide sequence variants may have at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% sequence identity to the non-variant peptide. Variants may be made using, for example, the methods of protein engineering and site-directed mutagenesis as is well known in the art. Further, a variant peptide or protein may include analogues, wherein the term analogue, as used herein, with reference to a peptide, means a peptide which is a derivative of a peptide of the invention, whereby the term derivative comprises a polypeptide that has addition, deletion, substitution of one or more amino acids compared to the non-variant peptide, such that the polypeptide retains substantially the same function as the non-variant peptide. The substitution may be one or more conservative amino acid substitutions. The term derivative or derivation also refer to compounds other than amino acids, which have been modified from the original compound. In some example, these derivatives retain the same or have increased desired function. In regards to chemical compounds, the term derivative refers to a chemical substance derived from another substance, either directly or by modification or partial substitution. In this case, chemical derivatives but do not necessarily retain their original function. The term conservative amino acid substitution as used herein refers to a substitution or replacement of one amino acid for another amino acid with similar properties within a peptide chain (primary sequence of a protein). For example, the substitution of the charged amino acid glutamic acid (Glu) for the similarly charged amino acid aspartic acid (Asp) would be a conservative amino acid substitution. Conservative amino acid substitution tables providing functionally similar amino acids are well known to one of ordinary skill in the art. The following six groups are examples of amino acids that are considered to be conservative substitutions for one another:
1) Alanine (A), Serine (S), Threonine (T);
[0083] 2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
[0084] A non-conservative amino acid substitution can result from changes in: (a) the structure of the amino acid backbone in the area of the substitution; (b) the charge or hydrophobicity of the amino acid; or (c) the bulk of an amino acid side chain. Substitutions generally expected to produce the greatest changes in protein properties are those in which: (a) a hydrophilic residue is substituted for (or by) a hydrophobic residue; (b) a proline is substituted for (or by) any other residue; (c) a residue having a bulky side chain, e.g., phenylalanine, is substituted for (or by) one not having a side chain, e.g., glycine; or (d) a residue having an electropositive side chain, e.g., lysyl, arginyl, or histadyl, is substituted for (or by) an electronegative residue, e.g., glutamyl or aspartyl.
[0085] As used herein, the term mutation or grammatical variants thereof, in general, relates to an altered genetic sequence which results in the gene coding for a non-functioning protein, or a protein with substantially reduced, or altered function. The term mutation also relates to a modification of the genome or part of a nucleic acid sequence of any biological organism, virus or extra-chromosomal genetic element, or any genetic element that has been included in the nucleic acid sequence of a fusion protein. The mutation can be performed by replacing one nucleotide by another in the nucleic acid sequence of any of the genetic elements, thus creating a different amino acid in the position where the nucleotide was replaced. The techniques in order to achieve such mutations are well known to a person skilled in the art. For example, the mutation can be induced artificially using, but not limited to, chemicals, PCR reactions, and radiation. When artificially created, in the context of the invention, a mutation is by extension, the replacement of an amino acid encoded by a given nucleic acid sequence to another amino acid in a nucleic acid sequence or a genetic element. Thus, the section of the construct, as disclosed herein, containing the full, unchanged sequences for, for example, the hormone binding domain of the estrogen receptor (ERT2), would be considered to contain the wild type hormone binding domain of the estrogen receptor (ERT2), while sections of the construct carrying a mutation in the hormone binding domain of the estrogen receptor (ERT2) are termed mutated hormone binding domain of the estrogen receptor (ERT2).
[0086] The present disclosed describes constructs for the expression of fusion proteins having the desired capability of genome engineering, that is genome editing. In order for such fusion proteins to be expressed, the constructs, as disclosed herein, need to be brought into a cell for protein expression. Thus, in one example, a host cell is transfected with the nucleic acid sequence as described herein, thereby resulting in the expression of the desired protein within the cell. In another example, the transfection is done via nucleofection or electroporation. In another example, the present disclosure describes a nucleic acid sequence encoding any one of the constructs as disclosed herein. In yet another example, there is disclosed a vector comprising the nucleic acid sequence of a construct as disclosed herein. In a further example, a host cell comprising the vector as disclosed herein is described. In one example, the host cell is a mammalian cell. In another example, the mammalian cell is, but is not limited to, mouse, horse, sheep, pig, cow, hamster or human. In another example, the host cell is bacterial.
[0087] Any or all of the components, as described herein, may be provided in the form of a kit. Thus, in one example, a kit comprising the construct as disclosed herein and an inducing agent is described. In another example, the kit comprises tamoxifen as an inducing agent, and/or a derivative thereof.
[0088] Described herein are also methods for using the claimed construct for genome editing. Thus, in one example, there is disclosed a method of editing a genome of a host cell using the construct as disclosed herein, wherein the host cell, comprising the nucleic acid sequence are as defined herein, is incubated with an inducing agent. Also disclosed herein is a method of editing a genome of a host cell using the construct as defined herein, wherein the method comprises transfecting the host cell with the nucleic acid sequence as defined herein; and incubating the cell with an inducing agent. IN another example, the transfection can be done using, for example, nucleofection, or electroporation.
[0089] The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms comprising, including, containing, etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.
[0090] The invention has been described broadly and generically herein. Each of the narrower species and sub-generic groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
[0091] Other embodiments are within the following claims and non-limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
EXPERIMENTAL SECTION
[0092] The CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 system enables ready modification of the mammalian genome and has been used to generate single or multiplexed gene knockouts, introduce specific point mutations, or insert epitope tags. However, there is a lack of generalizable methods to rapidly control the activity of the Cas9 endonuclease.
[0093] Disclosed herein is the development of a Cas9 variant, whose activity can be switched on and off in mammalian cells, for example, human cells, using an inducing agent, for example the chemical tamoxifen. Fusions of the wildtype Cas9 enzyme with the mutated hormone-binding domain of the estrogen receptor (ERT2) were generated. Furthermore, these Cas9 variant were systematically engineered by varying the position of ERT2 relative to Cas9, altering the number of ERT2 copies at the N- or C-terminus of Cas9, and testing different linker lengths and compositions. The optimized Cas9 variant (iCas) shows minimal endonuclease activity in the absence of tamoxifen but exhibits high editing efficiencies at multiple loci when the inducing agent is added. The duration and concentration of the inducing agent, for example tamoxifen, were also tuned so as to eliminate off-target genome modification. Additionally, iCas was utilised to target the Wnt signalling pathway and demonstrated that genome modification and signalling perturbation occurred much more rapidly than an alternative system that relied on a doxycycline-inducible promoter to drive Cas9 expression. The results highlight the utility of iCas for tight spatiotemporal control of genome editing activity.
Initial Development of a Chemical-Inducible Cas9 Variant
[0094] Different fusions of the ERT2 domain with wildtype Cas9 derived from the bacterium Streptococcus pyogenes (
Optimization of the ERT2-Cas9-ERT2 Architecture
[0095] All the initial fusion variants tested showed some background activity without tamoxifen, especially at the EMX1 exonic site and one of the VEGFA promoter sites. Hence, it was sought to develop the conditional genome editing system further. First, the lengths and amino acid compositions of the protein linkers between each ERT2 domain and the Cas9 enzyme were varied. Linker lengths that were tested ranged from 2 to 20 amino acids and the main focus was on the linker composition primarily of six amino acids (A, E, G, P, S, and T), which had previously been reported to be ideal for generating open flexible loops, and therefore polypeptides in stable conformations. Second, since the size of Cas9 is around four times that of Cre (160 kDa versus 40 kDa), it was reasoned that more copies of the ERT2 domain may be required to fully control the cellular localization and subsequent activity of the Cas9 nuclease. Thus, different copy numbers of ERT2 at either the N- or C-terminus of Cas9 were tested. In total, 30 variants with distinct configurations (
[0096] To assay the activities of all the Cas9 variants, a green fluorescent protein (GFP) disruption assay was employed, whereby cleavage and erroneous repair of a constitutively expressed GFP gene in HEK293 cells causes a loss of fluorescence signal which can be detected by flow cytometry (
[0097] To confirm the results of the GFP disruption experiments, the T7 endonuclease I Surveyor assay was performed to detect genome modifications (
[0098] Next, all data was examined together to identify the best performing variants. The rank orders of the Cas9 variants in at least two out of the three assays agreed well with one another (P<0.05, Kolmogorov-Smirnov test) (
Characterization and Performance of iCas Under Different 4-hydroxytamoxifen Treatment Regimes
[0099] In previous experiments, HEK293 cells had been transfected with the relevant plasmids, incubated for 24 hours, and then treated the cells with 1 M tamoxifen for another 24 hours. However, as the amount of Cas9 in the cell has to be tightly controlled, it was sought to ascertain the behaviour of the optimized Cas9 variants under various treatment conditions, because insufficient Cas9 will give rise to inefficient cleavage of the target genomic locus, while excess Cas9 may lead to unintended non-specific cleavage of off-target sites. Hence, the aim was to ascertain the behaviour of the optimized Cas9 variants under a range of tamoxifen treatment conditions, which would in turn determine the level of nuclease activity in the cell.
[0100] Three different concentrations of 4-hydroxtamoxifen (10 nM, 100 nM, and 1000 nM) and six durations of chemical treatment (2 hours, 4 hours, 6 hours, 8 hours, 16 hours, 24 hours, and 48 hours) were tested for variants 27, 30, and 29. The amount of genome modification at the EMX1 locus was quantified using the Surveyor assay (
[0101] A key performance measure of an inducible system is whether the system exhibits any background activity in the absence of the inducer. Surveyor assay showed a low amount of genome modification at the EMX1 locus for all three variants without 4-hydroxytamoxifen treatment (0 nM). Leaky activity per se was observed only at the last time point (48 hours) for Variant 30 (
[0102] At 24 hours after transfection with a FANCF-targeting plasmid, cells were treated with or without tamoxifen for another 24 hours before genomic DNA was isolated and analysed by the Surveyor assay (
[0103] To verify the results from the Surveyor assays and deep sequencing experiments, immunohistochemical staining was performed to determine the subcellular localization of the three variants, all of which contained two copies of ERT2 at both termini of the enzyme ((ERT2)2-Cas9-(ERT2)2), with or without 1 M 4-HT. 24 hours after transfection with plasmids carrying a Cas9 variant and a sgRNA targeting the EMX1, VEGFA, FANCF, WAS, or TAT genomic locus, the cells were either fixed immediately and stained with anti-V5 or were subjected to 6 h or 24 h 4-HT treatment before fixation and staining (
[0104] It was sought to test the robustness of iCas by using it to target the VEGFA promoter as well as the WAS, TAT, and FANCF genes for different durations of 1 M 4-HT treatment (2 hours, 4 hours, 6 hours, 8 hours, 16 hours, and 24 hours). Consistently, the Surveyor assay showed nuclease activity within 4 hours of 4-hydroxytamoxifen treatment for all loci tested (
Specificity of iCas at Endogenous Off-Target Sites
[0105] To assess the DNA cleavage specificity of iCas, the modification of known Cas9 off-target sites of the EMX1, VEGFA, FANCF, WAS, and TAT sgRNAs was measured. Twenty-four hours after transfection, HEK293 cells were treated with 1 M 4-hydroxytamoxifen for different durations (4 hours, 6 hours, 8 hours, 16 hours, and 24 hours) and used the Surveyor assay to assess editing activity at each off-target site (
Comparison of iCas with a Promoter-Based Approach
[0106] As different methods may be adopted for inducible genome editing, iCas was compared with an alternative strategy whereby the wild-type Cas9 enzyme was expressed under a doxycycline (dox)-inducible promoter (P.sub.TRE3G-Cas9). To this end, a previously reported STF3A cell line that carries a Wnt-responsive luciferase reporter and also strongly expresses a Wnt ligand was used, thereby giving high reporter activity. It was reasoned, without being bound by theory, that if -catenin, a key signal transducer in the Wnt pathway, was inactivated, luciferase expression would be reduced considerably. Thus, it was sought to use iCas or PTRE3G-Cas9 to knock out CTNNB1, which encodes -catenin, and to determine how rapidly each conditional system could perturb Wnt signalling upon induction. Firstly a gene encoding the Tet-On 3G transactivator, which binds to and activates expression from PTRE3G in the presence of doxycycline, was stably integrated into the STF3A cell line (
[0107] To demonstrate the impact of genome modification at the CTNNB1 locus, luciferase assays were performed on the STF3ATet-On cell line after transfection with iCas or P.sub.TRE3G-Cas9. Cells were treated for 6 hours with the respective chemical and then harvested after another 72 hours to allow sufficient time for changes in -catenin or luciferase protein levels. It was verified that both the transcript and protein levels of -catenin were downregulated in cells co-transfected with iCas and an CTNNB1-targeting sgRNA (
Benchmarking Different Post-Translational Control Systems
[0108] Two other chemical-inducible strategies that rely on post-translational control were recently reported, and it was sought to benchmark iCas against these other strategies. The best-performing intein-Cas9 and split-Cas9 constructs from these studies were cloned into the same plasmid backbone as iCas, and all experiments were performed side by side in HEK293 cells to ensure a fair comparison. The iCas and intein-Cas9 systems were induced with 1 M 4-hydroxytamoxifen and the split-Cas9 system with 200 nM rapamycin, on the basis of published reports. For the comparison, the EMX1, TAT, and WAS genomic loci were targeted with or without the appropriate inducer. Different durations of chemical treatment were tested, and the extent of genome modification was measured by the Surveyor assay (
[0109] Besides single gene targeting, the ability of iCas to perform multiplex genome engineering was compared with that of intein-Cas9 or split-Cas9. HEK293 cells were co-transfected with a sgRNA targeting EMX1 and another sgRNA targeting a coding exon of ADAR1 (ADAR), and subsequently the extent of genome modification was analysed by the Surveyor assay. After 12 hours of chemical treatment, it was observed that iCas generated INDELs at both the EMX1 and ADAR1 genomic loci (
Repeated Toggling of iCas Activity
[0110] In principle, a conditional system such as iCas should allow users to generate stable cell lines and induce its activity whenever needed. To demonstrate this, retroviral transduction was used to establish a HEK293 cell line that stably expresses iCas (HEK293-iCas cells). The cell line was verified to be functional (
[0111] Subsequently, the possibility of toggling the activity of iCas was explored (
Methods
Cell Culture and Transfection
[0112] All cell lines were cultured in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% FBS, 2 mM L-Glutamine and 1% penicillin/streptomycin. Transfection was performed in 12-well plates at around 70% cell confluency using either Turbofect (Thermo Scientific) or Lipofectamine 2000 (Life Technologies), according to manufacturers' instructions. When necessary, cells were treated with varying concentrations of 4-hydroxytamoxifen (Sigma Aldrich).
PCR and Mutagenesis
[0113] All oligonucleotides for PCR and mutagenesis reactions were purchased from Integrated DNA Technologies (IDT). PCR was performed with MyTaq DNA Polymerase (BioLine), Phusion High-Fidelity DNA Polymerase (New England Biolabs), or Q5 High-Fidelity DNA Polymerase (New England Biolabs). For MyTaq, the following cycling parameters were used: 95 C. for 3 minutes, followed by 35 cycles of (95 C. for 30 seconds, 60 C. for 30 seconds, and 72 C. for 30 seconds), and then 72 C. for 2 minutes. For Phusion and Q5, the following cycling parameters were used: 98 C. for 3 minutes followed by 40 cycles of (98 C. for 15 seconds, 63 C. for 30 seconds, and 72 C. for 30 seconds), and then 72 C. for 2 minutes. Mutagenesis was performed using QuikChange Lightning Site-Directed Mutagenesis kit (Agilent Technologies) according to manufacturer's instructions, in order to incorporate novel restriction sites or DNA linker fragments into the CRISPR-Cas9 variant plasmids. Mutagenic primers were designed using the QuikChange Primer Design Tool (http://www.genomics.agilent.com/primerDesignProgram.jsp).
Construction of Cas9 Variants
[0114] The GeneArt CRISPR nuclease vector (Life Technologies), which contains a human codon-optimized Streptococcus pyogenes Cas9 enzyme with a V5 epitope tag, was used as the wildtype Cas9 expression plasmid. The ERT2 domain was isolated using PCR from the pCAG-ERT2-Cre-ERT2 plasmid (Addgene #13777) and cloned into the pCR-BluntII-TOPO vector (Life Technologies). Different linkers and restriction sites were added using the QuikChange Lightning kit (Agilent Technologies). Each of the modified ERT2 fragment was flanked with either AgeI and SfoI or EcoRI and XbaI cut sites for cloning into the N- or C-terminus of Cas9 respectively. All Cas9 variants were confirmed by Sanger sequencing.
GFP Disruption Assay
[0115] HEK293-GFP stable cells were purchased from GenTarget. One day after seeding, cells were transfected using Lipofectamine 2000 (Life Technologies) according to manufacturer's instructions, with efficiency reaching at least about 70% per well. Experimental cells were treated with 1 mM 4-hydroxytamoxifen (Sigma Aldrich), while control cells remained in culture media devoid of tamoxifen. 5 days after transfection, cells were trypsinised and resuspended in PBS containing 2% FBS for analysis by flow cytometry. All the data were normalized to the average fluorescence intensity of cells transfected with a plasmid that did not express any sgRNA.
Generation of STF3A-TetOn Stable Cells
[0116] STF3A cells were modified to stably express the Tet-On 3G transactivator protein via retroviral transduction and drug selection. Briefly, to generate retroviruses, GP2-293 cells were transfected at around 70% confluence with a transfection mix comprising 20 g pCMV-VSVG envelope vector, 50 g pRETROX-TET3G vector (CloneTech), and 140 l Lipofectamine 2000 (LifeTechnologies) diluted in 3.75 ml Opti-MEM (Life Technologies) and 7.5 ml DMEM containing 10% FBS. The transfection mix was substituted with 10 ml DMEM containing 5% FBS after 6 hours of incubation at 37 C. Retrovirus-containing medium was harvested after 24 hours and purified using Amicon Ultra-15 Centrifugal Filter Units (Merck Millipore). STF3A cells were then infected twice with 20 l retroviruses each time and subsequently selected in DMEM containing 500 g/ml G418 over 5 days. To test the expression of the transactivator gene, STF3A-TetOn cells were transfected with 1 g pTRE-tdTomato vector (Addgene #50798) and observed for red fluorescence 24 hours after treatment with 1 g/ml doxycycline.
Luciferase Assay
[0117] STF3A-TetOn cells were transfected with 1 g iCas or pTRE3G-Cas9 and treated with 1 M tamoxifen or 1 g/ml doxycycline respectively for 6 hours. The cells were then trypsinised and re-seeded equally into a Corning 96-well flat clear bottom white plate. Samples were assayed for luciferase activity using Dual-Glo Luciferase (Promega) according to manufacturer's instructions. All measurements were taken using the i-control software for Tecan microplate readers. All firefly luciferase measurements were normalized to the corresponding renilla luciferase readings.
Surveyor Cleavage Assay
[0118] Genomic DNA was isolated from cells using the DNeasy Blood and Tissue Kit (Qiagen) and the loci-of-interest were amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs; see Table 3 for list of primers). The PCR products were purified using the GeneJET Gel Extraction Kit (Thermo Scientific). Subsequently, 250 ng DNA was incubated at 95 C. for 5 minutes in 1 NEBuffer 2 and then slowly cooled at a rate of 0.1 C./second. After annealing, 5U T7 endonuclease I (New England Biolabs) was added to each sample and the reactions were incubated at 37 C. for 50 minutes. The T7E1-digested products were separated on a 2.5% agarose gel stained with GelRed (Biotium) and the gel bands were quantified using ImageJ.
Illumina Deep Sequencing
[0119] Sequencing libraries were constructed via two rounds of PCR. In the first round, the loci-of-interest were amplified from genomic DNA using Q5 High-Fidelity DNA Polymerase (New England Biolabs) and the primers listed in Supplementary Table 4. Each forward primer contains the common sequence GCG TTA TCG AGG TC, while each reverse primer contains the common sequence GTG CTC TTC CGA TCT. In the second round, the PCR products from the first round were barcoded using Phusion High-Fidelity DNA Polymerase (New England Biolabs) and the following primers: ForwardAAT GAT ACG GCG ACC ACC GAG ATC TAC ACC CTA CAC GAG CGT TAT CGA GGT C; ReverseCAA GCA GAA GAC GGC ATA CGA GAT (barcode) GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T. 10 bp barcodes designed by Fluidigm for the Access Array System were used. All samples were sequenced on MiSeq (Illumina) to produce paired 151 bp reads.
Cell Fractionation
[0120] HEK293 cells were fractionated using the Rapid Efficient And Practical (REAP) method. Briefly, the cells were scraped in ice-cold PBS, collected into 1.5 ml Eppendorf tubes, and pop-spun for 10 seconds in a table-top centrifuge. The supernatant was discarded and the pellet was lysed with 0.1% Igepal CA630 (Sigma Aldrich) in PBS supplemented with protease inhibitor (Calbiochem). Whole cell lysates were aliquoted and the remainder was pop-spun for 10 seconds. The supernatant, comprising the cytosolic fraction, was collected into a new tube. The pellet, comprising the nuclear fraction, was resuspended using 0.1% Igepal CA630 in PBS with protease inhibitor. Whole cell lysates and nuclear fractions were subjected to 10 cycles of sonication (each cycle consisted of 30 seconds sonication followed by 30 seconds rest).
Western Blot Analysis
[0121] Proteins from whole cell lysates, nuclear fractions, and cytosolic fractions were loaded in equal amounts for SDS PAGE and then transferred onto a nitrocellulose membrane for western blot. The primary antibodies used were -V5 (Life Technologies, 1:8000 dilution), -3PGDH (Santa Cruz, 1:1000 dilution), and -total histone H3 (Abcam, 1:10000 dilution). Primary antibodies were diluted in TBST+5% milk and incubated overnight at 4 C. Secondary antibodies were used at a 1:2500 dilution in TBST+5% milk Membranes were exposed after addition of WesternBright Sirius HRP substrate (Advansta).
Immunohistochemistry
[0122] Paraformaldehyde-fixed HEK293 cells were first incubated with blocking solution (10% FBS in 0.1M PBS) (JR Scientific Inc) for 30 minutes and then quenched with 3% hydrogen peroxide. Next, the samples were incubated for 2 hours at room temperature or 4 C. overnight with primary antibody specific against the V5 epitope tag (Life Technologies) in blocking solution. Negative controls were incubated with blocking solution without any primary antibody. Subsequently, the samples were thoroughly washed with PBS and then incubated for 1 hour at room temperature with secondary horseradish peroxidase (HRP)-conjugated antibody (GE Healthcare UK Ltd). After further incubation with DAB substrate (Vector Laboratories) for 10 minutes at room temperature, the cover slips were washed with distilled water, counter-stained with hematoxylin (Vector Laboratories) for 10 minutes to reveal cellular material, and mounted onto glass slides (Thermo Scientific). All slides were viewed and imaged using a light microscope (Zeiss Axio Imager Z1 with attached Leica Axiocam MRc5 camera) with the appropriate filters.
Tables
[0123]
TABLE-US-00001 TABLE1 ListofCas9variantsconstructedandtestedAminoacidsforthe differentproteinlinkersaregiveninboldletters. No. Details 1 NLS-TG-ERT2-SGSETPGTSESAGA-Cas9-NLS-ERT2 2 NLS-TG-ERT2-SGSEGA-Cas9-NLS-ERT2 3 NLS-TG-ERT2-GGSGGSGA-Cas9-NLS-ERT2 4 NLS-TG-ERT2-GTSESATPESGGA-Cas9-NLS-ERT2 5 NLS-TG-ERT2-SGSETPGTGA-Cas9-NLS-ERI2 6 NLS-TG-ERT2-SESATPESGA-Cas9-NLS-ERT2 7 NLS-TGGGS-ERT2-SGSETPGTGA-Cas9-NLS-ERT2 8 NLS-TGGGS-ERT2-SGSETPGTPGGA-Cas9-NLS-ERT2 9 NLS-TG-ERT2-GASGSKTPG-Cas9-NLS-ERT2 10 NLS-TG-ERT2-TPESGA-Cas9-NLS-ERT2 11 NLS-TGPGGS-ERT2-GA-Cas9-NLS-ERT2 12 NLS-TGGGS-ERT2-SGSETPGTSEGA-Cas9-NLS-ERT2 13 NLS-TGGGS-ERT2-TPESGA-Cas9-NLS-ERT2 14 NLS-TGPGGSAGDTTGPGGS-ERT2-GA-Cas9-NLS-ERT2 15 NLS-TGGGS-ERT2-SESATPESGA-Cas9-NLS-ERT2 16 NLS-TGGGS-ERT2-SGSEGA-Cas9-NLS-ERT2 17 NLS-TG-ERT2-PG-Cas9-NLS-ERT2 18 NLS-TG-ERT2-GA-Cas9-NLS-SGS-ERT2 19 NLS-TG-ERT2-GA-Cas9-NLS-GGGS-ERT2 20 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2 21 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2 22 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-NLS-ERT2 23 NLS-TG-ERT2-GA-Cas9-NLS-ERT2-PAG-ERT2 24 NLS-TG-ERT2-GA-Cas9-NLS-ERT2-PAGGGS-ERT2 25 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAGGGS-ERT2 26 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAGGGS-ERT2 27 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAG-ERT2 28 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAG-ERT2 29 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAG-ERT2 30 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-NLS-ERT2-PAGGGS-ERT2
TABLE-US-00002 TABLE2 Non-specificoff-targetsitesinvestigatedinthisstudy. EMX1 Chr2:73160982 On GAGTCCGAGCAGAAGAAGAAggg Chr5:45359083 Off1 GAGTTAGAGCAGAAGAAGAAagg Chr15:44109747 Off2 GAGTCTAAGCAGAAGAAGAAgag VEGFAP1 Chr6:43737313 On GGGTGGGGGGAGTTTGCTCCtgg Chr15:65637553 Off1 GGATGGAGGGAGTTTGCTCCtgg Chr17:39796344 Off2 TAGTGGAGGGAGCTTGCTCCtgg Chr1:99347667 Off3 GGGGAGGGGAAGTTTGCTCCtgg VEGFAP2 Chr6:43737454 On GGTGAGTGAGTGTGTGCGTGtgg Chr9:16681608 Off1 AGTGAGTGAGTGTGTGTGTGggg Chr5:89440985 Off2 AGAGAGTGAGTGTGTGCATGagg Chr5:115434659 Off3 TGTGGGTGAGTGTGTGCGTGagg Chr22:37662840 Off4 GCTGAGTGAGTGTATGCGTGtgg WASI1 ChrX:48544569 On TGGATGGAGGAATGAGGAGTtgg Chr1:30597854 Off1 TGGATGGAGGGATGAGGAGTggg Chr2:242451414 Off2 GGGATGGAGGGATGAGGAGTggg Chr18:21810215 Off3 AGGAGGGAGGAATGGGGAGTtgg WASI2 ChrX:48544562 On CCCATCCATCCAGAGACACAggg ChrX:90817748 Off1 CTCTTCCACCCAGAGACACAggg TAT Chr16:71609818 On TCCTCCTGAGACTCCATACCtgg Chr6:12810776 Off1 CCATCCTGAGACTCCATACCtgg FANCF Chr11:22647354 On GGAATCCCTTCTGCAGCACCtgg Chr18:8707544 Off1 GGAACCCCGTCTGCAGCACCagg Chr10:43410014 Off2 GGAGTCCCTCCTACAGCACCagg Chr10:37953183 Off3 GGAGTCCCTCCTACAGCACCagg Chr17:78923961 Off4 AGAGGCCCCTCTGCAGCACCagg
TABLE-US-00003 TABLE3 PCRprimersusedfortheSurveyorcleavageassay. PrimerName PrimerSequence EMX1_On_Set1_FOR GCCCCTAACCCTATGTAGCC EMX1_On_Set1_REV GGAGATTGGAGACACGGAGA EMX1_On_Set2_FOR CTGTGTCCTCTTCCTGCCCT EMX1_On_Set2_REV CTCTCCGAGGAGAAGGCCAA EMX1_Off1_FOR TTGAGACATGGGGATAGAATCA EMX1_Off1_REV CAGGAATAGCCCTACAAAGGTG EMX1_Off2_FOR GTTCTGTAAACGCCGTAGCC EMX1_Off2_REV GGATGCAGTCTGCCTTTTTG PPP1R12C_On_Set1_FOR GTCTAACCCCCACCTCCTGT PPP1R12C_On_Set1_REV ACACCTAGGACGCACCATTC PPP1R12C_On_Set2_FOR CGGTTAATGTGGCTCTGGTT PPP1R12C_On_Set2_REV CGCACGGAGGAACAATATAAA VEGFA_Promoter1_On_Set1_FOR CTGGACACTTCCCAAAGGAC VEGFA_Promoter1_On_Set1_REV AGGGAGCAGGAAAGTGAGGT VEGFA_Promoter1_On_Set2_FOR TCACTGACTAACCCCGGAAC VEGFA_Promoter1_On_Set2_REV CTGAGAGCCGTTCCCTCTTT VEGFA_Promoter1_Off1_FOR GGGCTAGAGTGTAGTGGCACA VEGFA_Promoter1_Off1_REV GCCCTGTTTTCATCCTACACA VEGFA_Promoter1_Off2_FOR AAGTTGGGCAAGAGTCCAGA VEGFA_Promoter1_Off2_REV ACCAGCAGAGGAAGGGCTAT VEGFA_Promoter1_Off3_FOR TGCCATTTTTAAGCCATCAG VEGFA_Promoter1_Off3_REV AGCCCATTCTTTTTGCAGTG VEGFA_Promoter2_On_FOR CCAGATGGCACATTGTCAGA VEGFA_Promoter2_On_REV CCAAGGTTCACAGCCTGAAA VEGFA_Promoter2_Off1_FOR GCCGTCTGTTAGAGGGACAA VEGFA_Promoter2_Off1_REV GTCTTCCCCCAACCTCCAGT VEGFA_Promoter2_Off2_FOR GGCCCAATCTTAGTGTTTCAGA VEGFA_Promoter2_Off2_REV TGGTTAAAAGCAAAGGATGTGA VEGFA_Promoter2_Off3_FOR CCCTCGCTAGATACTGAGGAAA VEGFA_Promoter2_Off3_REV TGGCCAAGATAAGGAAACAAC VEGFA_Promoter2_Off4_FOR TGATTCCGCTGACACGTAAC VEGFA_Promoter2_Off4_REV TTCAGAGCCTCTCACCACCT WAS_Intron1-2_On_Set1_FOR CAGCCAATGAAGGTGAGTCC WAS_Intron1-2_On_Set1_REV GTGGATCCCACAAACCATTC WAS_Intron1-2_On_Set2_FOR AGGAATCAGAGGCAAAGTGG WAS_Intron1-2_On_Set2_REV TCCCATCAATTCATCCCTCT WAS_Intron1_Off1_FOR CTGTCCTCTCTGCAGGAACC WAS_Intron1_Off1_REV GTCTGGATCCCTGCATCACT WAS_Intron1_Off2_FOR CGAGGTTCCAGAATGCTCTT WAS_Intron1_Off2_REV GGGAGGCTAAACCCTGAAAC WAS_Intron1_Off3_FOR TCTTCAATGTTCCCCCACAT WAS_Intron1_Off3_REV AGGCTGCCATTGTCTGAAGT WAS_Intron2_Off1_Set1_FOR TCTCAGAGATACAAGGGAAATCG WAS_Intron2_Off1_Set1_REV CCAGCAGACTCTGGGTCTATTTA WAS_Intron2_Off1_Set2_FOR TACAAGGGAAATCGTGAGACC WAS_Intron2_Off1_Set2_REV AGTCAGCATGCAGATTCTGGT TAT_On_FOR GACAACATGAAGGTGAAACCAA TAT_On_REV GTCAAAGAAAGCCAGGAAAGAA TAT_Off1_FOR TGTGGTTGGTTGGTTTGTTG TAT_Off1_REV GTGACCAAGCAGGCTCTTTC FANCF_On_FOR ACCTCTTTGTGTGGCGAAAG FANCF_On_REV CCAGGCTCTCTTGGAGTGTC FANCF_Off1_FOR CAGACTTCACCACCATGCAC FANCF_Off1_REV GGCCAGTCCTTTGTAAGCAT FANCF_Off2_FOR AATGTAAGAGGCAACCAAAGGA FANCF_Off2_REV GTTAATGGAAGGTGAAGGCAGT FANCF_Off3_FOR AATGCAAGAGGCAAACAAAAA FANCF_Off3_REV CCAACATCTTCACAAGGGTTC FANCF_Off4_FOR CAACCTTCATCCTTGGCTTG FANCF_Off4_REV GAGACAGAGCCATGCAACCTA CTNNB_1_On_FOR GCCACCAGCAGGAATCTAGT CTNNB_1_On_REV TCAAAACTGCATTCTGACTTTCA ADAR1_On_FOR GGGCAGGAACCTGTCATAAA ADAR1_On_REV CCCTTGTTCAGCCAAGATTC TCF7_On_FOR TTCCTTCCCAAGTCAGGAACT TCF7_On_REV TATGGGAGAAAAGACCAGCAC PARP4_On_FOR GGACTTCCAGCTTTTTGCAC PARP4_On_REV TTGCTCTCGGGATTTTAGGA ASXL2_On_FOR CATGGCAGCCCCTTTCTAT ASXL2_On_REV GCCTGGCCATAAGTCATTTT
TABLE-US-00004 TABLE4 PCRprimersusedformakingIlluminasequencinglibraries. PrimerName PrimerSequence EMX1_On_Adapter_FOR GCGTTATCGAGGTCGGGCCTCCTG AGTTTCTCAT EMX1_On_Adapter_REV GTGCTCTTCCGATCTGTGGTTGCC CACCCTAGTC EMX1_Off1_Adapter_FOR GCGTTATCGAGGTCTGCACATGTA TGTACAGGAGTCAT EMX1_Off1_Adapter_REV GTGCTCTTCCGATCTCACCTTTTA AGATCTGACAGAGAAA EMX1_Off2_Adapter_FOR GCGTTATCGAGGTCTGGGCGAGAA AGGTAACTTATG EMX1_Off2_Adapter_REV GTGCTCTTCCGATCTACTGTTTCA CTGCCTACCTTCC PPP1R12C_On_Adapter_Set1_FOR GCGTTATCGAGGTCGATCAGTGAA ACGCACCAGA PPP1R12C_On_Adapter_Set1_REV GTGCTCTTCCGATCTGTCTAACCC CCACCTCCTGT PPP1R12C_On_Adapter_Set2_FOR GCGTTATCGAGGTCGTCAGAGCAG CTCAGGTTCTG PPP1R12C_On_Adapter_Set2_REV GTGCTCTTCCGATCTTAGGCCTCC TCCTTCCTAGTCT VEGFA_Promoter1_On_Adapter_FOR GCGTTATCGAGGTCGCACATTGTC AGAGGGACAC VEGFA_Promoter1_On_Adapter_REV GTGCTCTTCCGATCTCACACGTCC TCACTCTCGAA VEGFA_Promoter1_Off1_Adapter_FOR GCGTTATCGAGGTCTCTCAAACTC CTGGGCTCAA VEGFA_Promoter1_Off1_Adapter_REV GTGCTCTTCCGATCTCTGGTTTTT GGTTTGGGAAA VEGFA_Promoter1_Off2_Adapter_FOR GCGTTATCGAGGTCCCCTCTCCAT GAAACTTTGC VEGFA_Promoter1_Off2_Adapter_REV GTGCTCTTCCGATCTAGGGCAAAA CAGGAGAACAG VEGFA_Promoter1_Off3_Adapter_FOR GCGTTATCGAGGTCGCATCTCTGC CTTCATTGCT VEGFA_Promoter1_Off3_Adapter_REV GTGCTCTTCCGATCTGCCTACTCC AGGGTTTCTCA VEGFA_Promoter2_On_Adapter_FOR GCGTTATCGAGGTCGCAGACGGCA GTCACTAGG VEGFA_Promoter2_On_Adapter_REV GTGCTCTTCCGATCTCCGTTCCCT CTTTGCTAGG VEGFA_Promoter2_Off1_Adapter_FOR GCGTTATCGAGGTCGATCCGGTGC TGCAGTGA VEGFA_Promoter2_Off1_Adapter_REV GTGCTCTTCCGATCTGCTCTCCAC CTCGATGTCA VEGFA_Promoter2_Off2_Adapter_FOR GCGTTATCGAGGTCTCAAAGTTTC ACATGGTTGC VEGFA_Promoter2_Off2_Adapter_REV GTGCTCTTCCGATCTGTGTGGAGG GTGGGACCT VEGFA_Promoter2_Off3_Adapter_FOR GCGTTATCGAGGTCATTATGCGTA TTCAGGGTGTGC VEGFA_Promoter2_Off3_Adapter_REV GTGCTCTTCCGATCTGCTGGTCAG AGGGTACAACTTTT VEGFA_Promoter2_Off4_Adapter_FOR GCGTTATCGAGGTCGGTTAGGAGA GCTGGCTTGGA VEGFA_Promoter2_Off4_Adapter_REV GTGCTCTTCCGATCTCTGGCCTCG GCCTCTCA WAS_Intron1-2_On_Adapter_FOR GCGTTATCGAGGTCGGCAGGGCTG TGATAACTCT WAS_Intron1-2_On_Adapter_REV GTGCTCTTCCGATCTATCTACCGC CAATCCATCC WAS_Intron1_Off1_Adapter_FOR GCGTTATCGAGGTCACGGCATGGA ATTATTTGGTT WAS_Intron1_Off1_Adapter_REV GTGCTCTTCCGATCTGCCTGGGAG AGAAATCAACTC WAS_Intron1_Off2_Adapter_FOR GCGTTATCGAGGTCACTGTGTAGG AAGCCCACTCTC WAS_Intron1_Off2_Adapter_REV GTGCTCTTCCGATCTAAAGCTTGG TGACAGTGAAATG WAS_Intron1_Off3_Adapter_FOR GCGTTATCGAGGTCCATGAAGGGA AGAGGTGCAT WAS_Intron1_Off3_Adapter_REV GTGCTCTTCCGATCTCCAACGTGA CCCTTTTTGAG WAS_Intron2_Off1_Adapter_FOR GCGTTATCGAGGTCTCACAGTCTC TTCCCCTGCT WAS_Intron2_Off1_Adapter_REV GTGCTCTTCCGATCTCTTGGCCAG TGTCTTTCCAT TAT_On_Adapter_FOR GCGTTATCGAGGTCTGTGTTTGGA AACCTGCCTA TAT_On_Adapter_REV GTGCTCTTCCGATCTCCAAATCCA AAGGACCATGT TAT_Off1_Adapter_FOR GCGTTATCGAGGTCCATCCCCTGG CATCTAGAAA TAT_Off1_Adapter_REV GTGCTCTTCCGATCTTCACTACCT GGTGGCTATGG FANCF_On_Adapter_FOR GCGTTATCGAGGTCAGCATTGCAG AGAGGCGTAT FANCF_On_Adapter_REV GTGCTCTTCCGATCTATGGATGTG GCGCAGGTAG FANCF_Off1_Adapter_FOR GCGTTATCGAGGTCCACAGATTGA TGCCACTGGA FANCF_Off1_Adapter_REV GTGCTCTTCCGATCTACGCCAGCA CTTTCTAAGGA FANCF_Off2-3_Adapter_FOR GCGTTATCGAGGTCTTACCAGATG GAGGACAGTGA FANCF_Off2-3_Adapter_REV GTGCTCTTCCGATCTACCAGTTTG AGACCTCTGACC FANCF_Off4_Adapter_FOR GCGTTATCGAGGTCGGCTCTGGGT ACAGTTCTGC FANCF_Off4_Adapter_REV GTGCTCTTCCGATCTGCCACAGAC GAAGACACAGA
TABLE-US-00005 TABLE1 Listof#Cas9variantsconstructedandtested.Aminoacidsforthe differentproteinlinkersaregiveninbold. SEQ IDNo. No. Details 15 17 NLS-PR-ERT2-PG-Cas9-ERT2 16 2 NLS-TG-ERT2-SGSEGA-Cas9-ERT2 17 9 NLS-TG-ERT2-GASGSKTPG-Cas9-ERT2 18 1 NLS-TG-ERT2-SGSETPGTSESAGA-Cas9-ERT2 19 5 NLS-TG-ERT2-SGSETPGTGPGGA-Cas9-ERT2 20 6 NLS-TG-ERT2-SESATPESGA-Cas9-ERT2 21 4 NLS-TG-ERT2-GTSESATPESGGA-Cas9-ERT2 22 3 NLS-TG-ERT2-GGSGGSGA-Cas9-ERT2 23 11 NLS-TGPGPGGS-ERT2-GA-Cas9-ERT2 24 14 NLS-TGPGPGGSAGDTTGPGTGPG-ERT2-GA-Cas9-ERT2 25 19 NLS-TG-ERT2-GA-Cas9-GGGS-ERT2 26 13 NLS-TGGGS-ERT2-TPESGA-Cas9-ERT2 27 15 NLS-TGGGS-ERT2-SESATPESGA-Cas9-ERT2 28 16 NLS-TGGGS-ERT2-SGSEGA-Cas9-ERT2 29 7 NLS-TGGGS-ERT2-SGSETPGTGA-Cas9-ERT2 30 12 NLS-TGGGS-ERT2-SGSETPGTSEGA-Cas9-ERT2 31 22 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-ERT2 32 21 NLS-TGGGS-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-ERT2 33 23 NLS-TG-ERT2-GA-Cas9-ERT2-PAG-ERT2 34 24 NLS-TG-ERT2-GA-Cas9-ERT2-PAGGGS-ERT2 35 8 NLS-TGGGS-ERT2-SGSETPGTPGGA-Cas9-ERT2 36 27 NLS-TGGGS-ERT2-PRGGS-ERT2-TPESGA-Cas9-ERT2-PAG-ERT2 37 30 NLS-TGGGS-ERT2-PR-ERT2-TPESGA-Cas9-ERT2-PAGGGS-ERT2 38 25 NLS-TG-ERT2-GGSPRGGS-ERT2-TPESGA-Cas9-ERT2-PAGGGS-ERT2 39 28 NLS-TGGGS-ERT2-TPGGPRGGS-ERT2-TPESGA-Cas9-ERT2-PAG-ERT2