TERPENE SYNTHASE PRODUCING PATCHOULOL AND ELEMOL, AND PREFERABLY ALSO POGOSTOL
20210102224 · 2021-04-08
Inventors
- Martinus Julius Beekwilder (Renkum, NL)
- Adèle Margaretha Maria Liduina Van Houwelingen (Wekerom, NL)
- Hendrik Jan Bosch (Wageningen, NL)
- Georg Friedrich Lentzen (Geleen, NL)
- Elena MELILLO (Geleen, NL)
Cpc classification
Y02E50/10
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
C12N15/8243
CHEMISTRY; METALLURGY
International classification
Abstract
The invention is directed to a patchoulol synthase, to a nucleic acid encoding said patchoulol synthase, to an expression vector comprising said nucleic acid, to a host cell comprising said expression vector, to a method of preparing patchoulol and elemol, and preferably also pogostol, and to a method of preparing a patchoulol synthase.
Claims
1. Patchoulol synthase comprising an amino acid sequence as shown in SEQ ID NO: 4 or a functional homologue thereof, said homologue being a patchoulol synthase comprising an amino acid sequence which has a sequence identity of at least 80% with SEQ ID NO: 4.
2. Patchoulol synthase according to claim 1, having at least 85%, at least 90%, at least 95%, or at least 98% sequence identity with SEQ ID NO: 4.
3. Nucleic acid, comprising a nucleic acid sequence encoding a patchoulol synthase according to claim 1, or a complementary sequence thereof.
4. Nucleic acid according to claim 3, wherein the nucleic acid comprises a nucleic acid sequence as shown in SEQ ID NO: 3 or SEQ ID NO: 5, or a nucleic acid sequence having a sequence identity of at least 80%, more preferably at least 85%, more preferably at least 90%, more preferably at least 95%), most preferably at least 98% with a sequence shown in SEQ ID NO: 3 or SEQ ID NO: 5, or a complementary sequence of any of these sequences.
5. Expression vector comprising a nucleic acid according to claim 3.
6. A host cell, which may be an organism per se or part of a multi-cellular organism, said host cell comprising an expression vector comprising a heterologous nucleic acid sequence according to claim 3.
7. A host cell according to claim 6, wherein the host cell is a bacterial cell selected from the group of Oram negative bacteria, in particular from the group of Rhodobacter, Paracoccus and Escherichia.
8. A host cell according to claim 6, wherein the host cell is a fungal cell selected from the group of Aspergillus, Blakeslea, Penicillium, Phaffia (Xanthophyllomyces), Pichia, Saccharomyces, and Yarrowia.
9. Transgenic plant or culture comprising transgenic plant cells, said plant or culture comprising host cells according to claim 6, wherein the host cell is of a transgenic plant selected from Nicotiana spp, Solanum spp, Cichorum intybus, Lactuca sativa, Mentha spp, Artemisia annua, tuber forming plants, oil crops, liquid culture plants, tobacco BY2 cells, Physcomitrella patens, and trees.
10. Transgenic mushroom or culture comprising transgenic mushroom cells, said mushroom or culture comprising host cells according to claim 6, wherein the host cell is selected from Schizophyllum, Agaricus and Pleurotis.
11. Method for preparing patchoulol and elemol, and preferably also pogostol, comprising converting a farnesyl diphosphate to patchoulol and elemol, and preferably also pogostol in the presence of a patchoulol synthase according to claim 1.
12. Method for preparing patchoulol and elemol, and preferably also pogostol according to claim 11, wherein the patchoulol and elemol, and preferably also pogostol is prepared in a host cell, a plant or plant culture, or a mushroom or mushroom culture, expressing said patchoulol synthase.
13. Method according to claim 11, further comprising isolating the patchoulol and/or pogostol and/or elemol.
14. Method according to claim 11, wherein the ratio pogostol to patchoulol ratio is higher than 1:5.
15. Method according to claim 11, wherein the elemol to patchoulol ratio is higher than 1:10
16. Method according to claim 11, wherein the ratio of elemol to pogostol is 1:3 or higher, 1:2 or higher.
Description
FIGURE LEGENDS
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
EXAMPLES
Example 1
[0137] GC-MS Analysis of Nardostachys jatamansi
[0138] A Nardostachys jatamansi plant of about 20 cm tall was purchased from Poyntzfield Herb Nursery, Black Isle, By Dingwall IV7 8LX, Ross & Cromarty, Scotland. The plant was dissected in leaf and root material. 0.5 g of plant material was weighed in a precooled glass tube, and 2 mL of dichloromethane was added. The suspension was vortexed for 1 min, sonicated for 5 min in an ultrasonic bath and centrifuged for 5 min at 1500 g at room temperature. The supernatant was collected and filtered over a column of 1 g sodium sulphate. About 2 microliter was analysed by GC/MS using a gas chromatograph as described in detail by Cankar et al. (2015). Patchoulol was identified by the comparison of retention times and mass spectra to those of patchouli oil 5 (Sigma-Aldrich).
[0139] Results:
[0140] The roots of N. jatamansi appeared to contain compounds that correspond to patchoulol (Rt 16.4 min) and alpha patchoulene (Rt 13.9 min). Therefore, this tissue was further taken for extraction of RNA.
Example 2
[0141] RNA Extraction and Analysis
[0142] The RNA of N. jatamansi root material was isolated as follows: About 15 mL 20 extraction buffer (2% hexadecyl-trimethylammonium bromide, 2% polyvinylpyrrolidinone K 30, 100 mM Tris-HCl (pH 8.0), 25 mM EDTA, 2.0 M NaCl, 0.5 g/L spermidine and 2% β-mercaptoethanol) was warmed to 65° C., after which 3 g ground tissue was added and mixed. The mixture was extracted two times with an equal volume of chloroform:isoamylalcohol (1:24), and one-fourth volume of 10 M LiCl was added to the supernatant and mixed. The RNA was precipitated overnight at 4° C. and harvested by centrifugation at 10 000 g for 20 min. The pellet was dissolved in 500 microliter of SSTE [1.0 M NaCl, 0.5% SDS, 10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0)] and extracted once with an equal volume of chloroform: isoamylalcohol. Two volumes of ethanol were added to the supernatant, incubated for at least 2 h at −20° C., centrifuged at 13 000 g and the supernatant removed. The pellet was air-dried and resuspended in water. Total RNA (60 Vg) was shipped to Vertis Biotechnology AG (Freising, Germany). PolyA+RNA was isolated, random primed cDNA synthesized using a randomized N6 adapter primer and M-MLV H-reverse transcriptase. cDNA was sheared and fractionated, 5 and fragments of a size of 500 by were used for further analysis. The cDNAs carry attached to their 5′ and 3′ ends the adaptor sequences A and B as specified by Illumina.
[0143] The material was subsequently analysed on an Illumina MiSeq Sequencing device. In total, 28,331,910 sequences were read by the MiSeq, with a total sequence length of 11,064,059,151 basepairs. Trimmomatic-0.32 was used to trim sequences from Illumina sequencing adapters, Seqprep was used to overlap paired end sequences, and bowtie2 (version 2.2.1) was used to remove phiX contamination (phiX DNA is used as a spike-in control, usually present in <1%). Paired end reads and single reads were used in a Trinity assembly (trinityrnaseq-2.0.2). A total number of 160871 contigs were assembled by Trinity. In order to identify sesquiterpene synthases, the N. jatamansi contigs were used to create a database of cDNA sequences. In this database, the TBLASTN program was deployed to identify cDNA sequences that encode proteins that show identity with protein sequences of sesquiterpene synthases, including the patchoulol synthase from Pogostemon cablin, the patchoulol synthase from Valeriana. In total 77 contigs in the N. jatamansi cDNA database were identified which have significant homology to sesquiterpene synthases. These 77 contigs were further characterized by analyzing them using the BLASTX program to align them to protein sequences present in the UniProt database (downloaded Aug. 28, 2015), and 24 of them were identified as putative sesquiterpene synthase sequences and other 11 as putative monoterpene synthases, according to their homology to terpene synthases sequences present in UniProt. Contigs were screened for open reading frames encoding the full-length terpene synthase proteins, based on the alignments provided by the BLASTX analysis. Two full length putative sesquiterpene-synthase encoding RNA sequences were identified, one of them was contig 27692.
Example 3: Cloning of Nardostachys jatamansi Patchoulol Synthase (NjPAT)
[0144] Full length open reading frames were amplified from the cDNA of N. jatamansi. Forward and reverse primers as shown in Table 1 were designed and used to amplify total open reading frames in such a way that the reading frame was fused to the C-terminus of a His-6 tag in the plasmid pCDF-DUET-1 (Novagen corporation). A total of 5 different terpene synthase ORFs were cloned. Using the primers 27692-Fw (SEQ ID NO: 1) and 27692-Re (SEQ ID NO: 2), two different closely related cDNAs were cloned. One of these (pTS11-1) encoded a protein lacking 70 aminoacids, relative to the protein encoded by the contig 27692 described above. The other clone (pTS11-2) encompassed the cDNA sequence (SEQ ID NO: 3) which encoded the protein SEQ ID NO: 4.
[0145] The cloned variants were analysed by sequencing the TS insert. Different variants were introduced into chemical competent E. coli BL21-RIL (Stratagene), by heat shock transformation, and selected on LB-agar with 1% glucose, 50 ug/ml spectinomycin and 50 ul/ml chloramphenicol. Transformants were transferred to 5 ml LB liquid medium with 1% glucose 50 ug/ml spectinomycin and 50 ug/ml chloramphenicol and grown overnight at 37° C. and 250 rpm.
[0146] 200 microliter of those cultures was transferred to 20 mL of 5 LB medium with the appropriate antibiotic in a 100 mL Erlenmeyer flask, and incubated at 37° C., 250 rpm until the A600 was 0.4 to 0.6. Subsequently, 1 mM IPTG was added and cultures were incubated overnight at 18° C. and 250 rpm. The next day, cells were harvested by centrifugation (10 min 8000×g), medium was removed, and cells were resuspended in 1 mL Resuspension buffer (50 mM Tris-HCl pH=7.5, 1.4 mM 6-mercaptoethanol; 4° C.). Cells were disrupted by shaking 2 times for 10 seconds with 0.2 g zirconium sand in a Fastprep machine at speed 6.5. Insoluble particles were subsequently removed by centrifugation (10 min 13,000×g, 4° C.). Soluble protein was immediately used for enzyme assays.
Example 4: In Vitro Enzyme Assay
[0147] For enzyme assays, in a glass tube a mix was made of 800 μL of MOPSO buffer (15 mM MOPSO (3-[N-morpholino]-2-hydroxypropane sulphonic acid) pH=7.0, 12.5% glycerol, 1 mM MgCl2, 0.1% tween 20, 1 mM ascorbic acid, 1 mM dithiothreitol), 100 microliter of purified enzyme solution and 5 μL of farnesyl diphosphate or geranyl diphosphate (10 mM, Sigma FPP dry-evaporated and dissolved in 50% ethanol) and 20 μL Na-orthovanadate 250 mM. This mix was incubated at 30° C. with mild agitation for 2 hours. Subsequently, the waterphase was extracted with 2 mL ethylacetate. Ethylacetate phase was collected, centrifuged at 1200×g, dried over a sodium sulphate column and analyzed by GC-MS.
[0148] The GC-MS analysis was performed on an Agilent Technologies system, comprising a 7980A GC system, a 597C inert MSD detector (70 eV), a 7683 auto-sampler and injector and a Phenomenex Zebron ZB-5 ms column of 30 m length×0.25 mm internal diameter and 0.25 μm stationary phase, with a Guardian precolumn (5 m). In this system, 1 microliter of the sample was injected. The injection chamber was at 250° C., the injection was splitless, and the ZB5 column was maintained at 45° C. for 2 minutes after which a gradient of 10° C. per minute was started, until 300° C. Peaks were detected in chromatograms of the total ion count. Compounds were identified by their retention index and by their mass spectrum in combination with comparison of the mass spectrum to libraries (NIST8 and in-house).
[0149] Clone TS11-2 was found to produce patchoulol in this in vitro assay, and thus to encode a patchoulol synthase, and was termed NjPAT. The closely related clone TS11-1 did not produce any sesquiterpenes in the in vitro assay.
Example 5: Cloning of NjPAT for the Expression in Rhodobacter sphaeroides with Solubility Tags
[0150] For the expression of the NjPAT gene in combination with the different solubility tags under the regulation of promoter SPppa, the constructs SPppa-MBP, SPppa-Fh8, SPppa-SUMO, SPppa-NusA, SPppa-RsTRX and SPppa-GST, and the NjPAT gene were synthesized by Genscript USA Inc. (Piscataway, N.J., USA). Both NjPAT gene (SEQ ID NO:5) and all the sequences coding for the solubility tags were codon optimized for the expression in R. sphaeroides.
[0151] The plasmid p-m-SPppa-MBP-NjPAT-mpmii alt (
[0152] The plasmid p-m-SPppa-Fh8-NjPAT-mpmii alt (
[0153] The construct SPppa-SUMO was amplified using the primer Pppa-Fw (SEQ ID NO: 6)- and SUMO_NjPAT_Rv (SEQ ID NO: 16) and the NjPAT gene was amplified with primers SUMO_NjPAT_Fw (SEQ ID NO: 17) NjPAT_Rv (SEQ ID NO: 9). The nucleotide sequence of the construct SPppa-SUMO-NjPAT is given in SEQ ID NO: 18; the protein sequence is represented in SEQ ID NO: 19. The map of the final plasmid p-m-SPppa-SUMO-NjPAT-mpmii alt is illustrated in
[0154] The construct SPppa-NusA was amplified using the primer Pppa-Fw (SEQ ID NO: 6)- and NusA_NjPAT_Rv (SEQ ID NO: 20) and the NjPAT gene was amplified with primers NusA_NjPAT_Fw (SEQ ID NO: 21) NjPAT_Rv (SEQ ID NO: 9). The nucleotide sequence of the construct SPppa-NusA-NjPAT is given in SEQ ID NO: 22; the protein sequence is represented in SEQ ID NO: 23. The map of the final plasmid p-m-SPppa-NusA-NjPAT-mpmii alt is illustrated in
[0155] The construct SPppa-RsTRX was amplified using the primer Pppa-Fw (SEQ ID NO: 6)- and RsTRX_NjPAT_Rv (SEQ ID NO: 24) and the NjPAT gene was amplified with primers RsTRX_NjPAT_Fw (SEQ ID NO: 25) NjPAT_Rv (SEQ ID NO: 9). The nucleotide sequence of the construct SPppa-RsTRX-NjPAT is given in SEQ ID NO: 26; the protein sequence is represented in SEQ ID NO: 27. The map of the final plasmid p-m-SPppa-RsTRX-NjPAT-mpmii alt is illustrated in
[0156] The construct SPppa-GST was amplified using the primer Pppa-Fw (SEQ ID NO: 6)- and GST_NjPAT_Rv (SEQ ID NO: 28) and the NjPAT gene was amplified with primers GST_NjPAT_Fw (SEQ ID NO: 29) NjPAT_Rv (SEQ ID NO: 9). The nucleotide sequence of the construct SPppa-GST-NjPAT is given in SEQ ID NO: 30; the protein sequence is represented in SEQ ID NO: 31. The map of the final plasmid p-m-SPppa-GST-NjPAT-mpmii alt is illustrated in
[0157] Transfer of the plasmids from S17-1 to R. sphaeroides Rs265-9c by conjugation was performed using standard procedures (U.S. Pat. No. 9,260,709B2).
Example 6: Growth Conditions Shake Flask Experiments
[0158] Seed cultures were performed in 100 ml shake flasks without baffles with 20 ml RS102 medium (U.S. Pat. No. 9,260,709B2) with 100 mg/L neomycin and a loop of glycerol stock. The flasks were grown for 72 hours at 30° C. in a shaking incubator with an orbit of 50 mm at 110 rpm.
[0159] At the end of the 72 hours, the OD600 of the culture was assessed in order to calculate the exact volume of culture to be transferred to the larger flasks.
[0160] Shake flask experiments were performed in 300 ml shake flasks with 2 bottom baffles. Twenty ml of RS102 medium and neomycin to a final concentration of 100 mg/L were added to the flask together with 2 ml of sterile n-dodecane. The volume of the inoculum was adjusted to obtain a final OD600 value of 0.05 in 20 ml medium.
[0161] The flasks were kept for 72 hours at 30° C. in a shaking incubator with an orbit of 50 mm at 110 rpm.
Example 7: Sample Preparation for Analysis of Isoprenoid Content in Organic Phase
[0162] Cultures were collected 72 hours after inoculation in pre-weighted 50 ml PP tubes which were then centrifuged at 4500×g for 20 minutes. The n-dodecane layer was transferred to a microcentrifuge tube for later GC analysis.
[0163] Ten microliters of ethyl laureate were weighed in a 10-ml glass vial to which 800 μl of the isolated dodecane solution were added and weighed. Subsequently, 8 ml of acetone were added to the vial to dilute the dodecane concentration for a more accurate GC analysis. Approximately, 1.5 ml of the terpene-containing dodecane in acetone solution were transferred to a chromatography vial.
Example 8: Gas Chromatography Flame Ionization Detector (GC-FID)
[0164] Gas chromatography was performed on a Shimadzu GC2010 Plus equipped with a Restek RTX-55i1 MS capillary column (30 m×0.25 mm, 0.5 μm). The injector and FID detector temperatures were set to 280° C. and 300° C., respectively. Gas flow through the column was set at 40 mL/min. The oven initial temperature was 160° C., increased to 180° C. at a rate of 2° C./min, further increased to 300° C. at a rate of 50° C./min, and held at that temperature for 3 min. Injected sample volume was 1 μL with a 1:50 split-ratio, and the nitrogen makeup flow was 30 ml/min
Example 9: Gas Chromatography Quantitative Time-of-Flight (GC-QTOF) Analysis of Terpenes Produced Byt NjPAT
[0165] A stock solution of the dodecane extract from R. sphaeroides cultures was prepared at 1 mg/ml in hexane and then diluted to 20 μg/ml in methanol.
[0166] Two μl of the dilution were transferred to a Tenax tube, and drypurged for 2 h with a stream of 200 ml/minute helium, to remove solvent. The Tenax tube was desorbed using a Markes thermal desorber (Unity TD100) during 5 min at 240° C. to a coldtrap with multibed packing (0° C.). Trapped material was injected by heating at 40° C./sec to 260° C., with a splitflow of 9 ml/min and a column flow of 1.2 ml/min helium. Injected material was analyzed on a DB5 MS column (Agilent, 30 M, 0.25 mm id, 1 μm df) in an Agilent 7890B GC.
[0167] The temperature program was as follows: 2 min 40° C., then ramp at 10° C./min to 280° C. Compounds were detected using an Agilent 7200 QTOF MS with a mass range of 50-350 amu.
[0168] For data analysis Masshunter (Agilent, version B.07.00) was used. Deconvoluted spectra where compared with the NIST library (NIST version 2.2 2014)
Example 10: Analysis of Terpene Production by R. sphaeroides Strains
[0169] The Rhodobacter strain harbouring the plasmid p-m-SPppa-MBP-NjPAT-mpmii alt showed the highest production of terpene species, among which patchoulol, pogostol, and surprisingly elemol (
TABLE-US-00001 SEQUENCES SEQ ID NO: 1 27692-fw Nucleotide sequence atatgagctcaATGTCAATTATTATTGCAACAAACAGTACTGAGCATCC SEQ ID NO: 2 27692-re Nucleotide sequence atatgcggccgcTTATATGGATACGGGATCTACGAACAACGATATGATGTG SEQ ID NO: 3 NjPAT nucleotide sequence ATGTCAATTATTATTGCAACAAACAGTACTGAGCATCCAATTTTTCGTCCATTAGCAAA TTTTCCACCAAGTTTATGGGGCAATCTTTTCACTTCATTCTCCATGGATAATCAGGCTA GGGAAATATATGCTAAAGAACATGAAGGTTTAAAAGAAAAAGTGAGAATGATGTTTTTA GATACAACAAATTACAAAATTTCAGAGAAAATCAATTTCATAAACACAGTGGAAAGATT AGGTGTATCATATCATTTTGAGAAAGAGATTGAAGAACTACTTCATCAAATGTTTGATG CTCATTCTAAACACCTAGATGATATTCAAGAATTTGATTTGTTCACTTTGGGAATTTACT TCAGGATTCTAAGGCAACATGGTTATAAAATCTCTTGTGATGTTTTCAATAAGTTGAAA GATAGCAATGGCGAATTCAAGGACGAACTTAAAGATGATGTGAATGGTATGCTAAGTT TCTATGAAGCAACACATGTAAGAACACATGGAGAAAATATTTTAGATGAAGCTCTCATT TATACAAAAGCTCAACTTGAATCCATGGCCGCTGCAAGTTTAAGCCCATTTCTCGCGAA CCAAGTTAAGCATGCTTTGATGCAAGCTCTCCACAAAGGGATCCCAAGAATCGAAGCA CGTAACTATATCTCTGTTTACGAAGAAGATCCTAACAAAAATGATTTGTTATTGAGGTT CTCAAAGATAGATTTCAATCTAGTACAAATGATTCACAAGCAAGAATTGTGCGATACCT TTAGgTGGTGGAAAGATTTGGAGTTCGAATCGAAACTATCTTTTGCAaGGAATAGAGTG GTGGAAGCCTACTTATGGACTCTTAGCGCGTACTACGAACCAAAATACTCTTCCGCTCG GATTATATTAGTCAAACTAATGGTTATAATATCTGTTACGGATGACACATATGATGCAT ATGGTACATTAGATGAACTTCAACTTTTTACAGATGCAATACAAAGGTTGGATATGAGT TCTATCAATCAACTTCCAGATTACATGAAGACCATCTATAAAGCTCTCCTAGATCTTTTT GACGAAATAGAAGATCGATTATCGAAGCATGAAACTGATCATTCTTACCGCGTTGCTTA TGCGAAATATGTGTATAAAGAGATCGTTAGGTGCTACGATATGGAGTACAAATGGTTC AACAAAAATTACGTGCCGGCATTTGAAGAATATATGCAGAAAGCGTTAGTCACATCAG GTAACCGTTTGCTCATAACGTTTTCCTTTCTGGGAATGGACGAAGTCGCAACTATTCAA GCGTTCGAGTGGGTAAAAAGTAATGCCAAAATGATAGTCTCTTCCAATAAAGTATTACG ACTTATTGATGACATAATGAGTCACGAGGAAGAGGATGAAAGGGGACATGTTGCAACA GGGATTGAATGCTTTGTAAAAGAACATGGACTAACTAGGGAAGAGGTTATCGTTGAAT TTCATAAGAGGATTGATGATGCTTGGAAGGATATAAATGAGGAATTTATAACGCCAAAT AATTTACCGATTGAGATACTTACGCGTGTTCTAAACCTTACAAGAATTGGAGATGTTGT TTACAAGTATGATGACGGGTATACTCATCCGGAGAAAGCGTTGAAAGATCACATCATAT CGTTGTTCGTAGATCCCGTATCCATATAA SEQ ID NO: 4 NjPAT Aminoacid sequence MSIIIATNSTEHPIFRPLANFPPSLWGNLFTSFSMDNQAREIYAKEHEGLKEKVRMMFLD TTNYKISEKINFINTVERLGVSYHFEKEIEELLHQMFDAHSKHLDDIQEFDLFTLGIYFRI LRQHGYKISCDVFNKLKDSNGEFKDELKDDVNGMLSFYEATHVRTHGENILDEALIYT KAQLESMAAASLSPFLANQVKHALMQALHKGIPRIEARNYISVYEEDPNKNDLLLRFSKI DFNLVQMIHKQELCDTFRWWKDLEFESKLSFARNRVVEAYLWTLSAYYEPKYSSARIIL VKLMVIISVTDDTYDAYGTLDELQLFTDAIQRLDMSSINQLPDYMKTIYKALLDLFDEIE DRLSKHETDHSYRVAYAKYVYKEIVRCYDMEYKWFNKNYVPAFEEYMQKALVTSGNRL LITFSFLGMDEVATIQAFEWVKSNAKMIVSSNKVLRLIDDIMSHEEEDERGHVATGIECF VKEHGLTREEVIVEFHKRIDDAWKDINEEFITPNNLPIEILTRVLNLTRIGDVVYKYDDG YTHPEKALKDHIISLFVDPVSI SEQ ID NO: 5 NjPAT codon optimized Nucleotide sequence ATGTCGATCATCATCGCCACCAACAGCACCGAGCATCCCATCTTCCGCCCTCGCCA ACTTCCCGCCGTCGCTCTGGGGCAACCTGTTCACCTCGTTCAGCATGGACAACCAGGC GCGCGAGATCTACGCCAAGGAGCACGAGGGCCTCAAGGAGAAGGTCCGGATGATGTT CCTGGACACCACGAACTACAAGATCTCGGAGAAGATCAACTTCATCAACACCGTCGAG CGCCTGGGCGTGAGCTATCACTTCGAGAAGGAGATCGAGGAGCTGCTCCATCAGATGT TCGACGCCCACTCGAAGCATCTGGACGACATCCAGGAGTTCGACCTCTTCACGCTGGG CATCTACTTCCGCATCCTGCGGCAGCATGGCTATAAGATCTCGTGCGACGTCTTCAACA AGCTGAAGGACAGCAACGGCGAGTTCAAGGACGAGCTCAAGGACGACGTCAACGGCA TGCTGTCGTTCTATGAGGCCACCCATGTGCGCACCCATGGCGAGAACATCCTCGACGA GGCGCTGATCTACACGAAGGCCCAGCTGGAGTCGATGGCGGCCGCCTCGCTCAGCCC GTTCCTGGCCAACCAGGTGAAGCACGCGCTCATGCAGGCCCTGCATAAGGGCATCCCG CGCATCGAGGCGCGGAACTACATCTCGGTCTATGAGGAAGACCCGAACAAGAACGACC TGCTCCTGCGCTTCAGCAAGATCGACTTCAACCTGGTGCAGATGATCCACAAGGAGGA GCTGTGCGACACCTTCCGGTGGTGGAAGGACCTGGAGTTCGAGTCGAAGCTGTCGTTC GCCCGCAACCGGGTGGTGGAGGCCTACCTCTGGACGCTGTCGGCGTACTATGAGCCGA AGTATTCGAGCGCCCGCATCATCCTCGTGAAGCTGATGGTCATCATCTCGGTGACCGA CGACACGTACGACGCCTATGGCACCCTGGACGAGCTCCAGCTGTTCACGGACGCGATC CAGCGCCTCGACATGTCGAGCATCAACCAGCTGCCCGACTACATGAAGACCATCTATA AGGCGCTCCTGGACCTCTTCGACGAGATCGAGGACCGCCTGTCGAAGCACGAGACGGA CCATAGCTACCGGGTGGCGTATGCCAAGTACGTCTATAAGGAGATCGTGCGCTGCTAC GACATGGAGTATAAGTGGTTCAACAAGAACTACGTCCCCGCCTTCGAGGAGTATATGC AGAAGGCCCTGGTGACCTCGGGCAACCGGCTCCTGATCACGTTCAGCTTCCTGGGCAT GGACGAGGTCGCGACCATCCAGGCCTTCGAGTGGGTGAAGTCGAACGCCAAGATGATC GTGTCGTCGAACAAGGTGCTCCGCCTGATCGACGACATCATGTCGCACGAGGAAGAGG ACGAGCGCGGCCATGTGGCCACGGGCATCGAGTGCTTCGTGAAGGAGCACGGCCTGA CCCGCGAGGAAGTGATCGTCGAGTTCCATAAGCGGATCGACGACGCGTGGAAGGACAT CAACGAGGAGTTCATCACCCCGAACAACCTGCCGATCGAGATCCTGACCCGCGTGCTC AACCTGACCCGGATCGGCGACGTGGTCTACAAGTATGACGACGGCTACACGCACCCCG AGAAGGCCCTCAAGGACCATATCATCAGCCTGTTCGTGGACCCCGTCAGCATCTGA SEQ ID NO: 6 Pppa_Fw Nucleotide sequence actggcctcagaattcAAAtttatttgctttgtgagcggataac SEQ ID NO: 7 MBP_NiPAT_Rv Nucleotide sequence GGCGATGATGATCGACATGATCTTG SEQ ID NO: 8 MBP_NjPAT_Fw Nucleotide sequence CAAGATCATGTCGATCATCATCGC SEQ ID NO: 9 NjPAT_Rv Nucleotide sequence tttatgatttggatcCTCAGATGCTGACGGGGT SEQ ID NO: 10 SPpp a-MBP-MPAT Nucleotide sequence AAATTTATTTGCTTTGTGAGCGGATAACAATTATTAGATTCACCGGCGAGCCAGCAGGA ATTTCACTCTAGATGACAGGAGGGACATATGAAGATCGAGGAAGGCAAGCTCGTGATC TGGATCAACGGCGACAAGGGCTACAACGGCCTGGCCGAGGTGGGCAAGAAGTTCGAG AAGGACACCGGCATCAAGGTGACGGTGGAGCACCCGGACAAGCTCGAGGAGAAGTTC CCGCAGGTGGCGGCCACGGGCGACGGCCCGGACATCATCTTCTGGGCCCATGACCGC TTCGGCGGCTACGCCCAGTCGGGCCTGCTGGCCGAGATCACCCCGGACAAGGCGTTCC AGGACAAGCTCTATCCCTTCACGTGGGACGCCGTGCGCTACAACGGCAAGCTGATCGC GTATCCCATCGCGGTGGAGGCCCTGTCGCTCATCTATAACAAGGACCTGCTCCCGAAC CCGCCCAAGACCTGGGAGGAGATCCCCGCCCTCGACAAGGAGCTGAAGGCCAAGGGC AAGTCGGCGCTCATGTTCAACCTGCAGGAGCCGTACTTCACCTGGCCCCTGATCGCGG CCGACGGCGGCTACGCGTTCAAGTATGAGAACGGCAAGTATGACATCAAGGACGTGGG CGTGGACAACGCGGGCGCCAAGGCCGGCCTGACCTTCCTCGTGGACCTGATCAAGAAC AAGCACATGAACGCCGACACGGACTACTCGATCGCGGAGGCCGCGTTCAACAAGGGC GAGACCGCCATGACGATCAACGGCCCGTGGGCGTGGTCGAACATCGACACCTCGAAG GTGAACTATGGCGTGACCGTGCTCCCCACGTTCAAGGGCCAGCCCTCGAAGCCCTTCG TGGGCGTGCTGTCGGCGGGCATCAACGCCGCGTCGCCGAACAAGGAGCTCGCGAAGG AGTTCCTGGAGAACTACCTGCTCACCGACGAGGGCCTGGAGGCCGTGAACAAGGACAA GCCCCTGGGCGCCGTGGCCCTGAAGTCGTATGAGGAAGAGCTGGTGAAGGACCCGCG CATCGCGGCCACCATGGAGAACGCGCAGAAGGGCGAGATCATGCCGAACATCCCCCA GATGTCGGCCTTCTGGTATGCGGTGCGCACCGCCGTGATCAACGCGGCCTCGGGCCGC CAGACCGTGGACGAGGCCCTCAAGGACGCCCAGACCGGCGACGACGACGACAAGATC ATGTCGATCATCATCGCCACCAACAGCACCGAGCATCCCATCTTCCGCCCGCTCGCCA ACTTCCCGCCGTCGCTCTGGGGCAACCTGTTCACCTCGTTCAGCATGGACAACCAGGC GCGCGAGATCTACGCCAAGGAGCACGAGGGCCTCAAGGAGAAGGTCCGGATGATGTT CCTGGACACCACGAACTACAAGATCTCGGAGAAGATCAACTTCATCAACACCGTCGAG CGCCTGGGCGTGAGCTATCACTTCGAGAAGGAGATCGAGGAGCTGCTCCATCAGATGT TCGACGCCCACTCGAAGCATCTGGACGACATCCAGGAGTTCGACCTCTTCACGCTGGG CATCTACTTCCGCATCCTGCGGCAGCATGGCTATAAGATCTCGTGCGACGTCTTCAACA AGCTGAAGGACAGCAACGGCGAGTTCAAGGACGAGCTCAAGGACGACGTCAACGGCA TGCTGTCGTTCTATGAGGCCACCCATGTGCGCACCCATGGCGAGAACATCCTCGACGA GGCGCTGATCTACACGAAGGCCCAGCTGGAGTCGATGGCGGCCGCCTCGCTCAGCCC GTTCCTGGCCAACCAGGTGAAGCACGCGCTCATGCAGGCCCTGCATAAGGGCATCCCG CGCATCGAGGCGCGGAACTACATCTCGGTCTATGAGGAAGACCCGAACAAGAACGACC TGCTCCTGCGCTTCAGCAAGATCGACTTCAACCTGGTGCAGATGATCCACAAGCAGGA GCTGTGCGACACCTTCCGGTGGTGGAAGGACCTGGAGTTCGAGTCGAAGCTGTCGTTC GCCCGCAACCGGGTGGTGGAGGCCTACCTCTGGACCICTGTCGGCGTACTATGAGCCGA AGTATTCGAGCGCCCGCATCATCCTCGTGAAGCTGATGGTCATCATCTCGGTGACCGA CGACACGTACGACGCCTATGGCACCCTGGACGAGCTCCAGCTGTTCACGGACGCGATC CAGCGCCTCGACATGTCGAGCATCAACCAGCTGCCCGACTACATGAAGACCATCTATA AGGCGCTCCTGGACCTCTTCGACGAGATCGAGGACCGCCTGTCGAAGCACGAGACGGA CCATAGCTACCGGGTGGCGTATGCCAAGTACGTCTATAAGGAGATCGTGCGCTGCTAC GACATGGAGTATAAGTGGTTCAACAAGAACTACGTCCCCGCCTTCGAGGAGTATATGC AGAAGGCCCTGGTGACCTCGGGCAACCGGCTCCTGATCACGTTCAGCTTCCTGGGCAT GGACGAGGTCGCGACCATCCAGGCCTTCGAGTGGGTGAAGTCGAACGCCAAGATGATC GTGTCGTCGAACAAGGTGCTCCGCCTGATCGACGACATCATGTCGCACGAGGAAGAGG ACGAGCGCGGCCATGTGGCCACGGGCATCGAGTGCTTCGTGAAGGAGCACGGCCTGA CCCGCGAGGAAGTGATCGTCGAGTTCCATAAGCGGATCGACGACGCGTGGAAGGACAT CAACGAGGAGTTCATCACCCCGAACAACCTGCCGATCGAGATCCTGACCCGCGTGCTC AACCTGACCCGGATCGGCGACGTGGTCTACAAGTATGACGACGGCTACACGCACCCCG AGAAGGCCCTCAAGGACCATATCATCAGCCTGTTCGTGGACCCCGTCAGCATCTGA Sequence in Italics is the SPppa promoter; the underlined sequence is the codon optimized MBP and the sequence in normal font is the codon optimized NjPAT. SEQ ID NO: 11 MBP-NjPAT Aminoacid sequence MKIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGP DIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIY NKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGK YDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWS NIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAV NKDKPLGAVALKSYEEELVKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAAS GRQTVDEALKDAQTGDDDDKIMSIIIATNSTEHPIFRPLANFPPSLWGNLFTSFSMDNQA REIYAKEHEGLKEKVRMMFLDTTNYKISEKINFINTVERLGVSYHFEKEIEELLHQMFD AHSKHLDDIQEFDLFTLGIYFRILRQHGYKISCDVFNKLKDSNGEFKDELKDDVNGMLS FYEATHVRTHGENILDEALIYTKAQLESMAAASLSPFLANQVKHALMQALHKGIPRIEA RNYISVYEEDPNKNDLLLRFSKIDFNLVQMIHKQELCDTFRWWKDLEFESKLSFARNRV VEAYLWTLSAYYEPKYSSARIILVKLMVIISVTDDTYDAYGTLDELQLFTDAIQRLDMSSI NQLPDYMKTIYKALLDLFDEIEDRLSKHETDHSYRVAYAKYVYKEIVRCYDMEYKWFN KNYVPAFEEYMQKAINTSIGNRLLITFSFLGMDEVATIQAFEWVKSNAKMIVSSINKVLRL IDDIMSHEEEDERGHVATGIECFVKEHGLTREEVIVEFHKRIDDAWKDINEEFITPNNLP IEILTRVLNLTRIGDVVYKYDDGYTHPEKALKDHIISLFVDPVSI The underlined sequence is the MBP tag SEQ ID NO: 12 Fh8_NRAT_Rv Nucleotide sequence GATCGACATCGACGAGAGGATCGAG SEQ ID NO: 13 Fh8_NjPAT_Fw Nucleotide sequence CTCGTCGATGTCGATCATCATCGCC SEQ ID NO: 14 SPppa-Fh8-NjPAT Nucleotide sequence AAATTTATTTGCTTTGTGAGCGGATAACAATTATTAGATTCACCGGCGAGCCAGCAGGA ATTTCACTCTAGATGACAGGAGGGACATATGCCCTCGGTGCAAGAGGTGGAGAAGCTG CTGCATGTGCTGGACCGGAACGGCGACGGCAAGGTGTCGGCGGAGGAGCTGAAGGCG TTCGCCGACGACTCGAAGTGCCCGCTGGACTCGAACAAGATCAAGGCGTTCATCAAGG AGCATGACAAGAACAAGGACGGCAAGCTGGACCTCAAGGAGCTGGTCTCGATCCTCTC GTCGATGTCGATCATCATCGCCACCAACAGCACCGAGCATCCCATCTTCCGCCCGCTC GCCAACTTCCCGCCGTCGCTCTGGGGCAACCTGTTCACCTCGTTCAGCATGGACAACC AGGCGCGCGAGATCTACGCCAAGGAGCACGAGGGCCTCAAGGAGAAGGTCCGGATGA TGTTCCTGGACACCACGAACTACAAGATCTCGGAGAAGATCAACTTCATCAACACCGTC GAGCGCCTGGGCGTGAGCTATCACTTCGAGAAGGAGATCGAGGAGCTGCTCCATCAGA TGTTCGACGCCCACTCGAAGCATCTGGACGACATCCAGGAGTTCGACCTCTTCACGCT GGGCATCTACTTCCGCATCCTGCGGCAGCATGGCTATAAGATCTCGTGCGACGTCTTC AACAAGCTGAAGGACAGCAACGGCGAGTTCAAGGACGAGCTCAAGGACGACGTCAAC GGCATGCTGTCGTTCTATGAGGCCACCCATGTGCGCACCCATGGCGAGAACATCCTCG ACGAGGCGCTGATCTACACGAAGGCCCAGCTGGAGTCGATGGCGGCCGCCTCGCTCA GCCCGTTCCTGGCCAACCAGGTGAAGCACGCGCTCATGCAGGCCCTGCATAAGGGCAT CCCGCGCATCGAGGCGCGGAACTACATCTCGGTCTATGAGGAAGACCCGAACAAGAAC GACCTGCTCCTGCGCTTCAGCAAGATCGACTTCAACCTGGTGCAGATGATCCACAAGC AGGAGCTGTGCGACACCTTCCGGTGGTGGAAGGACCTGGAGTTCGAGTCGAAGCTGTC GTTCGCCCGCAACCGGGTGGTGGAGGCCTACCTCTGGACGCTGTCGGCGTACTATGAG CCGAAGTATTCGAGCGCCCGCATCATCCTCGTGAAGCTGATGGTCATCATCTCGGTGA CCGACGACACGTACGACGCCTATGGCACCCTGGACGAGCTCCAGCTGTTCACGGACGC GATCCAGCGCCTCGACATGTCGAGCATCAACCAGCTGCCCGACTACATGAAGACCATC TATAAGGCGCTCCTGGACCTCTTCGACGAGATCGAGGACCGCCTGTCGAAGCACGAGA CGGACCATAGCTACCGGGTGGCGTATGCCAAGTACGTCTATAAGGAGATCGTGCGCTG CTACGACATGGAGTATAAGTGGTTCAACAAGAACTACGTCCCCGCCTTCGAGGAGTAT ATGCAGAAGGCCCTGGTGACCTCGGGCAACCGGCTCCTGATCACGTTCAGCTTCCTGG GCATGGACGAGGTCGCGACCATCCAGGCCTTCGAGTGGGTGAAGTCGAACGCCAAGAT GATCGTGTCGTCGAACAAGGTGCTCCGCCTGATCGACGACATCATGTCGCACGAGGAA GAGGACGAGCGCGGCCATGTGGCCACGGGCATCGAGTGCTTCGTGAAGGAGCACGGC CTGACCCGCGAGGAAGTGATCGTCGAGTTCCATAAGCGGATCGACGACGCGTGGAAG GACATCAACGAGGAGTTCATCACCCCAACAACCTGCCGATCGAGATCCTGACCCGCG TGCTCAACCTGACCCGGATCGGCGACGTGGTCTACAAGTATGACGACGGCTACACGCA CCCCGAGAAGGCCCTCAAGGACCATATCATCAGCCTGTTCGTGGACCCCGTCAGCATC TGA Sequence in Italics is the SPppa promoter; the underlined sequence is the codon optimized Fh8 and the sequence in normal font is the codon optimized NjPAT. SEQ ID NO: 15 Fh8-NjPAT Aminoacid sequence MPSVQEVEKLLHVLDRNGDGKVSAEELKAFADDSKCPLDSNKIKAFIKEHDKNKDGKL DLKELVSILSSMSIIIATNSTEHPIFRPLANFPPSLWGNLFTSFSMDNQAREIYAKEHEGL KEKVRMMFLDTTNYKISEKINFINTVERLGVSYHFEKEIEELLHQMFDAHSKHLDDIQE FDLFTLGIYFRILRQHGYKISCDVFNKLKDSNGEFKDELKDDVNGMLSFYEATHVRTHG ENILDEALIYTKAQLESMAAASLSPFLANQVKHALMQALHKGIPRIEARNYISVYEEDPN KNDLLLRFSKIDFNLVQMIHKQELCDTFRWWKDLEFESKLSFARNRVVEAYLWTLSAY YEPKYSSARIILVKLMVIISVTDDTYDAYGTLDELQLFTDAIQRLDMSSINQLPDYMKTIY KALLDLFDEIEDRLSKHETDHSYRVAYAKYVYKEIVRCYDMEYKWFNKNYVPAFEEYM QKALVTSGNRLLITFSFLGMDEVATIQAFEWVKSNAKMIVSSNKVLRLIDDIMSHEEEDE RGHVATGIECFVKEHGLTREEVIVEFHKRIDDAWKDINEEFITPNNLPIEILTRVLNLTRI GDVVYKYDDGYTHPEKALKDHIISLFVDPVSI The underlined sequence is the Fh8 tag SEQ ID NO: 16 SUMO_NjPAT_Rv Nucleotide sequence GATCGACATGCCGATCTGCTCGCG SEQ ID NO: 17 SUMO_NRAT_Fw Nucleotide sequence GATCGGCATGTCGATCATCATCGCC SEQ ID NO: 18 SPppa-SUMO-NjPAT Nucleotide sequence AAATTTATTTGCTTTGTGAGCGGATAACAATTATTAGATTCACCGGCGAGCCAGCAGGA ATTTCACTCTAGATGACAGGAGGGACATATGTCGGACAGCGAGGTGAACCAGGAAGCC AAGCCCGAGGTGAAGCCCGAGGTGAAGCCGGAGACCCACATCAACCTGAAGGTGTCG GACGGCTCGTCGGAGATCTTCTTCAAGATCAAGAAGACCACGCCCCTGCGCCGCCTCA TGGAGGCCTTCGCCAAGCGCCAGGGCAAGGAGATGGACTCGCTGCGCTTCCTCTACGA CGGCATCCGCATCCAGGCGGACCAGACGCCGGAGGACCTCGACATGGAGGACAACGA CATCATCGAGGCGCATCGCGAGCAGATCGGCATGTCGATCATCATCGCCACCAACAGC ACCGAGCATCCCATCTTCCGCCCGCTCGCCAACTTCCCGCCGTCGCTCTGGGGCAACC TGTTCACCTCGTTCAGCATGGACAACCAGGCGCGCGAGATCTACGCCAAGGAGCACGA GGGCCTCAAGGAGAAGGTCCGGATGATGTTCCTGGACACCACGAACTACAAGATCTCG GAGAAGATCAACTTCATCAACACCGTCGAGCGCCTGGGCGTGAGCTATCACTTCGAGA AGGAGATCGAGGAGCTGCTCCATCAGATGTTCGACGCCCACTCGAAGCATCTGGACGA CATCCAGGAGTTCGACCTCTTCACGCTGGGCATCTACTTCCGCATCCTGCGGCAGCAT GGCTATAAGATCTCGTGCGACGTCTTCAACAAGCTGAAGGACAGCAACGGCGAGTTCA AGGACGAGCTCAAGGACGACGTCAACGGCATGCTGTCGTTCTATGAGGCCACCCATGT GCGCACCCATGGCGAGAACATCCTCGACGAGGCGCTGATCTACACGAAGGCCCAGCTG GAGTCGATGGCGGCCGCCTCGCTCAGCCCGTTCCTGGCCAACCAGGTGAAGCACGCGC TCATGCAGGCCCTGCATAAGGGCATCCCGCGCATCGAGGCGCGGAACTACATCTCGGT CTATGAGGAAGACCCGAACAAGAACGACCTGCTCCTGCGCTTCAGCAAGATCGACTTC AACCTGGTGCAGATGATCCACAAGCAGGAGCTGTGCGACACCTTCCGGTGGTGGAAGG ACCTGGAGTTCGAGTCGAAGCTGTCGTTCGCCCGCAACCGGGTGGTGGAGGCCTACCT CTGGACGCTGTCGGCGTACTATGAGCCGAAGTATTCGAGCGCCCGCATCATCCTCGTG AAGCTGATGGTCATCATCTCGGTGACCGACGACACGTACGACGCCTATGGCACCCTGG ACGAGCTCCAGCTGTTCACGGACGCGATCCAGCGCCTCGACATGTCGAGCATCAACCA GCTGCCCGACTACATGAAGACCATCTATAAGGCGCTCCTGGACCTCTTCGACGAGATC GAGGACCGCCTGTCGAAGCACGAGACGGACCATAGCTACCGGGTGGCGTATGCCAAG TACGTCTATAAGGAGATCGTGCGCTGCTACGACATGGAGTATAAGTGGTTCAACAAGA ACTACGTCCCCGCCTTCGAGGAGTATATGCAGAAGGCCCTGGTGACCTCGGGCAACCG GCTCCTGATCACGTTCAGCTTCCTGGGCATGGACGAGGTCGCGACCATCCAGGCCTTC GAGTGGGTGAAGTCGAACGCCAAGATGATCGTGTCGTCGAACAAGGTGCTCCGCCTGA TCGACGACATCATGTCGCACGAGGAAGAGGACGAGCGCGGCCATGTGGCCACGGGCA TCGAGTGCTTCGTGAAGGAGCACGGCCTGACCCGCGAGGAAGTGATCGTCGAGTTCCA TAAGCGGATCGACGACGCGTGGAAGGACATCAACGAGGAGTTCATCACCCCGAACAAC CTGCCGATCGAGATCCTGACCCGCGTGCTCAACCTGACCCGGATCGGCGACGTGGTCT ACAAGTATGACGACGGCTACACGCACCCCGAGAAGGCCCTCAAGGACCATATCATCAG CCTGTTCGTGGACCCCGTCAGCATCTGA Sequence in Italics is the SPppa promoter; the underlined sequence is the codon optimized SUMO and the sequence in normal font is the codon optimized NjPAT. SEQ ID NO: 19 SUMO-NjPAT Aminoacid sequence MSDSEVNQEAKPEVKPEVKPETHINLKVSDGSSEIFFKIKKTTPLRRLMEAFAKRQGKE MDSLRFLYDGIRIQADQTPEDLDMEDNDIIEAHREQIGMSIIIIATNSTEHPIFRPLANFPP SLWGNLFTSFSMDNQAREIYAKEHEGLKEKVRMMFLDTTNYKISEKINFINTVERLGVS YHFEKEIEELLHQMFDAHSKHLDDIQEFDLFTLGIYFRILRQHGYKISCDVFNKLKDSN GEFKDELKDDVNGMLSFYEATHVRTHGENILDEALIYTKAQLESMAAASLSPFLANQVK HALMQALHKGIPRIEARNYISVYEEDPNKNDLLLRFSKIDFNLVQMIHKQELCDTFRWW KDLEFESKLSFARNRVVEAYLWTLSAYYEPKYSSARIILVKLMVIISVTDDTYDAYGTLDE LQLFTDAIQRLDMSSINQLPDYMKTIYKALLDLFDEIEDRLSKHETDHSYRVAYAKYVYK EIVRCYDMEYKWFNKNYVPAFEEYMQKALVTSGNRLLITFSFLGMDEVATIQAFEWVK SNAKMIVSSNKVLRLIDDIMSHEEEDERGHVATGIECFVKEHGLTREEVIVEFHKRIDDA WKDINEEFITPNNLPIEILTRVLNLTRIGDVVYKYDDGYTHPEKALKDHIISLFVDPVSI The underlined sequence is the SUMO tag SEQ ID NO: 20 NusA_NjPAT_Rv Nucleotide sequence TGATCGACATCGCCTCGTCGCCGAAC SEQ ID NO: 21 NusA_NjPAT_Fw Nucleotide sequence CGAGGCGATGTCGATCATCATCGCC SEQ ID NO: 22 SPppa-NusA-NRAT Nucleotide sequence AAATTTATTTGCTTTGTGAGCGGATAACAATTATTAGATTCACCGGCGAGCCAGCAGGA ATTTCACTCTAGATGACAGGAGGGACAT ATGAACAAGGAGATCCTCGCGGTGGTGGAGGCGGTGTCGAACGAGAAGGCGCTGCCC CGCGAGAAGATCTTCGAGGCCCTGGAGTCGGCCCTGGCCACCGCGACCAAGAAGAAG TACGAGCAGGAGATCGACGTGCGCGTGCAGATCGACCGCAAGTCGGGCGACTTCGAC ACGTTCCGCCGCTGGCTCGTGGTGGACGAGGTGACCCAGCCCACGAAGGAGATCACCC TGGAGGCGGCCCGCTATGAGGACGAGTCGCTGAACCTCGGCGACTATGTGGAGGACC AGATCGAGTCGGTGACCTTCGACCGCATCACCACGCAGACGGCGAAGCAGGTGATCGT GCAGAAGGTGCGCGAGGCCGAGCGCGCCATGGTGGTGGACCAGTTCCGCGAGCACGA GGGCGAGATCATCACCGGCGTGGTGAAGAAGGTGAACCGCGACAACATCTCGCTGGA CCTGGGCAACAACGCGGAGGCCGTGATCCTGCGCGAGGACATGCTCCCGCGCGAGAA CTTCCGCCCGGGCGACCGCGTGCGCGGCGTGCTCTATTCGGTGCGCCCCGAGGCCCGT GGCGCCCAGCTGTTCGTGACCCGCTCGAAGCCGGAGATGCTGATCGAGCTCTTCCGCA TCGAGGTGCCCGAGATCGGCGAGGAAGTGATCGAGATCAAGGCGGCCGCCCGCGACC CGGGCTCGCGCGCGAAGATCGCCGTGAAGACCAACGACAAGCGCATCGACCCCGTGG GCGCCTGCGTGGGCATGCGTGGCGCCCCCGTGCAGGCCGTGTCGACCGAGCTCGGCG GCGAGCGCATCGACATCGTGCTGTGGGACGACAACCCGGCGCAGTTCGTGATCAACGC CATGGCCCCGGCGGACGTGGCCTCGATCGTGGTGGACGAGGACAAGCATACCATGGA CATCGCCGTGGAGGCGGGCAACCTGGCCCAGGCCATCGGCCGCAACGGCCAGAACGT GCGCCTGGCCTCGCAGCTCTCGGGCTGGGAGCTGAACGTGATGACGGTGGACGACCT GCAGGCCAAGCATCAGGCCGAGGCCCATGCCGCCATCGACACCTTCACGAAGTACCTC GACATCGACGAGGACTTCGCGACCGTGCTCGTGGAGGAAGGCTTCTCGACGCTGGAG GAGCTCGCCTATGTGCCGATGAAGGAGCTGCTCGAGATCGAGGGCCTGGACGAGCCG ACGGTGGAGGCGCTCCGCGAGCGCGCCAAGAACGCCCTGGCCACCATCGCCCAGGCC CAGGAAGAGTCGCTGGGCGACAACAAGCCGGCCGACGACCTGCTCAACCTGGAGGGC GTGGACCGCGACCTGGCCTTCAAGCTCGCCGCCCGCGGCGTGTGCACGCTCGAGGAC CTGGCCGAGCAGGGCATCGACGACCTGGCCGACATCGAGGGCCTCACCGACGAGAAG GCCGGCGCCCTGATCATGGCCGCCCGCAACATCTGCTGGTTCGGCGACGAGGCGATGT CGATCATCATCGCCACCAACAGCACCGAGCATCCCATCTTCCGCCCGCTCGCCAACTTC CCGCCGTCGCTCTGGGGCAACCTGTTCACCTCGTTCAGCATGGACAACCAGGCGCGCG AGATCTACGCCAAGGAGCACGAGGGCCTCAAGGAGAAGGTCCGGATGATGTTCCTGGA CACCACGAACTACAAGATCTCGGAGAAGATCAACTTCATCAACACCGTCGAGCGCCTG GGCGTGAGCTATCACTTCGAGAAGGAGATCGAGGAGCTGCTCCATCAGATGTTCGACG CCCACTCGAAGCATCTGGACGACATCCAGGAGTTCGACCTCTTCACGCTGGGCATCTA CTTCCGCATCCTGCGGCAGCATGGCTATAAGATCTCGTGCGACGTCTTCAACAAGCTG AAGGACAGCAACGGCGAGTTCAAGGACGAGCTCAAGGACGACGTCAACGGCATGCTG TCGTTCTATGAGGCCACCCATGTGCGCACCCATGGCGAGAACATCCTCGACGAGGCGC TGATCTACACGAAGGCCCAGCTGGAGTCGATGGCGGCCGCCTCGCTCAGCCCGTTCCT GGCCAACCAGGTGAAGCACGCGCTCATGCAGGCCCTGCATAAGGGCATCCCGCGCATC GAGGCGCGGAACTACATCTCGGTCTATGAGGAAGACCCGAACAAGAACGACCTGCTCC TGCGCTTCAGCAAGATCGACTTCAACCTGGTGCAGATGATCCACAAGCAGGAGCTGTG CGACACCTTCCGGTGGTGGAAGGACCTGGAGTTCGAGTCGAAGCTGTCGTTCGCCCGC AACCGGGTGGTGGAGGCCTACCTCTGGACGCTGTCGGCGTACTATGAGCCGAAGTATT CGAGCGCCCGCATCATCCTCGTGAAGCTGATGGTCATCATCTCGGTGACCGACGACAC GTACGACGCCTATGGCACCCTGGACGAGCTCCAGCTGTTCACGGACGCGATCCAGCGC CTCGACATGTCGAGCATCAACCAGCTGCCCGACTACATGAAGACCATCTATAAGGCGC TCCTGGACCTCTTCGACGAGATCGAGGACCGCCTGTCGAAGCACGAGACGGACCATAG CTACCGGGTGGCGTATGCCAAGTACGTCTATAAGGAGATCGTGCGCTGCTACGACATG GAGTATAAGTGGTTCAACAAGAACTACGTCCCCGCCTTCGAGGAGTATATGCAGAAGG CCCTGGTGACCTCGGGCAACCGGCTCCTGATCACGTTCAGCTTCCTGGGCATGGACGA GGTCGCGACCATCCAGGCCTTCGAGTGGGTGAAGTCGAACGCCAAGATGATCGTGTCG TCGAACAAGGTGCTCCGCCTGATCGACGACATCATGTCGCACGAGGAAGAGGACGAGC GCGGCCATGTGGCCACGGGCATCGAGTGCTTCGTGAAGGAGCACGGCCTGACCCGCG AGGAAGTGATCGTCGAGTTCCATAAGCGGATCGACGACGCGTGGAAGGACATCAACGA GGAGTTCATCACCCCGAACAACCTGCCGATCGAGATCCTGACCCGCGTGCTCAACCTG ACCCGGATCGGCGACGTGGTCTACAAGTATGACGACGGCTACACGCACCCCGAGAAGG CCCTCAAGGACCATATCATCAGCCTGTTCGTGGACCCCGTCAGCATCTGA Sequence in Italics is the SPppa promoter; the underlined sequence is the codon optimized NusA and the sequence in normal font is the codon optimized NjPAT. SEQ ID NO: 23 NusA-NjPAT Aminoacid sequence MNKEILAVVEAVSNEKALPREKIFEALESALATATKKKYEQEIDVRVQIDRKSGDFDTFR RWLVVDEVTQPTKEITLEAARYEDESLNLGDYVEDQIESVTFDRITTQTAKQVIVQKVRE AERAMVVDQFREHEGEIITGVVKKVNRDNISLDLGNNAEAVILREDMLPRENFRPGDRV RGVLYSVRPEARGAQLFVTRSKPEMLIELFRIEVPEIGEEVIEIKAAARDPGSRAKIAVKT NDKRIDPVGACVGMRGARVQAVSTELGGERIDIVLWDDNPAQFVINAMAPADVASIVVD EDKHTMDIAVEAGNLAQAIGRNGQNVRLASQLSGWELNVMTVDDLQAKHQAEAHAAI DTFTKYLDIDEDFATVLVEEGFSTLEELAYVPMKELLEIEGLDEPTVEALRERAKNALAT IAQAQEESLGDNKPADDLLNLEGVDRDLAFKLAARGVCTLEDLAEQGIDDLADIEGLTD EKAGALIMAARNICWFGDEAMSIIIATNSTEHPIFRPLANFPPSLWGNLFTSFSMDNQAR EIYAKEHEGLKEKVRMMFLDTTNYKISEKINFINTVERLGVSYHFEKEIEELLHQMFDA HSKHLDDIQEFDLFTLGIYFRILRQHGYKISCDVFNKLKDSNGEFKDELKDDVNGMLSF YEATHVRTHGENILDEALIYTKAQLESMAAASLSPFLANQVKHALMQALHKGIPRIEAR NYISVYEEDPNKNDLLLRFSKIDFNLVQMIHKQELCDTFRWWKDLEFESKLSFARNRVV EAYLWTLSAYYEPKYSSARIILVKLMVIISVTDDTYDAYGTLDELQLFTDAIQRLDMSSIN QLPDYMKTIYKALLDLFDEIEDRLSKHETDHSYRVAYAKYVYKEIVRCYDMEYKWFNK NYVPAFEEYMQKALVTSGNRLLITFSFLGMDEVATIQAFEWVKSNAKMIVSSNKVLRLI DDIMSHEEEDERGHVATGIECFVKEHGLTREEVIVEFHKRIDDAWKDINEEFITPNNLPI EILTRVLNLTRIGDVVYKYDDGYTHPEKALKDHIISLFVDPVSI The underlined sequence is the NusA tag SEQ ID NO: 24 RsTRX_NjPAT_Rv Nucleotide sequence TGATCGACATGAGCGCCGAGGCGATC SEQ ID NO: 25 RsTRX_NjPAT_Fw Nucleotide sequence ggcgctcATGTCGATCATCATCGCC SEQ ID NO: 26 SPppa-RsTRX-NjPAT Nucleotide sequence AAATTTATTTGCTTTGTGAGCGGATAACAATTATTAGATTCACCGGCGAGCCAGCAGGA ATTTCACTCTAGATGACAGGAGGGACATATGTCCACCGTTCCCGTGACGGACGCCACC TTCGACACCGAGGTGCGCAAGTCCGACGTGCCCGTCGTCGTCGATTTCTGGGCCGAAT GGTGCGGCCCCTGCCGGCAGATCGGCCCGGCGCTCGAGGAGCTCTCGAAGGAATATG CCGGCAAGGTGAAGATCGTGAAGGTCAATGTCGACGAGAACCCCGAGAGCCCGGCGA TGCTGGGCGTTCGCGGCATCCCGGCGCTGTTCCTGTTCAAGAACGGTCAGGTCGTGTC GAACAAGGTCGGCGCTGCGCCGAAGGCCGCGCTGGCCACCTGGATCGCCTCGGCGCT CATGTCGATCATCATCGCCACCAACAGCACCGAGCATCCCATCTTCCGCCCGCTCGCC AACTTCCCGCCGTCGCTCTGGGGCAACCTGTTCACCTCGTTCAGCATGGACAACCAGG CGCGCGAGATCTACGCCAAGGAGCACGAGGGCCTCAAGGAGAAGGTCCGGATGATGT TCCTGGACACCACGAACTACAAGATCTCGGAGAAGATCAACTTCATCAACACCGTCGA GCGCCTGGGCGTGAGCTATCACTTCGAGAAGGAGATCGAGGAGCTGCTCCATCAGATG TTCGACGCCCACTCGAAGCATCTGGACGACATCCAGGAGTTCGACCTCTTCACGCTGG GCATCTACTTCCGCATCCTGCGGCAGCATGGCTATAAGATCTCGTGCGACGTCTTCAAC AAGCTGAAGGACAGCAACGGCGAGTTCAAGGACGAGCTCAAGGACGACGTCAACGGC ATGCTGTCGTTCTATGAGGCCACCCATGTGCGCACCCATGGCGAGAACATCCTCGACG AGGCGCTGATCTACACGAAGGCCCAGCTGGAGTCGATGGCGGCCGCCTCGCTCAGCCC GTTCCTGGCCAACCAGGTGAAGCACGCGCTCATGCAGGCCCTGCATAAGGGCATCCCG CGCATCGAGGCGCGGAACTACATCTCGGTCTATGAGGAAGACCCGAACAAGAACGACC TGCTCCTGCGCTTCAGCAAGATCGACTTCAACCTGGTGCAGATGATCCACAAGCAGGA GCTGTGCGACACCTTCCGGTGGTGGAAGGACCTGGAGTTCGAGTCGAAGCTGTCGTTC GCCCGCAACCGGGTGGTGGAGGCCTACCTCTGGACGCTGTCGGCGTACTATGAGCCGA AGTATTCGAGCGCCCGCATCATCCTCGTGAAGCTGATGGTCATCATCTCGGTGACCGA CGACACGTACGACGCCTATGGCACCCTGGACGAGCTCCAGCTGTTCACGGACGCGATC CAGCGCCTCGACATGTCGAGCATCAACCAGCTGCCCGACTACATGAAGACCATCTATA AGGCGCTCCTGGACCTCTTCGACGAGATCGAGGACCGCCTGTCGAAGCACGAGACGGA CCATAGCTACCGGGTGGCGTATGCCAAGTACGTCTATAAGGAGATCGTGCGCTGCTAC GACATGGAGTATAAGTGGTTCAACAAGAACTACGTCCCCGCCTTCGAGGAGTATATGC AGAAGGCCCTGGTGACCTCGGGCAACCGGCTCCTGATCACGTTCAGCTTCCTGGGCAT GGACGAGGTCGCGACCATCCAGGCCTTCGAGTGGGTGAAGTCGAACGCCAAGATGATC GTGTCGTCGAACAAGGTGCTCCGCCTGATCGACGACATCATGTCGCACGAGGAAGAGG ACGAGCGCGGCCATGTGGCCACGGGCATCGAGTGCTTCGTGAAGGAGCACGGCCTGA CCCGCGAGGAAGTGATCGTCGAGTTCCATAAGCGGATCGACGACGCGTGGAAGGACAT CAACGAGGAGTTCATCACCCCGAACAACCTGCCGATCGAGATCCTGACCCGCGTGCTC AACCTGACCCGGATCGGCGACGTGGTCTACAAGTATGACGACGGCTACACGCACCCCG AGAAGGCCCTCAAGGACCATATCATCAGCCTGTTCGTGGACCCCGTCAGCATCTGA Sequence in Italics is the SPppa promoter; the underlined sequence is the codon optimized RsTRX and the sequence in normal font is the codon optimized NiPAT. SEQ ID NO: 27 RsTRX-NiPAT Aminoacid sequence MSTVPVTDATFDTEVRKSDVPVVVDFWAEWCGPCRQIGPALEELSKEYAGKVKIVKVN VDENPESPAMLGVRGIPALFLFKNGQVVSNKVGAAPKAALATWIASALMSIIIATNSTEH PIFRPLANFPPSLWGNLFTSFSMDNQAREIYAKEHEGLKEKVRMMFLDTTNYKISEKIN FINTVERLGVSYHFEKEIEELLHQMFDAHSKHLDDIQEFDLFTLGIYFRILRQHGYKISC DVFNKLKDSNGEFKDELKDDVNGMLSFYEATHVRTHGENILDEALIYTKAQLESMAAA SLSPFLANQVKHALMQALHKGIPRIEARNYISVYEEDPNKNDLLLRFSKIDFNLVQMIHK QELCDTFRWWKDLEFESKISFARNRVVEAYLWTLSAYYEPKYSSARIILVKLMVIISVTD DTYDAYGTLDELQLFTDAIQRLDMSSINQLPDYMKTIYKALLDLFDEIEDRLSKHETDHS YRVAYAKYVYKEIVRCYDMEYKWFNKNYVPAFEEYMQKALVTSGNRLLITFSFLGMDE VATIQAFEWVKSNAKMIVSSNKVLRLIDDIMSHEEEDERGHVATGIECFVKEHGLTREE VIVEFHKRIDDAWKDINEEFITPNNLPIEILTRVLNLTRIGDVVYKYDDGYTHPEKALKD HIISLFVDPVSI The underlined sequence is the RsTRX tag SEQ ID NO: 28 GST_NRAT_Rv Nucleotide sequence ATCGACATCTTGGGCGGATGG SEQ ID NO: 29 GST_NjPAT_Fw Nucleotide sequence GCCCAAGATGTCGATCATCATCGCC SEQ ID NO: 30 SPppa-GST-NRAT Nucleotide sequence AAATTTATTTGCTTTGTGAGCGGATAACAATTATTAGATTCACCGGCGAGCCAGCAGGA ATTTCACTCTAGATGACAGGAGGGACATATGTCGCCGATCCTGGGCTATTGGAAGATC AAGGGCCTCGTGCAGCCCACCCGCCTGCTCCTGGAGTACCTGGAGGAGAAGTATGAGG AGCACCTCTACGAGCGCGACGAGGGCGACAAGTGGCGCAACAAGAAGTTCGAGCTCG GCCTGGAGTTCCCGAACCTGCCCTACTATATCGACGGCGACGTGAAGCTCACGCAGTC GATGGCCATCATCCGCTACATCGCGGACAAGCATAACATGCTGGGCGGCTGCCCCAAG GAGCGCGCGGAGATCTCGATGCTGGAGGGCGCGGTGCTCGACATCCGCTATGGCGTG TCGCGCATCGCCTACTCGAAGGACTTCGAGACCCTGAAGGTGGACTTCCTCTCGAAGC TGCCGGAGATGCTCAAGATGTTCGAGGACCGCCTGTGCCACAAGACCTATCTCAACGG CGACCACGTGACGCATCCCGACTTCATGCTCTATGACGCGCTGGACGTGGTGCTCTAC ATGGACCCGATGTGCCTGGACGCCTTCCCCAAGCTCGTGTGCTTCAAGAAGCGCATCG AGGCGATCCCGCAGATCGACAAGTATCTGAAGTCGTCGAAGTACATCGCCTGGCCCCT CCAGGGCTGGCAGGCGACGTTCGGCGGCGGCGACCATCCGCCCAAGATGTCGATCAT CATCGCCACCAACAGCACCGAGCATCCCATCTTCCGCCCGCTCGCCAACTTCCCGCCG TCGCTCTGGGGCAACCTGTTCACCTCGTTCAGCATGGACAACCAGGCGCGCGAGATCT ACGCCAAGGAGCACGAGGGCCTCAAGGAGAAGGTCCGGATGATGTTCCTGGACACCA CGAACTACAAGATCTCGGAGAAGATCAACTTCATCAACACCGTCGAGCGCCTGGGCGT GAGCTATCACTTCGAGAAGGAGATCGAGGAGCTGCTCCATCAGATGTTCGACGCCCAC TCGAAGCATCTGGACGACATCCAGGAGTTCGACCTCTTCACGCTGGGCATCTACTTCC GCATCCTGCGGCAGCATGGCTATAAGATCTCGTGCGACGTCTTCAACAAGCTGAAGGA CAGCAACGGCGAGTTCAAGGACGAGCTCAAGGACGACGTCAACGGCATGCTGTCGTTC TATGAGGCCACCCATGTGCGCACCCATGGCGAGAACATCCTCGACGAGGCGCTGATCT ACACGAAGGCCCAGCTGGAGTCGATGGCGGCCGCCTCGCTCAGCCCGTTCCTGGCCAA CCAGGTGAAGCACGCGCTCATGCAGGCCCTGCATAAGGGCATCCCGCGCATCGAGGC GCGGAACTACATCTCGGTCTATGAGGAAGACCCGAACAAGAACGACCTGCTCCTGCGC TTCAGCAAGATCGACTTCAACCTGGTGCAGATGATCCACAAGCAGGAGCTGTGCGACA CCTTCCGGTGGTGGAAGGACCTGGAGTTCGAGTCGAAGCTGTCGTTCGCCCGCAACCG GGTGGTGGAGGCCTACCTCTGGACGCTGTCGGCGTACTATGAGCCGAAGTATTCGAGC GCCCGCATCATCCTCGTGAAGCTGATGGTCATCATCTCGGTGACCGACGACACGTACG ACGCCTATGGCACCCTGGACGAGCTCCAGCTGTTCACGGACGCGATCCAGCGCCTCGA CATGTCGAGCATCAACCAGCTGCCCGACTACATGAAGACCATCTATAAGGCGCTCCTG GACCTCTTCGACGAGATCGAGGACCGCCTGTCGAAGCACGAGACGGACCATAGCTACC GGGTGGCGTATGCCAAGTACGTCTATAAGGAGATCGTGCGCTGCTACGACATGGAGTA TAAGTGGTTCAACAAGAACTACGTCCCCGCCTTCGAGGAGTATATGCAGAAGGCCCTG GTGACCTCGGGCAACCGGCTCCTGATCACGTTCAGCTTCCTGGGCATGGACGAGGTCG CGACCATCCAGGCCTTCGAGTGGGTGAAGTCGAACGCCAAGATGATCGTGTCGTCGAA CAAGGTGCTCCGCCTGATCGACGACATCATGTCGCACGAGGAAGAGGACGAGCGCGG CCATGTGGCCACGGGCATCGAGTGCTTCGTGAAGGAGCACGGCCTGACCCGCGAGGA AGTGATCGTCGAGTTCCATAAGCGGATCGACGACGCGTGGAAGGACATCAACGAGGAG TTCATCACCCCGAACAACCTGCCGATCGAGATCCTGACCCGCGTGCTCAACCTGACCC GGATCGGCGACGTGGTCTACAAGTATGACGACGGCTACACGCACCCCGAGAAGGCCCT CAAGGACCATATCATCAGCCTGTTCGTGGACCCCGTCAGCATCTGA Sequence in Italics is the SPppa promoter; the underlined sequence is the codon optimized GST and the sequence in normal font is the codon optimized NjPAT. SEQ ID NO: 31 GST-NjPAT Aminoacid sequence MSPILGYWKIKGLVQPTRLLLEYLEEKYEEHLYERDEGDKWRNKKFELGLEFPNLPYYI DGDVKLTQSMAIIRYIADKHNMLGGCPKERAEISMLEGAVLDIRYGVSRIAYSKDFETLK VDFLSKLPEMLKMFEDRLCHKTYLNGDHVTHPDFMLYDALDVVLYMDPMCLDAFPKL VCFKKRIEAIPQIDKYLKSSKYIAWPLQGWQATFGGGDHPPKMSIIIATNSTEHPIFRPLA NFPPSLWGNLFTSFSMDNQAREIYAKEHEGLKEKVRMMFLDTTNYKISEKINFINTVER LGVSYHFEKEIEELLHQMFDAHSKHLDDIQEFDLFTLGIYFRILRQHGYKISCDVFNKLK DSNGEFKDELKDDVNGMLSFYEATHVRTHGENILDEALIYTKAQLESMAAASLSPFLAN QVKHALMQALHKGIPRIEARNYISVYEEDPNKNDLLLRFSKIDFNLVQMIHKQELCDTF RWWKDLEFESKLSFARNRVVEAYLWTLSAYYEPKYSSARIILVKLMVIISVTDDTYDAYG TLDELQLFTDAIQRLDMSSINQLPDYMKTIYKALLDLFDEIEDRLSKHETDHSYRVAYAK YVYKEIVRCYDMEYKWFNKNYVPAFEEYMQKALVTSGNRLLITFSFLGMDEVATIQAFE WVKSNAKMIVSSNKVLRLIDDIMSHEEEDERGHVATGIECFVKEHGLTREEVIVEFHKR IDDAWKDINEEFITPNNLPIEILTRVLNLTRIGDVVYKYDDGYTHPEKALKDHIISLFVDP VSI The underlined sequence is the GST tag