SYNTHETIC SANTALENE SYNTHASES

Abstract

Disclosed are santalene synthases with improved product profile and methods for improving santalene synthases. The invention further relates to santalene compositions produced by fermentation that have a greater beta-santalene content than alpha-santalene content.

Claims

1.-15. (canceled)

16. A synthetic beta santalene synthase characterized by the fact that the tertiary structure of the part of said synthetic beta santalene synthase that correspond to the stretch from amino acid positions 272 to position 291 of SEQ ID NO: 1 has an increased flexibility compared to the flexibility of the same tertiary structure in a naturally occurring santalene synthase, wherein the flexibility is determined by root mean square fluctuation analysis using simulations for 500 ns for both the synthetic and the naturally occurring santalene synthase with these settings: pH 8.0, 300 K, 1 atm, water environment, ions present without substrate, and evaluation for each enzyme structure on the last 450 ns of simulation; and wherein said synthetic beta santalene synthase is further characterized by its ability to produce beta-santalene and alpha-santalene in a ratio that is equal to or greater than 1 under typical conditions suitable for the production of both these santalenes, wherein beta-santalene is (−)-β-santalene (CAS number 511-59-1).

17. A synthetic beta santalene synthase producing beta-santalene and alpha-santalene from farnesyl pyrophosphate, wherein the santalene synthase has at least 50% sequence identity to a. the amino acid positions 261 to 278, preferably position 261 to position 272, of SEQ ID NO: 2, 3, 29, 57 or 58 wherein the position corresponding to position 261 of SEQ ID NO. 2, 3, 29, 57 or 58 is an Arginine residue and the position corresponding to position 278 of SEQ ID NO: 2, 3, 29, 57 or 58 is a Proline residue and said Arginine and said Proline are used to align the two protein sequences for the sequence identity determination; or b. the amino acid positions 261 to 302 of SEQ ID NO: 2, 3 or 40, wherein the position corresponding to position 261 of SEQ ID No. 2 or 3 is an Arginine residue and three Aspartate residues are found at positions that correspond to the Aspartates at positions 298, 299 and 302 of SEQ ID NO: 2 or 3 or 29 to 40; or c. a combination of a. and b. above; or d. the full-length of SEQ ID NO: 1 e. a combination of any of a. to c. above with d.; and wherein said synthetic beta santalene synthase is producing beta-santalene and alpha-santalene in a ratio of the two that is equal to or greater than 1 under conditions suitable for the production of these santalenes.

18. The synthetic beta santalene synthase of claim 16, wherein it further has the following amino acids at the position corresponding to the positions in SEQ ID NO 1 provided in brackets: An Arginine (261), Aspartate (262), Arginine (263), Leucine or Isoleucine or Valine or Methionine (264) Leucine or Isoleucine or Valine (265), Glutamic Acid or Glutamine (266) and Histidine or Tyrosine (268).

19. The synthetic beta santalene synthase of claim 16 that has at the amino acid position corresponding to a. position 267 of SEQ ID NO: 1 any of the following amino acids: Serine, Leucine, Threonine, Cysteine, Isoleucine, Valine, Tryptophan, Glycine or Alanine; or b. position 291 of SEQ ID NO: 1 any of the following amino acids: Threonine, Cysteine, Serine, Phenylalanine or Valine; or c. a combination of a. and b. above; or d. position 267 of SEQ ID NO: 1 an Asparagine and the position corresponding to position 291 of SEQ ID NO: 1 is any of the following amino acids: Threonine, Cysteine, Serine, Phenylalanine or Valine; or e. position 267 of SEQ ID NO: 1 any of the following amino acids: Serine, Leucine, Threonine, Cysteine, Isoleucine, Valine, Tryptophan, Glycine or Alanine and the position corresponding to position 291 of SEQ ID NO: 1 is an Isoleucine.

20. A synthetic nucleic acid encoding for the synthetic santalene synthase of claim 16.

21. An expression cassette comprising the synthetic nucleic acid of claim 20.

22. A method for the production of a composition comprising beta-santalene in excess of alpha-santalene comprising the following steps: I. providing one or more improved beta santalene synthase according to claim 16 in an active form and with all required co-factors, II. contacting farnesyl pyrophosphate with the one or more improved beta santalene synthases under conditions permitting the production of santalenes, III. producing beta-santalene and alpha santalene and optionally bergamotene from farnesyl pyrophosphate, wherein the amount of beta-santalene produced is larger than the amount of alpha-santalene produced, wherein beta-santalene is (−)-β-santalene (CAS number 511-59-1), IV. optionally purifying the products.

23. A non-human host cell suitable to produce the santalene synthase according to claim 16 from a nucleic acid encoding said santalene synthase and suitable to provide the santalene synthase with farnesyl pyrophosphate and all co-factors required for its activity, wherein the host cell comprises a nucleic acid encoding the santalene synthase.

24. The santalene synthase, non-human host cell or the method of claim 16 wherein the santalene synthase produces an excess of trans-α-bergamotene over alpha-santalene in addition to producing more beta-santalene than alpha-santalene.

25. A composition produced by a synthetic santalene synthase, the method or the non-human host cell of claim 16, wherein the composition comprises beta-santalene in excess to alpha-santalene, wherein beta-santalene is (−)-β-santalene (CAS number 511-59-1).

26. The composition of claim 25, wherein the composition comprises more beta-santalene than bergamotene, and more bergamotene than alpha-santalene.

27. The composition of claim 16 wherein the composition comprises at least 12% (w/w) trans-α-bergamotene.

28. A method for the production of a composition comprising beta-santalol in excess to alpha-santalol comprising the steps of: I. Producing a composition comprising beta-santalene in excess over alpha-santalene by a method of claim 16 or with the host cells or the santalene synthases of claim 16; II. Oxidising at least some of the beta-santalene and alpha-santalene in the composition produced in a) to their respective alcohols to produce a composition comprising beta-santalol in excess of alpha-santalol; and III. optionally purifying the products.

29. A composition produced by the method of claim 28 comprising beta-santalol in excess to alpha-santalol, wherein the beta-santalol to alpha santalol ratio is at least 1.3.

30. Use of any of the synthetic santalene synthases of SEQ ID NO: 2, 3, 13 to 53, 56 to 58 for the production of a composition comprising alpha-santalene, beta-santalene and trans-a-bergamotene.

Description

DESCRIPTION OF FIGURES

[0213] FIG. 1 shows an alignment of the known santalene synthases CiCaSSy wildtype (SEQ ID NO: 1), SaSSY, SaSSy14, SspiSSy, SauSSy, ClaSSy and SaSSy134 (SEQ ID NO: 4 to 9, respectively). SaSSY134 is labelled with SEQ 280 in this alignment. Also included are the two improved beta santalene synthases N267S and N267L mutants (SEQ ID NO: 2 and 3) The alignment was done with the clustalw software using typical settings. Black background shading marks a strongly conserved residue, grey background shading a residue that is conserved in at least 50% of the aligned sequences, white shading marks non-conserved amino acids.

[0214] FIG. 2 shows a 3D model of CiCaSSy SEQ ID NO: 1 created with the PyrMol software. The Alpha helices are shown, and black marks the two helices Helix C (short black helix) and Helix D (longer black helix) of CiCaSSy.

[0215] FIG. 3 shows a graphical representation of the interaction of Helix C and Helix D in the wildtype CiCaSSy (A) and the N267S mutant (B). The alpha helix in the center of the images represents Helix D, the alpha helix to the left represents Helix C. The side chains of the two amino acids in position 267 are marked by dark grey

[0216] FIG. 4 shows the changes in the three main products alpha-Santalene, beta-santalene and bergamotene by improving the santalene synthase of SEQ ID NO: 1 (“Wildtype”). Values have been normalised on these three major products; minor products are not shown. Filled black bars represent alpha-santalene, empty bars represent bergamotene and diagonally lined bars represent beta-santalene. Replacing position 267 with a Serine as in SEQ ID NO: 2 (“N2675”) or a Leucine residue as in SEQ ID NO: 3 (“N267L”) allows the enzyme to produce more beta-santalene and more bergamotene (N267S) than alpha-santalene, or more bergamotene and still considerable alpha-santalene, but less beta-santalene (N267L). Data for two santalene synthases known in the art are shown for comparison (termed “SaSSy” and “SaSSY-134” in FIG. 4). The data was taken from the reported values in the art, see WO2015153501. The known santalene synthases (Wildtype ClCassy, SaSSY and SaSSy-134) show a larger production of alpha-santalene than of the other two compounds. The improved version of N267S and N267L show how this product profile can be altered according to the desired prevalence of either beta-santalene alone over alpha-santalene as by N267L or of both beta-santalene and bergamotene over alpha-santalene as by N267S.

[0217] FIG. 5 shows the changes in the three products alpha-Santalene, bergamotene and beta-santalene by improving the santalene synthase at the position that corresponds to position 291 of SEQ ID NO: 1 alone or in combination with modifications of the position that corresponds to position 267 of SEQ IDNO: 1. Minor products are not shown for improved clarity. Filled black bars represent alpha-santalene, empty bars represent bergamotene and diagonally lined bars represent beta-santalene. Wildtype (SEQ ID NO: 1) and the modified enzyme “I291L” are shown as controls. Replacing position 291 with a Leucine residue as in SEQ ID NO: 53 (“I291L”) did not change the fact that a surplus of alpha-santalene compared to beta-santalene and bergamotene is produced, on the contrary this modification enhances the production of alpha-santalol over the one of the wildtype enzyme as can be seen from the figure.

[0218] Replacing position 291 with a Valine, Serine, Threonine or Cysteine (“I291V”, “I291S”, “I291T” and “I291C”, respectively), allows the enzyme to produce more beta-santalene than alpha-santalene, yet maintain much larger levels of alpha-santalene than in the N267S version of the improved beta santalene synthase. The improved version of I291V, I291S, I291C and I291T show how this product profile can be altered according to the desired prevalence of either beta-santalene alone over alpha-santalene as by I291T, I291S and I291C, or of both beta-santalene over alpha-santalene and bergamotene at levels similar or above those of alpha-santalene as by I291V, yet maintaining larger alpha-santalene levels compared to the N267S improvement. Such a profile with more remaining alpha-santalene can be advantageous for some applications.

[0219] The last two groups of bars show the results for the two double mutants with the positions corresponding to positions 267 and 291 of SEQ ID NO: 1 being modified. The data shown for “I291T/N267S” is from an enzyme in which the position 267 was filled with a Serine, and the position 291 with a Threonine. The data shown for “I291T/N267T” is for one that had a Threonine introduced in both these positions. As can be seen, from the mutants shown in FIG. 5 the largest percentage of beta-santalene is produced by the double mutant “I291T/N267S”. The amounts of alpha-santalene and bergamotene for the “I291T/N267S” enzyme are a type of intermediate of these values for the two single mutants, with the mutation N267S having the more impact on these values than I291T in this combination. The data for the other double mutant shows that a Threonine at position 267 has similar effects on alpha-santalene, bergamotene and beta-santalene than a Serine at this position causes, yet not quite as strong.

EXAMPLES

[0220] Publicly available electronic sequence information was used to analyse santalene synthase structures using standard software tools. A 3D model was generated of CiCaSSy (SEQ ID NO: 1), a santalene synthase from Cinnamomum camphora disclosed as SEQ ID NO 3 in the international patent application published as WO2018160066 with a normal alpha-santalene to beta-santalene ratio Common tools for such analysis are for example Structural alignment software: DALI, CE, STAMP; see http://www.rcsb.org/pdb/home/home.do for a choice.

[0221] The enzyme known as CiCaSSy has a bit unusual amino acid positioning compared to other santalene synthases. For example, it shares less than 50% sequence identity with many other santalene synthases yet combines elements from many other santalene synthases in some stretches. The active site cavity was identified, and residues within this were targeted for mutagenesis. In particular, residues that might influence the product profile were prioritized. An area comprising two spatially close α-helices in the middle of the amino acid sequence was chosen for mutations. CiCaSSy has in this area of the protein some difference in amino acids compared to each santalene synthases that are known, yet many elements at the same time are shared with different groups of santalene synthases in a combination only found in CiCaSSy. If this area of the protein is the key part for the product profile changes desired, transfer to other santalene sequences is easily feasible even if they differ in the remaining part to a great extent.

Mutation Testing

[0222] After in depth study the residue 267 of CiCaSSy was chosen for mutation. The inventors realized that the surrounding of N267 in SEQ ID NO: 1 is so favourable that the inventors chose to replace that unusual asparagine at position 267 also with Serine and Leucine, although these are found at the corresponding location in other santalene synthases of known lacking performance. DNA sequences encoding CiCaSSy proteins with the two desired mutations at position 267 were synthesized. The resulting protein sequence named N267S and N267L are given in SEQ ID NO:2 and SEQ ID NO: 3, respectively.

[0223] A root-mean-square deviation of atomic positions (RMSD) and a root mean square fluctuation (RSMF) analysis was performed with these two novel protein sequences. Each enzyme was simulated for 500 ns in the same condition (pH 8.0, 300 K, 1 atm, water environment, ions present without substrate). RMSD provides an indication of the movements and flexibility of the overall protein while RSMF indicates the average movement and flexibility at a given position, The RSMF showed that the N267S showed the predicted increase in fluctuation and hence flexibility in the area of the loop between Helix C and Helix D and the part of Helix D that interacts with the side chain of position 267 in SEQ ID NO 1 to 3 over the wildtype CiCaSSy. It was observed that the flexibility of the stretch that corresponds to positions 272 to 291 (which is the area where the side chains of Helix D are located that will interact with the side chain of the amino acid at position 267) that was increased in N267S compared to the flexibility in the wildtype CiCaSSy. The increase was of higher magnitude in the stretch from 272 to 284 which contains the loop between Helix C and Helix D, which is expected to be less rigid than a helix of course. Both N267S and N267L had further stretches of increased fluctuations as indication of flexibility further downstream, in the area of positions 380 to 500. When this was compared with the RSMF analysis of other santalene synthases with alpha-santalene surplus production, this pattern was not observed in any of the sequences of SEQ ID NO: 4, 5, 8 or 9 analysed. RSMD analysis showed that for N267S after 30000 picoseconds the deviations in nm increased by about one fifth from the initial equilibrium. This flexibility in structure was not observed in any of the other santalene sequences analysed.

[0224] The procedure described for the wildtype CiCaSSy in examples 6 to 19 of WO2018160066 (p. 44, I.19 to p.50, I. 22; incorporated herein by reference) was applied for the experiments with the mutated CiCaSSy sequences encoding the proteins N267S and N267L. The mutated DNA sequence encoding the CiCaSSy santalene synthase of SEQ ID NO: 2 and 3 were introduced into Rhodobacter sphaeroides by the procedure disclosed in international patent application published as WO2018160066 for CiCaSSy (SEQ ID NO: 1 of the present invention), SEQ ID NO: 3 in WO2018160066 using a plasmid based system to express heterologously the DNA sequence and form the mutate enzyme. Fermentation of Rhodobacter sphaeroides for the production of, extraction of and analysis of alpha-santalene, beta-santalene and bergamotene produced by the host cells were performed as in WO2018160066.

[0225] The determination of alpha-santalene, beta-santalene and bergamotene was performed with gas chromatography with FID detector:

[0226] Gas chromatography was performed on a Shimadzu GC2010 Plus equipped with a Restek RTX-SSil MS capillary column (30 m×0.25 mm, 0.5 pm). The injector and FID detector temperatures were set to 280° C. and 300° C., respectively. Gas flow through the column was set at 40 mL/min. The oven initial temperature was 160° C. increased to 180° C. at a rate of 2° C./min, further increased to 300° C. at a rate of 50° C./min, and held at that temperature for 3 min. Injected sample volume was 1 μL with a 1:50 split-ratio, and the nitrogen makeup flow was 30 ml/min

[0227] The two enzyme mutants at the 267 position of CiCaSSy; N267S and N267L, had a significant effect on the product ratios of alpha-santalene, beta-santalene and bergamotene. Both mutations led to an increased beta-santalene production compared to wildtype CiCaSSy, even producing more beta-santalene than alpha-santalene for the first time, and an increased beta-santalene to alpha-santalene product ratio (FIG. 4). Surprisingly, the N267S mutant also produced significantly less alpha-santalene, meaning this mutant had high specificity for beta-santalene over alpha-santalene—the first time such a phenomenon has been observed. The N267L mutation had an even larger change in product ratios, and produced beta-santalene as its major product, whilst producing relatively less alpha-santalene and trans-α bergamotene as shown in FIG. 4.

[0228] Additional mutants were tested with the same experimental set-up described above. For example, replacing the position corresponding to SEQ ID NO: 267 with a Glycine, Alanine or Tryptophan also resulted in improved santalene synthases.

[0229] Further, it was found that replacing the Isoleucine at the position corresponding to the position 291 of SEQ ID NO: 1 resulted in higher alpha-santalene levels than in the wildtype, and introducing a Histidine at this position destroyed the activity as a santalene synthase. This demonstrated that the position is important, but it is also important how it is changed.

[0230] The I291V, 129S, I291C, I291F and I291T mutants were also tested and showed—as the N267S or N267L—a surplus of beta-santalene, but in comparison to N267S there was more alpha-santalene remaining, albeit less alpha-santalene than the wildtype control (see FIG. 5 and table 1). The largest percentage value for beta-santalene was found when SEQ ID NO: 34 was expressed in the host cells.

[0231] Double mutants with N267S or N267T changes at the position corresponding to position 267 of SEQ ID NO: 1 and a Threonine instead of an Isoleucine at the position corresponding to the position 291 of SEQ ID NO: 1 resulted also in improved beta santalene synthases with an excess of beta-santalene, although they showed an intermediate product profile compared to the single mutant improved beta santalene synthase enzymes (see FIG. 5).

[0232] Modelling of these mutants showed in the RSMF plot that the N257S single mutants as well as its double mutants with a Serine, Cysteine or Threonine at the position corresponding to 291 of SEQ ID NO:1 show increased flexibility in Helix C and Helix D, which s concurrent with the experimental results for N267S and its double mutant with Threonine at the position corresponding to the position 291 of SEQ ID NO: 1 (see FIG. 5). Interestingly, also the flexibility in some other regions further downstream appears to be improved as well, as the RSMF data showed.

Software Tools Used

Homology Models

[0233] Homology models were generated using the Schrödinger Prime package (www.schrodinger.com/prime; Schrödinger Release 2020-2: Prime, Schrödinger, LLC, New York, N.Y., 2020; M Jacobson et al., Proteins, 2004, 55, 351-367). Template structures were downloaded from PDB—Protein Data Bank (HM Breman et al., Nucleic Acid Research, 2000, 28, 235-242), the template structure for each homology model generation are indicated in Table 2.

TABLE-US-00013 TABLE 2 Template structures used for homology model generation. For each built homology model the Protein DataBase code of the template structure used is indicated. Homology model Template structure (PDB code) CiCaSSy 6A1I CiCaSSy N267L 6A1I CiCaSSy N267S 6A1I SaSSy 5ZZJ SaSSy 14 5ZZJ ClaSSy 6A1I SaSSy 134 5ZZJ

MD Simulations

[0234] MD simulations were performed by using the software GROMACS version 2018 (www.gromacs.org; D van Der Spoel et al., J Comput Chem, 2005, 26, 1701-1718). All the enzymes were defined in OPLS-AA forcefield (WL Jorgensen and J Tirado-Rives, J Am Chem Soc, 1988, 110, 1657-1666), enzyme protonation was defined at pH 8.0 and calculated using the tool pdb2pqr (T J Dolinsky et al., Nucleic Acids Res, 2007 35, W522-W525); the 3 metal ions (Mg2+) were included in the model by fixing their relative position to their coordinating amino acid residues as described in MW van der Kamp et al., Biochemistry, 2013, 52, 8094-8105. Each enzyme was put in the center of a cubic system of 1000 nm3 and explicitly solvated with TIP4P water (WL Jorgensen et al., J Chem Phys, 1983, 79, 926-935), total charge of the system was neutralized by adding the opportune amount of Na+ or Cl− ions. Each system was minimized for 10000 steps, using a steepest descent algorithm and subsequently equilibrated for 10 ns. After equilibration, each system was simulated for 500 ns using. Temperature was kept constant at 300 K using the v-rescale algorithm (G Bussi et al., J Chem Phys, 2007, 126, 014101), pressure was kept constant at 1 atm using the Parrinello-Rahman algorithm (M Parrinello and A Rahman, Phys Rev Lett, 1980, 45, 1196-1198.) and electrostatic interactions were simulated by the extended particle mesh Ewald algorithm (U Essmann et al., J Chem Phys, 1995, 103, 8577-8593). Simulation frames were saved every 5 ps.

RMSD

[0235] Root Mean Square Deviation (RMSD) was evaluated for each enzyme structure on the full simulation length (500 ns). Calculations were performed by the gmx rms tool of the GROMACS package after having performed a structural superimposition of the protein structure for each trajectory frame (gmx trjconv) using the equilibrated system as a reference.

RMSF

[0236] Root Mean Square Fluctuation (RMSF) was evaluated for each enzyme structure on the last 450 ns of simulation. Calculations were performed by the gmx rmsf tool of the GROMACS package after having performed a structural superimposition of the protein structure for each trajectory frame (gmx trjconv) and using the protein Cα of the equilibrated system as a reference.

Pictures

[0237] The protein pictures for FIGS. 2 and 3 were generated by using the PyMOL (pymol.org) software. RMSD and RMSF pictures were generated by using the Matplotlib library (matplotlib.org) for Python version 3.6 (python.org).

PFAM Domain Analysis

[0238] PFAM domain PF01397 “Terpene_synth” and a C-terminal PFAM domain PF03936 “Terpene_synth_C” were identified using version 32.0 of the PFAM software on May 29, 2020 and confirmed with version 33.1 of the PFAM software released on Jun. 11, 2020; for details on PFAM see “The Pfam protein families database in 2019: S. EI-Gebali, J. Mistry, A. Bateman, S. R. Eddy, A. Luciani, S. C. Potter, M. Qureshi, L. J. Richardson, G. A. Salazar, A. Smart, E. L. L. Sonnhammer, L. Hirsh, L. Paladin, D. Piovesan, S. C. E. Tosatto, R. D. Finn Nucleic Acids Research (2019) and http://pfam.xfam.org/ and “Pfam: The protein families database in 2021: J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. L. Sonnhammer, S. C. E. Tosatto, L. Paladin, S. Raj, L. J. Richardson, R. D. Finn, A. Bateman Nucleic Acids Research (2020) doi: 10.1093/nar/gkaa913”

Interpro Motifs

[0239] The following domains

TABLE-US-00014 “Terpene synthase, metal-binding domain” IPR005630 “Terpene cyclase-like 1, C-terminal domain” IPR034741 “Terpene synthase, N-terminal domain” IPR001906 And these homologous superfamilies “Isoprenoid synthase domain superfamily” IPR008949 “Terpenoid cyclases/protein prenyltransferase alpha-alpha toroid” IPR008930 “Terpene synthase, N-terminal domain superfamily” IPR036965
were identified with the InterPro scan software version 83.0, released December 2020; for further details of InterPro see: Blum M, Chang H, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar G A, Williams L, Bork P, Bridge A, Gough J, Haft D H, Letunic I, Marchler-Bauer A, Mi H, Natale D A, Necci M, Orengo C A, Pandurangan A P, Rivoire C, Sigrist C J A, Sillitoe I, Thanki N, Thomas P D, Tosatto S C E, Wu C H, Bateman A and Finn R D The InterPro protein families and domains database: 20 years on. Nucleic Acids Research, November 2020, (doi: 10.1093/nar/gkaa977)

SYNTHETIC SANTALENE SYNTHASES

Inventors

Cpc classification

Classification Explorer

C12P5/007

CHEMISTRY; METALLURGY

Classification Explorer

C12P7/02

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/88

CHEMISTRY; METALLURGY

Classification Explorer

C12Y402/03083

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N9/88

CHEMISTRY; METALLURGY

Classification Explorer

C12P5/00

CHEMISTRY; METALLURGY

Classification Explorer

C12P7/02

CHEMISTRY; METALLURGY

Abstract

Claims

Description