PROMOTER VARIANTS

20200347391 ยท 2020-11-05

    Inventors

    Cpc classification

    International classification

    Abstract

    An isolated and/or artificial pG1-x promoter, which is a functional variant of the carbon source regulatable pG1 promoter of Pichia pastoris identified by SEQ ID 1, which pG1-x promoter consists of or comprises at least a part of SEQ ID 1 with a length of at least 293 bp, characterized by the following promoter regions: a) at least one core regulatory region comprising the nucleotide sequences SEQ ID 2 and SEQ ID 3; and b) a non-core regulatory region, which is any region within the pG1-x promoter sequence other than the core regulatory region; wherein the pG1-x promoter comprises at least one mutation in any of the promoter regions and a sequence identity of at least 80% in SEQ ID 2 and SEQ ID 3, and a sequence identity of at least 50% in any region other than SEQ ID 2 or SEQ ID 3; and further wherein the pG1-x promoter is characterized by the same or an increased promoter strength and induction ratio as compared to the pG1 promoter, wherein the promoter strength is at least 1.1-fold increased in the induced state as compared to the pG1 promoter, and/or the induction ratio is at least 1.1-fold increased as compared to the pG1 promoter.

    Claims

    1-40. (canceled)

    41. A method of producing a protein of interest (POI) by culturing a recombinant host cell which comprises an expression construct expressing the POI under the control of a carbon source regulatable promoter, which method is performed according to a speed fermentation protocol starting with a batch phase as the first step, followed by a fed-batch phase as the second step, wherein: a) in the first step a basal carbon source is used which represses the promoter and the cells are cultured to grow the cells until the basal carbon source is consumed; and b) in the second step no or a growth-limiting amount of a supplemental carbon source is added, thereby de-repressing the promoter to induce production of the POI, wherein the cells are cultured at a specific growth rate within the range of 0.04 h-1 to 0.2 h-1 for around (+/10%) 15 to 80 h.

    42. The method of claim 41, wherein a) the basal carbon source is selected from the group consisting of glucose, glycerol, ethanol, a mixture thereof, and complex nutrient material; and b) the supplemental carbon source is a hexose such as glucose, fructose, galactose or mannose, a disaccharide, such as saccharose, an alcohol, such as glycerol or ethanol, or a mixture of any of the foregoing.

    43. The method of claim 41, wherein the oxygen partial pressure (pO2) is continuously decreasing during the batch phase and the end of the batch phase is characterized by an increase of pO2.

    44. The method of claim 43, wherein the pO2 is decreased to below 65% saturation during the batch phase followed by an increase to above 65% saturation at the end of the batch phase.

    45. The method of claim 41, wherein the batch phase is performed for around (+/10%) 20 to 36 h.

    46. The method of claim 41, wherein the batch phase is performed at a temperature between 25 C. and 30 C. for around (+/10%) for 23 to 36 h, using 40-50 g/L glycerol or glucose as a basal carbon source.

    47. The method of claim 41, wherein the cultivation in the fed-batch phase is performed for around (+/10%) 15-40 h.

    48. The method of claim 41, wherein the POI is produced at a space time yield of around (+/10%) 30 mg (L h)-1.

    49. The method of claim 48, wherein the cultivation in the fed-batch phase is performed for around (+/10%) 30 h.

    50. The method of claim 41, wherein the promoter is a carbon source regulatable pG1 promoter of Pichia pastoris identified by SEQ ID 1 or a functional variant promoter (pG1-x), which is characterized by the same or an increased promoter strength and/or induction ratio as compared to the pG1 promoter.

    51. The method of claim 50, wherein the pG1-x promoter comprising or consisting of the nucleotide sequence selected from the group consisting of any of a) SEQ ID 37-44, or any of SEQ ID 45-76; b) SEQ ID 77-80, or any of SEQ ID 81-112; c) SEQ ID 113-114, or any of SEQ ID 115-130; d) SEQ ID 131-132, or any of SEQ ID 133-148; e) SEQ ID 149-150, or any of SEQ ID 151-166; f) SEQ ID 167-168, or any of SEQ ID 169-184; g) SEQ ID 185-186, or any of SEQ ID 187-202; h) SEQ ID 203-204, or any of SEQ ID 205-220; i) SEQ ID 221-222, or any of SEQ ID 223-238; j) SEQ ID 239-240, or any of SEQ ID 241-256; k) SEQ ID 32-36, or any of SEQ ID 257-259; l) a functional variant of any of a)-k) above, which is characterized by one or more of the following features: i) the nucleotide sequence comprising a deletion of one or more nucleotides at the 5-end of the promoter sequence, preferably leaving at least 293 nucleotides of the 3 region of the promoter sequence; ii) the nucleotide sequence comprises one or more TFBS; iii) the nucleotide sequence comprises at least one or at least two core regulatory regions, each comprising at least 80% sequence identity to SEQ ID 4; iv) the nucleotide sequence comprises at least one or at least two main regulatory regions comprising at least 80% sequence identity to SEQ ID 5; v) the nucleotide sequence comprises at least one or at least two core regulatory regions, each comprising SEQ ID 2 and SEQ ID 3, and at least 80% sequence identity to the corresponding region within SEQ ID NO:1; vi) the nucleotide sequence comprises at least one or at least two thymine (T) motifs identified by any one of SEQ ID NO: 12-29; vii) the nucleotide sequence comprises a 3-terminal nucleotide sequence comprising at least part of a translation initiation site; viii) the nucleotide sequence is at least 80% identical to 293 bp of SEQ ID NO:1SEQ ID NO:1, ix) the nucleotide sequence has a length up to 2000 bp.

    52. The method of claim 50, wherein the pG1-x promoter is any one of SEQ ID 37-44.

    53. The method of claim 50, wherein the pG1-x promoter is any one of SEQ ID 45-76.

    54. The method of claim 41, wherein the promoter is operably linked to a nucleotide sequence encoding the POI, which nucleic acid is not natively associated with the nucleotide sequence encoding the POI.

    55. The method of claim 41, wherein the promoter has a strength to produce the POI at a transcription rate of at least 15% as compared to the native pGAP promoter of the cell.

    Description

    FIGURES

    [0174] FIG. 1: pG1 sequence analysis for carbon source-related TFBS using Matinspector. pG1 (also referred to as P.sub.GTH1), was initially amplified and cloned from position-965 to -1 (length of 965 bp, sequence is provided in FIG. 6 (SEQ ID 1, in particular SEQ ID 9 has been used). Numbers indicate TFBS which were selected for deletion (listed in Table 2). Associated matrix families are F$CSRE (carbon source response elements, striped boxes), F$ADR (Yeast metabolic regulator, dotted boxes), F$MGCM (Monomeric Gal4-class motifs, filled boxes) and F$YMIG (Yeast GC-Box Proteins, white boxes). Other TFBS might be affected by the deletions (matrix match detail information is given in Table 1). The black dashed box indicates the main regulatory region of pG1 which was identified by the screening of shortened pG1 variants. The asterisk indicates the position of the prominent TAT (position-390 to -374) motif which was also selected for deletion and for mutation. Alternative 5-starts of the shortened pG1 promoter variants are labeled with arrows and the length of the corresponding variant.

    [0175] FIG. 2: Screening data of the shortened pG1 promoter variants The geometric mean of the population's specific eGFP fluorescence (fluorescence related to cell volume) is shown for clones expressing eGFP under control of pG1 (clone #8, verified GCN of 1) or a shortened pG1 variant (each 2 clones cultivated in triplicates, selected in pre-screenings) in repressing and inducing growth conditions. Non-expressing wild type P. pastoris cells were used as negative control. Samples were taken during the repressing pre-culture and after 24 and 48 hours induction with feed beads.

    [0176] FIG. 3: Screening data of the TFBS deletion and -TAT mutation variants

    [0177] The geometric mean of the population's specific eGFP fluorescence (fluorescence related to cell volume) is shown for clones expressing eGFP under the control of pG1 (clone #8, verified GCN of 1) or a pG1 variant (up to 9 clones were pool cultivated in 3 wells) in repressing and inducing growth conditions. Wild type P. pastoris cells were used as negative control.

    [0178] FIG. 4: Screening data of the pG1 duplication variants

    [0179] The geometric mean of the population's specific eGFP fluorescence (fluorescence related to cell volume) is shown for clones expressing eGFP under the control of pG1 (clone #8, verified GCN of 1) or a pG1 variant (up to 9 clones were pool cultivated in 3 wells, selected in pre-screenings) in repressing and inducing growth conditions. Wild type P. pastoris cells were used as negative control.

    [0180] FIG. 5: Fed batch cultivation of pG1 and pG1 variants expressing eGFP Relative eGFP fluorescence was measured from bioreactor samples (diluted to similar biomass densities) using a plate reader and is shown over the feed time (batch end set to 0) in batch (A) and fed batch cultivation (B). A clone expressing eGFP under control of pG1 (#8) was compared to clones expressing under control of a pG1 deletion variant (pG1-2, SEQ ID 211), a TAT mutation (pG1-T16, SEQ ID 257, and a duplication (pG1-D1240) variant (SEQ ID 49).

    [0181] FIG. 6: pG1 and pG1-x promoter sequences

    [0182] FIG. 6a: Reference sequences

    [0183] FIG. 6b: Sequences of pG1-x promoter

    [0184] Individual Sequence Elements:

    TABLE-US-00001 Position8(SEQID2): [00001]embedded image (e.g.position293to285inSEQID8): Position9(SEQID3): [00002]embedded image (e.g.position275to261inSEQID8) Coreregion:(SEQID4): [00003]embedded image (e.g.position293to261inSEQID8) Mainregulatoryregion:(SEQID5): [00004]embedded image [00005]embedded image AATTTTCCGGGGATTACGGATAATAC (e.g.position328to211inSEQID8): 3-terminalnucleotidesequence(SEQID6): [00006]embedded image

    [0185] Indications in Sequences: [0186] Main regulatory region: bold [0187] Core regulatory region: bold, italic and underlined, SEQ ID 2 and 3 double underlined [0188] T motif: italic and underlined, may be optionally extended (at the 5-terminal end of the T motif) by a preceding TA sequence, or (at the 3-terminal end of the T motif) by a succeeding AT sequence [0189] 3-terminal region: custom-charactercustom-character [0190] Region less relevant for promoter activity in the reference pG1 (P.sub.GTH1) sequences: custom-charactercustom-character: one or more nucleotides up to all nucleotides within the region ranging from the 5-terminal end to -328 (region underlined in FIG. 6a with a dash-dot line) may be substituted, or deleted, or further nucleotides may be inserted within such region, however, preferred embodiments still comprise at least one T motif which is (T)n (n=13-20) with or without preceding A or TA nucleotides; or with or without succeeding A or AT nucleotides. Such a less relevant region which can be partially or fully deleted is the region ranging from the 5-terminal end to the first or 5 main regulatory region (bold) in any one of SEQ ID 37 to SEQ ID 202; preferably, up to 50, 100, 150, 200, 250, 300, 320, or 325 nucleotides of the 5-terminal end of any one of SEQ ID 37 to SEQ ID 202 can be deleted. [0191] Deletion: del (underlined)

    TABLE-US-00002 (T).sub.n(n=13-20)motifs:maybeoptionally extendedatits5end,e.g.byAor TA;oratits3'end,e.g.byAorAT (T).sub.13:SEQID12:TTTTTTTTTTTTT (T).sub.14:SEQID13:TTTTTTTTTTTTTT (T).sub.15:SEQID14:TTTTTTTTTTTTTTT (T).sub.16:SEQID15:TTTTTTTTTTTTTTTT (T).sub.17:SEQID16:TTTTTTTTTTTTTTTTT (T).sub.18:SEQID17:TTTTTTTTTTTTTTTTTT (T).sub.19:SEQID18:TTTTTTTTTTTTTTTTTTT (T).sub.20:SEQID19:TTTTTTTTTTTTTTTTTTTT TA(T).sub.n(n=13-20)motifs,maybeoptionally mutatedtosubstitutetheAat position2foraT(A/T) TA(T).sub.13:SEQID20:TATTTTTTTTTTTTT TA(T).sub.13(substitutedA/T), SEQID14(see(T).sub.15):TTTTTTTTTTTTTTT TA(T).sub.14:SEQID21:TATTTTTTTTTTTTTT TA(T).sub.14(substitutedA/T), SEQID15(see(T).sub.16):TTTTTTTTTTTTTTTT TA(T).sub.15:SEQID22:TATTTTTTTTTTTTTTT TA(T).sub.15(substitutedA/T), SEQID16(see(T).sub.17):TTTTTTTTTTTTTTTTT TA(T).sub.16:SEQID23:TATTTTTTTTTTTTTTTT TA(T).sub.16(substitutedA/T), SEQID17(see(T).sub.18):TTTTTTTTTTTTTTTTTT TA(T).sub.17:SEQID24:TATTTTTTTTTTTTTTTTT TA(T).sub.17(substitutedA/T), SEQID18(see(T).sub.19):TTTTTTTTTTTTTTTTTTT TA(T).sub.18:SEQID25:TATTTTTTTTTTTTTTTTTT TA(T).sub.18(substitutedA/T), SEQID19(see(T).sub.20):TTTTTTTTTTTTTTTTTTTT TA(T).sub.19:SEQID26:TATTTTTTTTTTTTTTTTTTT TA(T).sub.19(substitutedA/T), SEQID28(i.e.(T).sub.21):TTTTTTTTTTTTTTTTTTTTT TA(T).sub.20:SEQID27:TATTTTTTTTTTTTTTTTTTTT TA(T).sub.20(substitutedA/T), SEQID29(i.e.(T).sub.22):TTTTTTTTTTTTTTTTTTTTTT

    [0192] FIG. 7:

    [0193] Native pGAP promoter sequence of P. pastoris (GS115) (SEQ ID 260)

    TABLE-US-00003 GS115 # Name PAS* PIPA* description pGAP TDH3 PAS_chr2- PIPA02510 Glyceraldehyde-3- 1_0437 phosphate dehydrogenase *PAS: ORF name in P. pastoris GS115; PIPA: ORF name in P. pastoris type strain DSMZ70382

    [0194] FIG. 7 continued: Transcription factor sequences

    [0195] Rgt1 (PAS_chr1-3_0233) (SEQ ID 261)

    [0196] Cat8-2(PAS_chr4_0540) (SEQ ID 262)

    [0197] Cat8-1(PAS_chr2-1_0757) (SEQ ID 263)

    [0198] FIG. 8: Prior art sequences

    [0199] pG1 (SEQ ID 264), pG1a (SEQ ID 265), pG1b (SEQ ID 266), pG1c (SEQ ID 267), pG1d (SEQ ID 268), pG1e (SEQ ID 269), or pG1f (SEQ ID 270), as described in WO2013050551 A1

    [0200] FIG. 9: Fed batch cultivation of the selected pG1-3 embodiment of SEQ ID 39 (pG1-D1240 (SEQ ID 49)) expressing an alternative scaffold protein as a model protein using (A) the standard fed batch protocol, (B) the space-time yield optimized fed batch protocol (speed fermentation) adapted from Maurer et al. (Microbial Cell Factories, 2006, 5:37)

    DETAILED DESCRIPTION OF THE INVENTION

    [0201] Specific terms as used throughout the specification have the following meaning.

    [0202] The term carbon source also referred as carbon substrate as used herein shall mean a fermentable carbon substrate, typically a source carbohydrate, suitable as an energy source for microorganisms, such as those capable of being metabolized by host organisms or production cell lines, in particular sources selected from the group consisting of monosaccharides, oligosaccharides, polysaccharides, alcohols including glycerol, in the purified form, in minimal media or provided in raw materials, such as a complex nutrient material. The carbon source may be used according to the invention as a single carbon source or as a mixture of different carbon sources.

    [0203] A basal carbon source such as used according to the invention typically is a carbon source suitable for cell growth, such as a nutrient for eukaryotic cells. The basal carbon source may be provided in a medium, such as a basal medium or complex medium, but also in a chemically defined medium containing a purified carbon source. The basal carbon source typically is provided in an amount to provide for cell growth, in particular during the growth phase in a cultivation process, for example to obtain cell densities of at least 5 g/L cell dry mass, preferably at least 10 g/L cell dry mass, or at least 15 g/L cell dry mass, e.g. exhibiting viabilities of more than 90% during standard sub-culture steps, preferably more than 95%.

    [0204] According to the invention the basal carbon source is typically used in an excess or surplus amount, which is understood as an excess providing energy to increase the biomass, e.g. during the cultivation of a cell line with a high specific growth rate, such as during the growth phase of a cell line in a batch or fed-batch cultivation process. This surplus amount is particularly in excess of the limited amount of a supplemental carbon source (as used under growth-limited conditions) to achieve a residual concentration in the fermentation broth that is measurable and typically at least 10 fold higher, preferably at least 50 fold or at least 100 fold higher than during feeding the limited amount of the supplemental carbon source.

    [0205] A supplemental carbon source such as used according to the invention typically is a supplemental substrate facilitating the production of fermentation products by production cell lines, in particular in the production phase of a cultivation process. The production phase specifically follows a growth phase, e.g. in batch, fed-batch and continuous cultivation process. The supplemental carbon source specifically may be contained in the feed of a fed-batch process. The supplemental carbon source is typically employed in a cell culture under carbon substrate limited conditions, i.e. using the carbon source in a limited amount.

    [0206] A limited amount of a carbon source or a limited carbon source is herein understood to specifically refer to the type and amount of a carbon substrate facilitating the production of fermentation products by production cell lines, in particular in a cultivation process with controlled growth rates of less than the maximum growth rate. The production phase specifically follows a growth phase, e.g. in batch, fed-batch and continuous cultivation process. Cell culture processes may employ batch culture, continuous culture, and fed-batch culture. Batch culture is a culture process by which a small amount of a seed culture solution is added to a medium and cells are grown without adding an additional medium or discharging a culture solution during culture. Continuous culture is a culture process by which a medium is continuously added and discharged during culture. The continuous culture also includes perfusion culture. Fed-batch culture, which is an intermediate between the batch culture and the continuous culture and also referred to as semi-batch culture, is a culture process by which a medium is continuously or sequentially added during culture but, unlike the continuous culture, a culture solution is not continuously discharged.

    [0207] Specifically preferred is a fed-batch process which is based on feeding of a growth limiting nutrient substrate to a culture. The fed-batch strategy, including single fed-batch or repeated fed-batch fermentation, is typically used in bio-industrial processes to reach a high cell density in the bioreactor. The controlled addition of the carbon substrate directly affects the growth rate of the culture and helps to avoid overflow metabolism or the formation of unwanted metabolic byproducts. Under carbon source limited conditions, the carbon source specifically may be contained in the feed of a fed-batch process. Thereby, the carbon substrate is provided in a limited amount.

    [0208] Also in chemostat or continuous culture as described herein, the growth rate can be tightly controlled.

    [0209] The limited amount of a carbon source is herein particularly understood as the amount of a carbon source necessary to keep a production cell line under growth-limited conditions, e.g. in a production phase or production mode. Such a limited amount may be employed in a fed-batch process, where the carbon source is contained in a feed medium and supplied to the culture at low feed rates for sustained energy delivery, e.g. to produce a POI, while keeping the biomass at low specific growth rates. A feed medium is typically added to a fermentation broth during the production phase of a cell culture.

    [0210] The limited amount of a carbon source may, for example, be determined by the residual amount of the carbon source in the cell culture broth, which is below a predetermined threshold or even below the detection limit as measured in a standard (carbohydrate) assay. The residual amount typically would be determined in the fermentation broth upon harvesting a fermentation product.

    [0211] The limited amount of a carbon source may as well be determined by defining the average feed rate of the carbon source to the fermenter, e.g. as determined by the amount added over the full cultivation process, e.g. the fed-batch phase, per cultivation time, to determine a calculated average amount per time. This average feed rate is kept low to ensure complete usage of the supplemental carbon source by the cell culture, e.g. between 0.6 g L.sup.1 h.sup.1 (g carbon source per L initial fermentation volume and h time) and 25 g L.sup.1 h.sup.1, preferably between 1.6 g L.sup.1 h.sup.1 and 20 g L.sup.1 h.sup.1.

    [0212] The limited amount of a carbon source may also be determined by measuring the specific growth rate, which specific growth rate is kept low, e.g. lower than the maximum specific growth rate, during the production phase, e.g. within a predetermined range, such as in the range of 0.001 h.sup.1 to 0.20 h.sup.1, or 0.005 h.sup.1 to 0.20 h.sup.1, preferably between 0.01 h.sup.1 and 0.15 h.sup.1.

    [0213] Specifically, a feed medium is used which is chemically defined and methanol-free.

    [0214] The term chemically defined with respect to cell culture medium, such as a minimal medium or feed medium in a fed-batch process, shall mean a cultivation medium suitable for the in vitro cell culture of a production cell line, in which all of the chemical components and (poly)peptides are known. Typically, a chemically defined medium is entirely free of animal-derived components and represents a pure and consistent cell culture environment.

    [0215] The term cell line as used herein refers to an established clone of a particular cell type that has acquired the ability to proliferate over a prolonged period of time. The term host cell line refers to a cell line as used for expressing an endogenous or recombinant gene or products of a metabolic pathway to produce polypeptides or cell metabolites mediated by such polypeptides. A production host cell line or production cell line is commonly understood to be a cell line ready-to-use for cultivation in a bioreactor to obtain the product of a production process, such as a POI. The term eukaryotic host or eukaryotic cell line shall mean any eukaryotic cell or organism, which may be cultivated to produce a POI or a host cell metabolite. It is well understood that the term does not include human beings.

    [0216] The term cell culture or cultivation, also termed fermentation, with respect to a host cell line is meant the maintenance of cells in an artificial, e.g., an in vitro environment, under conditions favoring growth, differentiation or continued viability, in an active or quiescent state, of the cells, specifically in a controlled bioreactor according to methods known in the industry.

    [0217] When cultivating a cell culture using the culture media of the present invention, the cell culture is brought into contact with the media in a culture vessel or with substrate under conditions suitable to support cultivation of the cell culture. In certain embodiments, a culture medium as described herein is used to culture cells according to standard cell culture techniques that are well-known in the art. In various aspects of the invention, a culture medium is provided that can be used for the growth of eukaryotic cells, specifically yeast or filamentous fungi.

    [0218] Cell culture media provide the nutrients necessary to maintain and grow cells in a controlled, artificial and in vitro environment. Characteristics and compositions of the cell culture media vary depending on the particular cellular requirements. Important parameters include osmolality, pH, and nutrient formulations. Feeding of nutrients may be done in a continuous or discontinuous mode according to methods known in the art. The culture media used according to the invention are particularly useful for producing recombinant proteins.

    [0219] Whereas a batch process is a cultivation mode in which all the nutrients necessary for cultivation of the cells are contained in the initial culture medium, without additional supply of further nutrients during fermentation, in a fed-batch process, after a batch phase, a feeding phase takes place in which one or more nutrients are supplied to the culture by feeding. The purpose of nutrient feeding is to increase the amount of biomass in order to increase the amount of recombinant protein as well. Although in most cultivation processes the mode of feeding is critical and important, the present invention employing the promoter of the invention is not restricted with regard to a certain mode of cultivation.

    [0220] In certain embodiments, the method of the invention is a fed-batch process. Specifically, a host cell transformed with a nucleic acid construct encoding a desired recombinant POI, is cultured in a growth phase medium and transitioned to a production phase medium in order to produce a desired recombinant POI.

    [0221] In another embodiment, host cells of the present invention are cultivated in continuous mode, e.g. a chemostat. A continuous fermentation process is characterized by a defined, constant and continuous rate of feeding of fresh culture medium into the bioreactor, whereby culture broth is at the same time removed from the bioreactor at the same defined, constant and continuous removal rate. By keeping culture medium, feeding rate and removal rate at the same constant level, the cultivation parameters and conditions in the bioreactor remain constant.

    [0222] A stable cell culture as described herein is specifically understood to refer to a cell culture maintaining the genetic properties, specifically keeping the POI production level high, e.g. at least at a g level, even after about 20 generations of cultivation, preferably at least 30 generations, more preferably at least 40 generations, most preferred of at least 50 generations. Specifically, a stable recombinant host cell line is provided which is considered a great advantage when used for industrial scale production.

    [0223] The cell culture of the invention is particularly advantageous for methods on an industrial manufacturing scale, e.g. with respect to both the volume and the technical system, in combination with a cultivation mode that is based on feeding of nutrients, in particular a fed-batch or batch process, or a continuous or semi-continuous process (e.g. chemostat).

    [0224] The term expression or expression system or expression cassette refers to nucleic acid molecules containing a desired coding sequence and control sequences in operable linkage, so that hosts transformed or transfected with these sequences are capable of producing the encoded proteins or host cell metabolites. In order to effect transformation, the expression system may be included in a vector; however, the relevant DNA may also be integrated into the host chromosome. Expression may refer to secreted or non-secreted expression products, including polypeptides or metabolites.

    [0225] Expression constructs or vectors or plasmid used herein are defined as DNA sequences that are required for the transcription of cloned recombinant nucleotide sequences, i.e. of recombinant genes and the translation of their mRNA in a suitable host organism. Expression vectors or plasmids usually comprise an origin for autonomous replication in the host cells, selectable markers (e.g. an amino acid synthesis gene or a gene conferring resistance to antibiotics such as zeocin, kanamycin, G418 or hygromycin), a number of restriction enzyme cleavage sites, a suitable promoter sequence and a transcription terminator, which components are operably linked together. The terms plasmid and vector as used herein include autonomously replicating nucleotide sequences as well as genome integrating nucleotide sequences.

    [0226] The expression construct of the invention specifically comprises a promoter of the invention, operably linked to a nucleotide sequence encoding a POI under the transcriptional control of said promoter, which promoter is not natively associated with the coding sequence of the POI.

    [0227] The term heterologous as used herein with respect to a nucleotide or amino acid sequence or protein, refers to a compound which is either foreign, i.e. exogenous, such as not found in nature, to a given host cell; or that is naturally found in a given host cell, e.g., is endogenous, however, in the context of a heterologous construct, e.g. employing a heterologous nucleic acid. The heterologous nucleotide sequence as found endogenously may also be produced in an unnatural, e.g. greater than expected or greater than naturally found, amount in the cell. The heterologous nucleotide sequence, or a nucleic acid comprising the heterologous nucleotide sequence, possibly differs in sequence from the endogenous nucleotide sequence but encodes the same protein as found endogenously. Specifically, heterologous nucleotide sequences are those not found in the same relationship to a host cell in nature. Any recombinant or artificial nucleotide sequence is understood to be heterologous. An example of a heterologous polynucleotide is a nucleotide sequence not natively associated with the promoter according to the invention, e.g. to obtain a hybrid promoter, or operably linked to a coding sequence, as described herein. As a result, a hybrid or chimeric polynucleotide may be obtained. A further example of a heterologous compound is a POI encoding polynucleotide operably linked to a transcriptional control element, e.g., a promoter of the invention, to which an endogenous, naturally-occurring POI coding sequence is not normally operably linked.

    [0228] The term variant as used herein in the context of the present invention shall refer to any sequence with a specific sequence identity or homology to a comparable parent sequence. A variant is specifically any sequence derived from a parent sequence e.g., by size variation, such as (terminal or non-terminal, such as interstitional i.e. with deletions or insertions within the nucleotide sequence) elongation, or fragmentation, mutation, hybridization (including combination of sequences).

    [0229] The pG1-x promoter as described herein is specifically an artificial variant of the native (wild-type) pG1 promoter. Though there is a certain degree of sequence identity to the native structure, it is well understood that the materials, methods and uses of the invention, e.g. specifically referring to isolated nucleic acid sequences, amino acid sequences, expression constructs, transformed host cells and recombinant proteins, are man-made or synthetic, and are therefore not considered as a result of law of nature.

    [0230] The promoter herein referred to as pG1-x promoter is a variant of the pG1 promoter and its nucleotide sequence may be produced by mutagenesis of the pG1 promoter which is used as a parent sequence for producing a variant. A pG1-x promoter includes a promoter comprising two, three, four or more copies of SEQ ID 2, SEQ ID 3, SEQ ID 4 or SEQ ID 5.

    [0231] A series of pG1-x promoters is e.g., exemplified by the promoter comprising or consisting of any of the sequences exemplified in FIG. 6b, in particular any of the following sequences:

    [0232] a) SEQ ID 37-44, preferably any of SEQ ID 45-76;

    [0233] b) SEQ ID 77-80, preferably any of SEQ ID 81-112;

    [0234] c) SEQ ID 113-114, preferably any of SEQ ID 115-130;

    [0235] d) SEQ ID 131-132, preferably any of SEQ ID 133-148;

    [0236] e) SEQ ID 149-150, preferably any of SEQ ID 151-166;

    [0237] f) SEQ ID 167-168, preferably any of SEQ ID 169-184;

    [0238] g) SEQ ID 185-186, preferably any of SEQ ID 187-202;

    [0239] h) SEQ ID 203-204, preferably any of SEQ ID 205-220;

    [0240] i) SEQ ID 221-222, preferably any of SEQ ID 223-238;

    [0241] j) SEQ ID 239-240, preferably any of SEQ ID 241-256; and

    [0242] k) SEQ ID 32-36 or SEQ ID 257-259.

    [0243] A pG1-x promoter also includes 3 fragments of any one of SEQ ID 37 to SEQ ID 202 wherein part or all of the 5-terminal end up to the first or 5 main regulatory region has been deleted; preferably, up to 50, 100, 150, 200, 250, 300, 320, or 325 nucleotides of the 5-terminal end of any one of SEQ ID 37 to SEQ ID 202 is deleted.

    [0244] The pG1-x promoter is characterized by having the same or an increased promoter strength and induction ratio as compared to the pG1 promoter, wherein [0245] the promoter strength is at least 1.1-fold increased in the induced state as compared to the pG1 promoter, and/or [0246] the induction ratio is at least 1.1-fold increased as compared to the pG1 promoter.

    [0247] Further pG1-x variants are feasible e.g., using the exemplified pG1-x promoter of FIG. 6b, or size variants, in particular elongated variants or fragments thereof, as parent sequences to produce variants by mutagenesis of certain regions, in particular such, that the essential elements and functions of the promoter be maintained or even improved. The pG1-x promoter variants may e.g., be derived from any of the exemplified pG1-x promoter sequences by mutagenesis to produce sequences suitable for use as a promoter in recombinant cell lines. Such variant promoter may be obtained from a library of mutant sequences by selecting those library members with predetermined properties. Variant promoters may have the same or even improved properties, e.g. improved in the promoter strength, the induction of POI production, with increased differential effect under repressing and de-repressing conditions (in particular the induction ratio). The variant promoter may also comprise a nucleotide sequence from analogous sequences, e.g. from eukaryotic species other than Pichia pastoris or from a genus other than Pichia, such as from K. lactis, Z. rouxii, P. stipitis, H. polymorpha.

    [0248] The term functionally active as used herein with respect to e.g., a promoter variant, the pG1-x promoter or variant of a pG1-x promoter as described herein or variant of the pG1 promoter, means a variant sequence resulting from modification of a parent sequence by mutagenesis, specifically by insertion, deletion or substitution of one or more nucleotides within the sequence or at either or both of the distal ends of the sequence, and which modification does not affect (in particular impair) the activity of this sequence. Regarding the pG1-x promoter as described herein, the function and activity is specifically characterized by the promoter activity and strength as well as the induction ratio.

    [0249] Functionally active promoter variants as described herein are specifically characterized by exhibiting substantially the same promoter activity as the pG1 promoter (+1-10%, or +1-5%), or even higher.

    [0250] Functionally active promoter variants as described herein are specifically characterized by exhibiting substantially the same regulatable properties as the pG1 promoter e.g., measured by the induction ratio (+/10%, or +1-5%), or an even higher induction ratio.

    [0251] The term promoter as used herein refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. Promoter activity may be assessed by its transcriptional efficiency. This may be determined directly by measurement of the amount of mRNA transcription from the promoter, e.g. by

    [0252] Northern Blotting or indirectly by measurement of the amount of gene product expressed from the promoter.

    [0253] The pG1-x promoter as described herein specifically initiates, regulates, or otherwise mediates or controls the expression of a coding DNA. Promoter DNA and coding DNA may be from the same gene or from different genes, and may be from the same or different organisms.

    [0254] The pG1-x promoter as described herein is specifically understood as a regulatable promoter, in particular a carbon source regulatable promoter with different promoter strength in the repressed and induced state.

    [0255] The strength of the promoter of the invention specifically refers to its transcription strength, represented by the efficiency of initiation of transcription occurring at that promoter with high or low frequency. The higher transcription strength the more frequently transcription will occur at that promoter. Promoter strength is important, because it determines how often a given mRNA sequence is transcribed, effectively giving higher priority for transcription to some genes over others, leading to a higher concentration of the transcript. A gene that codes for a protein that is required in large quantities, for example, typically has a relatively strong promoter. The RNA polymerase can only perform one transcription task at a time and so must prioritize its work to be efficient. Differences in promoter strength are selected to allow for this prioritization.

    [0256] According to the invention the regulatable promoter is relatively strong in the fully induced state, which is typically understood as the state of about maximal activity.

    [0257] The relative strength is commonly determined with respect to a comparable promoter, such as the pG1 promoter, or a standard promoter, such as the respective pGAP promoter of the cell as used as the host cell. The frequency of transcription is commonly understood as the transcription rate, e.g. as determined by the amount of a transcript in a suitable assay, e.g. RT-PCR or Northern blotting. For example, the transcription strength of a promoter according to the invention is determined in the host cell which is P. pastoris and compared to the native pGAP promoter of P. pastoris.

    [0258] The strength of a promoter to express a gene of interest is commonly understood as the expression strength or the capability of support a high expression level/rate. For example, the expression and/or transcription strength of a promoter of the invention is determined in the host cell which is P. pastoris and compared to the native pGAP promoter of P. pastoris.

    [0259] The comparative transcription strength employing the pGAP promoter as a reference (standard) may be determined by standard means, such as by measuring the quantity of transcripts, e.g. employing a microarray, or else in a cell culture, such as by measuring the quantity of respective gene expression products in recombinant cells. An exemplary test is illustrated in the Examples section.

    [0260] In particular, the transcription rate may be determined by the transcription strength on a microarray, or with quantitative real time PCR (qRT-PCR) where microarray or qRT-PCR data show the difference of expression level between conditions with high growth rate and conditions with low growth rate, or conditions employing different media composition, and a high signal intensity as compared to the native pGAP promoter.

    [0261] The expression rate may, for example, be determined by the amount of expression of a reporter gene, such as eGFP.

    [0262] The pG1-x promoter as described herein exerts a relatively high transcription strength, reflected by a transcription rate or transcription strength of at least 15% as compared to the native pGAP promoter in the host cell, sometimes called homologous pGAP promoter. Preferably the transcription rate or strength is at least 20%, in specifically preferred cases at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90% and at least 100% or even higher, such as at least 150% or at least 200% as compared to the native pGAP promoter, e.g. determined in the eukaryotic cell selected as host cell for producing the POI.

    [0263] The native pGAP promoter typically initiates expression of the gap gene encoding glyceraldehyde-3-phosphate dehydrogenase (GAPDH), which is a constitutive promoter present in most living organisms. GAPDH (EC 1\2\1\12), a key enzyme of glycolysis and gluconeogenesis, plays a crucial role in catabolic and anabolic carbohydrate metabolism.

    [0264] The native pGAP promoter specifically is active in a recombinant eukaryotic cell in a similar way as in a native eukaryotic cell of the same species or strain, including the unmodified (non-recombinant) or recombinant eukaryotic cell. Such native pGAP promoter is commonly understood to be an endogenous promoter, thus, homologous to the eukaryotic cell, and serves as a standard or reference promoter for comparison purposes.

    [0265] For example, a native pGAP promoter of P. pastoris is the unmodified, endogenous promoter sequence in P. pastoris, as used to control the expression of GAPDH in P. pastoris, e.g. having the sequence shown in FIG. 13: native pGAP promoter sequence of P. pastoris (GS115) (SEQ ID 260). If P. pastoris is used as a host for producing a POI according to the invention, the transcription strength or rate of the promoter according to the invention is compared to such native pGAP promoter of P. pastoris.

    [0266] As another example, a native pGAP promoter of S. cerevisiae is the unmodified, endogenous promoter sequence in S. cerevisiae, as used to control the expression of GAPDH in S. cerevisiae. If S. cerevisiae is used as a host for producing a POI according to the invention, the transcription strength or rate of the promoter according to the invention is compared to such native pGAP promoter of S. cerevisiae.

    [0267] Therefore, the relative expression or transcription strength of a promoter according to the invention is usually compared to the native pGAP promoter of a cell of the same species or strain that is used as a host for producing a POI.

    [0268] The term regulatable with respect to a pG1-x promoter or pG1 promoter as used herein shall refer to a promoter that is repressed in a eukaryotic cell in the presence of an excess amount of a carbon source (nutrient or basal substrate) in the growth phase of a batch culture, and de-repressed to exert strong promoter activity in the production phase of a production cell line, e.g. upon reduction of the amount of carbon, such as upon feeding of a growth limiting carbon source (nutrient or supplemental substrate) to a culture according to the fed-batch strategy. In this regard, the term regulatable is understood as carbon source-limit regulatable or glucose-limit regulatable, referring to the de-repression of a promoter by carbon consumption, reduction, shortcoming or depletion, or by limited addition of the carbon source so that it is readily consumed by the cells.

    [0269] The functionally active pG1-x promoter as described herein is a relatively strong regulatable promoter that is silenced or repressed under cell growth conditions (growth phase), and activated or de-repressed under production condition (production phase), and therefore suitable for inducing POI production in a production cell line by limiting the carbon source.

    [0270] Specifically, the promoter as described herein is carbon source regulatable with a differential promoter strength as determined in a test comparing its strength in the presence of glucose and glucose limitation, showing that it is still repressed at relatively high glucose concentrations, preferably at concentrations of at least 10 g/L, preferably at least 20 g/L. Specifically the promoter according to the invention is fully induced at limited glucose concentrations and glucose threshold concentrations fully inducing the promoter, which threshold is less than 20 g/L, preferably less than 10 g/L, less than 1 g/L, even less than 0.1 g/L or less than 50 mg/L, preferably with a full transcription strength of e.g. at least 50% of the native, homologous pGAP promoter, at glucose concentrations of less than 40 mg/L.

    [0271] Preferably the induction ratio is understood as a differential promoter strength which is determined by the initiation of POI production upon switching to inducing conditions below a predetermined carbon source threshold, and compared to the strength in the repressed state. The transcription strength commonly is understood as the strength in the fully induced state, i.e. showing about maximum activities under de-repressing conditions. The differential promoter strength is, e.g. determined according to the efficiency or yield of POI production in a recombinant host cell line under de-repressing conditions as compared to repressing conditions, or else by the amount of a transcript. The regulatable promoter according to the invention has a preferred differential promoter strength, which is at least 2 fold, more preferably at least 5 fold, even more preferred at least 10 fold, more preferred at least 20 fold, more preferably at least 30, 40, 50, or 100 fold in the de-repressed state compared to the repressed state, also understood as fold induction.

    [0272] The term sequence identity of a variant as compared to a parent sequence indicates the degree of identity (or homology) in that two or more nucleotide sequences have the same or conserved base pairs at a corresponding position, to a certain degree, up to a degree close to 100%. A homologous sequence typically has at least about 50% nucleotide sequence identity, preferably at least about 60% identity, more preferably at least about 70% identity, more preferably at least about 80% identity, more preferably at least about 90% identity, more preferably at least about 95% identity.

    [0273] Percent (%) identity with respect to the nucleotide sequence e.g., of a promoter or a gene, is defined as the percentage of nucleotides in a candidate DNA sequence that is identical with the nucleotides in the DNA sequence, after aligning the sequence and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent nucleotide sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. For purposes of the present invention, the sequence identity between two nucleotide sequences is determined using the NCBI BLAST program version 2.2.29 (Jan. 6, 2014) with blastn set at the following exemplary parameters: Word Size: 11; Expect value: 10; Gap costs: Existence=5, Extension=2; Filter=low complexity activated; Match/Mismatch Scores: 2,-3; Filter String: L; m.

    [0274] The term mutagenesis as used in the context of the present invention shall refer to a method of providing mutants of a nucleotide sequence, e.g. through insertion, deletion and/or substitution of one or more nucleotides, so to obtain variants thereof with at least one change in the non-coding or coding region. Mutagenesis may be through random, semi-random or site directed mutation. Specific pG1-x promoter variants are derived from the pG1 promoter sequence by a mutagenesis method using the pG1 nucleotide sequence as a parent sequence. Such mutagenesis method encompass those methods of engineering the nucleic acid or de novo synthesizing a nucleotide sequence using the pG1 promoter sequence information as a template. Specific mutagenesis methods apply rational promoter engineering.

    [0275] The pG1-x promoter may be produced by mutagenesis of the pG1 promoter, and variants of the pG1-x promoter as described herein may further be produced, including functionally active variants, employing standard techniques. The promoter may e.g. be modified to generate promoter variants with altered expression levels and regulatory properties. For instance, a promoter library may be prepared by mutagenesis of selected promoter sequences, which may be used as parent molecules, e.g. to fine-tune the gene expression in eukaryotic cells by analyzing variants for their expression under different fermentation strategies and selecting suitable variants. A synthetic library of variants may be used, e.g. to select a promoter matching the requirements for producing a selected POI. Such variants may have increased expression efficiency in eukaryotic host cells and differential expression under carbon source rich and limiting conditions. Typically large randomized gene libraries are produced with a high gene diversity, which may be selected according to a specifically desired genotype or phenotype.

    [0276] Some of the preferred pG1-x promoter as described herein are size variants of the pG1 promoter and comprise more than one copy of certain elements or regions of the promoter, or comprise one or more (the same or different) fragments of the pG1 promoter.

    [0277] Specific mutagenesis methods provide for point mutations of one or more nucleotides in a sequence, in particular tandem point mutations, such as to change at least 2, 3, 4, 5, 6, 7, 8, 9, 10, or even more continuous nucleotides within the nucleotide sequence of the promoter. Such mutation is typically at least one of a deletion, insertion, and/or substitution of one or more nucleotides. The promoter sequence may be mutated at the distal ends, in particular within the 5-region which amounts to up to 50% of the nucleotide sequence, which may be highly variable without substantially losing the promoter activity. The promoter sequence may specifically be mutated within the main regulatory region, yet, it is preferred that the sequence identity to the pG1 parent main regulatory region and in particular to the parent core regulatory region is high, such as e.g. at least 80%. Within the main regulatory region, but outside the core regulatory region the variability of the sequence may be higher so to obtain a sequence identity of less than 80%.

    [0278] The core regulatory region specifically incorporates the SEQ ID 2 and SEQ ID 3, which represent transcription factor binding sites (TFBS) and an interstitional region between SEQ ID 2 and SEQ ID 3.

    [0279] The nucleotide sequence identified as SEQ ID 2 comprises at least part of the TFBS recognized by Rgt1, Cat8-1 and Cat8-2.

    [0280] The nucleotide sequence identified as SEQ ID 3 comprises at least part of the TFBS recognized by Rgt1, Cat8-1 and Cat8-2.

    [0281] Specifically, the nucleotide sequence between SEQ ID 2 and SEQ ID 3 (the interstitional sequence) may be mutated to a non-homologous sequence (e.g., with a sequence identity of less than 50%) or even be deleted.

    [0282] Any mutations within the SEQ ID 2 and SEQ ID 3 are specifically conservative, i.e. such as to maintain (or improve) the recognition by the respective transcription factor. Upon engineering such conservative mutants, the sequence identity within the SEQ ID 2 and/or SEQ ID 3 nucleotide sequence is at least 90%, preferably at least 95%.

    [0283] The main regulatory region comprises or consists of the nucleotide sequence identified by SEQ ID 5. Such region comprises the core regulatory region and further non-core regulatory region, which comprises essential elements of the pG1 promoter and which may be mutated to a certain extent to produce the pG1-x promoter as described herein.

    [0284] Specific regions of site directed mutagenesis are e.g., the non-core regulatory region of the pG1 or the pG1-x promoter (inside or outside the main regulatory region). However, specific mutants may as well be prepared by mutagenesis methods directed to the core regulatory region of the promoter, keeping a certain degree of sequence identity to maintain the promoter function. Further specific regions are outside or within the main regulatory region. Specifically, the promoter may comprise a hybrid nucleotide sequence e.g. comprising the core regulatory region of the pG1 promoter and one or more regions or alternative (native or artificial) promoter, such as the translation initiation site at the 3-region (specifically the 3-end which comprises at least 10 terminal nucleotides, or at least 15 terminal nucleotides) of a promoter which is any other than the pG1 promoter may be used to substitute the translation initiation site of the pG1 promoter.

    [0285] Specific mutations refer to the duplication of selected regions (or motifs) of the pG1 promoter e.g., the T motif or the extended T motif. Such selected motifs may be elongated by additional nucleotides or shortened at one or both distal ends of the motif, or within the motif. The native pG1 sequence comprises a TAT motif consisting of the nucleotides T followed by A followed by T15 (SEQ ID 14). Such TAT motif 5-TATTTTTTTTTTTTTTT-3 (SEQ ID 22) has turned out to have a positive effect on the promoter strength, which may even be increased by duplicating the TAT motif, or inserting at least 2, or 3, or 4 copies of the TAT motif, either the same TAT motif or using an alternative T motif, extended T motif (e.g. a TAT motif), which comprises at least the T13 motif (SEQ ID 12).

    [0286] The invention further encompasses a nucleotide sequence which hybridizes under stringent conditions to the pG1-x promoter.

    [0287] As used in the present invention, the term hybridization or hybridizing is intended to mean the process during which two nucleic acid sequences anneal to one another with stable and specific hydrogen bonds so as to form a double strand under appropriate conditions. The hybridization between two complementary sequences or sufficiently complementary sequences depends on the operating conditions that are used, and in particular the stringency. The stringency may be understood to denote the degree of homology; the higher the stringency, the higher percent homology between the sequences. The stringency may be defined in particular by the base composition of the two nucleic sequences, and/or by the degree of mismatching between these two nucleic sequences. By varying the conditions, e.g. salt concentration and temperature, a given nucleic acid sequence may be allowed to hybridize only with its exact complement (high stringency) or with any somewhat related sequences (low stringency). Increasing the temperature or decreasing the salt concentration may tend to increase the selectivity of a hybridization reaction.

    [0288] As used herein, the phrase hybridizing under stringent hybridizing conditions is preferably understood to refer to hybridizing under conditions of certain stringency. In a preferred embodiment the stringent hybridizing conditions are conditions where homology of the two nucleic acid sequences is at least 70%, preferably at least 80%, preferably at least 90%, i.e. under conditions where hybridization is only possible if the double strand obtained during this hybridization comprises preferably at least 70%, preferably at least 80%, preferably at least 90% of A-T bonds and C-G bonds.

    [0289] The stringency may depend on the reaction parameters, such as the concentration and the type of ionic species present in the hybridization solution, the nature and the concentration of denaturing agents and/or the hybridization temperature. The appropriate conditions can be determined by those skilled in the art, e.g. as described in Sambrook et al. (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, 1989).

    [0290] The term isolated or isolation as used herein with respect to a nucleic acid, a POI or other compound shall refer to such compound that has been sufficiently separated from the environment with which it would naturally be associated, so as to exist in substantially pure form. Isolated does not necessarily mean the exclusion of artificial or synthetic mixtures with other compounds or materials, or the presence of impurities that do not interfere with the fundamental activity, and that may be present, for example, due to incomplete purification. In particular, isolated nucleic acid molecules of the present invention are also meant to include those chemically synthesized,, and in particular those not naturally-occurring in P. pastoris or any other organism, herein referred to as artificial. With reference to nucleic acids of the invention, the term isolated nucleic acid or isolated nucleic acid sequence is sometimes used. This term, when applied to DNA, refers to a DNA molecule that is separated from sequences with which it is immediately contiguous in the naturally occurring genome of the organism in which it originated. For example, an isolated nucleic acid may comprise a DNA molecule inserted into a vector, such as a plasmid or virus vector, or integrated into the genomic DNA of a prokaryotic or eukaryotic cell or host organism. An isolated nucleic acid (either DNA or RNA) may further represent a molecule produced directly by biological or synthetic means and separated from other components present during its production.

    [0291] The term operably linked as used herein refers to the association of nucleotide sequences on a single nucleic acid molecule, e.g. a vector, in a way such that the function of one or more nucleotide sequences is affected by at least one other nucleotide sequence present on said nucleic acid molecule. For example, a promoter is operably linked with a coding sequence of a recombinant gene, when it is capable of effecting the expression of that coding sequence. As a further example, a nucleic acid encoding a signal peptide is operably linked to a nucleic acid sequence encoding a POI, when it is capable of expressing a protein in the secreted form, such as a preform of a mature protein or the mature protein. Specifically, such nucleic acids operably linked to each other may be immediately linked, i.e. without further elements or nucleic acid sequences in between the nucleic acid encoding the signal peptide and the nucleic acid sequence encoding a POI.

    [0292] A promoter sequence is typically understood to be operably linked to a coding sequence, if the promoter controls the transcription of the coding sequence. If a promoter sequence is not natively associated with the coding sequence, its transcription is either not controlled by the promoter in native (wild-type) cells or the sequences are recombined with different contiguous sequences.

    [0293] The term protein of interest (POI) as used herein refers to a polypeptide or a protein that is produced by means of recombinant technology in a host cell. More specifically, the protein may either be a polypeptide not naturally occurring in the host cell, i.e. a heterologous protein, or else may be native to the host cell, i.e. a homologous protein to the host cell, but is produced, for example, by transformation with a self-replicating vector containing the nucleic acid sequence encoding the POI, or upon integration by recombinant techniques of one or more copies of the nucleic acid sequence encoding the POI into the genome of the host cell, or by recombinant modification of one or more regulatory sequences controlling the expression of the gene encoding the POI, e.g. of the promoter sequence. In some cases the term POI as used herein also refers to any metabolite product by the host cell as mediated by the recombinantly expressed protein.

    [0294] The POI may specifically be recovered from the cell culture in the purified form, e.g. substantially pure.

    [0295] The term substantially pure or purified as used herein shall refer to a preparation comprising at least 50% (w/w), preferably at least 60%, 70%, 80%, 90% or 95% of a compound, such as a nucleic acid molecule or a POI. Purity is measured by methods appropriate for the compound (e.g. chromatographic methods, polyacrylamide gel electrophoresis, HPLC analysis, and the like).

    [0296] The term recombinant as used herein shall mean being prepared by or the result of genetic engineering. Thus, a recombinant microorganism comprises at least one recombinant nucleic acid. A recombinant microorganism specifically comprises an expression vector or cloning vector, or it has been genetically engineered to contain a recombinant nucleic acid sequence. A recombinant protein is produced by expressing a respective recombinant nucleic acid in a host. A recombinant promoter is a genetically engineered non-coding nucleotide sequence suitable for its use as a functionally active promoter as described herein.

    [0297] In general, the recombinant nucleic acids or organisms as referred to herein may be produced by recombination techniques well known to a person skilled in the art. In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Maniatis, Fritsch & Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, (1982).

    [0298] According to a preferred embodiment of the present invention, a recombinant construct is obtained by ligating the promoter and relevant genes into a vector or expression construct. These genes can be stably integrated into the host cell genome by transforming the host cell using such vectors or expression constructs.

    [0299] Expression vectors may include but are not limited to cloning vectors, modified cloning vectors and specifically designed plasmids. The preferred expression vector as used in the invention may be any expression vector suitable for expression of a recombinant gene in a host cell and is selected depending on the host organism. The recombinant expression vector may be any vector which is capable of replicating in or integrating into the genome of the host organisms, also called host vector.

    [0300] Appropriate expression vectors typically comprise further regulatory sequences suitable for expressing DNA encoding a POI in a eukaryotic host cell. Examples of regulatory sequences include operators, enhancers, ribosomal binding sites, and sequences that control transcription and translation initiation and termination. The regulatory sequences may be operably linked to the DNA sequence to be expressed.

    [0301] To allow expression of a recombinant nucleotide sequence in a host cell, the expression vector may provide the promoter according to the invention adjacent to the 5 end of the coding sequence, e.g. upstream from the gene of interest (GOI) or a signal peptide gene enabling secretion of the POI. The transcription is thereby regulated and initiated by this promoter sequence.

    [0302] The term signal peptide as used herein shall specifically refer to a native signal peptide, a heterologous signal peptide or a hybrid of a native and a heterologous signal peptide, and may specifically be heterologous or homologous to the host organism producing a POI. The function of the signal peptide is to allow the POI to be secreted to enter the endoplasmic reticulum. It is usually a short (3-60 amino acids long) peptide chain that directs the transport of a protein outside the plasma membrane, thereby making it easy to separate and purify a heterologous protein. Some signal peptides are cleaved from the protein by signal peptidase after the proteins are transported.

    [0303] Exemplary signal peptides are signal sequences from S. cerevisiae alpha-mating factor prepro peptide and the signal peptides from the P. pastoris acid phosphatase gene (PHO1) and the extracellular protein X (EPX1) (Heiss et al., 2015; WO2014067926A1).

    [0304] Expression vectors comprising one or more of the regulatory elements (such as the pG1-x promoter and optionally a signal sequence) may be constructed to drive expression of a POI, and the expressed yield is compared to constructs with conventional regulatory elements, such as to prove the function of the relevant sequences. The identified nucleotide sequences may be amplified by PCR using specific nucleotide primers, cloned into an expression vector and transformed into a eukaryotic cell line, e.g. using a yeast vector and a strain of P. pastoris, for high level production of various different POI. To estimate the effect of the pG1-x promoter as described herein on the amount of recombinant POI so produced, the eukaryotic cell line may be cultured in shake flask experiments and fedbatch or chemostat fermentations in comparison with strains comprising a conventional pG1 promoter or the pGAP promoter, in the respective cell. In particular, the choice of the promoter has a great impact on the recombinant protein production.

    [0305] The POI can be produced using the recombinant host cell line by culturing a transformant, thus obtained in an appropriate medium, isolating the expressed product or metabolite from the culture, and optionally purifying it by a suitable method.

    [0306] Transformants according to the present invention can be obtained by introducing such a vector DNA, e.g. plasmid DNA, into a host and selecting transformants which express the POI or the host cell metabolite with high yields. Host cells are treated to enable them to incorporate foreign DNA by methods conventionally used for transformation of eukaryotic cells, such as the electric pulse method, the protoplast method, the lithium acetate method, and modified methods thereof. P. pastoris is preferably transformed by electroporation. Preferred methods of transformation for the uptake of the recombinant DNA fragment by the microorganism include chemical transformation, electroporation or transformation by protoplastation. Transformants according to the present invention can be obtained by introducing such a vector DNA, e.g. plasmid DNA, into a host and selecting transformants which express the relevant protein or host cell metabolite with high yields.

    [0307] Several different approaches for the production of the POI according to the method of the invention are preferred. Substances may be expressed, processed and optionally secreted by transforming a eukaryotic host cell with an expression vector harboring recombinant DNA encoding a relevant protein and at least one of the regulatory elements as described above, preparing a culture of the transformed cell, growing the culture, inducing transcription and POI production, and recovering the product of the fermentation process.

    [0308] The host cell according to the invention is preferably tested for its expression capacity or yield by the following test: ELISA, activity assay, HPLC, or other suitable tests.

    [0309] The invention specifically allows for the fermentation process on a pilot or industrial scale. The industrial process scale would preferably employ volumina of at least 10 L, specifically at least 50 L, preferably at least 1 m.sup.3, preferably at least 10 m.sup.3, most preferably at least 100 m.sup.3.

    [0310] Production conditions in industrial scale are preferred, which refer to e.g. fed batch cultivation in reactor volumes of 100 L to 10 m.sup.3 or larger, employing typical process times of several days, or continuous processes in fermenter volumes of approximately 50-1000 L or larger, with dilution rates of approximately 0.02-0.15 h.sup.1.

    [0311] The suitable cultivation techniques may encompass cultivation in a bioreactor starting with a batch phase, followed by a short exponential fed batch phase at high specific growth rate, further followed by a fed batch phase at a low specific growth rate. Another suitable cultivation technique may encompass a batch phase followed by a continuous cultivation phase at a low dilution rate.

    [0312] A preferred embodiment includes a batch culture to provide biomass followed by a fed-batch culture for high yields POI production.

    [0313] It is preferred to cultivate the host cell line as described herein in a bioreactor under growth conditions to obtain a cell density of at least 1 g/L cell dry weight, more preferably at least 10 g/L cell dry weight, preferably at least 20 g/L cell dry weight. It is advantageous to provide for such yields of biomass production on a pilot or industrial scale.

    [0314] A growth medium allowing the accumulation of biomass, specifically a basal growth medium, typically comprises a carbon source, a nitrogen source, a source for sulphur and a source for phosphate. Typically, such a medium comprises furthermore trace elements and vitamins, and may further comprise amino acids, peptone or yeast extract.

    [0315] Preferred nitrogen sources include NH.sub.4H.sub.2PO.sub.4, or NH.sub.3 or (NH.sub.4).sub.2SO.sub.4,

    [0316] Preferred sulphur sources include MgSO.sub.4, or (NH.sub.4).sub.2SO.sub.4 or K.sub.2SO.sub.4,

    [0317] Preferred phosphate sources include NH.sub.4H.sub.2PO.sub.4, or H.sub.3PO.sub.4 or NaH.sub.2PO.sub.4, KH.sub.2PO.sub.4, Na.sub.2HPO.sub.4 or K.sub.2HPO.sub.4;

    [0318] Further typical medium components include KCl, CaCl.sub.2), and Trace elements such as: Fe, Co, Cu, Ni, Zn, Mo, Mn, I, B;

    [0319] Preferably the medium is supplemented with vitamin B.sub.7;

    [0320] A typical growth medium for P. pastoris comprises glycerol, sorbitol or glucose, NH.sub.4H.sub.2PO.sub.4, MgSO.sub.4, KCl, CaCl.sub.2), biotin, and trace elements.

    [0321] In the production phase a production medium is specifically used with only a limited amount of a supplemental carbon source.

    [0322] Preferably the host cell line is cultivated in a mineral medium with a suitable carbon source, thereby further simplifying the isolation process significantly. An example of a preferred mineral medium is one containing an utilizable carbon source (e.g. glucose, glycerol, sorbitol or methanol), salts containing the macro elements (potassium, magnesium, calcium, ammonium, chloride, sulphate, phosphate) and trace elements (copper, iodide, manganese, molybdate, cobalt, zinc, and iron salts, and boric acid), and optionally vitamins or amino acids, e.g. to complement auxotrophies.

    [0323] Specifically, the cells are cultivated under conditions suitable to effect expression of the desired POI, which can be purified from the cells or culture medium, depending on the nature of the expression system and the expressed protein, e.g. whether the protein is fused to a signal peptide and whether the protein is soluble or membrane-bound. As will be understood by the skilled artisan, cultivation conditions will vary according to factors that include the type of host cell and particular expression vector employed.

    [0324] A typical production medium comprises a supplemental carbon source, and further NH.sub.4H.sub.2PO.sub.4, MgSO.sub.4, KCl, CaCl.sub.2), biotin, and trace elements.

    [0325] For example the feed of the supplemental carbon source added to the fermentation may comprise a carbon source with up to 50 wt % utilizable sugars. The low feed rate of the supplemental medium will limit the effects of product or byproduct inhibition on the cell growth, thus a high product yield based on substrate provision will be possible.

    [0326] The fermentation preferably is carried out at a pH ranging from 3 to 7.5.

    [0327] Typical fermentation times are about 24 to 120 hours with temperatures in the range of 20 C. to 35 C., preferably 22-30 C.

    [0328] The POI is preferably expressed employing conditions to produce yields of at least 1 mg/L, preferably at least 10 mg/L, preferably at least 100 mg/L, most preferred at least 1 g/L.

    [0329] It is understood that the methods disclosed herein may further include cultivating said recombinant host cells under conditions permitting the expression of the POI, preferably in the secreted form or else as intracellular product. A recombinantly produced POI or a host cell metabolite can then be isolated from the cell culture medium and further purified by techniques well known to a person skilled in the art.

    [0330] The POI produced according to the invention typically can be isolated and purified using state of the art techniques, including the increase of the concentration of the desired POI and/or the decrease of the concentration of at least one impurity.

    [0331] If the POI is secreted from the cells, it can be isolated and purified from the culture medium using state of the art techniques. Secretion of the recombinant expression products from the host cells is generally advantageous for reasons that include facilitating the purification process, since the products are recovered from the culture supernatant rather than from the complex mixture of proteins that results when yeast cells are disrupted to release intracellular proteins.

    [0332] The cultured transformant cells may also be ruptured sonically or mechanically, enzymatically or chemically to obtain a cell extract containing the desired POI, from which the POI is isolated and purified.

    [0333] As isolation and purification methods for obtaining a recombinant polypeptide or protein product, methods, such as methods utilizing difference in solubility, such as salting out and solvent precipitation, methods utilizing difference in molecular weight, such as ultrafiltration and gel electrophoresis, methods utilizing difference in electric charge, such as ion-exchange chromatography, methods utilizing specific affinity, such as affinity chromatography, methods utilizing difference in hydrophobicity, such as reverse phase high performance liquid chromatography, and methods utilizing difference in isoelectric point, such as isoelectric focusing may be used.

    [0334] The highly purified product is essentially free from contaminating proteins, and preferably has a purity of at least 90%, more preferred at least 95%, or even at least 98%, up to 100%. The purified products may be obtained by purification of the cell culture supernatant or else from cellular debris.

    [0335] As isolation and purification methods the following standard methods are preferred: Cell disruption (if the POI is obtained intracellularly), cell (debris) separation and wash by Microfiltration or Tangential Flow Filter (TFF) or centrifugation, POI purification by precipitation or heat treatment, POI activation by enzymatic digest, POI purification by chromatography, such as ion exchange (IEX), hydrophobic interaction chromatography (HIC), Affinity chromatography, size exclusion (SEC) or HPLC Chromatography, POI precipitation of concentration and washing by ultrafiltration steps.

    [0336] The isolated and purified POI can be identified by conventional methods such as Western blot, HPLC, activity assay, or ELISA.

    [0337] The POI can be any eukaryotic, prokaryotic or synthetic polypeptide. It can be a secreted protein or an intracellular protein. The present invention also provides for the recombinant production of functional homologs, functional equivalent variants, derivatives and biologically active fragments of naturally occurring proteins. Functional homologs are preferably identical with or correspond to and have the functional characteristics of a sequence.

    [0338] A POI referred to herein may be a product homologous to the eukaryotic host cell or heterologous, preferably for therapeutic, prophylactic, diagnostic, analytic or industrial use.

    [0339] The POI is preferably a heterologous recombinant polypeptide or protein, produced in a eukaryotic cell, preferably a yeast cell, preferably as secreted proteins. Examples of preferably produced proteins are immunoglobulins, immunoglobulin fragments, aprotinin, tissue factor pathway inhibitor or other protease inhibitors, and insulin or insulin precursors, insulin analogues, growth hormones, interleukins, tissue plasminogen activator, transforming growth factor a or b, glucagon, glucagon-like peptide 1 (GLP-1), glucagon-like peptide 2 (GLP-2), GRPP, Factor VII, Factor VIII, Factor XIII, platelet-derived growth factor1, serum albumin, enzymes, such as lipases or proteases, or a functional homolog, functional equivalent variant, derivative and biologically active fragment with a similar function as the native protein. The POI may be structurally similar to the native protein and may be derived from the native protein by addition of one or more amino acids to either or both the C- and N-terminal end or the side-chain of the native protein, substitution of one or more amino acids at one or a number of different sites in the native amino acid sequence, deletion of one or more amino acids at either or both ends of the native protein or at one or several sites in the amino acid sequence, or insertion of one or more amino acids at one or more sites in the native amino acid sequence. Such modifications are well known for several of the proteins mentioned above.

    [0340] A POI can also be selected from substrates, enzymes, inhibitors or cofactors that provide for biochemical reactions in the host cell, with the aim to obtain the product of said biochemical reaction or a cascade of several reactions, e.g. to obtain a metabolite of the host cell. Exemplary products can be vitamins, such as riboflavin, organic acids, and alcohols, which can be obtained with increased yields following the expression of a recombinant protein or a POI according to the invention.

    [0341] In general, the host cell, which expresses a recombinant product, can be any eukaryotic cell suitable for recombinant expression of a POI.

    [0342] Examples of preferred mammalian cells are BHK, CHO (CHO-DG44, CHO-DUXB11, CHO-DUKX, CHO-K1, CHOK1SV, CHOS), HeLa, HEK293, MDCK, NIH3T3, NSO, PER.C6, SP2/0 and VERO cells.

    [0343] Examples of preferred yeast cells used as host cells according to the invention include but are not limited to the Saccharomyces genus (e.g. Saccharomyces cerevisiae), the Pichia genus (e.g. P. pastoris, or P. methanolica), the Komagataella genus (K. pastoris, K. pseudopastoris or K. phaffii), Hansenula polymorpha, Yarrowia lipolytica, Schefferomyces stipitis or Kluyveromyces lactis.

    [0344] Newer literature divides and renames Pichia pastoris into Komagataella pastoris, Komagataella phaffii and Komagataella pseudopastoris. Herein Pichia pastoris is used synonymously for all, Komagataella pastoris, Komagataella phaffii and Komagataella pseudopastoris.

    [0345] The preferred yeast host cells are derived from methylotrophic yeast, such as from Pichia or Komagataella, e.g. Pichia pastoris, or Komagataella pastoris, or K. phaffii, or K. pseudopastoris. Examples of the host include yeasts such as P. pastoris. Examples of P. pastoris strains include CBS 704 (=NRRL Y-1603=DSMZ 70382), CBS 2612 (=NRRL Y-7556), CBS 7435 (=NRRL Y-11430), CBS 9173-9189 (CBS strains: CBS-KNAW Fungal Biodiversity Centre, Centraalbureau voor Schimmel-cultures, Utrecht, The Netherlands), and DSMZ 70877 (German Collection of Microorganisms and Cell Cultures), but also strains from Invitrogen, such as X-33, GS115, KM71 and SMD1168. Examples of S. cerevisiae strains include W303, CEN.PK and the BY-series (EUROSCARF collection). All of the strains described above have been successfully used to produce transformants and express heterologous genes.

    [0346] A preferred yeast host cell according to the invention, such as a P. pastoris or S. cerevisiae host cell, contains a heterologous or recombinant promoter sequences, which may be derived from a P. pastoris or S. cerevisiae strain, different from the production host. In another specific embodiment the host cell according to the invention comprises a recombinant expression construct according to the invention comprising the promoter originating from the same genus, species or strain as the host cell.

    [0347] According to the invention it is preferred to provide a P. pastoris host cell line comprising a pG1-x promoter sequence as described herein operably linked to the nucleotide sequence coding for the POI.

    [0348] If the POI is a protein homologous to the host cell, i.e. a protein which is naturally occurring in the host cell, the expression of the POI in the host cell may be modulated by the exchange of its native promoter sequence with a promoter sequence according to the invention.

    [0349] This purpose may be achieved e.g. by transformation of a host cell with a recombinant DNA molecule comprising homologous sequences of the target gene to allow site specific recombination, the promoter sequence and a selective marker suitable for the host cell. The site specific recombination shall take place in order to operably link the promoter sequence with the nucleotide sequence encoding the POI. This results in the expression of the POI from the promoter sequence according to the invention instead of from the native promoter sequence.

    [0350] It is specifically preferred that the pG1-x promoter has an increased promoter activity relative to the native promoter sequence of the POI.

    [0351] According to a specific embodiment, the POI production method employs a recombinant nucleotide sequence encoding the POI, which is provided on a plasmid suitable for integration into the genome of the host cell, in a single copy or in multiple copies per cell. The recombinant nucleotide sequence encoding the POI may also be provided on an autonomously replicating plasmid in a single copy or in multiple copies per cell.

    [0352] The preferred method as described herein employs a plasmid, which is a eukaryotic expression vector, preferably a yeast expression vector. Expression vectors may include but are not limited to cloning vectors, modified cloning vectors and specifically designed plasmids. The preferred expression vector as used in the invention may be any expression vector suitable for expression of a recombinant gene in a host cell and is selected depending on the host organism. The recombinant expression vector may be any vector which is capable of replicating in or integrating into the genome of the host organisms, also called host vector, such as a yeast vector, which carries a DNA construct according to the invention. A preferred yeast expression vector is for expression in yeast selected from the group consisting of methylotrophic yeasts represented by the genera Hansenula, Pichia, Candida and Torulopsis.

    [0353] In the present invention, it is preferred to use plasmids derived from pPICZ, pGAPZ, pPIC9, pPICZalfa, pGAPZalfa, pPIC9K, pGAPHis or pPUZZLE as the vector.

    [0354] According to a preferred embodiment of the present invention, a recombinant construct is obtained by ligating the relevant genes into a vector. These genes can be stably integrated into the host cell genome by transforming the host cell using such vectors. The polypeptides encoded by the genes can be produced using the recombinant host cell line by culturing a transformant, thus obtained in an appropriate medium, isolating the expressed POI from the culture, and purifying it by a method appropriate for the expressed product, in particular to separate the POI from contaminating proteins.

    [0355] Expression vectors may comprise one or more phenotypic selectable markers, e.g. a gene encoding a protein that confers antibiotic resistance or that supplies an autotrophic requirement. Yeast vectors commonly contain an origin of replication from a yeast plasmid, an autonomously replicating sequence (ARS), or alternatively, a sequence used for integration into the host genome, a promoter region, sequences for polyadenylation, sequences for transcription termination, and a selectable marker.

    [0356] The procedures used to ligate the DNA sequences and regulatory elements, e.g. the pG1-x promoter and the gene(s) coding for the POI, the promoter and the terminator, respectively, and to insert them into suitable vectors containing the information necessary for integration or host replication, are well-known to persons skilled in the art, e.g. described by J. Sambrook et al., (A Laboratory Manual, Cold Spring Harbor, 1989).

    [0357] It will be understood that the vector, which uses the regulatory elements according to the invention and/or the POI as an integration target, may be constructed either by first preparing a DNA construct containing the entire DNA sequence coding for the regulatory elements and/or the POI and subsequently inserting this fragment into a suitable expression vector, or by sequentially inserting DNA fragments containing genetic information for the individual elements, followed by ligation.

    [0358] Also multicloning vectors, which are vectors having a multicloning site, can be used according to the invention, wherein a desired heterologous gene can be incorporated at a multicloning site to provide an expression vector. In expression vectors, the promoter is placed upstream of the gene of the POI and regulates the expression of the gene. In the case of multicloning vectors, because the gene of the POI is introduced at the multicloning site, the promoter is placed upstream of the multicloning site.

    [0359] The DNA construct as provided to obtain a recombinant host cell according to the invention may be prepared synthetically by established standard methods, e.g. the phosphoramidite method. The DNA construct may also be of genomic or cDNA origin, for instance obtained by preparing a genomic or cDNA library and screening for DNA sequences coding for all or part of the polypeptide of the invention by hybridization using synthetic oligonucleotide probes in accordance with standard techniques (Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, 1989). Finally, the DNA construct may be of mixed synthetic and genomic, mixed synthetic and cDNA or mixed genomic and cDNA origin prepared by annealing fragments of synthetic, genomic or cDNA origin, as appropriate, the fragments corresponding to various parts of the entire DNA construct, in accordance with standard techniques.

    [0360] In another preferred embodiment, the yeast expression vector is able to stably integrate in the yeast genome, e. g. by homologous recombination.

    [0361] A transformant host cell according to the invention obtained by transforming the cell with the regulatory elements according to the invention and/or the POI genes may preferably first be cultivated at conditions to grow efficiently to a large cell number. When the cell line is prepared for the POI expression, cultivation techniques are chosen to produce the expression product.

    [0362] The foregoing description will be more fully understood with reference to the following examples. Such examples are, however, merely representative of methods of practicing one or more embodiments of the present invention and should not be read as limiting the scope of invention.

    EXAMPLES

    Example 1: 5-Shortening of pG1 Reveals the Main Regulatory Region of pG1

    [0363] The native (wild-type) pG1 promoter has been isolated from P. pastoris (Komagatella phaffii) strain CBS2612 (CBS strains: CBS-KNAW Fungal Biodiversity Centre, Centraalbureau voor Schimmelcultures, Utrecht, The Netherlands). As determined by Sanger sequencing and subsequent BLAST analysis, the pG1 promoter sequence of CBS2612 had more than 95% sequence identity to the respective regions in the genomic sequences of the strains GS115 (Invitrogen) (upstream of PAS_chr1-3_0011) and CBS7435 (upstream of P7435_Chr1-0007) or K. pastoris DSMZ 70382 (DSMZ strains: German Collection of Microorganisms and Cell Cultures) (upstream of PIPA00372). During the analysis of the genomic region of pG1, it was realized that its gene GTH1 has a different start annotation in the strains CBS7435 (P7435_Chr1-0007) and DSMZ 70382 (PIPA00372) than in GS115 (PAS_chr1-3_0011). In contrast to GS115 and CBS2612, the coding sequence is annotated to start 36 bp further downstream in the genomic sequences of the other two strains.

    [0364] In order to identify the relevant regulatory region of pG1 8 shortened pG1 variants were cloned from CBS2612 starting from the alternative 5 positions-858, -663, -492, -371, -328, -283, -211 and -66 to position-1 (see FIG. 1, numbering based on the start of the GTH1 gene locus PAS_chr1-3_0011). These shortened promoter variants were screened for eGFP expression in deep well plates as described in Example 8 to test for the repression- (glycerol) and induction properties (glucose feed beads) in comparison to the original 965 bp version of pG1 (FIG. 2). No difference in eGFP signal was found for all length variants in the repressing condition, showing that promoter repression was not restricted in any of the shortened variants. After 48 hours of induction, the expression capacity remained fully functional for the promoter variants down to a length of 328 bp. The 283 bp-variant was only about two thirds strong compared to the original pG1 promoter. The two shortest length variants (211 and 66 bp) appeared to be almost nonfunctional. These results that the region between position-400 and -200 contains important regulatory features.

    Example 2: A High Density of Predicted Carbon Source Related TFBS Marks the Main Regulatory Region of the pG1 Promoter

    [0365] The pG1 promoter sequence (1000 bp upstream of the gene PAS_chr1-3_0011) was searched for matrix families belonging to the matrix groups fungi and general core promoter elements using the MatInspector from Genomatix. 111 putative TFBS belonging to 46 different matrix families were found (Table 1). The most common matrix families in the analyzed sequence were monomeric Gal4-class motifs (F$MGCM, 12 binding sites), homeodomain-containing transcriptional regulators (F$HOMD, 6 binding sites), fungal basic leucine zipper family (F$BZIP, 5 binding sites) and yeast GC-Box Proteins (F$YMIG, 5 binding sites). A very high TFBS binding site density was noticed between position-400 to -200 with about two thirds of the mentioned TFBS (most common matrix families) occurring there (18 out of 28). Regarding general core promoter elements, no yeast- or fungi-related motifs were identified by the MatInspector, but a TATA box can be found starting at position-26.

    [0366] A prominent motif was identified e.g. at position-390 to -375, which was termed TAT14 due to its sequence 5-TATTTTTTTTTTTTTT-3 (SEQ ID 21) or TAT15 due to its sequence 5-TATTTTTTTTTTTTTTT-3 (SEQ ID 22). Such poly(A:T) tracts in promoter regions are known to negatively affect nucleosome binding and to stimulate TF binding at nearby sites in yeast.

    Example 3: The Carbon Source-Related Transcription Factors Mxr1, Rgt1, Cat8-1, Cat8-2 and Mig1 were Revealed to be Important for the Regulatory Properties of pG1

    [0367] Transcription factor binding sites with predicted glucose- or carbon source dependency were selected for further analysis (see FIG. 1 and Table 2). pG1 variants with deletions of the respective regions were generated using overlap-extension PCR. Table 3 lists all selected TFBS and indicates all TFBS which are (partially) affected by the deletion (detailed list in Table 2). For some deletions (e.g. 49 and 10), some nucleotides of the respective TFBS were left untouched in order to keep close neighboring TFBS functional and to separately examine their effect.

    [0368] All TFBS deletion and TAT mutation variants were screened for eGFP expression as described in Example 8 in repressing (glycerol) and inducing conditions (glucose feed bead) (FIG. 3). It is important to consider that individual TF/TFBS are usually not sufficient to fulfill a promoter's regulation. TFBS deletions also imply that the promoter sequence can be affected by the newly formed adjoined sequence, by altered distances between TFBS or by changes of higher order properties (chromatin organization). The same TFBS at different positions of the promoter can have different functions, also because of other adjacent TFBS. At closely neighbouring TFBS, TFs might either act synergistically or restrict binding of other TFs due to steric hindrance.

    [0369] Four different carbon source-related TF families were deleted in the pG1 promoter variants (see Table 2 and Table 3): Yeast metabolic regulator (F$ADR; matrixes: F$ADR1.01), Monomeric Gal4-class motifs (F$MGCM; matrixes: F$RGT1.01, F$RGT1.02), Carbon source-responsive elements (F$CSRE, matrixes: F$CSRE.01, F$S1P4.01) and Yeast GC-Box Proteins (F$YMIG; matrixes: F$MIG1.01 and F$MIG1.02). The corresponding transcription factors in S. cerevisiae are Adr1, Rgt1, Sip4/Cat8 and Mig1, respectively.

    [0370] Carbon source dependent promoters are controlled by glucose repression and/or induction by carbohydrates or other non-sugar carbon sources. Glucose repression is mainly conducted by the Snf1 protein kinase complex, the transcriptional repressor Mig1 and protein phosphatase 1. Downstream factors regulate e.g. respiratory genes (Hap4), gluconeogenesis genes (Cat8, Sip4) and glucose transporters (Rgt1) in S. cerevisiae.

    [0371] P. pastoris has two Mig1 homologs, called Mig1-1 and Mig1-2, the second of which possibly acts as carbon catabolite repressor. When glucose is available, Mig1 acts as a repressor, while Rgt1 acts as transcriptional activator. To fulfill repressor function, Mig1 gets dephosphorylated and imported into the nucleus where it recruits the corepressors Ssn6 and Tup1.

    [0372] In limiting glucose, Rgt1 gets dephosphorylated and acts as transcriptional repressor. Rgt1 function is controlled by its phosphorylation state (Rgt1 has four phosphorylation sites), and induction of regulatedpromoters does not require Rgt1 dissociation in S. cerevisiae, as typically seen for transcriptional repressors.

    [0373] The carbon source-responsive zinc-finger transcription factor Adr1 is required for transcriptional activation of the glucose-repressible alcohol dehydrogenase (ADH2) gene in S. cerevisae. The Adr1 homolog in P. pastoris is Mxr1 (PAS_chr4_0487), the key regulator of methanol metabolism, and it was reported to be a positive acting transcription factor being essential for strong P.sub.AOX induction on methanol. The reported TFBS core motif 5 CYCC 3 for Mxr1 matches with both F$ADR1.01 sites found in the pG1 promoter sequence.

    [0374] The carbon source response element (CSRE) is bound by the transcriptional activators Sip4 and Cat8 and functions to induce the expression of gluconeogenesis genes in S. cerevisiae. Two P. pastoris homologs of ScCat8 can be found: Cat8-1 (PAS_chr2-1_0757) and Cat8-2 (PAS_chr4_0540), both also being the best blastp hits for ScSip4. Cat8-2 is weakly similar to ScCat8, and it potentially plays an important role in derepressing conditions.

    Example 4: Deletion Variants of the pG1 Promoter Reveal TFBS Responsible for its Repression and Induction

    [0375] Out of the 5 deletion variants residing upstream (5) of the main regulatory region of pG1 identified before (see dashed box in FIG. 1 and Table 2), the variants pG1-1, -2 and -4 appear to have a beneficial effect on promoter strength while the deletion variants pG1-3 and 5 had no effect on GFP expression compared to the original pG1 promoter (SEQ ID 9). This result suggests that 5 shortening of the promoter might be beneficial for the engineering of pG1. TFBS deletions within the main regulatory region of pG1 (pG1-6 to -12, see FIG. 1 and Table 2) had different impacts on eGFP expression, but none showed increased induction without losing the repression properties. Therefore, it is assumed that the main regulatory region of pG1 needs to be maintained in engineered pG1 promoter variants in order to retain its tight regulation. Accordingly, without this region, much lower induction in limiting glucose was observed in Example 1 (pG1-328 and pG1-283, FIG. 2).

    [0376] Mig1 binding sites were deleted in pG1-3, -4, -10 and -11 (F$MIG1.02 in 3, F$MIG1.01 in 4, 10 and 11), whereat pG1-10 and pG1-11 also include F$ADR1.01 and F$RGT1.02 deletions, respectively. Slightly tighter repression was found for 3, while 4 had unchanged repression but enhanced eGFP levels after induction.

    [0377] Liberated repression seen for 10 and weaker promoter induction of 10 and 11 could also be connected to F$RGT1 binding sites in this region (F$RGT1.01 and F$RGT1.02 deleted in 9 and 11). Also, Mig1 could play a bifunctional role in pG1 regulation: two MIG1 genes are found in P. pastoris (MIG1-1, MIG1-2) and they were shown to be regulated contrariwise upon glucose availability.

    [0378] The deletion of F$ADR1.01 increased eGFP levels in the variant pG1-1, although Mxr1 (positive regulator of methanol metabolism in Pp, homolog of ScADR1) binding site deletion would be expected to rather weaken the promoter. Combined deletion of F$ADR1.01 with F$MIG1.01 in pG1-10 liberated promoter repression on glycerol and weakened its induction, which is a conclusive response for Mig1 TFBS deletion.

    [0379] In the main regulatory region, the binding site F$RGT1.02 was deleted in the variants pG1-6 (two sites), -7, -8, -11 and -12, and F$RGT1.01 was deleted in 49. The variant harboring the deletion of the paired F$RGT1.02 site (6, binding sites on opposite strands with a shift of 7 bp) showed a slightly liberated repression and reduced induction. The variants 7 and 8 contain very close F$RGT1.02 sites, whereat the first lies on the negative- and the second on the positive strand; also 8 contains the deletion of an F$S1P4.01 site. The first (7) showed a slightly liberated repression and increased induction, while the second (8) was much weaker induced (but had unchanged promoter repression). This indicates a strong role for the transcriptional activator Cat8-1 and/or Cat8-2 (strongest homologs for ScSip4) for pG1 induction. The variant 49 was created to delete closely located F$RGT1.01 and F$CSRE.01 TFBS (binding sites on opposite strands) and the drastic loss of repression indicates a strong role of these TFBS to tightly control pG1, most likely through binding of Rgt1, Cat8-1 and/or Cat8-2. The deletion of F$RGT1.02 in the variant pG1-12 did not have an effect on eGFP expression performance. Interestingly, CATS-2 transcription is strongly upregulated in limiting glucose compared to glucose surplus, while RGT1 and CATS-2 were not transcriptionally regulated in the tested conditions.

    Example 5: pG1 Promoter Strength is Dependent on the Poly(A:T) Tract TAT14

    [0380] The TAT motif is located about 80 bp upstream (5, e.g. position-390 to -374) of the main regulatory region of pG1. Repeated sequencing of the 5-region of GTH1 in P. pastoris CBS2612, CBS7435 or GS115 resulted in the detection of 15+/1 Ts in the TAT motif. To elucidate its impact on promoter performance, the TAT14 motif was selected for deletion (pG1-TAT14) and mutation (to T16, T18 and T20; pG1-T16, pG1-T18, pG1-T20). Primers (see primers #37-42 in Table 4) were initially designed to obtain T18, T20 and T22, but variants with different lengths (T16, T20 and T18, respectively) were obtained and used. Deletion of the TAT14 motif resulted in lower GFP signals, whereas its prolongation increased the expression strength of pG1. This indicates that the use of a prolonged TAT14 motif would be beneficial for pG1 engineering.

    Example 6: Partial Sequence Duplications of pG1's Main Regulatory Region Significantly Improve its Expression Strength

    [0381] Two duplication variants (pG1-D1240 (SEQ ID 49) and pG1-D1427 (SEQ ID 85), the numbers state the lengths of the respective promoter variants) of the pG1 promoter were generated by PCR amplification of two sequence fragments (472 to -188 and -472 to -1) and insertion using the restriction sites PstI and BgIII (positions 509-514 and 525-530). The duplication sections start upstream of TFBS deleted in pG1-5 and end after the main regulatory region of pG1 for the first variant (pG1-D1240), while the second duplication (pG1-D1427) reaches until the 3-end of the pG1 promoter. These variants were screened for eGFP expression in the same way as described for the TFBS deletion and TAT14 mutation variants (see Example 8). Both duplication variants showed more tight repression in excess glycerol and stronger induction upon limiting glucose (FIG. 4).

    [0382] The post-transformational stability of the duplication variant clone pG1-D1240 #3 was tested by performing three consecutive batch cultivations without selection pressure, which is equal to about 20 generations. eGFP expression was stable over the whole cultivation time (data not shown). In comparison, a typical P. pastoris bioreactor process starts with OD.sub.600=1 (0.2-0.4 g/L YDM) in the batch phase and ends with 100 g/L YDM after the fed batch phase and thereby takes about 10 generations.

    Example 7: Verification of pG1 Promoter Variant Performance in Fed Batch Bioreactor Cultivation

    [0383] In order to verify the performance of the generated promoter variants in bioprocess conditions, some variants were selected for fed batch cultivation based on their altered eGFP expression performance: pG1-2 (SEQ ID 211) was the most enhanced variant upstream of the main regulatory region, and pG1-T16 (SEQ ID 257) and pG1-D1240 (SEQ ID 49) showed higher eGFP expression levels in limiting glucose without losing promoter repression in the glycerol condition. A bioreactor cultivation, which was started with a glycerol batch phase followed by a space-time yield optimized fed batch (Prielhofer et al., 2013), was performed for one clone each and compared to the control strain pG1 #8 for eGFP expression (see FIG. 5 and Table 5).

    [0384] Fed batch fermentations were performed in DASGIP reactors with a final working volume of 0.7 L.

    [0385] Following media were used:

    [0386] PTM.sub.1 Trace Salts Stock Solution Contained Per Liter

    [0387] 6.0 g CuSO.sub.4.5H.sub.2O, 0.08 g NaI, 3.36 g MnSO.sub.4.H.sub.2O, 0.2 g Na.sub.2MoO.sub.4.2H.sub.2O, 0.02 g H.sub.3BO.sub.3, 0.82 g CoCl.sub.2, 20.0 g ZnCl.sub.2, 65.0 g FeSO.sub.4.7H.sub.2O, 0.2 g biotin and 5.0 ml H.sub.2SO.sub.4 (95%-98%).

    [0388] Glycerol Batch Medium Contained Per Liter

    [0389] 2 g Citric acid monohydrate (C.sub.6H.sub.8O.sub.7.H.sub.2O), 39.2 g Glycerol, 12.6 g NH.sub.4H.sub.2PO.sub.4, 0.5 g MgSO.sub.4.7H.sub.2O, 0.9 g KCl, 0.022 g CaCl.sub.2.2H.sub.2O, 0.4 mg biotin and 4.6 ml PTM1 trace salts stock solution. HCl was added to set the pH to 5.

    [0390] Glucose Fed Batch Medium Contained Per Liter

    [0391] 464 g glucose monohydrate, 5.2 g MgSO.sub.4.7H.sub.2O, 8.4 g KCl, 0.28 g CaCl.sub.2.2H.sub.2O, 0.34 mg biotin and 10.1 mL PTM1 trace salts stock solution.

    [0392] The dissolved oxygen was controlled at DO=20% with the stirrer speed (400-1200 rpm). Aeration rate was 24 L h.sup.1 air, the temperature was controlled at 25 C. and the pH setpoint of 5 was controlled with addition of NH.sub.4OH (25%).

    [0393] To start the fermentation, 400 mL batch medium was sterile filtered into the fermenter and was inoculated from a selective pre-culture of the respective P. pastoris clone with a starting optical density (OD600) of 1. The batch phase of approximately 25 h (reaching a dry biomass concentration of approximately 20 g/L) was followed by a glucose-limited fed batch starting with an exponential feed for 7 h and a constant feed rate of 15 g/L for 13 h, leading to a final dry biomass concentration of approximately 100 g/L. Samples were taken during batch and fed batch phase, and analyzed for eGFP expression using a plate reader (Infinite 200, Tecan, CH). Therefore, samples were diluted to an optical density (OD600) of 5. Results are shown in FIG. 5 as relative fluorescence per bioreactor (FL/r).

    [0394] The gene copy number of these three clones was analyzed using Real-time PCR and resulted in one GCN for all of them (data not shown). All pG1-variants displayed good repression in the batch phase and strong expression in the induced state (Table 5). The strong improvement of the duplication variant pG1-D1240 could be verified in bioreactor conditions, the clone pG1-D1240 #3 showed a 50% increase in GFP fluorescence at the fed batch end compared to pG1. Although the signal was already increased at the batch end, the induction ratio was even a bit higher than for the original pG1 Other than in the screening, the clone pG1-2 #3 had a slightly increased signal at the batch end, and about 10% weakened signal at the fed batch end. The TAT14 mutation variant clone pG1-T16 #3 showed the strongest signal at the batch end, and fell behind the duplication variant at the fed batch end, reaching about 20% improvement over the control pG1 #8, similar to the screening result. The different induction behavior of the clones in the batch phase is explained by derepression due to decreasing glycerol concentration throughout the batch phase (see FIG. 5A). Overall, the fed batch cultivations could largely confirm the results obtained in small scale screening.

    ACHIEVEMENTS AND CONCLUSIONS

    [0395] Gene promoters with carbon source-dependent regulation are favorable for bioprocess application because the production phase can be separated from growth. Potential promoter-based protein production improvement can be accomplished by finding the optimal growth conditions (e. g. growth rate, feeding strategy) or by directly manipulating the promoter sequence (e. g. mutations, deletions).

    [0396] Several pG1 promoter variants were constructed with shortened length, TFBS deletions, TAT motif mutations and fragment duplications. Thereby, the main regulatory region of pG1, including its important TFBS was identified. The analysis of TFBS deletions indicates that the transcription factors Rgt1 and Cat8-1 and/or Cat8-2 play an essential role for pG1 repression and induction: two motifs consisting of F$RGT1 and F$CSRE binding at the same position on the opposite strands were deleted. Deletion of the first part (pG1-8, position-293 to -285; RGT1: (+)-310 to -299, CSRE: ()-299 to 285) caused weakened promoter induction, while deletion of the second part (pG1-9, position-275 to -261; RGT1: ()-275 to -259, CSRE: (+)-276 to -260) lead to decreased promoter repression. Thereby, regulatory motifs were identified which are essential and characteristic for pG1 regulation.

    [0397] The role of the transcriptional regulators Mig1 (F$MIG1) and Mxr1 (F$ADR1) might be more important in other conditions such as excess glucose or methanol induction. Other transcription factors which bind in or close to that region might also contribute to pG1's regulation.

    [0398] The poly(A:T) tracts are known to play a role in promoter sequences, and the TAT motif in pG1, which is located upstream (e.g. position-390 to -375) of the main regulator region, could be shown to be essential for its strength. Elongation of this motif to T16, T18 and T20 had a positive effect on promoter performance.

    [0399] Deletion variants of pG1 revealed that 5shortening might be beneficial for promoter engineering as well. TFBS for Mxr1, Mig1, Rgt1 and Cat8 deleted upstream of the main regulatory region of pG1 improved eGFP expression, although this effect was not seen for the 5shortened promoter variants.

    [0400] Two variants with partial sequence duplications reached greatly enhanced expression capacities compared to the wild type pG1.

    [0401] Distinct features of pG1 good expression performance could be assigned, which is a solid basis for rational promoter engineering: 5shortening, TAT motif use and optional mutation/elongation and fragment duplication. pG1 variant performance in small scale screening could successfully be verified in fed batch cultivations.

    Abbreviations

    [0402] CSRE: carbon source response element, F$: fungi specific TF matrix, GCN: gene copy number, GOI: gene of interest, Pp: Pichia pastoris, Sc: Saccharomyces cerevisiae, TF: transcription factor(s), TFBS: transcription factor binding site(s), YDM: yeast dry mass

    Example 8: Determining the Repression, Induction, pG1-x Expression Level (Expression Level Compared to pG1), Induction Ratio

    [0403] The promoter strength as compared to the pG1 promoter and the induction ratio can be determined by the following standard assay: P. pastoris strains are screened in 24-deep well plates at 25 C. with shaking at 280 rpm with 2 mL culture per well. Glucose feed beads (6 mm, Kuhner, CH) are used to generate glucose-limiting growth conditions. Cells are analyzed for eGFP expression during repression (YP+1% glycerol, exponential phase) and induction (YP+1 feed bead, for 20-28 hours) using flow cytometry. The specific eGFP fluorescence is calculated from fluorescence intensity and forward scatter for at least 3000 data points of the flow cytometry data. Forward scatter is a relative measure for the cell volume. Specific eGFP fluorescence equals fluorescence intensity (FI) divided by forward scatter (FSC) to the 1.5, that is FI/FSC.sup.1.5 (Hohenblum, H., N. Borth & D. Mattanovich, (2003) Assessing viability and cell-associated product of recombinant protein producing Pichia pastoris with flow cytometry. J Biotechnol 102: 281-290). From this data, the geometric mean of the population's specific fluorescence is used, and normalized by subtracting background signal of non-producing P. pastoris wild type cells. The specific eGFP fluorescence of the glycerol condition is termed Repression, and the specific eGFP fluorescence of the limited glucose condition (glucose feed beads) is termed Induction. Therefore, only Repression and Induction values of the same screening and flow cytometry measurement can be compared and used for calculations. To determine relative pG1-x promoter strength, the eGFP expression levels in the induced state of the pG1-x promoters were compared to the original pG1 promoter by dividing the Induction value of a strain comprising the pG1-x promoter by the Induction value of a strain comprising the original pG1 promoter. The Induction ratio is calculated by dividing the Induction value by the Repression value of the same strain/promoter. Repression, Induction, relative pG1-x promoter strength and Induction ratio are shown in Table 6 for several promoter variants

    [0404] Further examples have proven that by using a pG1-x promoter comprising or consisting of the nucleotide sequence SEQ ID 49 a model protein (POI) was produced in P. pastoris at much higher yields (a fold increase of more than 3.5 fold), fed-batch experiments) as compared to the unmodified pG1 promoter (reference SEQ ID 7).

    Example 9: Comparison of Speed Fermentation and Standard Fermentation

    [0405] Summary: Significantly reduced fermentation times could be obtained for the expression of an alternative scaffold protein as model protein under control of a pG1-3 embodiment of SEQ ID 39 (pG1-D1240 (SEQ ID 49)) promoter by employing a space-time yield optimized fed batch protocol instead of using a standard fed batch regime.

    [0406] A clone expressing a model protein under control of pG1-D1240 (SEQ ID 49) was selected for the fed batch cultivations. Fed batch cultivations were performed in DASGIP reactors (Eppendorf, Germany) with a final working volume of 0.5 L. Media and trace element solution were prepared as previously described in Example 7, except for the glycerol concentration in the glycerol batch medium which was 45 g/L. During cultivation the dissolved oxygen level was controlled at DO=30% with the stirrer speed (400-1200 rpm). Aeration rate was 1 wm air, the temperature was controlled at 25 C. and the pH set-point of 5.0 was controlled with addition of NH.sub.4OH (25%). To start the bioreactor cultivation, 250 mL batch medium were inoculated from a pre-culture of the respective P. pastoris clone with a starting optical density (OD600) of 1.0. The batch phase on glycerol took approximately 30 h and reached a dry biomass concentration of 25-29 g/L. The glycerol batch phase was followed by a glucose-limited fed batch. Two different fed batch cultivation modes were compared: (A) a standard fed batch protocol using a constant feed rate, (B) a space-time yield optimized fed batch protocol (Speed fermentation), where the glucose feed rate was optimized to maximize the volumetric productivity of the fermentation.

    [0407] For the standard cultivation, a constant glucose feed rate of 1.25 mL h.sup.1 was selected. The fed batch cultivation was maintained for 100 h (126 h total cultivation time) resulting in a final dry biomass concentration of approximately 90 g L.sup.1. For the Speed fermentation, a model-based optimization algorithm (Maurer et al., Microbial Cell Factories, 2006, 5:37) was adopted, where the optimized volumetric glucose feed rate F(t) was approximated by a linearly increasing function: F(t) [mL h.sup.1]=0.3234 mL h.sup.2*t+3.3921 mL h.sup.1. The fed batch phase was maintained for t=33 h (60 h total cultivation time), which resulted in a final dry biomass concentration of approximately 140 g L.sup.1.

    [0408] Samples were taken at the end of the batch and during the fed batch phase. Product titers were analyzed from clarified supernatants using a HT low MW protein express reagent kit and the Caliper LabChip GI system (Perkin Elmer, USA). As a reference standard for absolute quantification a purified standard of alternative scaffold protein was used.

    [0409] FIG. 9 shows the product and biomass generation over the total cultivation time for the standard cultivation (A) and the Speed fermentation (B). In comparison, final product titers of 6.4 g L.sup.1 and 4.3 g L.sup.1 could be reached after 60 h and 126 h for the Speed fermentation and the standard fermentation, respectively. In other words, a 1.4-fold higher titer (resp. 1.2-fold higher broth titers) could be found in significantly shorter fermentation time (66 h) when supplementing the glucose feed during expression under the pG1-D1240 (SEQ ID 49) promoter as described for the Speed fermentation instead of using the described standard feed regime.

    [0410] Tables

    TABLE-US-00004 TABLE1 TFBSidentifiedinthepG1promotersequenceusingMatInspector. Targetedcarbonsource-relatedTFBSofthepG1deletionvariantsare showninbold. Detailed Detailed Start End Sequence Matrix Family Matrix posi- posi- SEQID Family Information Matrix Information tion tion Strand NO. F$TEAF TEA/ATTS F$ABAA.01 Aspergillus 985 969 accctaCAT DNAbinding spore/ Tctactgg domain developmental (SEQID factors regulator 271) F$NRGF NRGzinc F$NRG1.01 Transcriptional 976 964 + tgtAGGGtc finger repressor ccca factors Nrg1 (SEQID 272) F$YSTR Yeaststress F$MSN2.01 Transcriptional 956 942 gagactaGG response activatorfor GGgagc elements genesin (SEQID multistress 273) response F$PDRE Pleiotropic F$PDRE.01 Pleiotropic 944 936 TCCCtggag drug drug (SEQID resistance resistance 274) responsive responsive elements element (yeast) F$YMAT Yeastmating F$HMRA2.01 HiddenMat 939 927 + gggaaaTG factors RightA2,a2is TAaaa oneoftwo (SEQID genes 275) encodedby theamating typecassette in S.cerevisiae F$MADS Yeast F$RLM1.01 YeastMADS- 926 908 gtttTCTAtta MADS-Box BoxRLM1 gcagtata factors transcription (SEQID factor 276) O$INRE Core O$DINR.01 Drosophila 899 889 + gcTCAGttgtc promoter initiatormotifs (SEQID initiator 277) elements F$RFXP Regulatory F$RFX1.02 RFX1(CRT1), 896 882 ttatcctgaCA factorX actsby ACtg protein, recruiting (SEQID homologous Ssn6and 278) to Tup1,general mammalian repressorsto RFX1-5 thepromoters ofdamage- inducible genes F$HOMD Homeodomain- F$YOX1.02 Yeast 889 875 aacgtaATT containing homeobox1, Atcctg transcriptional homeodomain- (SEQID regulators containing 279) transcriptional repressor F$HOMD Homeodomain- F$YOX1.02 Yeast 888 874 + aggataATT containing homeobox1, Acgttc transcriptional homeodomain- (SEQID regulators containing 280) transcriptional repressor O$MTEN Core O$DMTE.01 Drosophila 888 868 acagtcgAA promoter motiften CGtaattatc motiften element ct elements (SEQID 281) F$BZIP Fungalbasic F$CST6.01 Chromosome 885 865 actacagtcg leucine stability,bZIP aACGTaatt zipperfamily transcription at factorofthe (SEQID ATF/CREB 282) family(ACA2) F$MADS Yeast F$RLM1.01 YeastMADS- 855 837 tcttTCTAac MADS-Box BoxRLM1 aatacagat factors transcription (SEQID factor 283) F$YMAT Yeastmating F$MATALP Homeodomain 853 841 + ctgtaTTGTt factors HA2.02 transcriptional aga repressor (SEQID Matalpha2 284) F$MMAT M-box F$MAT1MC.01 HMG-BOX 852 842 + tgtATTGttag interacting protein (SEQID withMat1-Mc interactswith 285) M-boxsite, cooperativity withHMG-Box STE11protein F$STPF STPgene F$STP2.01 Proteolytically 828 814 gcggcGCC family activated Gtaaaaa transcription (SEQID factor 286) F$STPF STPgene F$STP2.01 Proteolytically 823 809 + acggcGCC family activated Gccatat transcription (SEQID factor 287) F$YADR Yeast F$ADR1.01 Alcohol 785 777 + aaCCCCact metabolic Dehydrogenase (SEQID regulator Regulator, 288) carbon source- responsive zinc-finger transcription factor F$RFXP Regulatory F$RFX1.01 RFX1(CRT1) 763 749 cgtgtataGC factorX isaDNA- AAcag protein, bindingprotein (SEQID homologous thatactsby 289) to recruiting mammalian Ssn6and RFX1-5 Tup1,general repressorsto thepromoters ofdamage- inducible genes F$YMCB YeastMluI F$SWI4.01 DNAbinding 756 744 + tatacaCGA cellcycle componentof Acca box theSBF(SCB (SEQID bindingfactor) 290) complex (Swi4p-Swi6p) F$CYTO Activatorof F$HAP1.01 HAP1, 715 701 + ctgaagtcAT cytochrome S.cerevisiae CGgtt C memberof (SEQID GALfamily, 291) regulates heme dependent cytochrome expression F$FKHD Fungalfork F$FKH1.01 Forkhead 709 693 + tcatcggTTA head transcription Acaatca transcription factorFkh1 (SEQID factors 292) F$ROX1 Repressorof F$ROX1.01 Heme- 704 692 ttgaTTGTta hypoxic dependent acc genes transcriptional (SEQID repressorof 293) hypoxicgenes F$YMAT Yeastmating F$MATALP Homeodomain 703 691 cttgaTTGTt factors HA2.02 transcriptional aac repressor (SEQID Matalpha2 294) F$MMAT M-box F$MAT1MC.01 HMG-BOX 702 692 ttgATTGttaa interacting protein (SEQID withMat1-Mc interactswith 295) M-boxsite, cooperativity withHMG-Box STE11protein F$YHSF Yeastheat F$HSF1.01 Trimericheat 678 646 aacacctact shockfactors shock gaatatGGA transcription Aaggagcatt factor caga (SEQID 296) F$PHD1 Pseudohyphal F$PHD1.03 Transcription 635 623 gcaGTGCa determinant factorinvolved tgcaa 1 inregulationof (SEQID filamentous 297) growth F$MGCM Monomeric F$RGT1.02 Glucose- 628 612 + cactgCGG Gal4-class responsive Aagaattag motifs transcription (SEQID factor 298) involvedin regulationof glucose transporters F$CSRE Carbon F$CSRE.01 Carbon 626 612 ctaattctTC source- source- CGcag responsive responsive (SEQID elements element 299) (yeast) F$YRSC Yeast F$RSC3.01 Component 614 594 + tagccaatag transcription ofthe CGCGtttcata factors RSC (SEQID remodeling chromatin 300) chromatin remodeling structure complex F$YMCB Yeast F$STUAP.O1 Aspergillus 609 597 gaaaCGCG MluI Stunted ctatt cell protein, (SEQID cycle (bHLH)-like 301) box structure, regulates multicellular complexity during asexual reproduction F$YMCB Yeast F$MCB.01 MluIcell 608 596 + atagCGCGt MluI cyclebox, ttca cell activates (SEQID cycle G1/S-specific 302) box transcription (yeast) F$DUIS DAL F$DAL82.01 Transcriptional 597 589 + cataTGCGc upstream activatorfor (SEQID induction allantoin 303) sequence catabolic genes F$PHD1 Pseudohyphal F$PHD1.02 Transcription 597 585 + cataTGCG determinant factorinvolved ctttt 1 inregulationof (SEQID filamentous 304) growth F$RDNA RDNA F$REB1.02 rDNA 589 577 + cttTTACccc binding enhancer ctc factor bindingprotein (SEQID 1,termination 305) factorforRNA polymeraseI and transcription factorforRNA polymeraseII F$YMIG YeastGC- F$MIG1.02 MIG1,zinc 586 568 ttgacaaaag Box finger aGGGGgtaa Proteins protein (SEQID mediates 306) glucose repression F$YSTR Yeaststress F$MSN2.01 Transcriptional 586 572 caaaagaG response activatorfor GGGgtaa elements genesin (SEQID multistress 307) response F$BZIP Fungal F$YAP1.02 Yeast 585 565 + taccccctctttt basic activator GTCAagcg leucine protein (SEQID zipper ofthe 308) family basic leucine zipper (bZIP) family F$TALE FungalTALE F$TOS8.01 Homeodomain- 579 567 + ctcttttGTCAag homeodomain containing (SEQID class transcription 309) factor F$DUIS DAL F$DAL82.01 Transcriptional 567 559 atttTGCGc upstream activatorfor (SEQID induction allantoin 310) sequence catabolic genes F$YMIG Yeast F$MIG1.01 MIG1,zinc 553 535 + taagatttggt GC- fingerprotein GGGGgtgt Box mediates (SEQID Proteins glucose 311) repression F$YRAP Yeast F$RAP1.06 RAP1(TUF1), 546 524 gctaacggct activatorof activatoror caCACCcc glycolyse repressor cacca genes/ dependingon (SEQID repressorof context 312) matingtype1 F$IRTF Iron- F$AFT2.01 Activator 543 529 cggctcaCA responsive ofFe(iron) CCccca transcriptiona1 transcription2, (SEQID activators iron-regulated 313) transcriptional activator O$VTBP Vertebrate O$ATATA.01 Avian 530 514 ttgtactTCA TATA C-type Gctaacg binding LTRTATA (SEQID proteinfactor box 314) F$RRPE Ribosomal F$STB3.01 Ribosomal 504 488 tgcagtttTTT RNA RNA Caggga processing processing (SEQID element element 315) (RRPE)- binding protein F$MGCM Monomeric F$RGT1.02 Glucose- 442 426 atatcAGG Gal4-class responsive Aaaaacata motifs transcription (SEQID factor 316) involvedin regulationof glucose transporters F$GATA Fungal F$GZF3.01 GATAzinc 434 420 + tcctGATAtg GATA finger catca binding protein (SEQID factors Gzf3 317) F$PHD1 Pseudohyphal F$PHD1.01 Transcription 430 418 + gataTGCAt determinant factor caaa 1 involved (SEQID inregulation 318) of filamentous growth F$YMAT Yeastmating F$MATA1.01 Homeodomain 429 417 ttttGATGca factors protein tat mating (SEQID factora1 319) F$ICGG Inverted F$CHA4.01 Fungalzinc 408 388 + taaaacctga CGGtriplets cluster atctCCGCt spaced transcription at preferentially factorCha4, (SEQID by10bp singletriplet 320) F$MGCM Monomeric F$YRR1.01 Zinccluster 403 387 aatagCGG Gal4-class transcription Agattcagg motifs factor, (SEQID activates 321) genes involvedin multidrug resistance (PDR2) F$RDR1 Repressor F$RDR1.01 Repressorof 399 389 tagCGGAg of Drug att Drug Resistance1 (SEQID Resistance (transcriptional 322) 1 repressor involvedinthe controlof multidrug resistance F$RFXP Regulatory F$RFX1.02 RFX1(CRT1), 366 352 ttgtcacgaA factorX actsby AACgg protein, recruiting (SEQID homologous Ssn6and 323) to Tup1,general mammalian repressorsto RFX1-5 thepromoters ofdamage- inducible genes F$YMCB Yeast F$SWI4.01 DNAbinding 364 352 ttgtcaCGA MluI componentof Aaac cell theSBF(SCB (SEQID cycle binding 324) box factor) complex (Swi4p-Swi6p) F$BZIP Fungal F$YAP1.02 Yeast 361 345 tggaaattaat basic activator ttGTCAcgaa leucine protein (SEQID zipper ofthe 325) family basic leucine zipper (bZIP) family F$RRPE Ribosomal F$STB3.01 Ribosomal 359 347 aattaattTG RNA RNA TCacgaa processing processing (SEQID element element 326) (RRPE)- binding protein F$TALE Fungal F$CUP9.01 Homeodomain 361 341 ttaattTGTC TALE transcriptional acg homeodomain repressor (SEQID class Cup9 327) F$HOMD Homeodomain- F$YOX1.01 Yeast 358 344 aaattAATTt containing homeobox1, gtcac transcriptional homeodomain- (SEQID regulators containing 328) transcriptional repressor F$HOMD Homeodomain- F$YOX1.01 Yeast 357 343 + tgacaAATT containing homeobox1, aatttc transcriptional homeodomain- (SEQID regulators containing 329) transcriptional repressor F$ICGG Inverted F$TEA1.01 Ty1enhancer 357 337 + tgacaaaTT CGG activator,zinc AAtttccaac triplets clusterDNA- gg spaced bindingprotein (SEQID preferentially 330) by10bp F$MGCM Monomeric F$YRR1.01 Zinccluster 352 336 cccgtTGGA Gal4-class transcription aattaatt motifs factor, (SEQID activates 331) genes involvedin multidrug resistance (PDR2) F$ASG1 Activator F$ASG1.01 Fungalzinc 340 324 tCCGGaca of cluster agaccccgt stress transcription (SEQID genes factorAsg1 332) F$MGCM Monomeric F$RGT1.02 Glucose- 337 321 ttatcCGGA Gal4-class responsive caagaccc motifs transcription (SEQID factor 333) involved in regulation of glucose transporters F$MGCM Monomeric F$RGT1.02 Glucose- 330 320 + ttgtcCGGA Gal4-class responsive taagagaa motifs transcription (SEQID factor 334) involvedin regulationof glucose transporters F$RDR1 Repressorof F$RDR1.01 Repressorof 332 316 + gtcCGGAta Drug Drug ag Resistance1 Resistance1 (SEQID (transcriptional 335) repressor involvedinthe controlof multidrug resistance F$GATA Fungal F$GATA.01 GATAbinding 329 315 + tccgGATAa GATA factor(yeast) gagaat binding (SEQID factors 336) F$PRES Pheromone F$STE12.01 Transcription 315 303 taatcaAAC response factor Aaaa elements activatedbya (SEQID MAPkinase 337) signaling cascade, activates genes involvedin matingor pseudohyphal/ invasive growth pathways F$GATA Fungal F$GAT1.01 GATA-typeZn 311 297 aacggATA GATA fingerprotein Atcaaac binding Gat1 (SEQID factors 338) F$MGCM Monomeric F$RGT1.02 Glucose- 310 294 ccgaaCGG Gal4-class responsive Ataatcaaa motifs transcription (SEQID factor 339) involvedin regulationof glucose transporters O$MTEN Core O$DMTE.01 Drosophila 310 290 ttatccgAAC promoter motiften Ggataatcaaa motiften element (SEQID elements 340) F$YORE Yeastoleate F$OAF1.01 Oleate- 307 283 cgtccatttaT response activated CCGaacgg elements transcription ataatc factor,acts (SEQID aloneand 341) asa heterodimer withPip2p F$MGCM Monomeric F$RGT1.02 Glucose- 299 289 + ccgttCGG Gal4-class responsive Ataaatgga motifs transcription (SEQID factor 342) involvedin regulationof glucose transporters F$YGAL YeastGAL4 F$GAL4.01 GAL4 301 285 agcaggcgtc factor transcriptional catttatCCG activatorin Aacgg responseto (SEQID galactose 343) induction F$CSRE Carbon F$SIP4.01 Zinccluster 299 285 tCCATttatc source- transcriptional cgaac responsive activator, (SEQID elements bindstothe 344) carbon source- responsive element (CSRE)of gluconeogenic genes F$RDR1 Repressorof F$RDR1.01 Repressorof 301 277 + gttCGGAtaaa Drug Drug (SEQID Resistance1 Resistance1 345) (transcriptional repressor involved inthe controlof multidrug resistance F$YGAL YeastGAL4 F$LAC9.01 LAC9binding 299 275 + gttCGGAta factor site, aatggacgcc homologousto tgctcc GAL4of (SEQID Saccharomyces 346) cerevisiae F$FBAS Fungi F$LEU3.02 LEU3, 275 261 taaCCGGa branched S.cerevisiae, aaaatatgg aminoacid zinccluster (SEQID biosynthesis protein 347) F$CSRE Carbon F$CSRE.01 Carbon 276 260 + catattttTC source- source- CGgtt responsive responsive (SEQID elements element 348) (yeast) F$MGCM Monomeric F$RGT1.01 Glucose- 275 259 ataacCGG Gal4-class responsive Aaaaatatg motifs transcription (SEQID factor 349) involvedin regulationof glucose transporters F$ICGG Inverted F$TEA1.01 Ty1 269 249 aggtgggGT CGGtriplets enhancer AAtaaccgg spaced activator, aaa preferentially zinc (SEQID by10bp cluster 350) DNA- binding protein F$RDNA RDNA F$REB1.02 rDNA 262 250 + ttaTTACccc binding enhancer acc factor bindingprotein (SEQID 1,termination 351) factorforRNA polymeraseI and transcription factorforRNA polymeraseII F$YMCM Yeastcell F$MCM1.02 Yeastfactor 258 250 cTTCCaggt cycleand MCM1 ggggtaat metabolic cooperating (SEQID regulator withMATalpha 352) factors F$YMIG Yeast F$MIG1.01 MIG1,zinc 260 244 cacttccagg GC- fingerprotein tGGGGtaat Box mediates (SEQID Proteins glucose 353) repression F$YADR Yeast F$ADR1.01 Alcohol 260 242 + taCCCCacc metabolic Dehydrogenase (SEQID regulator Regulator, 354) carbon source- responsive zinc-finger transcription factor F$MGCM Monomeric F$RGT1.02 Glucose- 239 223 atcccCGG Gal4-class responsive Aaaattctg motifs transcription (SEQID factor 355) involvedin regulationof glucose transporters F$YMIG YeastGC- F$MIG1.01 MIG1,zinc 239 221 + cagaattttc Box fingerprotein cGGGGatta Proteins mediates (SEQID glucose 356) repression F$ICGG Inverted F$TEA1.01 Ty1enhancer 232 224 attatccGTA CGGtriplets activator, Atccccggaaa spaced zinc (SEQID preferentially cluster 357) by10bp DNA- binding protein F$ARPU Regulatorof F$PPR1.01 Pyrimidine 231 223 atccgtaatcc pyrimidine pathway CCGGaa andpurine regulator1 (SEQID utilization 358) pathway F$PDRE Pleiotropic F$PDRE.01 Pleiotropic 232 216 TCCCcggaa drug drug (SEQID resistance resistance 359) responsive responsive elements element (yeast) F$ARPU Regulatorof F$PPR1.01 Pyrimidine 231 215 + tccggggatta pyrimidine pathway CGGAta andpurine regulator1 (SEQID utilization 360) pathway F$PDRE Pleiotropic F$PDRE.01 Pleiotropic 230 216 + TCCGgggat drug drug (SEQID resistance resistance 361) responsive responsive elements element (yeast) F$CYTO Activatorof F$HAP1.01 HAP1, 233 213 + ccggggatT cytochrome S.cerevisiae ACGgat C memberof (SEQID GALfamily, 362) regulates heme dependent cytochrome expression F$YQA1 Neurospora F$QA1F.01 qa-1F, 228 208 + ggggattacg crassaQA1 requiredfor gaTAATac gene quinicacid ggt activator inductionof (SEQID transcription 363) intheqa gene cluster F$MGCM Monomeric F$RGT1.02 Glucose- 225 209 + gattaCGG Gal4-class responsive Ataatacgg motifs transcription (SEQID factor 364) involvedin regulation of glucose transporters F$CYTO Activatorof F$HAP1.01 HAP1, 221 207 + acggataaT cytochrome S.cerevisiae ACGgtg C memberof (SEQID GALfamily, 365) regulates heme dependent cytochrome expression F$BZIP Fungalbasic F$CIN5.01 bZIP 208 188 + tggtctggatta leucine transcriptional atTAATacg zipperfamily factorofthe (SEQID yAP-1family 366) thatmediates pleiotropic drug resistanceand salttolerance F$BZIP Fungalbasic F$CIN5.01 bZIP 203 189 cttggcgtatta leucine transcriptional atTAATcca zipperfamily factorofthe (SEQID yAP-1family 367) thatmediates pleiotropic drug resistanceand salttolerance F$HOMD Homeodomain- F$YOX1.02 Yeast 202 188 gtattaATTA containing homeobox1, atcca transcriptional homeodomain- (SEQID regulators containing 368) transcriptional repressor F$HOMD Homeodomain- F$YOX1.02 Yeast 203 183 + ggattaATT containing homeobox1, Aatacg transcriptional homeodomain- (SEQID regulators containing 369) transcriptional repressor F$YABF YeastABF F$ABF1.04 ARS 202 184 + ggATTAatt factors (autonomously aatacgccaa replicating (SEQID sequence)- 370) bindingfactorI F$PHRR pH F$RIM101.01 Transcriptional 192 176 + atacGCCA responsive repressor agtcttaca regulators involvedin (SEQID responseto 371) pHandincell wall construction F$PRES Pheromone F$STE12.01 Transcription 175 163 gactgcAAC response factor Aaaa elements activatedbya (SEQID MAPkinase 372) signaling cascade, activates genes involvedin matingor pseudohyphal/ invasive growth pathways F$FKHD Fungalfork F$FKH2.01 Forkhead 148 132 + gcaataaTA head transcription AAcaagat transcription factorFkh2 (SEQID factors 373) F$YCAT Yeast F$HAP234.01 Yeastfactor 124 112 ctaatCCAAt CCAAT complex aaa binding HAP2/3/5, (SEQID factors homologto 374) vertebrateNF- Y/CP1/CBF F$YORE Yeastoleate F$ORE.01 Oleate 120 96 CGGGgtca response response agctgcaact elements element, aatccaa bindingmotif (SEQID ofOaf1 375) homodimers orOaf1/Pip2 heterodimers F$AAAU A.nidulans F$FACBCB.01 FACB, 109 93 + GCAGcttga activator activatorof ccccgcca of acetate (SEQID acetate utilization 376) utilization geneswitha genes GAL4-type Zn(II)2Cys6 zincbinuclear cluster F$YMIG YeastGC- F$MIG3.01 Zincfinger 104 86 ctagctatggc BoxProteins transcriptional GGGGtcaa repressor (SEQID MIG3 377) F$YRAP Yeast F$RAP1.06 RAP1(TUF1), 74 52 tgcatcatcta activatorof activatoror aCACCcat glycolyse repressor agca genes/ dependingon (SEQID repressorof context 378) matingtypeI F$PHD1 Pseudohyphal F$PHD1.03 Transcription 60 48 caaGTGCa determinant factorinvolved tcatc 1 inregulationof (SEQID filamentous 379) growth O$VTBP Vertebrate O$VTATA.01 Cellularand 31 15 + gagtaTAAA TATA viralTATAbox agatcctt binding elements (SEQID proteinfactor 380) F$MGCM Monomeric F$LYS14.01 Transcriptional 17 1 aagggtGG Gal4-class activator AAttttaag motifs involvedin (SEQID regulationof 381) genesofthe lysine biosynthesis pathway

    TABLE-US-00005 TABLE 2 Affected TFBS of the pG1 promoter sequence in the deletion mutants pG1-1 to 12. Sequence analysis was done using MatInspector from Genomatix. Glucose- and carbon- related TFBS which were selected for deletion are shown in bold and the corresponding ID (1-12) and deleted positions are stated in column 1 and 2. Matrix Detailed Family Deletion Position Family Information Matrix Detailed Matrix Information 1 785 to 777 F$YADR Yeast metabolic F$ADR1.01 Alcohol Dehydrogenase regulator Regulator, carbon source- responsive zinc-finger transcription factor 2 628 to 612 F$PHD1 Pseudohyphal F$PHD1.03 Transcription factor involved in determinant 1 regulation of filamentous growth F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$CSRE Carbon source- F$CSRE.01 Carbon source-responsive responsive element (yeast) elements 3 586 to 568 F$RDNA RDNA binding F$REB1.02 rDNA enhancer binding protein factor 1, termination factor for RNA polymerase I and transcription factor for RNA polymerase II F$YMIG Yeast GC-Box F$MIG1.02 MIG1, zinc finger protein Proteins mediates glucose repression F$YSTR Yeast stress F$MSN2.01 Transcriptional activator for response elements genes in multistress response F$BZIP Fungal basic F$YAP1.02 Yeast activator protein of the leucine zipper basic leucine zipper (bZIP) family family F$TALE Fungal TALE F$TOS8.01 Homeodomain-containing homeodomain transcription factor class 4 553 to 535 F$YMIG Yeast GC-Box F$MIG1.01 MIG1, zinc finger protein Proteins mediates glucose repression F$YRAP Yeast activator of F$RAP1.06 RAP1 (TUF1), activator or glycolyse genes/ repressor depending on context repressor of mating type I F$IRTF Iron-responsive F$AFT2.01 Activator of Fe (iron) transcriptional transcription 2, iron-regulated activators transcriptional activator 5 442 to 426 F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$GATA Fungal GATA F$GZF3.01 GATA zinc finger protein Gzf3 binding factors F$PHD1 Pseudoh yphal F$PHD1.01 Transcription factor involved in determinant 1 regulation of filamentous growth 6 337 to 316 F$ASG1 Activator of stress F$ASG1.01 Fungal zinc cluster transcription genes factor Asg1 F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$RDR1 Repressor of Drug F$RDR1.01 Repressor of Drug Resistance 1 Resistance 1 (transcriptional repressor involved in the control of multidrug resistance F$GATA Fungal GATA F$GATA.01 GATA binding factor (yeast) binding factors F$PRES Pheromone F$STE12.01 Transcription factor activated by response elements a MAP kinase signaling cascade, activates genes involved in mating or pseudohyphal/invasive growth pathways 7 310 to 299 F$GATA Fungal GATA F$GAT1.01 GATA-type Zn finger protein binding factors Gat1 F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters O$MTEN Core promoter O$DMTE.01 Drosophila motif ten element motif ten elements F$YORE Yeast oleate F$OAF1.01 Oleate-activated transcription response elements factor, acts alone and as a heterodimer with Pip2p F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$YGAL Yeast GAL4 factor F$GAL4.01 GAL4 transcriptional activator in response to galactose induction 8 293 to 285 F$CSRE Carbon source- F$SIP4.01 Zinc cluster transcriptional responsive activator, binds to the carbon elements source-responsive element (CSRE) of gluconeogenic genes F$RDR1 Repressor of Drug F$RDR1.01 Repressor of Drug Resistance 1 Resistance 1 (transcriptional repressor involved in the control of multidrug resistance F$YGAL Yeast GAL4 factor F$LAC9.01 LAC9 binding site, homologous to GAL4 of Saccharomyces cerevisiae F$FBAS Fungi branched F$LEU3.02 LEU3, S. cerevisiae, zinc cluster amino acid protein biosynthesis 9 275 to 261 F$CSRE Carbon source- F$CSRE.01 Carbon source-responsive responsive element (yeast) elements F$MGCM Monomeric Gal4- F$RGT1.01 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$ICGG Inverted CGG F$TEA1.01 Ty1 enhancer activator, zinc triplets spaced cluster DNA-binding protein preferentially by 10 bp F$RDNA RDNA binding F$REB1.02 rDNA enhancer binding protein factor 1, termination factor for RNA polymerase I and transcription factor for RNA polymerase II F$YMCM Yeast cell cycle F$MCM1.02 Yeast factor MCM1 cooperating and metabolic with MATalpha factors regulator 10 258 to 242 F$YMIG Yeast GC-Box F$MIG1.01 MIG1, zinc finger protein Proteins mediates glucose repression F$YADR Yeast metabolic F$ADR1.01 Alcohol Dehydrogenase regulator Regulator, carbon source- responsive zinc-finger transcription factor 11 239 to 221 F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$YMIG Yeast GC-Box F$MIG1.01 MIG1, zinc finger protein Proteins mediates glucose repression F$ICGG Inverted CGG F$TEA1.01 Ty1 enhancer activator, zinc triplets spaced cluster DNA-binding protein preferentially by 10 bp F$ARPU Regulator of F$PPR1.01 Pyrimidine pathway regulator 1 pyrimidine and purine utilization pathway F$PDRE Pleiotropic drug F$PDRE.01 Pleiotropic drug resistance resistance responsive element (yeast) responsive elements F$ARPU Regulator of F$PPR1.01 Pyrimidine pathway regulator 1 pyrimidine and purine utilization pathway F$PDRE Pleiotropic drug F$PDRE.01 Pleiotropic drug resistance resistance responsive element (yeast) responsive elements F$CYTO Activator of F$HAP1.01 HAP1, S. cerevisiae member of cytochrome C GAL family, regulates heme dependent cytochrome expression F$YQA1 Neurospora crassa F$QA1F.01 qa-1F, required for quinic acid QA1 gene activator induction of transcription in the qa gene cluster 12 220 to 209 F$MGCM Monomeric Gal4- F$RGT1.02 Glucose-responsive class motifs transcription factor involved in regulation of glucose transporters F$CYTO Activator of F$HAP1.01 HAP1, S. cerevisiae member of cytochrome C GAL family, regulates heme dependent cytochrome expression

    TABLE-US-00006 TABLE 3 Positions and TFBS deletions of pG1 TFBS deletion variants Targeted and affected TFBS in pG1 TFBS deletion variants (pG1-1 to 12) are listed. Targeted carbon source-related TFBS are shown in bold. Detailed information for all TFBS and for the deleted TFBS is provided in Table 1 and Table 2, respectively. pG1- Position TFBS Deletions (TF Matrices) 1 785 to 777 F$ADR1.01 2 628 to 612 F$PHD1.03, F$RGT1.02, F$CSRE.01 3 586 to 568 F$REB1.02, F$MIG1.02, F$MSN2.01, F$YAP1.02, F$TOS8.01 4 553 to 535 F$MIG1.01, F$RAP1.06, F$AFT2.01 5 442 to 426 F$RGT1.02, F$GZF3.01, F$PHD1.01 6 337 to 316 F$ASG1.01, F$RGT1.02, F$RGT1.02, F$RDR1.01, F$GATA.01 7 310 to 299 F$STE12.01, F$GAT1.01, F$RGT1.02, O$DMTE.01, F$OAF1.01 8 293 to 285 F$OAF1.01, F$RGT1.02, F$GAL4.01, F$SIP4.01, F$RDR1.01, F$LAC9.01 9 275 to 261 F$LEU3.02, F$CSRE.01, F$RGT1.01, F$TEA1.01 10 258 to 242 F$REB1.02, F$MCM1.02, F$MIG1.01, F$ADR1.01 11 239 to 221 F$RGT1.02, F$MIG1.01, F$TEA1.01, F$PPR1.01, F$PDRE.01, F$PPR1.01, F$PDRE.01 12 220 to 209 F$HAP1.01, F$QA1F.01, F$RGT1.02, F$HAP1.01

    TABLE-US-00007 TABLE4 Primersequences # Name Product Sequence(SEQIDNO.) T.sub.M 1 pG1_fw pG1 GATAGGGCCCCAAACATTTGCTCCCCCTAGTCTC 71 (SEQID382) 2 pG1back pG1/pG1-s GATACCTGCAGGAAGGGTGGAATTTTAAGGATCTTTTAT 70 (SEQID383) 3 pG1-858_fw pG1-s858 GATAGGGCCCGGAATCTGTATTGTTAGAAAGAACGAGAG 71 (SEQID384) 4 pG1-663_fw pG1-s663 GATAGGGCCCCCATATTCAGTAGGTGTTTCTTGCAC 69 (SEQID385) 5 pG1-492_fw pG1-s492 GATAGGGCCCCTGCAGATAGACTTCAAGATCTCAGG 69 (SEQID386) 6 pG1-371_fw pG1-s371 GATAGGGCCCGACCCCGTTTTCGTGACAAATT 70 (SEQID387) 7 pG1-328_fw pG1-s328 GATAGGGCCCCCGGATAAGAGAATTTTGTTTGATTAT 70 (SEQID388) 8 pG1-283_fw pG1-s283 GATAGGGCCCGCCTGCTCCATATTTTTCCGG 71 (SEQID389) 9 pG1-211_fw pG1-s211 GATAGGGCCCCGGTGGTCTGGATTAATTAATACG 68 (SEQID390) 10 pG1-66_fw pG1-s66 GATAGGGCCCGTGTTAGATGATGCACTTGGATGC 68 (SEQID391) 11 pG1-l_fw pG1-1 GAAAACAGCTTGAACTTTCAAAGGTTCTGTTGCTATACAC 69 GAAC (SEQID392) 12 pG1-l_bw pG1-1 GTTCGTGTATAGCAACAGAACCTTTGAAAGTTCAAGCTG 68 TTTTCACACGGCC (SEQID393) 13 pG1-2_fw pG1-2 GTAGGTGTTTCTTGCACTTTTGCATGCCAATAGCGCGTT 67 TCATATGC (SEQID394) 14 pG1-2_bw pG1-2 GCATATGAAACGCGCTATTGGCATGCAAAAGTGCAAGAA 68 ACACCTAC (SEQID395) 15 pG1-3_fw pG1-3 CGCGTTTCATATGCGCTTGCGCAAAATGCCTGTAAGATT 68 TG (SEQID396) 16 pG1-3bw pG1-3 CAAATCTTACAGGCATTTTGCGCAAGCGCATATGAAACG 65 CG (SEQID397) 17 pG1-4_fw pG1-4 GTCAAGCGCAAAATGCCTGGAGCCGTTAGCTGAAGTAC 65 AACAG (SEQID398) 18 pG1-4_bw pG1-4 CTGTTGTACTTCAGCTAACGGCTCCAGGCATTTTGCGCT 67 TGAC (SEQID399) 19 pG1-5_fw pG1-5 GGGATTCCCACTATTTGGTATTCTGAGCATCAAAACTCTA 67 ATCTAAAACCTGAATCTC (SEQID400) 20 pG1-5_bw pG1-5 GAGATTCAGGTTTTAGATTAGAGTTTTGATGCTCAGAATA 68 CCAAATAGTGGGAATCCC (SEQID401) 21 pG1-6_fw pG1-6 GTTTTCGTGACAAATTAATTTCCAACGTTTTGTTTGATTAT 65 CCGTTCGG (SEQID402) 22 PG1-6_bw pG1-6 CCGAACGGATAATCAAACAAAACGTTGGAAATTAATTTGT 68 CACGAAAAC (SEQID403) 23 pG1-7_fw pG1-7 CCGGATAAGAGAATTTTGTTCGGATAAATGGACGCCTG 67 (SEQID404) 24 pG1-7_bw pG1-7 CAGGCGTCCATTTATCCGAACAAAATTCTCTTATCCGGA 68 CAAGACC (SEQID405) 25 pG1-8_fw pG1-8 GAATTTTGTTTGATTATCCGTTCGGCGCCTGCTCCATATT 70 TTTCCG (SEQID406) 26 pG1-8_bw pG1-8 CGGAAAAATATGGAGCAGGCGCCGAACGGATAATCAAA 67 CAAAATTC (SEQID407) 27 pG1-9_fw pG1-9 CGGATAAATGGACGCCTGCTCATTACCCCACCTGGAAGT 68 GCC (SEQID408) 28 PG1-9_bw pG1-9 GGCACTTCCAGGTGGGGTAATGAGCAGGCGTCCATTTA 70 TCCG (SEQID409) 29 PG1-10_fw pG1-10 GCCTGCTCCATATTTTTCCGGTTATCCCAGAATTTTCCG 53 (SEQID410) 30 pG1-l0_bw pG1-10 CGGAAAATTCTGGGATAACCGGAAAAATATGGAGCAGGC 69 (SEQID411) 31 PG1-11_fw pG1-11 TATTACCCCACCTGGAAGTGCCCGGATAATACGGTGGTC 67 TGGATTAAT (SEQID412) 32 PG1-11_bw pG1-11 ATTAATCCAGACCACCGTATTATCCGGGCACTTCCAGGT 68 GGGGTAATA (SEQID413) 33 PG1-12_fw pG1-12 CCAGAATTTTCGGGGGATTATGGTCTGGATTAATTAATAC 68 GCCAAGTC (SEQID414) 34 PG1-12_bw pG1-12 GACTTGGCGTATTAATTAATCCAGACCATAATCCCCGGA 65 AAATTCTGG (SEQID415) 35 pG1- pG1-T14 CAAAACTCTAATCTAAAACCTGAATCTCCGCGATGACCC 67 ATAT14_fw CGTTTTCGTGAC (SEQID416) 36 pG1- PG1-T14 GTCACGAAAACGGGGTCATCGCGGAGATTCAGGTTTTA 69 ATAT14_bw GATTAGAGTTTTG (SEQID417) 37 pG1- pG1-T18 CCTGAATCTCCGCTTTTTTTTTTTTTTTTTTGATGACCCCG 70 TAT18_fw (SEQID418) 38 pG1- PG1-T18 CGGGGTCATCAAAAAAAAAAAAAAAAAAGCGGAGATTCAGG 70 TAT18_bw (SEQID419) 39 pG1- pG1-T20 CCTGAATCTCCGCTTTTTTTTTTTTTTTTTTTTGATGACCC 70 TAT20_fw CG (SEQID420) 40 pG1- pG1-T20 CGGGGTCATCAAAAAAAAAAAAAAAAAAAAGCGGAGATT 70 TAT20_bw CAGG (SEQID421) 41 pG1- pG1-T22 CCTGAATCTCCGCTTTTTTTTTTTTTTTTTTTTTTGATGAC 70 TAT22_fw CCCG (SEQID422) 42 pG1- pG1-T22 CGGGGTCATCAAAAAAAAAAAAAAAAAAAAAAGCGGAGATT 70 TAT22_bw CAGG (SEQID423) 43 pG1-d- pG1- GATACTGCAGCTCAGGGATTCCCACTATTTGGTATTC 68 472_fw d1240/- (SEQID424) d1427 44 pG1-d- pG1- GATAGATCTCGTATTAATTAATCCAGACCACCG 64 188_bw d1240 (SEQID425) 45 pG1-d-1_bw pG1- GATAGATCTAAGGGTGGAATTTTAAGGATCTTTTAT 64 d1427 (SEQID426)

    TABLE-US-00008 TABLE 5 Fed batch cultivation of pG1 (herein referred to as pG1 #8) and pG1-x variants (herein also referred to as pG1-variants) expressing eGFP Relative eGFP fluorescence is shown for the batch end and for the fed batch end. The time points were set to 0 at the batch end. A clone expressing eGFP under control of pG1 (#8) was compared to clones expressing under control of a pG1 deletion (pG1-2), a TAT14 mutation (pG1-T16), and a duplication (pG1-D1240) variant. The biomass concentrations (YDM) in the batch and fed batch were as expected. Batch End Fed Batch End t YDM relative eGFP t YDM relative eGFP Clone [h] [g/L] fluorescence % [h] [g/L] fluorescence % pG1 #8 5.3 9.8 44 +/ 1 100 19.5 118.6 2005 +/ 36 100 PG1-2 #3 4.6 11.0 51 +/ 1 116 19.5 110.6 1819 +/ 43 91 pG1-T16 #3 3.0 14.2 70 +/ 1 160 19.5 113.1 2383 +/ 24 119 pG1-D1240 #3 3.0 14.9 62 +/ 1 141 19.5 113.3 2948 +/ 33 147

    TABLE-US-00009 TABLE 6 Promoter strength compared to pG1 and promoter induction ratio of pG1 variants_from a comparative deep-well screening. The expression strength of the pG1-x variants (induced) is related to the eGFP expression level obtained with the original pG1 promoter The induction ratio is calculated from the GFP level in the induced and repressed state. pG1 pG1- pG1- pG1- pG1- pG1- pG1- pG1- (P.sub.GTH1) 8 9 T16 T18 T20 D1240 D1427 Repression 6.1 5.8 9.4 5.4 6.7 5.3 5.3 5.5 Induction 15.3 11.0 21.4 17.0 20.8 16.2 21.6 22.9 Expression level 1.00 0.72 1.40 1.11 1.36 1.06 1.41 1.49 Induction ratio 2.52 1.89 2.27 3.12 3.10 3.03 4.05 4.18