Multi-copy gene protein expression system

11851666 · 2023-12-26

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention belongs to the field of biotechnology, specifically to the field of recombinant protein expression. The present invention focuses on two problems commonly encountered during recombinant protein expression, low quantity of protein expression and genetic instability of cell lines used for recombinant protein expression. The basic principle of the present invention is to introduce several expression cassettes into a cell which expression cassettes all code for the same mature recombinant protein of interest, but which expression cassettes have different nucleotide sequences. Expression cassette means a polynucleotide sequence which comprises at least a promoter sequence, a start codon, a polynucleotide sequence coding for a protein which is intended to be recombinant expressed (POI), a stop codon and a terminator.

Claims

1. A host cell comprising three or more different types of expression cassettes, each expression cassette coding for the same Protein Of Interest (POI) with identical mature amino acid sequence, and each type of expression cassette at least is comprising a promoter sequence, a polynucleotide sequence of the coding sequence of the POI, a terminator sequence, and optionally a signal sequence, wherein said expression cassettes differ in that they comprise (A) (Aa) different promoter sequences, (Ab) different polynucleotide sequences coding for the identical mature amino acid sequence of the POI due to the use of degenerated genetic code, and optionally (Ac) different terminator sequences, and/or (Ad) different signal sequences, if present, or wherein said expression cassettes differ in that they comprise (B) (Ba) the same promoter sequences, (Bb) different polynucleotide sequences coding for the identical mature amino acid sequence of the POI due to the use of degenerated genetic code, and optionally (Bc) different terminator sequences, and/or (Bd) different signal sequences, if present, wherein at least one expression cassette codes for two or more POI with identical mature amino acid sequence, wherein between the coding sequences of said two or more POI is respectively located an internal ribosomal entry site (IRES) sequence.

2. The host cell according to claim 1, wherein in point (Ab) of alternative (A) of claim 1 said different polynucleotide sequences of the coding sequence of the POI are coded by a degenerated genetic code, which degenerated genetic code results in at least 50% of the maximum theoretical polynucleotide sequence difference possible for that particular POI coding polynucleotide sequence, in order to get an identical mature amino acid sequence of said particular POI, or wherein in point (Bb) of alternative (B) said different polynucleotide sequences of the coding sequence of the POI are coded by a degenerated genetic code, which degenerated genetic code results in at least 50% of the maximum theoretical polynucleotide sequence difference possible for that particular POI coding polynucleotide sequence, in order to get an identical mature amino acid sequence of said particular POI.

3. The host cell according to claim 1, wherein in alternative (A) said promoter, said terminator and/or said signal sequences, if present, and in alternative (B) said terminator and/or said signal sequence, if present, respectively differ between the used different expression cassettes by at least 20%, regarding their nucleotide sequence.

4. The host cell according to claim 1, wherein said POI is heterologous to said host cell.

5. The host cell according to claim 1, wherein said different polynucleotide sequences of the coding sequences of the POI at least have a length of 30 nucleotides.

6. The host cell according to claim 1, wherein said host cell is (i) an eukaryotic cell, selected from (a) filamentous fungal cells; (b) yeast cells; (c) mammalian cells; (d) human cells; (e) insect cells; or (ii) a prokaryotic cell.

7. A method of generating a host cell as defined in claim 1, comprising the step of transfecting said host cell with at least three different nucleic acid sequences, wherein each nucleic acid sequence comprises at least one different expression cassette coding for the same mature amino acid sequence of said POI.

8. A method of generating a host cell as defined in claim 1, comprising the step of transfecting said host cell with at least one nucleic acid sequence, wherein said nucleic acid sequence comprises at least three different types of expression cassettes, and each of said expression cassettes is coding for the same mature amino acid sequence of said POI.

9. A nucleic acid comprising at least three different types of expression cassettes, each expression cassette coding for the same Protein Of Interest (POI) with identical mature amino acid sequence, and each type of expression cassette at least is comprising a promoter sequence, a polynucleotide sequence of the coding sequence of the POI, a terminator sequence, and optionally a signal sequence, wherein said expression cassettes differ in that they comprise (A) (Aa) different promoter sequences, (Ab) different polynucleotide sequences coding for the identical mature amino acid sequence of the POI due to the use of degenerated genetic code, and optionally (Ac) different terminator sequences, and/or (Ad) different signal sequences, if present, or wherein said expression cassettes differ in that they comprise (B) (Ba) the same promoter sequences, (Bb) different polynucleotide sequences coding for the identical mature amino acid sequence of the POI due to the use of degenerated genetic code, and optionally (Bc) different terminator sequences, and/or (Bd) different signal sequences, if present, wherein at least one expression cassette codes for two or more POI with identical mature amino acid sequence, wherein between the coding sequences of said two or more POI is respectively located an internal ribosomal entry site (IBES) sequence.

10. A vector comprising at least three different types of expression cassettes, each expression cassette coding for the same Protein Of Interest (POI) with identical mature amino acid sequence, and each type of expression cassette at least is comprising a promoter sequence, a polynucleotide sequence of the coding sequence of the POI, a terminator sequence, and optionally a signal sequence, wherein said expression cassettes differ in that they comprise (A) (Aa) different promoter sequences, (Ab) different polynucleotide sequences coding for the identical mature amino acid sequence of the POI due to the use of degenerated genetic code, and optionally (Ac) different terminator sequences, and/or (Ad) different signal sequences, if present, or wherein said expression cassettes differ in that they comprise (B) (Ba) the same promoter sequences, (Bb) different polynucleotide sequences coding for the identical mature amino acid sequence of the POI due to the use of degenerated genetic code, and optionally (Bc) different terminator sequences, and/or (Bd) different signal sequences, if present, wherein at least one expression cassette codes for two or more POI with identical mature amino acid sequence, wherein between the coding sequences of said two or more POI is respectively located an internal ribosomal entry site (IBES) sequence.

11. A kit comprising the nucleic acid as defined in claim 9 and an instruction manual.

12. A process for the manufacture of a POI, comprising (i) a step of generating a host cell as defined in claim 1, comprising the step of transfecting said host cell (A) with at least three different nucleic acid sequences, wherein each nucleic acid sequence comprises at least one different expression cassette coding for the same mature amino acid sequence of said POI, or (B) with at least one nucleic acid sequence, wherein said nucleic acid sequence comprises at least three different types of expression cassettes, and each of said expression cassettes is coding for of the same mature amino acid sequence of said POI, and (ii) a step of obtaining the POI.

13. The process according to claim 12, wherein said POI is a single chain protein or originates from a precursor of a single chain polypeptide.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) FIG. 1: Vector maps of the vectors used for transfection of yeast cells (Pichia Pastoris), wherein the vector comprise 1, 2, 3 or 4 expression cassettes for the POI and wherein within one vector for each POI expression cassette there are always used different sequences for the promoter sequence, the signal sequence, the GOI sequence (different coding sequences but which result, due to the degenerated genetic code, always in the same amino acid sequence of the POI; GOI termed variant 1 to variant 4, which are abbreviated var1 to var4), and the terminator sequence. Every yeast vector comprises as the vector back-bone a Zeocin antibiotic resistance expression cassette comprising the hybrid-promoter working in yeast as well as in E. coli (pILV5 combined with pEM72), followed by the coding sequence of the antibiotic resistance of Zeocin (ZeoR), followed by the Alkohol Oxidase terminator (AODTT), followed by the origin of replication of pUC (pUC ori). Only in the case of Y392_1 xGOI the pUC ori is followed by the lectin-like protein terminator sequence (LLPTT).

(2) FIG. 1 A:

(3) Yeast vector Y391_1 xGOI in addition to vector back bone contains the following expression cassette for a GOI, which in this case is a single-chain antibody (scFV): Lectin-like protein promoter (pLLP), as a gene of interest (GOI) a single-chain antibody (scFv_var4), alcohol dehydrogenase terminator sequence (ADHTT)

(4) FIG. 1B:

(5) Yeast vector Y393_2 xGOI in addition to vector back bone contains the following expression cassette for a GOI, which in both cases codes for the same amino acid sequence of a single-chain antibody (scFV): Glyceraldehyde-3-phosphate dehydrogenase promoter (pGAP), Mating factor alpha 2-signal sequence (MFa2SS), as a gene of interest variation 1 of the same single-chain antibody (scFv_var 1), lectin-like protein terminator sequence (LLPTT) Lectin-line protein promoter (pLLP), as a gene of interest variation 2 of the same single-chain antibody (scFv_var 4), alcohol dehydrogenase terminator sequence (ADHTT)

(6) FIG. 1C:

(7) Yeast vector Y394_3 xGOI in addition to vector back bone contains the following expression cassette for a GOI, which in in all three case codes for the same amino acid sequence of a single-chain antibody (scFV): Alcohol dehydrogenase promoter (pADH), human serum albumin signal sequence (HSASS), single-chain antibody, variant 2 (scFv_var2), cytochrome c1 terminator sequence (cyc1TT) Glyceraldehyde-3-phosphate dehydrogenase promoter (pGAP), mating factor alpha 2-signal sequence (MFa2SS), single-chain antibody, variant 1 (scFv_var1), lectin-like protein terminator sequence (LLPTT) Lectin-like protein promoter (pLLP), lectin-like protein signal sequence (LLPSS), single-chain antibody, variant 4 (scFv_var4), alcohol dehydrogenase terminator sequence (ADHTT)

(8) FIG. 1D:

(9) Yeast vector Y395_4 xGOI in addition to vector back bone contains the following expression cassette for a GOI, which in in all four case codes for the same amino acid sequence of a single-chain antibody (scFV): Alcohol dehydrogenase promoter (pADH), human serum albumin signal sequence (HSASS), single-chain antibody, variant 2 (scFv_var2), cytochrome c1 terminator sequence (cyc1TT) Glyceraldehyde-3-phosphate dehydrogenase promoter (pGAP), mating factor alpha 2-signal sequence (MFa2SS), single-chain antibody, variant 1 (scFv_var1), lectin-like protein terminator sequence (LLPTT) Lectin-like protein promoter (pLLP), lectin-like protein signal sequence (LLPSS) single-chain antibody, variant 4 (scFv_var4), alcohol dehydrogenase terminator sequence (ADHTT) Transelongationfactor-promoter (pTEF), mating factor alpha 4-signal sequence (MFa4SS), single-chain antibody, variant 3 (scFv_var3), alcohol oxidase terminator sequence (AOXTT)

(10) FIG. 2:

(11) Sequences of the expression vectors from FIG. 1.

(12) A) Yeast vector Y391_1 xGOI (SEQ-ID NO.: 1)

(13) B) Yeast vector Y393_2 xGOI (SEQ-ID NO.: 2)

(14) C) Yeast vector Y394_3 xGOI (SEQ-ID NO.: 3)

(15) D) Yeast vector Y395_4 xGOI (SEQ-ID NO.: 4)

(16) FIG. 3:

(17) Vector maps of the vectors used for transfection of mammalian cells (CHO cells), each vector comprising a single expression cassette, wherein the expression cassettes comprise as a GOI the sequence of a fusion protein consisting of a constant region of an antibody fused to the ligand-binding domain of a TNF-receptor 2. Each vector furthermore comprises the metabolic selection marker dihydrofolate reductase (DHFR), an enzyme which for example allows CHO (chinese hamster ovary) cells to grow in cell culture medium lacking thymidine, thereby allowing to select CHO (or other cells) transfected with DHFR-comprising vectors from non-transfected cells. Furthermore each vector comprises the sequence of the neomycin resistance gene (NeoR), which allows to select transformed cells by using the antibiotic neomycin. Furthermore each vector comprises another antibiotic resistance gene selected from Ampicillin Resistance (AmpR), Spectromycin Resistance (SpectR) and Chloramphenicol Resistance (CmR). Each vector comprises a different promoter, a different signal sequence and a different terminator sequence within the expression cassette for the GOI.

(18) FIG. 3 A depicts the vector pNT-MG001. Details of the vector elements are shown in Table 7.

(19) FIG. 3 B depicts the vector pNT-MG002. Details of the vector elements are shown in Table 7.

(20) FIG. 3 C depicts the vector pNT-MG003. Details of the vector elements are shown in Table 7.

(21) FIG. 3 D depicts the vector pNT-MG004. Details of the vector elements are shown in Table 7.

(22) FIG. 4:

(23) Sequences of the expression vectors from FIG. 3.

(24) A) Mammalian vector pNT-MG001 (SEQ-ID NO.: 5)

(25) B) Mammalian vector pNT-MG002 (SEQ-ID NO.: 6)

(26) C) Mammalian vector pNT-MG003 (SEQ-ID NO.: 7)

(27) D) Mammalian vector pNT-MG004 (SEQ-ID NO.: 8)

EXAMPLES AND METHODS

(28) Methods for Pichia Pastoris Cells

(29) Generation of yeast vectors: The set of vectors contains one vector with one expression cassette, one vector with two different expression cassettes, one vector with three different expression cassettes and one vector with four different expressions cassettes. In the vector set each of the four different expression cassettes has a different nucleotide sequence of the GOI but the resulting POI has an identical mature amino acid sequence, and each of the four different expression cassettes comprises a different promoter nucleotide sequence, a different signal sequence, and a different terminator nucleotide sequence. FIG. 1A to 1D show the vector maps for these vectors, whereas FIG. 2A to 2D and SEQ-ID-NO. 1, 2, 3, and 4 show the complete nucleotide sequences of these vectors.

(30) The four different nucleotide sequence of the POI are designed by use of the degenerated genetic code. The POI is a single chain antibody (scFV, ESBA1845=scFv=single chain variable fragment=artificial antibody fragment comprising a single polypeptide chain including its antigen binding domain). There are used 4 different variants of said scFv termed scFv_var1, scFv_var2, scFv_var3, and scFv_var4, which all code for an identical amino acid sequence but have different nucleotide sequences due to the use of the degenerated genetic code. The promoter sequences used are lectin-like protein promoter from Pichia Pastoris (pLLP), the GAP-promoter (pGAP), the ADH-promoter (pADH), and the TEF-promoter (pTEF). The secretion signal sequences used for the POI are the signal sequence of lectin-like protein from P. pastoris (LLPSS), the signal sequence of mating factor alpha-4 from S. cerevisiae (MFa4SS), the signal sequence of human serum albumin ((HSASS), and the signal sequence of mating factor alpha-2 of S. cerevisiae (MFa2SS). The termination sequences are the Alcohol dehydrogenase (ADHTT), the termination sequence of the lectin-like protein from Pichia Pastoris (LLPTT), the termination sequence of cytochrome c1 terminator (cyc1TT), and the termination sequence of Alcohol oxidase (AOXTT). The yeast cell selection marker used in all vectors is Zeocin-r, expressed by use of the ILV5-promoter, the EM72-signal sequence and the AOD terminator. The pUC ori is used in all yeast expression vectors.

(31) Generation of Vectors

(32) The four different expression vectors are designed as depicted in the vector maps of FIG. 1A to 1D, having the vector sequences as depicted in FIG. 2A to 2D and SEQ ID NOs: 1, 2, 3, and 4. All vectors are chemically synthesized using the DNA2.0 (now ATUM) synthesis service from (ATUM, Newark, CA, USA).

(33) Transfection of P. pastoris

(34) The four different vectors are transfected individually into Pichia pastoris yeast cell SSS1. This yeast cells is described in patent application WO2016139279A1 and is genetically identical to Pichia pastoris CBS 7435 and identical to NRRL Y-11430, except that the ssn6-like gene is disrupted at position 807,480 of chromosome 1 of the P. pastoris CBS 7435 genome by insertion of the expression cassette as described in WO 2016/139270 A1. The complete sequence of CBS 7435 is disclosed in Journal of Biotechnology, published in 2011, Vol. 154, page 312-320 year 2011. The nucleotide sequences are published in GenBank under the following Accession Numbers: Chromosome 1: FR839628.1; Chromosome 2: FR839629.1; Chromosome 3: FR839630.1; Chromosome 4: FR839631.1; Mitochondrion: FR839632.1

(35) Expression of POI in 48-Deep Well Plates, Semi Quantitative Measurement of POI

(36) The transfections are streaked out and individual transformed clones are cultured in synthetic medium. After 70 hours cell culture supernatant is removed from the culture, yeast cells and cell debris is removed from the supernatant by centrifugation and 10 l of supernatant is loaded and electrophoretically separated on SDS-PAGE (Novex NuPage 4-12%, Invitrogen) gels. After staining with coomassie blue or after silver staining of the SDS-PAGE gels the protein band of the scFv (ESBA1845), having a molecular weight of about 26 kDa is semi-quantitatively determined by scanning and densitometric measurement of the protein band in the gels. The signal intensity gives an estimate of the expression rate of the scFv protein.

(37) Concentration of POI in the supernatant was determined by applying automated capillary electrophoreses (LabChip GXII-Touch, Perkin Elmer, Waltham, MA, USA) according to manufacturer's recommendations.

(38) TABLE-US-00005 TABLE 5 Expression of POI, measured by Lab-on-a-chip, Perkin Elmer Fold increase relative Titer at harvest to cells transfected Transfected plasmid after 120 h [g/L]] with Y391_1xGOI Y391_1xGOI 1.2 1 (SEQ ID NO: 1) Y393_2xGOI 1.6 1.3 (SEQ ID NO: 2) Y394_3xGOI 1.7 1.4 (SEQ ID NO: 3) Y395_4xGOI 1.7 1.4 (SEQ ID NO: 4)

(39) Expression of POI in P. pastoris in Shaker Flasks, Determination of Genetic Stability

(40) The individual P. pastoris clones of are either cultured in shaker flasks for 4 weeks. The cell culture is diluted with medium when needed in order to ensure growth of the cells. Before and after this 4 week-culture the copy number of the expression cassettes is determined by for example quantitative PCR (qPCR). Optionally or in addition the sequence of the expression cassettes is determined by sequencing and the correct size of the PCR-amplified nucleic acids is determined by agarose gel electrophoresis, according to methods known in the art. These experiments are performed in order to determine genetic stability of the clones.
Methods for CHO Cells

(41) Generation of Vectors

(42) Four different CHO expression vectors are designed, each coding for the same POI. Two different nucleotide sequences coding for the same amino acid sequence of the POI were used (Etanercept var1 and Etanercept var2). The four different vectors each contain only one expression cassette coding for the same POI, one expression cassette for neomycin (antibiotic selection marker), an expression cassette for another antibiotic resistance, and one expression cassette for DHFR (metabolic selection marker needed for growth of the CHO cell line). Within each of the four different vectors different promoters and terminators are used for the GOI, the neomycin selection marker, and the DHFR, meaning that within a vector different promoters and terminators are used. The nucleotide sequence of the neomycin selection marker and the DHFR is identical in all four vectors. All vectors are chemically synthesized using the GeneArt synthesis service from (Geneart AG, Regensburg, Germany, now belonging to Life Technologies). Details on the vector elements of the different vectors can be found in Table 6, vector maps are depicted in FIGS. 3A to 3D, and the sequences are depicted in FIGS. 4A to 4D and in SEQ ID NOs: 5, 6, 7, and 8.

(43) The CHO-vectors each time comprise only one expression cassette, which expression cassette is different in each of the four vectors. In detail each expression cassette uses a different promoter, a different signal sequence and a different terminator. The POI is always the same. Furthermore each vector comprises an expression cassettes for the metabolic selection marker DHFR (each time coded by the same nucleotide sequence), an expression cassette for the antibiotic selection marker Neomycin R (NeoR) (each time coded by the same nucleotide sequence), and expression cassette coding for another antibiotic selection maker which is either a different selection marker, namely Ampicillin Resistance (AmpR), Spectromycin Resistance (SpectR) or Chloramphenicol Resistance (CmR), or which selection marker is the same selection marker but inserted into the vector in different orientation, e.g. in this case the Ampicillin Resistance marker in two different orientation within vector pNT-MG001 and pNT-MG004. Furthermore all 4 vectors contain as a vector backbone a phage f1 sequence an origin of replication, either pBR322 or p16A, wherein also pBR322 is used in two different orientations within the vectors. An overview of the different vector elements of the mammalian vectors is given in Table 6 below.

(44) TABLE-US-00006 TABLE 6 Vector elements of pNT-MB001 to pNT-MB004 POI DHFR NeoR phage Vector p SS POI TT p TT p TT Antibiotic ori f1 pNT-MG001 CMV + enh. + var1 var2 bGH SV40 synthetic PGK BG AmpR** pBR322* yes (SEQ ID RK intron NO: 5) pNT-MG002 SV40 + enh. + var2 var2 SV40 EF1a BG BG bGH SpectR p16A yes (SEQ ID Hbb intron II NO: 6) pNT-MG003 EF1a + EF1a var3 var2 BG CMV bGH SV40 SV40early CmR pBR322 yes (SEQ ID first intron NO: 7) pNT-MG004 CMV + enh. + var1 var1 SV40 SV40 SV40 SV40 + synthetic AmpR pBR322 yes (SEQ ID RK intron enh. NO: 8) p = promoter, SS = signal sequence, TT = terminator, DHFR = dihydrofolate reductase, NeoR = Neomycin R resistance, ori = origin of replication, var1 = variation 1 of sequence, var2 = variation 2 of sequence, var3 = variation 3 of sequence, AmpR = Ampicillin Resistance, SpectR = Spectromycin Resistance, CmR = Chloramphenicol Resistance *ori pBR322 has different orientation within vector in pNT-MB001, as compared to pNT-MB003 and pNT-MB004 **the antibiotic resistance AmpR has different orientation within vector in pNT-MB001, as compared to pNT-MB

(45) The nucleotide sequences of the vectors pNT-MG001 to pNT-MG004 are given in FIG. 4 A to D and in the sequence protocol, SEQ-ID NO. 5, 6, 7, and 8. As can be seen from Table 6 and from FIG. 3 A to D, pNT-MG001 to pNT-MG003 all contain as a POI the sequence of Etanercept, var2 (=version 2), whereas pNT-MG004 contains Etanercept, var1 (=version 1). var1 and var2 both represent a codon optimized nucleotide sequence, both coding for the same amino acid sequence, however with slightly different codon-usage. The nucleotide sequence of var1 and var2 are more than 90% identical (determined by methods as described elsewhere herein) and the difference is only caused by the use of two different codon-optimizing algorithms for var1 and var2. Only the nucleotide sequence of var2 (used in vectors pNT-MG001 to pNT-MG003) is given in FIG. 4 and in the sequence protocol. For the principle of the invention and for carrying out the described experiments it is not needed to know the var1 nucleotide sequence, as long as it is clear that both, var1 and var2 code for exactly the same amino acid sequence.

(46) Table 7 shows all features of the used expression vectors Y391_1 xGOI, Y393_2 xGOI, Y394_3 xGOI, Y394_4 xGOI, pNT-MG001, pNT-MG002, pNT-MG003, and pNT-MG004.

(47) TABLE-US-00007 TABLE 7 Features of the used expression vectors Position Seq.- within Vector name ID NO. Feature sequence Y391_1xGOI 1 pUC ori 1-673 LLPPT 674-1084 pLLP 1091-1701 LLPSS 1709-1783 scFv_var4 1784-2545 ADHTT 2558-2857 pILV5 2863-3416 pEM72 3417-3481 ZeoR 3482-3856 AODTT 3865-4335 Y393_2xGOI 2 pUC ori 1-674 pGAP 698-1183 MFa2SS 1197-1451 scFv_var1 1452-2216 LLPTT 2223-2633 pLLP 2640-3244 LLPSS 3258-3332 scFv_var4 3333-4097 ADHTT 4107-4406 pILV5 4412-4965 pEM72 4966-5030 ZeoR 5031-5405 AODTT 5414-5884 Y394_3xGOI 3 pADH 4-863 HSASS 877-930 scFv_var2 931-1695 cyc1TT 1702-1972 pGAP 1984-2469 MFa2SS 2483-2737 scFv_var1 2738-3502 LLPTT 3509-3919 pLLP 3926-4530 LLPSS 4544-4618 scFv_var4 4619-5383 ADHTT 5393-5692 pILV5 5698-6251 pEM72 6252-6316 ZeoR 6317-6691 AODTT 6700-7170 pUC ori 7191-7864 Y394_4xGOI 4 pADH 4-863 HSASS 877-930 scFv_var2 931-1695 cyc1TT 1702-1972 pGAP 1984-2469 MFa2SS 2483-2737 scFv_var1 2738-3502 LLPTT 3509-3919 pLLP 3926-4530 LLPSS 4544-4618 scFv_var4 4619-5383 ADHTT 5393-5692 pTEF 5698-6397 MFa4SS 6411-6467 scFv_var3 6468-7232 AOX1TT 7239-7498 pILV5 7499-8052 pEM72 8053-8117 ZeoR 8118-8492 AODTT 8501-8971 pUC ori 8992-9665 pNT-MG001 5 syntheticTT 12-60 pCMV + enh + RK 66-1065 intron SS var1 1134-1199 Etanercept var2 1200-2606 bGHTT 2668-2895 phage f1 2990-3445 pPGK 3509-4063 NeoR 4086-4880 BGTT 4944-5526 AmpR 5884-6744 pBR322 ori 6745-7555 pSV40 7615-7954 DHFR 8031-8594 pNT-MG002 6 pSV40 + enh + Hbb 12-1296 intron II SS var2 1360-1413 Etanercept var2 1414-2820 SV40TT 2882-3103 phage f1 3199-3654 pBG 3718-4105 NeoR 4106-4900 bGHTT 4901-5128 p16A ori 5320-6266 SpectR 6267-7277 pEF1a 7582-8765 DHFR 8842-9405 BGTT 10252-10783 pNT-MG003 7 pEF1a + EF1a first 12-1253 intron SS var2 1254-1310 Etanercept var2 1311-2717 BGTT 2765-3347 phage f1 3442-3897 pSV40 3961-4311 NeoR 4334-5128 SV40earlyTT 5129-5434 pBR322 ori 5653-6463 CmR 6492-7151 pCMV 7406-7994 DHFR 8071-8634 pGHTT 9480-9707 pNT-MG004 8 pCMV + enh + RK 1-1000 intron SS var1 1054-1119 Etanercept var1 1120-2526 SV40TT 2574-2795 phage f1 2890-3345 pSV40 + enh 3409-3827 NeoR 3872-4666 syntheticTT 4730-4778 AmpR 5189-6049 pBR322 ori 6050-6860 pSV40 6927-7265 DHFR 7342-7905 SV40TT 8753-8956 SS = signal sequence, TT = terminator, var1 = variant 1, ori = origin of replication, enh = enhancer

(48) Obtaining Stable Cell Lines

(49) CHO (DHFR) cells are transfected with either an individual vector of the four vectors or with a mix of all four vectors. Stable transfections are performed using Amaxa Nucleofection kit (Lonza AG, Switzerland) following manufacturer's instructions. Briefly, 510.sup.6 CHO cells are transfected with 3 g of linearized vector DNA per transfection. All vectors are either transfected individually, or as a mix of all four vectors combined. After transfection, growth medium is added and cells are grown in a 10% CO.sub.2 atmosphere for 24-48 h at 37 C. with shaking at 110 rpm. Following the recovery of the cells, two selection rounds are performed. Firstly, cells are selected using medium containing G418, followed by selection using methotrexate (MTX) after 90% cell viability is reached. Cells are maintained under MTX selection until cell viability reaches more than 90% (usually 3-4 weeks post-transfection). Throughout the selection period, cells are cultured using fresh medium twice per week. Single cell cloning is performed a using standard limiting dilution cloning approach. Individual clones were selected based on vector copy number (i.e. at least two copies per clone).

(50) From each transfection individual clones are selected and tested for expression rate (titer) of the POI, titer stability of the clone over time, leader peptide cleavage per clone, and genetic stability of the clone over time. With titer is meant concentration (mg/L) of recombinant POI, in this case Etanercept, in tissue culture medium.

(51) Analysis of Vector Copy Numbers in Cell Lines

(52) Integrated vector copy number are assessed using quantitative PCR (qPCR). Relative quantification is used to estimate the number of integrated expression constructs per clone. Repeating the copy number assessment after 3 months is also used to determine, whether copy number of the POI within the individual cell lines is stable over time. Separation of the PCR-products by agarose gel electrophoresis further allows to determine if the size of the PCR-amplified polynucleotide is stable over time, which is another indicator of genetic stability of the individual clones of the cell lines. High resolution melting analysis of PCR-products can be used to confirm the identity of the PCR products.

(53) Analysis of Production of POI by Cell Lines

(54) A 14-day generic fed-batch process is applied for productivity assessment. All fed-batch processes are performed in 100 mL serum-free medium. The medium is inoculated with 410.sup.5 of viable cells/mL and cell culture is incubated in 10% CO.sub.2 atmosphere at 37 C. with shaking at 110 rpm (50 mm shaking diameter) and with temperature shift to 33 C. on day 7. Cell concentration and viability are measured using a Vi-Cell XR analyzer. Titers are measured on cultivation days 7, 10 and 14 using Cedex system (Roche Diagnostics Deutschland GmbH, Mannheim, Germany). The measurement is based on a turbidimetric method using antibodies directed against the human Fc region. Harvests are collected at the end of the fed-batch processes and purified using Protein A chromatography.

(55) Analysis of Genetic Stability of the Cell Lines

(56) Individual cell clones are seeded at the density of 310.sup.5 cell/ml in 75 cm.sup.3 flasks in suspension culture in the absence of selective pressure. Productivity testing is done every 6 weeks over a period of 3 months. Expression of POI is measured using standard methods know the skilled person such as ELISA assays, ELISPOT, quantitative western blotting, quantitative mass spectrometry, surface plasmon resonance (e.g. Biacore, Sweden), etc.

(57) Analysis of Signal Peptide Cleavage, by the Cell Lines

(58) Analysis of the correct leader peptide cleavage is done by peptide sequencing using mass spectrometry or Edman degradation. Signal peptide miscleavage can be assessed using intact mass measurement. Protein is first de-glycosylated with N-glycosidase (PNGase) F and subsequently intact mass of the protein is analyzed using LC-MS on a high-resolution mass spectrometer. Masses are identified according to calculated theoretical masses of the protein and signal peptide adducts and proportion of miscleaved signal peptide is calculated from peak intensities.

(59) All methods described or mentioned herein for Pichia pastoris yeast cells, CHO mammalian cells, as well as for other types of cells according to the invention, are standard methods know to the skilled person. Such methods are for example described in standard laboratory method manuals such as for instance in M. R. Green, J. Sambrook, 2013, Molecular cloning: a laboratory manual, Cold Spring Harbor, N.Y., or in Current Protocols in Molecular Biology, John Wiley & Sons Inc. ISSN 1934-3639 and Current protocols in Protein Science, John Wiley & Sons Inc. ISSN 1934-3655, or in other titles of the Current Protocols series of John Wiley & Sons Inc.

(60) The invention does not include the by chance possible presence of two or more expression cassettes within an individual cell of a cell library, which expression cassettes comprise the same GOI but with a different coding sequence for that same expression cassette, wherein said cell library is intended to screen for an GOI coding sequence with a maximal expression rate in the cell line used for construction of the cell library.