NUCLEOTIDE SEQUENCES ENCODING PEPTIDE LINKERS

Abstract

The invention provides improved nucleotide sequences and nucleic acids that encode glycine serine linkers and that use an excess of GGA, GGG, and GGT/GGU codons to encode the glycine residues. The invention further relates to nucleotide sequences and nucleic acids that encode (fusion) proteins and polypeptides comprising glycine serine linkers, which nucleotide sequences and nucleic acids comprise such improved nucleotide sequences and nucleic acids of the invention.

Claims

1. Nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by said nucleotide sequence or nucleic acid comprises or (essentially) consists of glycine and serine residues, in which: more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA, GGG, or GGT/GGU; more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA or GGG; and/or less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up less than 1% and lower (including 0%) of the codons that encode a glycine residue in said peptide linker are GGC.

2. Nucleotide sequence and/or a nucleic acid according to claim 1, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% or more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA, GGG, or GGT/GGU.

3. Nucleotide sequence and/or a nucleic acid according to any of claim 1 or 2, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% or more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA or GGG.

4. Nucleotide sequence and/or a nucleic acid according to any of claims 1 to 3, in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up less than 1% or lower (including 0%) of the codons that encode a glycine residue in said peptide linker are GGC.

5. Nucleotide sequence and/or a nucleic acid according to any of claims 1 to 4, in which said peptide linker comprises or (essentially) consists of one or more (such as two or more) repeats of the sequence motif GGGGS (SEQ ID NO:1).

6. Nucleotide sequence and/or a nucleic acid according to any of claims 1 to 5, in which said peptide linker is a 9 GS linker, a 15 GS linker, a 20 GS linker, or a 35 GS linker.

7. Nucleotide sequence and/or a nucleic acid according to claim 6, in which said peptide linker is a 35 GS linker.

8. Nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more peptide linkers, in which said one or more peptide linkers are encoded by a nucleotide sequence or nucleic acid according to any of claims 1 to 7.

9. Nucleotide sequence and/or a nucleic acid according to claim 8, in which the two or more peptide moieties are both immunoglobulin single variable domains.

10. Nucleotide sequence and/or a nucleic acid according to claim 9, in which the two or more peptide moieties are both VHH's, humanized VHH's, sequence-optimized VHH's, or camelized VH's, such as camelized human VH's.

11. Nucleotide sequence and/or a nucleic acid according to any of claims 8 to 10, which encodes a bivalent, trivalent, bispecific, trispecific, biparatopic, or tetravalent construct.

12. Genetic construct that comprises a nucleotide sequence and/or a nucleic acid according to any of claims 1 to 11.

13. Method for expressing or producing a (fusion) protein or polypeptide, in which said method at least comprises the step of expressing a nucleotide sequence or nucleic acid according to any of claims 8 to 11 in a suitable host cell or host organism, and optionally also comprises the step of isolating/purifying the (fusion) protein or polypeptide thus expressed.

14. Method for expressing or producing a (fusion) protein or polypeptide according to claim 12, wherein the host is Pichia, such as Pichia pastoris.

15. Method for expressing or producing a (fusion) protein or polypeptide according to claim 12, wherein the host is a mammalian cell, such as a Chinese hamster ovary (CHO) cell.

16. A host cell or host organism that comprises a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide according to any of claims 8 to 11.

17. Method for reducing the level of Gly to Asp misincorporation in a peptide linker, said method comprising the step of replacing, in the nucleic acid sequence and/or nucleic acid that encodes said peptide linker, at least one GGC codon with a GGG, GGA or GGT/GGU codon.

18. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to claim 17, wherein the at least one GGC codon is replaced with a GGG or GGA codon.

19. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to any of claim 17 or 18, wherein the peptide linker comprises or (essentially) consists of one or more (such as two or more) repeats of the sequence motif GGGGS (SEQ ID NO:1).

20. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to any of claims 17 to 19, wherein the peptide linker is a 9 GS linker, a 15 GS linker, a 20 GS linker, or a 35 GS linker.

21. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to any of claims 17 to 20, wherein the peptide linker is a 35 GS linker.

22. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to any of claims 17 to 21, wherein the peptide linker links two or more peptide moieties.

23. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to claim 22, wherein the peptide moieties are immunoglobulin single variable domains.

24. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to claim 23, wherein the peptide moieties are VHH's, humanized VHH's, sequence-optimized VHH's, or camelized VH's, such as camelized human VH's.

25. Method for reducing the level of Gly to Asp misincorporation in a peptide linker according to any of claims 22-24, wherein the peptide linker is comprised in a bivalent, trivalent, bispecific, trispecific, biparatopic, or tetravalent construct.

Description

FIGURE LEGENDS

[0087] FIG. 1 schematically shows some non-limiting examples of Nanobody constructs containing linkers;

[0088] FIG. 2 schematically shows the tetravalent Nanobody construct used in Example 1 to illustrate the invention. FIG. 2 also shows the localization of the T10 peptide in this construct;

[0089] FIG. 3 shows the amino acid sequence (SEQ ID NO:10) and codon usage (SEQ ID NO: 11) of peptide T10. In the sequence, amino acid residues and codons where a misincorporation with aspartic acid was observed are indicated in bold/underline (note, for the residues/codons indicated in italics/underline, misincorporation could have been expected but was not observed).

[0090] FIG. 4 shows the amino acid sequence (SEQ ID NO:12) and coding sequences (SEQ ID NOs: 13 to 15) of the 35 GS linkers in Nanobody Construct A. Specific codons for glycine susceptible for misincorporation with aspartic acid (GGT and GGC) are indicated in bold/underline. Codons for serine are annotated in small caps.

[0091] FIG. 5 shows a cation exchange chromatogram of purified Nanobody Construct A on Source 15S column (GE Healthcare Life Sciences) and a pH gradient (green trace, CX-1 pH gradient buffer A (pH 5.6) and B (pH 10.2), Thermo Scientific), recorded at UV 254 nm (red (lower) trace) and UV 280 nm (blue (upper) trace). pH recording is shown in gray trace. The pre-peaks are acidic variants of Nanobody Construct A. The fractions 14, 15, 16, and 17 were pooled for subsequent characterization of the acidic variants, and fraction 18 for characterization of the main peak;

[0092] FIG. 6 shows the Max-ent deconvoluted mass spectra obtained for acidic variants (top pane) and main peak (bottom pane) collected from cation exchange fractionation of purified Nanobody Construct A. The most important mass measured in the acidic fractions is 59689.4 Da, which is 58 Dalton higher than the mass of Nanobody Construct A as measured in the pH-IEX main peak fraction (59630.9 Da, see bottom pane);

[0093] FIG. 7 lists the peptide fragments (SEQ ID NOs: 16 to 33) of tryptic peptide T10 generated by an Asp-N digest, an endoproteinase cleaving at the N-terminus of an aspartic acid. Each cleavage site corresponds with a glycine exchanged to an aspartic acid;

[0094] FIG. 8 shows the relative levels of Gly to Asp misincorporation of three sites (C1, C2, and C3) in the GS linker(s) of (a) Nanobody construct A; (b) Nanobody construct A after depletion of variants with Asp misincorporation by pH-IEX; (c) Nanobody construct A in which 100% of GGC codon sequences were replaced with a GGG, GGA or GGT codon sequence;

[0095] FIG. 9 shows the ten constructs that were produced to investigate the impact of valency and linker length on Gly to Asp misincorporation as described in Example 3;

[0096] FIG. 10 shows the relative levels of Gly to Asp misincorporation of the two sites (C1 and C2) in the 9GS linker; (A) bivalent construct, (B) trivalent construct, (C) tetravalent construct;

[0097] FIG. 11 shows the relative levels of Gly to Asp misincorporation of the five sites (C1, C2, C3, C4, and C5) in the 20GS linker; (A) bivalent construct, (B) trivalent construct, (C) tetravalent construct;

[0098] FIG. 12 shows the relative levels of Gly to Asp misincorporation of the nine sites (C1 to C9) in the 35GS linker; (A) bivalent construct, (B) trivalent construct and (C) tetravalent construct, (D) tetravalent construct without GGC codons.

[0099] The entire contents of all of the references (including literature references, issued patents, published patent applications, and co pending patent applications) cited throughout this application are hereby expressly incorporated by reference, in particular for the teaching that is referenced hereinabove.

EXPERIMENTAL PART

Example 1

Construction of an Expression Vector for a Tetravalent Nanobody Construct

[0100] In this Example, the invention will be illustrated using, as a non-limiting example, a tetravalent Nanobody construct consisting of four sequence optimized variable domains of a heavy-chain llama antibody, which are fused head-to-tail with 35GS linkers (see FIG. 2). The overall construct used (also referred to herein as Nanobody construct A) can be schematically represented by the formula

[A]-[35GS linker]-[B]-[35GS linker]-[C]-[35GS linker]-[C]

in which [A], [B] and [C] represent three different Nanobodies and [35GS linker] represents a 35GS linker (see also FIG. 2).

[0101] DNA fragments containing the coding information of Nanobody Construct A were cloned into the multiple cloning site of a Pichia expression vector that contains a Zeocin resistance gene (a derivative of the original pPpT4_Alpha_S expression vector described by Ntsaari et al., PLoS One. 2012; 7(6):e39720), such that the Nanobody sequence was downstream of and in frame with the alfa Mating Factor (aMF) signal peptide sequence.

Transformation of the Nanobody Construct a Coding Sequence, Expression and Secretion of the Construct in Pichia pastoris

[0102] Transformation and expression studies were performed in the Pichia strain NRRL Y-11430 (ARS Patent Culture Collection 1815 North University St., Peoria). This WT strain was used to make a derivative strain overexpressing the endogenous Pichia auxiliary protein KAR2 (GeneID:8198455) as well as Nanobody Construct A. Both Nanobody Construct A and Kar2 were under the control of the AOX1 methanol inducible promoter. Transformation was performed by standard techniques and in accordance with the standard handbooks (see for example Methods In Molecular Biology 2007, Humana Press Inc.). Transformants were grown on selective medium containing Zeocin and a number of individual colonies were selected and evaluated on the expression level of Nanobody Construct A in 5 mL shake-flasks cultures in BMCM medium and induced by the addition of methanol as has been described in Pichia protocols (see again the standard handbooks). The best expressing clone was used in standard fed batch fermentation. Glycerol fed batches were performed and induction was initiated by the addition of methanol. The productions were performed at 2 L scale at pH6, 30 C. in complex medium with a methanol feed rate of 4 ml/L*h.

Purification of the Nanobody Construct a after Fed-Batch Fermentation

[0103] Nanobody Construct A was purified as follows: after fermentation, part of the cell broth was clarified via a hollow fiber 750 kDa followed by a capture step using a CIEX Poros XS resin, a polish step using CIEX Nuvia HR-S resin and a flow through step on an AIEX Sartobind STIC PA. Finally a concentration and buffer exchange step was performed via UF/DF using the Hydrosart 10 kD membrane.

Analysis of Purified Nanobody Construct a on Ion Exchange Chromatography and Determination of Molecular Weight of Acidic Variants

[0104] The purified Nanobody Construct A was analyzed by strong cation exchange chromatography using a pH gradient (pH-IEX). The chromatogram, shown in FIG. 5, shows acidic variants of the Nanobody A eluting as a group of pre-peaks relative to the main peak. After fraction collection of the acidic and main peaks, the nature of the acidic variants was investigated by determining their molecular weight by electrospray Q-TOF mass spectrometry. The deconvoluted mass spectra are shown in FIG. 6. The main mass observed in the acidic fraction was 59689.4 Da, which is 58 Dalton higher than the mass of Nanobody Construct A as measured in the pH-IEX main peak fraction. The mass measured for Nanobody Construct A in the main peak fraction (59630.9 Da) is 12 ppm higher than theoretical molecular weight of Nanobody Construct A, i.e. within the measurement error of the instrument.

[0105] A 58 Dalton mass difference can be explained by the exchange of glycine with the acidic amino acid aspartic acid.

Analysis and Identification of Acidic Variants by Peptide Map Reversed Phase UHPLC Coupled with Mass Spectrometry (RP-UHPLC-MS)

[0106] Peptide map analysis (after trypsin digest) of the acidic variants fraction of Nanobody Construct A resulted in identification of two peptides with a mass increment of 58 Dalton. As schematically shown in FIG. 2, one of these two peptides (referred to herein as the T10 peptide) corresponds to a part of the sequence that encompasses a few of the C-terminal amino acid residues of the first Nanobody in the construct, the first 35Gs linker and a few of the N-terminal amino acid residues of the second Nanobody in the construct. The amino acid sequence (SEQ ID NO:10) and nucleotide sequence (SEQ ID NO:11) of the T10 peptide are shown in FIG. 3.

[0107] As collision induced fragmentation in the mass spectrometer led to only partial sequence coverage of the T10 peptide, the T10 peptide of the trypsin digest was fractionated by reversed phase chromatography, and subsequently digested with the enzyme Asp-N. The enzyme Asp-N is an endoproteinase that hydrolyses peptide bonds on the N-terminal side of aspartic acid residues. Because no aspartic acid residues are in the sequence of this peptide, cleavages were only expected in case of a Gly->Asp misincorporation events. In the analysis of the Asp-N digest of the T10 peptide by RP-UHPLC-MS, different fragments were identified with a mass corresponding to fragments of the T10 peptide with a mass increment of 58 Dalton. In total 9 Asp-N fragmentation sites were identified, as shown in FIG. 7. Quite unexpectedly, it was observed that the Asp misincorporation only occurred at GGC codons (see also FIG. 3), and not at GGT codons although both glycine codons can in principle be misread by the aspartic acid tRNAs (having the anticodons CUG and CUA). In both cases there is a G-(mRNA)/U-(tRNA) mismatch, i.e. the most common mismatch during translation, along with wobble position mismatches (C/U and/or U/U), that cause amino acid misincorporation. Thus, more generally, according to the invention, when a codon encoding glycine other than GGA or GGG (i.e. that is not GGA or GGG) is present in a nucleotide sequence of the invention, it may be preferred that codon is GGT or GGU rather than GGC.

[0108] As mentioned, the peptide map analysis of Nanobody Construct A also resulted in identification of a second peptide with a mass increment of 58 Dalton. This peptide was found to correspond to one of the CDR's of one of the Nanobodies present in Nanobody Construct A. Further analysis (data not shown) confirmed that also for this peptide, the observed mass increment of 58 Dalton was most likely due to Asp misincorporation.

Example 2: Codon Optimization in the Nucleic Acid Sequence of the 35GS Linkers

[0109] The GGC codon sequences present in the 35GS linker sequence of Nanobody construct A were replaced with a GGG, GGA or GGT codon sequence.

[0110] The obtained Nanobody constructs were expressed in Pichia strain NRRL Y-11430 and purified as described above. The level of Asp misincorporation in the obtained polypeptides was measured by the same method as described above. The mass spectrometer was setup to quantify 3 out of 9 misincorporation sites.

[0111] The relative levels of Asp misincorporation in the 35GS linker of the polypeptide obtained with the Reference Nanobody construct A (no codon optimization) and of the polypeptide obtained with the codon optimized Nanobody construct A is shown in FIG. 8.

Example 3: Observation of Asp Misincorporation in Other Linkers

[0112] In this example, the impact of Nanobody valency and linker length on Gly to Asp misincorporation was studied. For this, bi-, tri- and tetravalent constructs, each with 9GS, 20GS or 35GS linkers sequences and a Nanobody building block sequence (different from the Nanobody building block sequence present in Nanobody construct A) were produced. An extra tetravalent, 35GS linker Nanobody construct was also produced without any GGC codons. The ten new constructs are shown in FIG. 9. The 9GS linker contains 2 GGC codons, the 20GS linker contains 5 GGC codons and the 35GS linker contains 9 GGC codons.

[0113] Each possible new peptide after Gly to Asp misincorporation was followed with the mass spectrometry method as described above. The method was further optimized to allow simultaneous quantification of all 9 Asp-N fragmentation sites. The results on the misincorporation are shown in FIG. 10 (9GS linker), FIG. 11 (20 GS linker) and FIG. 12 (35 GS linker).

[0114] From these results it can be concluded that the valency or the linker length does not have an impact on Gly to Asp misincorporation levels. Removal or reduction of the number of GGC codons clearly reduces the level of Gly to Asp misincorporation.

[0115] Finally, although the invention is described herein mainly with respect to GS linkers, it will be clear to the skilled person that the invention can generally be applied to other peptide linkers that contain glycine residues.

[0116] Thus, in a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by nucleotide sequence and/or a nucleic acid contains four or more glycine residues, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGA, GGG or GGT/GGU.

[0117] In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by nucleotide sequence and/or a nucleic acid contains four or more glycine residues, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGA or GGG.

[0118] In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by nucleotide sequence and/or a nucleic acid contains four or more glycine residues, in which less than 30%, preferably less than 1%, more preferably less than 10%, such as less than 5% and up to less than 1% and lower (including 0%) of the codons that encode a glycine residue in the GS linker are GGC.

NUCLEOTIDE SEQUENCES ENCODING PEPTIDE LINKERS

Assignee

Inventors

Cpc classification

Classification Explorer

A61K31/713

HUMAN NECESSITIES

Classification Explorer

A61K47/65

HUMAN NECESSITIES

Classification Explorer

A61K47/6889

HUMAN NECESSITIES

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C07K7/04

CHEMISTRY; METALLURGY

Classification Explorer

C07K2317/569

CHEMISTRY; METALLURGY

Classification Explorer

C07K2317/35

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/67

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/63

CHEMISTRY; METALLURGY

Classification Explorer

C07K2319/00

CHEMISTRY; METALLURGY

Classification Explorer

C07K16/468

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/62

CHEMISTRY; METALLURGY

Classification Explorer

C07K16/46

CHEMISTRY; METALLURGY

Abstract

Claims

Description