Protein surface remodeling
10407474 ยท 2019-09-10
Assignee
Inventors
- David R. Liu (Lexington, MA)
- Kevin John Phillips (Somerville, MA, US)
- Michael S. Lawrence (Atkinson, NH, US)
Cpc classification
International classification
C12N9/00
CHEMISTRY; METALLURGY
Abstract
Aggregation is a major cause of the misbehavior of proteins. A system for modifying a protein to create a more stable variant is provided. The method involves identifying non-conserved hydrophobic amino acid residues on the surface of a protein, suitable for mutating to more hydrophilic residues (e.g., charged amino acids). Any number of residues on the surface may be changed to create a variant that is more soluble, resistant to aggregation, has a greater ability to re-fold, and/or is more stable under a variety of conditions. The invention also provides GFP, streptavidin, and GST variants with an increased theoretical net charge created by the inventive technology. Kits are also provided for carrying out such modifications on any protein of interest.
Claims
1. A supercharged protein variant of a wild-type protein, wherein the supercharged protein variant comprises a modified primary amino acid sequence as compared to the wild-type sequence, resulting in a theoretical net charge on the supercharged protein variant of +10 to +52 at physiological pH, wherein the wild-type protein is selected from the group consisting of enzymes, transcription factors, DNA-binding proteins, and RNA-binding proteins.
2. The supercharged protein of claim 1, wherein the net charge at physiological pH of the supercharged protein variant is increased by at least +3 as compared to the wild-type sequence.
3. The supercharged protein of claim 2, wherein the net charge at physiological pH of the supercharged protein variant is increased by at least +25 as compared to the wild-type sequence.
4. The supercharged protein variant of claim 1, wherein the theoretical net charge at physiological pH of the supercharged protein is within the range of +52 to +20.
5. The supercharged protein variant of claim 4, wherein the theoretical net charge at physiological pH of the supercharged protein is within the range of +52 to +30.
6. The supercharged protein variant of claim 1, wherein the supercharged protein variant retains at least 50% of the activity of the wild-type protein.
7. The supercharged protein variant of claim 6, wherein the supercharged protein variant retains at least 90% of the activity of the wild-type protein.
8. The supercharged protein variant of claim 1, wherein the modified primary amino acid sequence comprises replacing a plurality of non-conserved, surface residues with a natural amino acid residue that is positively charged at physiological pH; and wherein non-conserved, surface residues are identified by comparing the amino acid sequence of the protein with at least one other amino acid sequence of the protein from the same protein family or a different species, wherein a residue is non-conserved if less than or equal to 50% of the amino acid sequences have the same amino acid sequence in a particular position.
9. The supercharged protein variant of claim 1, wherein the modified primary amino acid sequence of the supercharged protein variant comprises a replacement of at least five surface residues of the wild-type protein with a different residue.
10. The supercharged protein variant of claim 9, wherein the modified primary amino acid sequence of the supercharged protein variant comprises a replacement of at least five surface residue of the wild-type protein with a lysine, histidine, or arginine residue.
11. The supercharged protein variant of claim 1, wherein the wild type protein is a monomeric protein.
12. The supercharged protein variant of claim 1, wherein the wild type protein is a multimeric protein.
13. The supercharged protein variant of claim 1, wherein the wild type protein is an enzyme.
14. The supercharged protein variant of claim 1, wherein the wild type protein is a transcription factor.
15. The supercharged protein variant of claim 1, wherein the wild type protein is a DNA-binding protein.
16. The supercharged protein variant of claim 1, wherein the wild type protein is a RNA-binding protein.
17. The supercharged protein variant of claim 1, wherein the variant is a fusion protein.
18. The supercharged protein variant of claim 17, wherein the fusion protein comprises a linker.
19. A complex comprising the supercharged protein variant of claim 1 and an oppositely charged macromolecule.
20. A composition comprising the supercharged protein variant of claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWING
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF CERTAIN PREFERRED EMBODIMENTS OF THE INVENTION
(6) The invention provides a system for modifying proteins to be more stable. The system is thought to work by changing non-conserved amino acids on the surface of a protein to more polar or charged amino acid residues. The amino acids residues to be modified may be hydrophobic, hydrophilic, charged, or a combination thereof. Any protein may be modified using the inventive system to produce a more stable variant. These modifications of surface residues have been found to improve the extrathermodynamic properties of proteins. As proteins are increasingly used as therapeutic agents and as they continue to be used as research tools, a system for altering a protein to make it more stable is important and useful. Proteins modified by the inventive method typically are resistant to aggregation, have an increased ability to refold, resist improper folding, have improved solubility, and are generally more stable under a wide range of conditions including denaturing conditions such as heat or the presence of a detergent.
(7) Any protein may be modified to create a more stable variant using the inventive system. Natural as well as unnatural proteins (e.g., engineered proteins) may be modified. Example of proteins that may be modified include receptors, membrane bound proteins, transmembrane proteins, enzymes, transcription factors, extracellular proteins, therapeutic proteins, cytokines, messenger proteins, DNA-binding proteins, RNA-binding proteins, proteins involved in signal transduction, structural proteins, cytoplasmic proteins, nuclear proteins, hydrophobic proteins, hydrophilic proteins, etc. The protein to be modified may be derived from any species of plant, animal, or microorganism. In certain embodiments, the protein is a mammalian protein. In certain embodiments, the protein is a human protein. In certain embodiments, the proteins is derived from an organism typically used in research. For example, the protein to be modified may be from a primate (e.g., ape, monkey), rodent (e.g., rabbit, hamster, gerbil), pig, dog, cat, fish (e.g., zebrafish), nematode (e.g., C. elegans), yeast (e.g., Saccharomyces cervisiae), or bacteria (e.g., E. coli).
(8) The inventive system is particularly useful in modifying proteins that are susceptible to aggregation or have stability issues. The system may also be used to modify proteins that are being overexpressed. For example, therapeutic proteins that are being produced recombinantly may benefit from being modified by the inventive system. Such modified therapeutic proteins are not only easier to produce and purify but also may be more stable with respect to storage and use of the protein.
(9) The inventive system involves identifying non-conserved surface residues of a protein of interest and replacing some of those residues with a residue that is hydrophilic, polar, or charged at physiological pH. The inventive system includes not only methods for modifying a protein but also reagents and kits that are useful in modifying a protein to make it more stable.
(10) The surface residues of the protein to be modified are identified using any method(s) known in the art. In certain embodiments, the surface residues are identified by computer modeling of the protein. In certain embodiments, the three-dimensional structure of the protein is known and/or determined, and the surface residues are identified by visualizing the structure of the protein. In other embodiments, the surface residues are predicted using computer software. In certain particular embodiments, Average Neighbor Atoms per Sidechain Atom (AvNAPSA) is used to predict surface exposure. AvNAPSA is an automated measure of surface exposure which has been implemented as a computer program. See Appendix A. A low AvNAPSA value indicates a surface exposed residue, whereas a high value indicates a residue in the interior of the protein. In certain embodiments, the software is used to predict the secondary structure and/or tertiary structure of a protein and the surface residues are identified based on this prediction. In other embodiments, the prediction of surface residues is based on hydrophobicity and hydrophilicity of the residues and their clustering in the primary sequence of the protein. Besides in silico methods, the surface residues of the protein may also be identified using various biochemical techniques, for example, protease cleavage, surface modification, etc.
(11) Of the surface residues, it is then determined which are conserved or important to the functioning of the protein. The identification of conserved residues can be determined using any method known in the art. In certain embodiments, the conserved residues are identified by aligning the primary sequence of the protein of interest with related proteins. These related proteins may be from the same family of proteins. For example, if the protein is an immunoglobulin, other immunoglobulin sequences may be used. The related proteins may also be the same protein from a different species. For example, the conserved residues may be identified by aligning the sequences of the same protein from different species. To give but another example, proteins of similar function or biological activity may be aligned. Preferably, 2, 3, 4, 5, 6, 7, 8, 9, or 10 different sequences are used to determine the conserved amino acids in the protein. In certain embodiments, the residue is considered conserved if over 50%, 60%, 70%, 75%, 80%, or 90% of the sequences have the same amino acid in a particular position. In other embodiments, the residue is considered conserved if over 50%, 60%, 70%, 75%, 80%, or 90% of the sequences have the same or a similar (e.g., valine, leucine, and isoleucine; glycine and alanine; glutamine and asparagine; or aspartate and glutamate) amino acid in a particular position. Many software packages are available for aligning and comparing protein sequences as described herein. As would be appreciated by one of skill in the art, either the conserved residues may be determined first or the surface residues may be determined first. The order does not matter. In certain embodiments, a computer software package may determine surface residues and conserved residues simultaneously. Important residues in the protein may also be identified by mutagenesis of the protein. For example, alanine scanning of the protein can be used to determine the important amino acid residues in the protein. In other embodiments, site-directed mutagenesis may be used.
(12) Once non-conserved surface residues of the protein have been identified, each of the residues is identified as hydrophobic or hydrophilic. In certain embodiments, the residues is assigned a hydrophobicity score. For example, each non-conserved surface residue may be assigned an octanol/water log P value. Other hydrophobicity parameters may also be used. Such scales for amino acids have been discussed in: Janin, Surface and Inside Volumes in Globular Proteins, Nature 277:491-92, 1979; Wolfenden et al., Affinities of Amino Acid Side Chains for Solvent Water, Biochemistry 20:849-855, 1981; Kyte et al., A Simple Method for Displaying the Hydropathic Character of a Protein, J. Mol. Biol. 157:105-132, 1982; Rose et al., Hydrophobicity of Amino Acid Residues in Globular Proteins, Science 229:834-838, 1985; Corvette et al., Hydrophobicity Scales and Computational Techniques for Detecting Amphipathic Structures in Proteins, J. Mol. Biol, 195:659-685, 1987; Charton and Charton, The Structure Dependence of Amino Acid Hydrophobicity Parameters, J. Theor. Biol. 99:629-644, 1982; each of which is incorporated by reference. Any of these hydrophobicity parameters may be used in the inventive method to determine which non-conserved residues to modify. In certain embodiments, hydrophilic or charged residues are identified for modification.
(13) At least one identified non-conserved or non-vital surface residue is then chosen for modification. In certain embodiments, hydrophobic residue(s) are chosen for modification. In other embodiments, hydrophilic and/or charged residue(s) are chosen for modification. In certain embodiments, more than one residue is chosen for modification. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 of the identified residues are chosen for modification. In certain embodiments, over 10, over 15, or over 20 residues are chosen for modification. As would be appreciated by one of skill in the art, the larger the protein the more residues that will need to be modified. Also, the more hydrophobic or susceptible to aggregation or precipitation the protein is, the more residues will need to be modified. In certain embodiments, multiple variants of the protein, each with different modifications, are produced and tested to determine the best variant in terms of biological activity and stability.
(14) In certain embodiments, the residues chosen for modification are mutated into more hydrophilic residues (including charged residues). Typically, the residues are mutated into more hydrophilic natural amino acids. In certain embodiments, the residues are mutated into amino acids that are charged at physiological pH. For example, the residue may be changed to an arginine, aspartate, glutamate, histidine, or lysine. In certain embodiments, all the residues to be modified are changed into the same different residue. For example, all the chosen residues are changed to a glutamate residue. In other embodiments, the chosen residues are changed into different residues; however, all the final residues may be either positively charged or negatively charged at physiological pH. In certain embodiments, to create a negatively charged protein, all the residues to be mutated are converted to glutamate and/or aspartate residues. In certain embodiments, to create a positively charged protein, all the residues to be mutated are converted to lysine residues. For example, all the chosen residues for modification are asparagine, glutamine, lysine, andlor arginine, and these residues are mutated into aspartate or glutamate residues. To give but another example, all the chosen residues for modification are aspartate, glutamate, asparagine, and/or glutamine, and these residues are mutated into lysine. This approach allows for modifying the net charge on the protein to the greatest extent.
(15) In other embodiments, the protein may be modified to keep the net charge on the modified protein the same as on the unmodified protein. In still other embodiments, the protein may be modified to decrease the overall net charge on the protein while increasing the total number of charged residues on the surface. In certain embodiments, the theoretical net charge is increased by at least +1, +2, +3, +4, +5, +10, +15, +20, +25, +30, or +35. In certain embodiments, the theoretical net charge is decreased by at least 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, or 35. In certain embodiments, the chosen amino acids are changed into non-ionic, polar residues (e.g., cysteine, serine, threonine, tyrosine, glutamine, asparagine).
(16) These modification or mutations in the protein may be accomplished using any technique known in the art. Recombinant DNA techniques for introducing such changes in a protein sequence are well known in the art. In certain embodiments, the modifications are made by site-directed mutagenesis of the polynucleotide encoding the protein. Other techniques for introducing mutations are discussed in Molecular Cloning: A Laboratory Manual, 2nd Ed., ed. by Sambrook, Fritsch, and Maniatis (Cold Spring Harbor Laboratory Press: 1989); the treatise, Methods in Enzymology (Academic Press, Inc., N.Y.); Ausubel et al. Current Protocols in Molecular Biology (John Wiley & Sons, Inc., New York, 1999); each of which is incorporated herein by reference. The modified protein is expressed and tested. In certain embodiments, a series of variants is prepared and each variant is tested to determine its biological activity and its stability. The variant chosen for subsequent use may be the most stable one, the most active one, or the one with the greatest overall combination of activity and stability. After a first set of variants is prepared an additional set of variants may be prepared based on what is learned from the first set. The variants are typically created and overexpressed using recombinant techniques known in the art.
(17) The inventive system has been used to created variants of GFP. These variants have been shown to be more stable and to retain their fluorescence. A GFP from Aequorea victoria is described in GenBank Accession Number P42212, incorporated herein by reference. The amino acid sequence of this wild type GFP is as follows:
(18) TABLE-US-00001 (SEQIDNO:1) MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTT GKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFF KDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNV YIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHY LSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
Wild type GFP has a theoretical net charge of 7. Using the inventive system, variants with a theoretical net charge of 29, 30, 25, +36, +48, and +49 have been created. Even after heating the +36 GFP to 95 C., 100% of the variant protein is soluble and the protein retains 70% of its fluorescence.
(19) The amino acid sequences of the variants of GFP that have been created include:
(20) TABLE-US-00002 GFP-NEG25 (SEQIDNO:2) MGHHHHHHGGASKGEELFTGVVPILVELDGDVNGHEFSVRGEGEGDATEG ELTLKFICTTGELPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPE GYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK LEYNFNSHDVYITADKQENGIKAEFEIRHNVEDGSVQLADHYQQNTPIGD GPVLLPDDHYLSTESALSKDPNEDRDHMVLLEFVTAAGIDHGMDELYK GFP-NEG29 (SEQIDNO:3) MGHHHHHHGGASKGEELFDGEVPILVELDGDVNGHEFSVRGEGEGDATEG ELTLKFICTTGELPVPWPTLVTTLTYGVQCFSRYPDHMDQHDFFKSAMPE GYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK LEYNFNSHDVYITADKQENGIKAEFEIRHNVEDGSVQLADHYQQNTPIGD GPVLLPDDHYLSTESALSKDPNEDRDHMVLLEFVTAAGIDHGMDELYK GFP-NEG30 (SEQIDNO:4) MGHHHHHHGGASKGEELFDGVVPILVELDGDVNGHEFSVRGEGEGDATEG ELTLKFICTTGELPVPWPTLVTTLTYGVQCFSDYPDHMDQHDFFKSAMPE GYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHK LEYNFNSHDVYITADKQENGIKAEFEIRHNVEDGSVQLADHYQQNTPIGD GPVLLPDDHYLSTESALSKDPNEDRDHMVLLEFVTAAGIDHGMDELYK GFP-POS36) (SEQIDNO:5) MGHHHHHHGGASKGERLFRGKVPILVELKGDVNGHKFSVRGKGKGDATRG KLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPKHMKRHDFFKSAMPK GYVQERTISFKKDGKYKTRAEVKFEGRTLVNRIKLKGRDFKEKGNILGHK LAYNFNSHKVYITADKRKNGIKAKFKIRHNVKDGSVQLADHYQQNTPIGR GPVLLPRNHYLSTRSKLSKDPKEKRDHMVLLEFVTAAGIKHGRDERYK GFP-POS42 (SEQIDNO:6) MGHHHHHHGGRSKGKRLFRGKVPILVELKGDVNGHKFSVRGKGKGDATRG KLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPKHMKRHDFFKSAMPK GYVQERTISFKKDGKYKTRAEVKFEGRTLVNRIKLKGRDFKEKGNILGHK LRYNFNSHKVYITADKRKNGIKAKFKIRHNVKDGSVQLADHYQQNTPIGR GPVLLPRKHYLSTRSKLSKDPKEKRDHMVLLEFVTAAGIKHGRKERYK GFP-POS49 (SEQIDNO:7) MGHHHHHHGGRSKGKRLFRGKVPILVKLKGDVNGHKFSVRGKGKGDATRG KLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPKHMKRHDFFKSAMPK GYVQERTISFKKDGKYKTRAEVKFKGRTLVNRIKLKGRDFKEKGNILGHK LRYNFNSHKVYITADKRKNGIKAKFKIRHNVKDGSVQLAKHYQQNTPIGR GPVLLPRKHYLSTRSKLSKDPKEKRDHMVLKEFVTAAGIKHGRKERYK
As would be appreciated by one of skill in the art, homologous proteins are also considered to be within the scope of this invention. For example, any protein that includes a stretch of 20, 30, 40, 50, or 100 amino acids which are 60%, 70%, 80%, 90%, 95%, or 100% homologous to any of the above sequences is considered part of the invention. In addition, addition and deletion variants are also contemplated by the invention. In certain embodiments, any GFP with a mutated residue as shown in any of the above sequences is considered part of the invention. In certain embodiments, the sequence includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations as shown in any of the sequences above.
(21) Any DNA sequence that encodes the above GFP variants is also include within the scope of the invention. Exemplary DNA sequences which encode each of the variants above are as follows:
(22) TABLE-US-00003 GFP-NEG25 (SEQIDNO:8) ATGGGGCATCACCATCATCATCATGGCGGTGCGTCTAAGGGGGAGGAGTT ATTTACGGGTGTGGTGCCGATCCTGGTGGAGCTTGATGGCGATGTTAACG GCCATGAATTTTCTGTCCGCGGTGAAGGGGAGGGTGATGCCACGGAAGGG GAGCTGACACTTAAATTTATTTGCACCACCGGTGAACTCCCGGTCCCGTG GCCGACCCTGGTGACCACCCTGACCTACGGCGTTCAATGCTTTTCACGTT ATCCGGATCACATGAAGCAACACGACTTCTTTAAAAGCGCGATGCCTGAA GGCTATGTTCAAGAACGTACAATTAGTTTTAAAGATGACGGCACCTACAA GACCCGTGCGGAAGTAAAATTTGAAGGGGACACTTTAGTGAACCGCATCG AGCTGAAAGGGATCGATTTTAAAGAAGATGGGAATATCCTGGGACACAAA CTTGAATACAACTTTAATAGTCATGACGTCTATATCACGGCGGACAAACA GGAAAACGGAATTAAGGCAGAATTTGAGATTCGGCATAATGTCGAAGATG GCTCGGTACAGTTGGCTGATCACTATCAGCAGAATACGCCGATTGGAGAT GGTCCGGTTTTATTACCAGACGATCACTATCTGTCCACCGAATCCGCCCT GAGCAAAGATCCGAATGAAGACCGGGACCATATGGTTCTGCTGGAATTTG TTACGGCGGCTGGTATTGACCATGGCATGGATGAGCTGTATAAGTAG GFP-NEG29 (SEQIDNO:9) ATGGGGCATCACCATCATCATCATGGCGGTGCGTCTAAGGGGGAGGAGTT ATTTGATGGTGAAGTGCCGATCCTGGTGGAGCTTGATGGCGATGTTAACG GCCATGAATTTTCTGTCCGCGGTGAAGGGGAGGGTGATGCCACGGAAGGG GAGCTGACACTTAAATTTATTTGCACCACCGGTGAACTCCCGGTCCCGTG GCCGACCCTGGTGACCACCCTGACCTACGGCGTTCAATGCTTTTCACGTT ATCCGGATCACATGGACCAACACGACTTCTTTAAAAGCGCGATGCCTGAA GGCTATGTTCAAGAACGTACAATTAGTTTTAAAGATGACGGCACCTACAA GACCCGTGCGGAAGTAAAATTTGAAGGGGACACTTTAGTGAACCGCATCG AGCTGAAAGGGATCGATTTTAAAGAAGATGGGAATATCCTGGGACACAAA CTTGAATACAACTTTAATAGTCATGACGTCTATATCACGGCGGACAAACA GGAAAACGGAATTAAGGCAGAATTTGAGATTCGGCATAATGTCGAAGATG GCTCGGTACAGTTGGCTGATCACTATCAGCAGAATACGCCGATTGGAGAT GGTCCGGTTTTATTACCAGACGATCACTATCTGTCCACCGAATCCGCCCT GAGCAAAGATCCGAATGAAGACCGGGACCATATGGTTCTGCTGGAATTTG TTACGGCGGCTGGTATTGACCATGGCATGGATGAGCTGTATAAGTAG GFP-NEG30 (SEQIDNO:10) ATGGGGCATCACCATCATCATCATGGCGGTGCGTCTAAGGGGGAGGAGTT ATTTGATGGTGTGGTGCCGATCCTGGTGGAGCTTGATGGCGATGTTAACG GCCATGAATTTTCTGTCCGCGGTGAAGGGGAGGGTGATGCCACGGAAGGG GAGCTGACACTTAAATTTATTTGCACCACCGGTGAACTCCCGGTCCCGTG GCCGACCCTGGTGACCACCCTGACCTACGGCGTTCAATGCTTTTCAGATT ATCCGGATCACATGGACCAACACGACTTCTTTAAAAGCGCGATGCCTGAA GGCTATGTTCAAGAACGTACAATTAGTTTTAAAGATGACGGCACCTACAA GACCCGTGCGGAAGTAAAATTTGAAGGGGACACTTTAGTGAACCGCATCG AGCTGAAAGGGATCGATTTTAAAGAAGATGGGAATATCCTGGGACACAAA CTTGAATACAACTTTAATAGTCATGACGTCTATATCACGGCGGACAAACA GGAAAACGGAATTAAGGCAGAATTTGAGATTCGGCATAATGTCGAAGATG GCTCGGTACAGTTGGCTGATCACTATCAGCAGAATACGCCGATTGGAGAT GGTCCGGTTTTATTACCAGACGATCACTATCTGTCCACCGAATCCGCCCT GAGCAAAGATCCGAATGAAGACCGGGACCATATGGTTCTGCTGGAATTTG TTACGGCGGCTGGTATTGACCATGGCATGGATGAGCTGTATAAGTAG GFP-POS36 (SEQIDNO:11) ATGGGGCATCATCATCATCACCACGGCGGGGCGTCTAAGGGAGAGCGCTT GTTTCGCGGCAAAGTCCCGATTCTTGTGGAGCTCAAAGGTGATGTAAATG GTCATAAATTTAGTGTGCGCGGGAAAGGGAAAGGAGATGCTACGCGGGGC AAGCTCACCCTGAAATTTATTTGCACAACCGGCAAACTGCCAGTGCCGTG GCCTACATTAGTCACTACTCTGACGTACGGTGTTCAGTGCTTTTCTCGCT ATCCCAAACACATGAAACGCCATGATTTCTTCAAGAGCGCGATGCCAAAA GGTTATGTGCAGGAACGCACCATCAGCTTTAAAAAAGACGGCAAATATAA AACCCGTGCAGAAGTTAAATTCGAAGGCCGCACCCTGGTCAACCGCATTA AACTGAAAGGTCGTGACTTCAAAGAGAAAGGTAATATTCTTGGTCACAAA CTGCGCTATAATTTCAACTCTCACAAAGTTTATATTACGGCGGATAAACG TAAAAACGGGATTAAAGCGAAATTTAAGATTCGTCATAATGTTAAAGACG GCAGTGTGCAGTTAGCGGATCATTATCAGCAGAATACCCCAATTGGTCGC GGTCCAGTGCTGCTGCCGCGTAACCATTATCTGTCGACCCGCAGCAAACT CAGCAAAGACCCGAAAGAAAAACGTGACCACATGGTATTACTGGAATTTG TGACCGCAGCAGGCATTAAACATGGCCGCGATGAACGTTACAAATAG GFP-POS42 (SEQIDNO:12) ATGGGCCATCATCATCACCACCACGGCGGCCGCTCAAAAGGTAAACGCTT GTTCCGTGGTAAAGTACCGATCTTAGTGGAGCTCAAAGGGGATGTGAATG GCCATAAGTTCTCGGTTCGTGGCAAAGGTAAGGGAGATGCGACGCGCGGC AAATTAACGCTGAAATTCATTTGTACTACAGGTAAACTGCCGGTGCCATG GCCTACTCTCGTCACCACGTTGACCTATGGGGTTCAATGCTTCAGCCGGT ACCCTAAACACATGAACCGCCACGATTTCTTCAAATCGGCGATGCCAAAG GGGTATGTCCAGGAACGCACTATCAGCTTCAAAAAAGACGGTAAGTATAA AACTCGTGCTGAAGTTAAATTCGAAGGACGCACACTGGTAAATCGCATTA AATTGAAGGGGCGCGACTTTAAGGAAAAAGGTAATATCTTAGGTCACAAA TTGCGCTACAACTTCAACTCTCATAAAGTTTACATTACAGCAGATAAGCG TAAAAATGGCATCAAAGCGAAATTCAAAATTCGTCACAATGTGAAAGATG GTAGCGTGCAATTAGCCGATCATTACCAGCAGAATACGCCGATCGGTCGC GGCCCAGTACTGTTGCCGCGCAAACATTACTTATCTACCCGGAGTAAACT GTCTAAAGACCCAAAAGAGAAGCGCGACCATATGGTTCTCCTGGAGTTTG TCACCGCCGCCGGAATTAAACACGGCCGCAAAGAGCGCTATAAATAG GFP-POS49 (SEQIDNO:13) ATGGGCCACCATCATCATCACCACGGGGGACGCTCTAAAGGTAAACGTCT GTTTCGTGGAAAGGTGCCCATTCTGGTTAAACTCAAAGGTGATGTCAACG GCCATAAGTTTTCGGTTCGTGGCAAAGGTAAAGGTGATGCGACGCGCGGG AAATTAACACTGAAATTTATTTGCACAACCGGAAAACTCCCTGTGCCGTG GCCGACTTTGGTGACCACATTAACCTATGGTGTTCAATGCTTCTCACGTT ATCCGAAGCATATGAAACGTCATGATTTTTTCAAATCGGCTATGCCGAAA GGTTACGTCCAGGAGCGCACCATCTCATTTAAGAAAGACGGTAAGTATAA AACCCGTGCTGAAGTAAAATTCAAAGGACGCACCCTGGTGAATCGCATTA AACTGAAAGGTCGTGATTTCAAAGAAAAGGGAAATATTTTAGGGCATAAG CTCCGTTATAATTTTAACAGTCATAAGGTGTATATTACCGCTGATAAACG CAAAAACGGAATCAAAGCGAAATTTAAGATCCGTCATAATGTAAAAGATG GCTCAGTCCAACTGGCAAAACATTACCAGCAGAATACCCCGATCGGCCGC GGTCCTGTGCTTCTGCCGCGTAAACACTACTTGTCGACCCGGTCAAAATT GAGTAAAGATCCGAAGGAAAAGCGTGATCACATGGTCTTGAAGGAATTTG TAACTGCAGCAGGTATTAAACACGGGCGCAAAGAACGTTACAAATAG
(23) Polynucleotide sequence homologous to the above sequences are also within the scope of the present invention. In certain embodiments, the polynucleotide sequence include a stretch of 50, 100, or 150 nucleotides that are 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 100% homologous to any one of the above sequence. The present invention also includes sequence where one or more nucleotides is inserted or deleted from one of the above sequences. Any polynucleotide sequence with a mutation as shown in any of the sequences above is considered part of the invention. In certain embodiments, the sequence includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations as shown in any of the sequences above.
(24) The present invention also provides vector (e.g., plasmids, cosmids, viruses, etc.) that comprise any of the inventive sequences herein or any other sequence (DNA or protein) modified using the inventive system. In certain embodiments, the vector includes elements such as promoter, enhancer, ribosomal binding sites, etc. sequences useful in overexpressing the inventive GFP variant in a cell. The invention also includes cells comprising the inventive sequences or vectors. In certain embodiments, the cells overexpress the variant GFP. The cells may be bacterial cells (e.g., E. coli), fungal cells (e.g., P. pastoris), yeast cells (e.g., S. cerevisiae), mammalian cells (e.g., CHO cells), or human cells.
(25) The inventive system has been used to created variants of streptavidin. These variants have been shown to form soluble tetramers that bind biotin. The amino acid sequence of this wild type streptavidin is as follows:
(26) TABLE-US-00004 (SEQIDNO:28) AAEAGITGTWYNQLGSTFIVTAGADGALTGTYESAVGNAESRYVLTGRYD SAPATDGSGTALGWTVAWKNNYRNAHSATTWSGQYVGGAEARINTQWLLT SGTTEANAWKSTLVGHDTFTKVKPSAAS
Wild type streptavidin has a theoretical net charge of 4. Using the inventive system, variants with a theoretical net charge of 40 and 52 have been created. Even after heating the variants to 100 C., the proteins remained soluble.
(27) The amino acid sequences of the variants of streptavidin that have been created include:
(28) TABLE-US-00005 SAV-NEG40 (SEQIDNO:29) MGHHHHHHGGAEAGITGTWYNQLGSTFIVTAGADGALTGTYESAVGDAES EYVLTGRYDSAPATDGSGTALGWTVAWKNDYENAHSATTWSGQYVGGAEA RINTQWLLTSGTTEADAWKSTLVGHDTFTKVEPSAAS SAV-POS52 (SEQIDNO:30) MGHHHHHHGGAKAGITGTWYNQLGSTFIVTAGAKGALTGTYESAVGNAKS RYVLTGRYDSAPATKGSGTALGWTVAWKNKYRNAHSATTWSGQYVGGAKA RINTQWLLTSGTTKAKAWKSTLVGHDTFTKVKPSAAS
As would be appreciated by one of skill in the art, homologous proteins are also considered to be within the scope of this invention. For example, any protein that includes a stretch of 20, 30, 40, 50, or 100 amino acids which are 60%, 70%, 80%, 90%, 95%, or 100% homologous to any of the above sequences is considered part of the invention. In addition, addition and deletion variants are also contemplated by the invention. In certain embodiments, any streptavidin with a mutated residue as shown in any of the above sequences is considered part of the invention. In certain embodiments, the sequence includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations as shown in any of the sequences above.
(29) Any DNA sequence that encodes the above streptavidin variants is also included within the scope of the invention. Exemplary DNA sequences which encode each of the variants above are as follows:
(30) TABLE-US-00006 SAV-NEG40 (SEQIDNO:31) GGTTCAGCCATGGGTCATCACCACCACCATCACGGTGGCGCCGAAGCAGG TATTACCGGTACCTGGTATAACCAGTTAGGCTCAACCTTTATTGTGACCG CGGGAGCGGACGGCGCCTTAACCGGTACCTACGAATCAGCTGTAGGTGAC GCGGAATCAGAGTACGTATTAACCGGTCGTTATGATAGCGCGCCGGCGAC TGACGGTAGCGGTACTGCTTTAGGTTGGACCGTAGCGTGGAAGAATGATT ATGAAAACGCACATAGCGCAACAACGTGGTCAGGGCAGTACGTTGGCGGA GCTGAGGCGCGCATTAACACGCAGTGGTTATTAACTAGCGGCACCACTGA AGCTGATGCCTGGAAGAGCACGTTAGTGGGTCATGATACCTTCACTAAAG TGGAACCTTCAGCTGCGTCATAATAATGACTCGAGACCTGCA SAV-POS52 (SEQIDNO:32) GGTTCAGCCATGGGTCATCACCACCACCATCACGGTGGCGCCAAAGCAGG TATTACCGGTACCTGGTATAACCAGTTAGGCTCAACCTTTATTGTGACCG CGGGAGCGAAAGGCGCCTTAACCGGTACCTACGAATCAGCTGTAGGAAAC GCAAAATCACGCTACGTATTAACCGGTCGTTATGATAGCGCGCCGGCGAC TAAAGGTAGCGGTACTGCTTTAGGTTGGACCGTAGCGTGGAAGAATAAGT ATCGTAATGCGCACAGTGCTACCACTTGGTCAGGGCAGTACGTAGGGGGA GCCAAAGCACGTATCAACACGCAGTGGTTATTAACATCAGGTACCACCAA AGCGAAAGCCTGGAAGAGCACGTTAGTGGGTCATGATACCTTCACTAAAG TGAAACCTTCAGCTGCGTCATAATAATGACTCGAGACCTGCA
(31) Polynucleotide sequence homologous to the above sequences are also within the scope of the present invention. In certain embodiments, the polynucleotide sequence include a stretch of 50, 100, or 150 nucleotides that are 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 100% homologous to any one of the above sequence. The present invention also includes sequence where one or more nucleotides is inserted or deleted from one of the above sequences. Any polynucleotide sequence with a mutation as shown in any of the sequences above is considered part of the invention. In certain embodiments, the sequence includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations as shown in any of the sequences above.
(32) The present invention also provides vector (e.g., plasmids, cosmids, viruses, etc.) that comprise any of the inventive sequences herein or any other sequence (DNA or protein) modified using the inventive system. In certain embodiments, the vector includes elements such as promoter, enhancer, ribosomal binding sites, etc. sequences useful in overexpressing the inventive streptavidin variant in a cell. The invention also includes cells comprising the inventive sequences or vectors. In certain embodiments, the cells overexpress the variant streptavidin. The cells may be bacterial cells (e.g., E. coli), fungal cells (e.g., P. pastoris), yeast cells (e.g., S. cerevisiae), mammalian cells (e.g., CHO cells), or human cells.
(33) The inventive system has been used to created variants of glutathione-S-transferase (GST). These variants have been shown to retain the catalytic activity of wild type GST. The amino acid sequence of this wild type GST is as follows:
(34) TABLE-US-00007 (SEQIDNO:33) MGHHHHHHGGPPYTITYFPVRGRCEAMRMLLADQDQSWKEEVVTMETWPP LKPSCLFRQLPKFQDGDLTLYQSNAILRHLGRSFGLYGKDQKEAALVDMV NDGVEDLRCKYATLIYTNYEAGKEKYVKELPEHLKPFETLLSQNQGGQAF VVGSQISFADYNLLDLLRIHQVLNPSCLDAFPLLSAYVARLSARPKIKAF LASPEHVNRPINGNGKQ
Wild type GST has a theoretical net charge of +2. Using the inventive system, a variant with a theoretical net charge of 40 has been created. This variant catalyzes the addition of glutathione to chloronitrobenzene with a specific activity only 2.7-fold lower than that of wild type GST. Even after heating the variant to 100 C., the protein remained soluble, and the protein recovered 40% of its catalytic activity upon cooling.
(35) The amino acid sequences of variants of GST include:
(36) TABLE-US-00008 GST-NEG40 (SEQIDNO:34) MGHHHHHHGGPPYTITYFPVRGRCEAMRMLLADQDQSWEEEVVTMETWPP LKPSCLFRQLPKFQDGDLTLYQSNAILRHLGRSFGLYGEDEEEAALVDMV NDGVEDLRCKYATLIYTDYEAGKEEYVEELPEHLKPFETLLSENEGGEAF VVGSEISFADYNLLDLLRIHQVLNPSCLDAFPLLSAYVARLSARPEIEAF LASPEHVDRPINGNGKQ GST-POS50 (SEQIDNO:35) MGHHHHHHGGPPYTITYFPVRGRCEAMRMLLADQKQSWKEEVVTMKTWPP LKPSCLFRQLPKFQDGKLTLYQSNAILRHLGRSFGLYGKKQKEAALVDMV NDGVEDLRCKYATLIYTKYKAGKKKYVKKLPKHLKPFETLLSKNKGGKAF VVGSKISFADYNLLDLLRIHQVLNPSCLKAFPLLSAYVARLSARPKIKAF LASPEHVKRPINGNGKQ
As would be appreciated by one of skill in the art, homologous proteins are also considered to be within the scope of this invention. For example, any protein that includes a stretch of 20, 30, 40, 50, or 100 amino acids which are 60%, 70%, 80%, 90%, 95%, or 100% homologous to any of the above sequences is considered part of the invention. In addition, addition and deletion variants are also contemplated by the invention. In certain embodiments, any streptavidin with a mutated residue as shown in any of the above sequences is considered part of the invention. In certain embodiments, the sequence includes 2, 3, 4, 5, 6, 7, 8, 9, 10, or more mutations as shown in any of the sequences above.
(37) Any DNA sequence that encodes the above GST variants is also included within the scope of the invention. Exemplary DNA sequences which encode each of the variants above are as follows:
(38) TABLE-US-00009 GST-NEG40 (SEQIDNO:36) GGTTCAGCCATGGGTCATCACCACCACCATCACGGTGGCCCGCCGTACAC CATTACATACTTTCCGGTACGTGGTCGTTGTGAAGCGATGCGTATGTTAT TAGCGGACCAGGACCAATCATGGGAAGAAGAAGTAGTGACAATGGAAACC TGGCCGCCGTTAAAGCCTAGCTGTTTATTCCGTCAATTACCGAAGTTTCA GGATGGTGATTTAACCTTATACCAGTCTAACGCGATCTTACGTCATTTAG GTCGCTCATTTGGTTTATACGGTGAAGATGAAGAAGAAGCAGCCTTAGTG GATATGGTGAATGATGGCGTGGAAGACTTACGTTGTAAATACGCGACGTT AATTTACACTGATTATGAAGCCGGTAAAGAGGAGTACGTGGAAGAATTAC CTGAACACCTGAAGCCGTTTGAAACATTACTGAGCGAAAATGAAGGAGGT GAGGCGTTCGTAGTTGGTAGCGAAATTAGCTTCGCTGATTATAACTTATT AGACTTATTACGCATTCACCAGGTTTTAAATCCTAGCTGTTTAGACGCTT TCCCGTTACTGAGCGCATATGTAGCGCGCCTGAGCGCCCGTCCGGAAATT GAAGCTTTCTTAGCGTCACCTGAACACGTAGACCGCCCGATTAACGGAAA CGGCAAGCAGTAATAATGAGGTACCACCTGCA GST-POS50 (SEQIDNO:37) GGTTCAGCCATGGGTCATCACCACCACCATCACGGTGGCCCGCCGTACAC CATTACATACTTTCCGGTACGTGGTCGTTGTGAAGCGATGCGTATGTTAT TAGCGGACCAGAAACAATCATGGAAAGAAGAAGTAGTGACAATGAAGACC TGGCCGCCGTTAAAGCCTAGCTGTTTATTCCGTCAATTACCGAAGTTTCA GGATGGTAAATTAACCTTATACCAGTCTAACGCGATCTTACGTCATTTAG GTCGCTCATTTGGTTTATACGGTAAGAAGCAGAAAGAAGCAGCCTTAGTG GATATGGTGAATGATGGCGTGGAAGACTTACGTTGTAAATACGCGACGTT AATTTACACTAAATATAAAGCCGGTAAAAAGAAGTACGTGAAAAAATTAC CTAAACACCTGAAGCCGTTTGAAACATTACTGAGCAAAAATAAAGGAGGT AAGGCGTTCGTAGTTGGTAGCAAGATTAGCTTCGCTGATTATAACTTATT AGACTTATTACGCATTCACCAGGTTTTAAATCCTAGCTGTTTAAAGGCTT TCCCGTTACTGAGCGCATATGTAGCGCGCCTGAGCGCCCGTCCGAAGATC AAAGCTTTCTTAGCGTCACCTGAACACGTGAAGCGCCCGATTAACGGAAA CGGCAAGCAGTAATAATGAGGTACCACCTGCA
(39) The present invention also provides vector (e.g., plasmids, cosmids, viruses, etc.) that comprise any of the inventive sequences herein or any other sequence (DNA or protein) modified using the inventive system. In certain embodiments, the vector includes elements such as promoter, enhancer, ribosomal binding sites, etc. sequences useful in overexpressing the inventive GST variant in a cell. The invention also includes cells comprising the inventive sequences or vectors. In certain embodiments, the cells overexpress the variant GST. The cells may be bacterial cells (e.g., E. coli), fungal cells (e.g., P. pastoris), yeast cells (e.g., S. cerevisiae), mammalian cells (e.g., CHO cells), or human cells.
(40) The present invention also includes kits for modifying proteins of interest to produce more stable variants of the protein. These kits typically include all or most of the reagents needed create a more stable variant of a protein. In certain embodiments, the kit includes computer software to aid a researcher in designing the more stable variant protein based on the inventive method. The kit may also include all of some of the following: reagents, primers, oligonucleotides, nucleotides, enzymes, buffers, cells, media, plates, tubes, instructions, vectors, etc. The research using the kit typically provides the DNA sequence for mutating to create the more stable variant. The contents are typically packaged for convenience use in a laboratory.
(41) These and other aspects of the present invention will be further appreciated upon consideration of the following Examples, which are intended to illustrate certain particular embodiments of the invention but are not intended to limit its scope, as defined by the claims.
EXAMPLES
Example 1Supercharging Proteins can Impart Extraordinary Resilience
(42) Protein aggregation, a well known culprit in human disease (Cohen, F. E.; Kelly, J. W., Nature 2003, 426, (6968), 905-9; Chiti, F.; Dobson, C. M., Annu Rev Biochern 2006, 75, 333-66; each of which is incorporated herein by reference), is also a major problem facing the use of proteins as therapeutic or diagnostic agents (Frokjaer, S.; Otzen, D. E., Nat Rev Drug Discov 2005, 4, (4), 298-306; Fowler, S. B.; Poon, S.; Muff, R.; Chiti, F.; Dobson, C. M.; Zurdo, J., Proc Natl Acad Sci USA 2005, 102, (29), 10105-10; each of which is incorporated herein by reference). Insights into the protein aggregation problem have been garnered from the study of natural proteins. It has been known for some time that proteins are least soluble at their isoelectric point, where they bear a net charge of zero (Loeb, J., J Gen Physiol 1921, 4, 547-555; incorporated herein by reference). More recently, small differences in net charge (3 charge units) have been shown to predict aggregation tendencies among variants of a globular protein (Chiti, F.; Stefani, M.; Taddei, N.; Ramponi, G.; Dobson, C. M., Nature 2003, 424, (6950), 805-8; incorporated herein by reference), and also among intrinsically disordered peptides (Pawar, A. P.; Dubay, K. F.; Zurdo, J.; Chiti, F.; Vendruscolo, M.; Dobson, C. M., J Mol Biol 2005, 350, (2), 379-92; incorporated herein by reference). Together with recent evidence that some proteins can tolerate significant changes in net charge (for example, the finding that carbonic anhydrase retains catalytic activity after exhaustive chemical acetylation of its surface lysines (Gudiksen et al., J Am Chem Soc 2005, 127, (13), 4707-14; incorporated herein by reference)), these observations led us to conclude that the solubility and aggregation resistance of some proteins might be significantly enhanced, without abolishing their folding or function, by extensively mutating their surfaces to dramatically increase their net charge, a process we refer to herein as supercharging.
(43) We began with a recently reported state-of-the-art variant of green fluorescent protein (GFP) called superfolder GFP (sfGFP), which has been highly optimized for folding efficiency and resistance to denaturants (Pedelacq et al., Nat Biotechnol 2006, 24, (1), 79-88; incorporated herein by reference). Superfolder GFP has a net charge of 7, similar to that of wild-type GFP. Guided by a simple algorithm to calculate solvent exposure of amino acids (see Materials and Methods), we designed a supercharged variant of GFP having a theoretical net charge of +36 by mutating 29 of its most solvent-exposed residues to positively charged amino acids (
(44) Although sfGFP is the product of a long history of GFP optimization (Giepmans et al., Science 2006, 312, (5771), 217-24; incorporated herein by reference), it remains susceptible to aggregation induced by thermal or chemical unfolding. Heating sfGFP to 100 C. induced its quantitative precipitation and the irreversible loss of fluorescence (
(45) In addition to this remarkable aggregation resistance, supercharged GFP variants show a strong, reversible avidity for highly charged macromolecules of the opposite charge (
(46) We next sought to determine whether the supercharging principle could apply to proteins other than GFP, which is monomeric and has a well-shielded fluorophore. To this end, we applied the supercharging process to two proteins unrelated to GFP. Streptavidin is a tetramer with a total net charge of 4. Using the solvent-exposure algorithm, we designed two supercharged streptavidin variants with net charges of 40 or +52. Both supercharged streptavidin variants were capable of forming soluble tetramers that bind biotin, albeit with reduced affinity.
(47) Glutathione-S-transferase (GST), a dimer with a total net charge of +2, was supercharged to yield a dimer with net charge of 40 that catalyzed the addition of glutathione to chlorodinitrobenzene with a specific activity only 2.7-fold lower than that of wild-type GST (
(48) In summary, we have demonstrated that monomeric and multimeric proteins of varying structures and functions can be supercharged by simply replacing their most solvent-exposed residues with like-charged amino acids. Supercharging profoundly alters the intermolecular properties of proteins, imparting remarkable aggregation resistance and the ability to associate in folded form with oppositely charged macromolecules like molecular Velcro. We note that these unusual intermolecular properties arise from high net charge, rather than from the total number of charged amino acids, which was not significantly changed by the supercharging process (Table 1).
(49) In contrast to these dramatic intermolecular effects, the intramolecular properties of the seven supercharged proteins studied here, including folding, fluorescence, ligand binding, and enzymatic catalysis, remained largely intact. Supercharging therefore may represent a useful approach for reducing the aggregation tendency and improving the solubility of proteins without abolishing their function. These principles may be particularly useful in de novo protein design efforts, where unpredictable protein handling properties including aggregation remain a significant challenge. In light of the above results of supercharging natural proteins, it is tempting to speculate that the aggregation resistance of designed proteins could also be improved by biasing the design process to increase the frequency of like-charged amino acids at positions predicted to lie on the outside of the folded protein.
(50) Protein supercharging illustrates the remarkable plasticity of protein surfaces and highlights the opportunities that arise from the mutational tolerance of solvent-exposed residues. For example, it was recently shown that the thermodynamic stability of some proteins can be enhanced by rationally engineering charge-charge interactions (Strickler et al., Biochemistry 2006, 45, (9), 2761-6; incorporated herein by reference). Protein supercharging demonstrates how this plasticity can be exploited in a different way to impart extraordinary resistance to protein aggregation. Our findings are consistent with the results of a complementary study in which removal of all charges from ubiquitin left its folding intact but significantly impaired its solubility (Loladze et al, Protein Sci 2002, 11, (1), 174-7; incorporated herein by reference).
(51) These observations may also illuminate the modest net-charge distribution of natural proteins (Knight et al., Proc Natl Acad Sci USA 2004, 101, (22), 8390-5; Gitlin et al., Angew Chem Int Ed Engl 2006, 45, (19), 3022-60; each of which is incorporated herein by reference): the net charge of 84% of Protein Data Bank (PDB) polypeptides, for example, falls within 10. Our results argue against the hypothesis that high net charge creates sufficient electrostatic repulsion to force unfolding. Indeed, GFP(+48) has a higher positive net charge than any polypeptide currently in the PDB, yet retains the ability to fold and fluoresce. Instead, our findings suggest that nonspecific intermolecular adhesions may have disfavored the evolution of too many highly charged natural proteins. Almost all natural proteins with very high net charge, such as ribosomal proteins L3 (+36) and L15 (+44), which bind RNA, or calsequestrin (80), which binds calcium cations, associate with oppositely charged species as part of their essential cellular functions.
(52) Materials and Methods
(53) Design Procedure and Supercharged Protein Sequences.
(54) Solvent-exposed residues (shown in grey below) were identified from published structural data (Weber, P. C., Ohlendorf, D. H., Wendoloski, J. J. & Salemme, F. R. Structural origins of high-affinity biotin binding to streptavidin. Science 243, 85-88 (1989); Dirr, H., Reinemer, P. & Huber, R. Refined crystal structure of porcine class Pi glutathione S-transferase (pGST P1-1) at 2.1 A resolution. J Mol Biol 243, 72-92 (1994); Pedelacq, J. D., Cabantous, S., Tran, T., Terwilliger, T. C. & Waldo, G. S. Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol 24, 79-88 (2006); each of which is incorporated herein by reference) as those having AvNAPSA<150, where AvNAPSA is average neighbor atoms (within 10 ) per sidechain atom. Charged or highly polar solvent-exposed residues (D, E, R, K, N, and Q) were mutated either to Asp or Glu, for negative-supercharging (red); or to Lys or Arg, for positive-supercharging (blue). Additional surface-exposed positions to mutate in green fluorescent protein (GFP) variants were chosen on the basis of sequence variability at these positions among GFP homologues. The supercharging design process for streptavidin (SAV) and glutathione-S-transferase (GST) was fully automated: residues were first sorted by solvent exposure, and then the most solvent-exposed charged or highly polar residues were mutated either to Lys for positive supercharging, or to Glu (unless the starting residue was Asn, in which case to Asp) for negative supercharging.
(55) TABLE-US-00010
(56) Protein Expression and Purification.
(57) Synthetic genes optimized for E. coli codon usage were purchased from DNA 2.0, cloned into a pET expression vector (Novagen), and overexpressed in E. coli BL21(DE3)pLysS for 5-10 hours at 15 C. Cells were harvested by centrifugation and lysed by sonication. Proteins were purified by Ni-NTA agarose chrornotography (Qiagen), buffer-exchanged into 100 mM NaCl, 50 mM potassium phosphate pH 7.5, and concentrated by ultrafiltration (Millipore). All GFP variants were purified under native conditions. Wild-type streptavidin was purchased from Promega. Supercharged streptavidin variants were purified under denaturing conditions and refolded as reported previously for wild-type streptavidin (Thompson et al. Construction and expression of a synthetic streptavidin-encoding gene in Escherichia coli. Gene 136, 243-246 (1993); incorporated herein by reference), as was supercharged GST. Wild-type GST was purified under either native or denaturing conditions, yielding protein of comparable activity.
(58) Electrostatic Surface Potential Calculations (
(59) Models of 30 and +48 supercharged GFP variants were based on the crystal structure of superfolder GFP (Pedelacq et al., Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol 24, 79-88 (2006); incorporated herein by reference). Electrostatic potentials were calculated using APBS (Baker et al., Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl Acad Sci USA 98, 10037-10041 (2001); incorporated herein by reference) and rendered with PyMol (Delano, W. L., The PyMOL Molecular Graphics System, www.pymol.org (2002); incorporated herein by reference) using a scale of 25 kT/e (red) to +25 kT/e (blue).
(60) Protein Staining and UV-Induced Fluorescence (
(61) 0.2 g of each GFP variant was analyzed by electrophoresis in a 10% denaturing polyacrylamide gel and stained with Coomassie brilliant blue dye. 0.2 g of the same protein samples in 25 mM Tris pH 8.0 with 100 mM NaCl was placed in a 0.2 mL Eppendorf tube and photographed under UV light (360 nm).
(62) Thermal Denaturation and Aggregation (
(63) Purified GFP variants were diluted to 2 mg/mL in 25 mM Tris pH 8.0, 100 mM NaCl, and 10 mM beta-mercaptoethanol (BME), then photographed under UV illumination (native). The samples were heated to 100 C. for 1 minute, then photographed again under UV illumination (boiled). Finally, the samples were cooled 2 h at room temperature and photographed again under UV illumination (cooled).
(64) Chemically Induced Aggregation (
(65) 2,2,2-trifluoroethanol (TFE) was added to produce solutions with 1.5 mg/mL protein, 25 mM Iris pH 7.0, 10 mM BME, and 40% TFE. Aggregation at 25 C. was monitored by right-angle light scattering.
(66) Size-Exclusion Chromotography (Table 1).
(67) The multimeric state of SAV and GST variants was determined by analyzing 20-50 g of protein on a Superdex 75 gel-filtration column. Buffer was 100 mM NaCl, 50 mM potassium phosphate pH 7.5. Molecular weights were determined by comparison with a set of monomeric protein standards of known molecular weights analyzed separately under identical conditions.
(68) TABLE-US-00011 TABLE 1 Calculated and experimentally determined protein properties. name MW (kD) length (aa) n.sub.pos n.sub.neg n.sub.charged Q.sub.net pl G (kcal/mol).sup.a native MW (kD).sup.b % soluble after boiling.sup.c GFP (30) 27.8 248 19 49 68 30 4.8 10.2 n.d. 98 GFP (25) 27.8 248 21 46 67 25 5.0 n.d n.d. n.d. sfGFP 27.8 248 27 34 61 7 6.6 11.2 n.d. 4 GFP (+36) 28.5 248 56 20 76 +36 10.4 8.8 n.d. 97 GFP (+48) 28.6 248 63 15 78 +48 10.8 7.1 n.d. n.d. SAV (40) 14.3 137 5 15 20 10 5.1 n.d. 55 5 (tetramer) 99 wtSAV 13.3 128 8 9 17 1 6.5 n.d. 50 5 (tetramer) 7 SAV (+52) 14.5 137 16 3 19 +13 10.3 n.d. 55 5 (tetramer) 97 GST (40) 24.7 217 17 37 54 20 4.8 n.d. 50 5 (dimer) 96 wtGST 24.6 217 24 23 47 +1 7.9 n.d. 50 5 (dimer) 3 GST (+50).sup.d 24.7 217 39 14 53 +25 10.0 n.d. n.d. n.d. n.sub.pos, number of positively charged amino acids (per monomer) n.sub.neg, number of negatively charged amino acids n.sub.charged, total number of charged amino acids Q.sub.net, theroretical net charge at neutral pH pI, calculated isoelectric point n.d., not determined .sup.ameasured by guanidinium denaturation (FIG. 2c). .sup.bmeasured by size-exclusion chromatography. .sup.cpercent protein remaining in supernatant after 5 min at 100 C., cooling to 25 C., and brief centrifugation. .sup.dprotein failed to express in E. coli.
Other Embodiments
(69) Those of ordinary skill in the art will readily appreciate that the foregoing represents merely certain preferred embodiments of the invention. Various changes and modifications to the procedures and compositions described above can be made without departing from the spirit or scope of the present invention, as set forth in the following claims.