Computer assisted antibody re-epitoping
11468969 · 2022-10-11
Assignee
Inventors
- Yanay OFRAN (Tel Aviv, IL)
- Guy NIMROD (Tel Aviv, IL)
- Sharon Fischman (Modiein, IL)
- Asael Herman (Nes Tziona, IL)
Cpc classification
G16B15/00
PHYSICS
G16B35/00
PHYSICS
C40B10/00
CHEMISTRY; METALLURGY
G16B15/30
PHYSICS
G16B20/00
PHYSICS
International classification
G16B20/00
PHYSICS
G16B15/00
PHYSICS
G16B35/00
PHYSICS
C40B10/00
CHEMISTRY; METALLURGY
G16B15/30
PHYSICS
Abstract
The present invention is directed to a method for generating a library of antigen binding molecules for screening for binding to an epitope of interest, said method comprising: a. selecting a template antigen-binding molecule from a set of possible template antigen binding molecules wherein said selected template does not specifically bind the epitope of interest but is known to specifically bind another epitope; b. selecting at least one residue position in said template antigen-binding molecule for mutation; and c. selecting at least one variant residue to substitute at the at least one residue position selected in b; such that a library containing a plurality of variants of said template is generated.
Claims
1. A method for generating a library of variant antigen binding molecules for screening for binding to an epitope of interest, said method comprising: (a) analyzing binding affinities of a set of possible template antigen binding molecules for the epitope of interest and selecting sequence data of a template antigen-binding molecule from said set of possible template antigen binding molecules, wherein said selected template does not bind the epitope of interest and wherein said set of possible template antigen-binding molecules consists of a plurality of known antibodies that do not bind the epitope of interest, wherein said selecting sequence data comprises screening three-dimensional structures or structural models of the set of possible template antigen-binding molecules based on one or more of the following criteria: shape complementarity to the epitope of interest, physico-chemical complementarity to the epitope of interest and the predicted free energy of the interaction with the epitope of interest; (b) selecting at least one residue position in said selected sequence data of the template antigen-binding molecule of (a) for mutation, wherein said selecting said at least one position comprises (i) screening the three dimensional structure and/or a three-dimensional model of the template antigen-binding molecule sequence data selected in (a) to identify residues that contribute to binding to the epitope of interest; or (ii) conducting multiple sequence alignments of the nucleic acid or the amino acid sequence of the template antigen-binding molecule selected in (a) to identify substitutable positions; (c) selecting at least one variant residue to substitute at the at least one residue position selected in the sequence date selected in (b), wherein said selecting the at least one variant residue comprises for each residue identified in step (b), identifying one or more amino acid substitutions that are preferred, allowed or neutral at that residue position; (d) substituting at least one variant residue at the at least one residue position selected in (c) in said sequence data; (e) generating sequence data of one or more variant antigen binding molecules and synthesizing one or more variant antigen binding molecules of said template antigen-binding molecule, wherein said sequence data of the variant antigen binding molecules comprises at least one substitution identified and substituted in the sequence data of (d); and (f) screening said variant antigen binding molecules synthesized, said molecules comprising antibodies or antigen-binding fragments thereof, for binding to the epitope of interest, and selecting variant antigen binding molecules that bind the epitope of interest; wherein said method generates a library of variant antigen-binding molecules comprising said substituted sequence data of said template antigen-binding molecule for screening for binding to the epitope of interest.
2. The method of claim 1, wherein said preferred, allowed and/or neutral substitutions in the sequence data are determined by analyzing the amino acid sequences of a plurality of known antibodies compared with the sequence data of the template antigen-binding molecule, wherein analyzing comprises comparison of said amino acid sequences, analysis of composition of said amino acid sequences, analysis of ΔΔG of binding energy, probability of mutation from said germline sequence, or sequence-similarity search algorithms, or any combination thereof.
3. The method of claim 2, wherein said substitutions in the sequence data comprise a position prone to somatic hypermutation; a position selected based on sequence analysis using multi-sequence alignment to the template, wherein said analysis is by machine learning; or a position based on structural analysis, wherein said analysis is by machine learning; or a combination thereof.
4. The method of claim 1, wherein said at least one residue selected for mutation is in a CDR region of the sequence data of the template.
5. The method of claim 1, wherein the at least one variant residue selected for mutation comprises residues that is in less than all of the CDRs.
6. The method of claim 1 wherein said method is computer implemented.
Description
BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES
(1)
(2)
(3)
(4)
(5)
(6)
(7) The structural regions are marked as follows: Ag interface marked as *, VH-VL interface marked as Δ, both interfaces marked as #, and ABRs that are not in interfaces are marked as o.
(8)
(9)
(10)
is the frequency of a specific amino acid in the germ-line sequences of the group. mutations in group is the number of mutations in the group. Standard errors are presented by the error bars
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
DETAILED DESCRIPTION OF THE INVENTION
Definitions
(26) As used herein, the term “antigen binding molecule” refers in its broadest sense to a molecule that specifically binds an antigenic determinant. An antigen binding molecule can be, for example, an antibody or a fragment thereof that specifically binds to an antigenic determinant. By “specifically binds” is meant that the binding is selective for the antigen of interest and can be discriminated from unwanted or nonspecific interactions.
(27) As used herein, the term “antibody” is intended to include whole antibody molecules, including monoclonal, polyclonal and multispecific (e.g., bispecific) antibodies, Also encompassed are antibody fragments that retain binding specificity including, but not limited to, VH fragments, VL fragments, Fab fragments, F(ab′).sub.2 fragments, scFv fragments, Fv fragments, minibodies, diabodies, triabodies, and tetrabodies (see, e.g., Hudson and Souriau, Nature Med. 9: 129-134 (2003) (hereby incorporated by reference in their entirety)). Also encompassed are humanized, primatized and chimeric antibodies.
(28) As used herein, the term “variant” refers to a polypeptide differing from a specifically recited polypeptide of the invention by amino acid insertions, deletions, and/or substitutions, created using, e.g., recombinant DNA techniques. Variants of the antigen binding molecules of the present invention include antigen binding molecules wherein one or several of the amino acid residues are modified by substitution, addition and/or deletion in such manner that does not substantially affect antigen binding affinity (that is, the affinity remains within one order of magnitude of the affinity of another variant). Guidance in determining which amino acid residues may be replaced, added or deleted without abolishing activities of interest, may be found by comparing the sequence of the particular polypeptide with that of homologous peptides and minimizing the number of amino acid sequence changes made in regions of high homology (conserved regions) or by replacing amino acids with consensus sequence amino acids.
(29) As used herein, “shape complementarity” means the 3D shapes, either as detected experimentally or through homology modeling or through de-novo modeling, of the interacting surfaces fit each other without clashes or steric hindrances.
(30) As used herein, “physico-chemical complementarity” means alignments of complementary charges, pi-pi interactions, donors and/or acceptors of H-bonds and any other molecular interactions that stabilize the complex.
(31) As used herein, “substitutable positions” means positions in the antibody that, according to sequence and structure analysis, may be substituted without compromising the structure, expression stability or other characteristics of the antibody other than what it can bind.
(32) As used herein, “preferred substitution” means that variability in a given position occurs more than expected by chance when comparing similar sequences.
(33) As used herein, “neutral substitution” means that variability in a given position occurs as expected by chance when comparing similar sequences.
(34) As used herein, “allowed substitution” means that variability in a given position occurs less than expected by chance when comparing similar sequences.
(35) As used herein, “enriched residues” and “depleted residues” are determined as follows: The propensity of each amino acid in a given position in the original library determines the expected distribution of amino acids in this position, assuming that the position does not affect binding. After one or more rounds of selection, the observed propensities of amino acids in that position are recorded. If, by a predefined statistic, e.g. measuring the observed frequency compared to expected frequency using a measure such as log-odds, a certain amino acid is observed significantly more frequently than expected under the null hypothesis, then the amino acid is said to be enriched in that position. If it appears significantly less, it is said to be depleted.
(36) Protein-protein docking is a computational method used to predict the structure of macromolecular complexes by orienting the three dimensional structures of two binding partners relative to each other, a goal of which is to accurately model the binding interface. A variety of algorithms can be utilized to sample the rotational and translational search space, including Fast Fourier Transform (Comeau, S. R., et al., ClusPro: a fully automated algorithm for protein-protein docking: Nucleic Acids Res, v. 32, p. W96-9 (2004); Ohue, M., et al., MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data: Protein Pept Lett, v. 21, p. 766-78 (2014); Tovchigrechko, A., and I. A. Vakser, GRAMM-X public web server for protein-protein docking: Nucleic Acids Res, v. 34, p. W310-4 (2006)) (each of which is hereby incorporated by reference in its entirety), geometric hashing (Schneidman-Duhovny, D., et al., PatchDock and SymmDock: servers for rigid and symmetric docking: Nucleic Acids Res, v. 33, p. W363-7 (2005)) (hereby incorporated by reference in its entirety), Spherical polar Fourier (Ritchie, D. W., and V. Venkatraman, 2010, Ultra-fast FFT protein docking on graphics processors: Bioinformatics, v. 26, p. 2398-405) (hereby incorporated by reference in its entirety) Monte Carlo Search (Gray, J. J., et al. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations: J Mol Biol, v. 331, p. 281-99 (2003); Huang, S. Y., Search strategies and evaluation in protein-protein docking: principles, advances and challenges: Drug Discov Today, v. 19, p. 1081-1096 (2014)) (each of which is hereby incorporated by reference in its entirety). The key to successful protein-protein docking is the ability to select native or near-native structures from the thousands of docking poses the search algorithm generates, which is not a trivial challenge (Huang, S. Y., Search strategies and evaluation in protein-protein docking: principles, advances and challenges: Drug Discov Today, v. 19, p. 1081-1096 (2014); Moal, I. H., et al., Scoring functions for protein-protein interactions: Curr Opin Struct Biol, v. 23, p. 862-7 (2014)) (each of which is hereby incorporated by reference in its entirety). To select docking poses, different scoring functions can be implemented to rank the set of docking poses, for example, optimizing shape complementarity, energy functions (vdw, electrostatics, desolvation), binding free energies, and statistical potentials (Chen, R., et al., ZDOCK: an initial-stage protein-docking algorithm: Proteins, v. 52, p. 80-7 (2003); Gray, J. J., et al., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations: J Mol Biol, v. 331, p. 281-99 (2003); Huang, S. Y., Search strategies and evaluation in protein-protein docking: principles, advances and challenges: Drug Discov Today, v. 19, p. 1081-1096 (2014); Moal, I. H., et al., Scoring functions for protein-protein interactions: Curr Opin Struct Biol, v. 23, p. 862-7 (2013); Norel, R., et al., Electrostatic contributions to protein-protein interactions: fast energetic filters for docking and their physical basis: Protein Sci, v. 10, p. 2147-61 (2001); Ohue, M., et al., MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data: Protein Pept Lett, v. 21, p. 766-78 (2014); Schneidman-Duhovny, et al., PatchDock and SymmDock: servers for rigid and symmetric docking: Nucleic Acids Res, v. 33, p. W363-7 (2005)) (each of which is hereby incorporated by reference in its entirety). In addition to these physical and statistical based scoring functions, biological data can be incorporated either at the search stage or the scoring stage, for example defining residues that contribute to the binding interface or restricting the docked interface to the cdrs of an Ab in Ab-Ag docking (Dominguez, C., et al., HADDOCK: a protein-protein docking approach based on biochemical or biophysical information: J Am Chem Soc, v. 125, p. 1731-7 (2003); Gray, J. J., et al., Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations: J Mol Biol, v. 331, p. 281-99 (2003)) (each of which is hereby incorporated by reference in its entirety).
(37) Several challenges to the problem of protein-protein docking exist. Docking methods generally perform well when re-docking the individual binding partners from the structure a bound complex, yet performance degrades when the structures of two proteins in their unbound state are used (Janin, J., 2010, Protein-protein docking tested in blind predictions: the CAPRI experiment: Mol Biosyst, v. 6, p. 2351-62) (hereby incorporated by reference in its entirety). Moreover, often rigid docking is performed, which does not take into account the potentially large conformation changes in secondary structure that may occur in some cases of protein-protein binding. Advances in docking include attempting to incorporate flexibility into the structures being docked, whether on the level of backbone or side chain (Zacharias, M., 2010, Accounting for conformational changes during protein-protein docking: Curr Opin Struct Biol, v. 20, p. 180-6) (hereby incorporated by reference in its entirety).
(38) An reasonably accurate model of the interface of a protein-protein complex is a important for protein design experiments that aim to introduce novel function to protein scaffold (Fleishman, S. J., et al., Computational design of proteins targeting the conserved stem region of influenza hemagglutinin: Science, v. 332, p. 816-21(2011)) (hereby incorporated by reference in its entirety). In some cases, there has even been success using models of the proteins of interest for docking and subsequent protein design (Tharakaraman, K., et al. Redesign of a cross-reactive antibody to dengue virus with broad-spectrum activity and increased in vivo potency: Proc Natl Acad Sci USA, v. 110, p. E1555-64 (2013)) (hereby incorporated by reference in its entirety).
(39) In order to predict the structure of a macromolecular complex, using docking or other methods, a three-dimensional structure of the individual proteins is required. In the absence of experimentally determined structures (i.e. X-ray or NMR), a model of the protein must be generated. In general, models can be built using three methods—homology modeling, ab initio modeling and fold-recognition/threading methods (Petrey, D., and B. Honig, 2005, Protein structure prediction: inroads to biology: Mol Cell, v. 20, p. 811-9) (hereby incorporated by reference in its entirety). Reliable models can be generated by homology modeling if the protein of interest has a homolog with an experimentally determined structure, where the homology is at least ˜30% sequence identity (over a significant alignment length)(Rost, B., 1999, Twilight zone of protein sequence alignments: Protein Eng, v. 12, p. 85-94) (hereby incorporated by reference in its entirety). The homolog structure is used as ‘template’ on which to build the model (Sali, A., and T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints: J Mol Biol, v. 234, p. 779-815 (1993); Sali, A., et al., Evaluation of comparative protein modeling by MODELLER: Proteins, v. 23, p. 318-26 (1995); Webb, B., and A. Sali, Comparative Protein Structure Modeling Using MODELLER: Curr Protoc Bioinformatics, v. 47, p. 5.6.1-5.6.32 (2014)) (each of which is hereby incorporated by reference in its entirety). This 30% identity ‘rule of thumb’ may be sufficient for reliably modeling the correct protein fold; however, insertions or deletions, or sequence variability within loop regions, complicate the modeling and additional modeling approaches may be required. For proteins that do not have known 3D structures of homologs, or for regions of a protein with a high degree of variability relative to the template, methods such as ab initio modeling, or fold-recognition can be implemented (Petrey, D., and B. Honig, Protein structure prediction: inroads to biology: Mol Cell, v. 20, p. 811-9 (2005)) (hereby incorporated by reference in its entirety).
(40) Structural relationships between evolutionarily distant sequences, as identified by structure alignments and/or other computational tools, can be used as a method to predict function for proteins that lack functional annotation but have known structures (Goldsmith-Fischman, S., and B. Honig, Structural genomics: computational methods for structure analysis: Protein Sci, v. 12, p. 1813-21 (2003); Goldsmith-Fischman, S., et al., The SufE sulfur-acceptor protein contains a conserved core structure that mediates interdomain interactions in a variety of redox protein complexes: J Mol Biol, v. 344, p. 549-65 (2004)) (each of which is hereby incorporated by reference in its entirety). As an extension of this idea, the structure of the interface in a protein-protein complex (experimental or modeled by docking) may be used to identify and/or predict additional potential binders, by aligning regions of the protein comprising one side of the interface with a database of protein 3D structures, either by structural alignment of atoms or alignment of protein surfaces (Dey, F., et al., Toward a “structural BLAST”: using structural relationships to infer function: Protein Sci, v. 22, p. 359-66 (2013); Gao, M., and J. Skolnick, iAlign: a method for the structural comparison of protein-protein interfaces: Bioinformatics, v. 26, p. 2259-65 (2010); Pandit, S. B., and J. Skolnick, Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score: BMC Bioinformatics, v. 9, p. 531 (2008); Shulman-Peleg, A. et al., SiteEngines: recognition and comparison of binding sites and protein-protein interfaces: Nucleic Acids Res, v. 33, p. W337-41 (2005); Zhang, Q. C., et al., Structure-based prediction of protein-protein interactions on a genome-wide scale: Nature, v. 490, p. 556-60 (2012)) (each of which is hereby incorporated by reference in its entirety).
(41) Molecular Dynamics (MD) is a method that computationally simulates the movement of atoms and subsequent behavior of macromolecules in a biological system. (Karplus, M., and J. A. McCammon, Molecular dynamics simulations of biomolecules: Nat Struct Biol, v. 9, p. 646-52 (2002)) (hereby incorporated by reference in its entirety). The physical properties of the interaction potentials between atoms are described by a force-field, a set of functions approximating different properties of the atoms. The solvent properties of the biological system can be modelled explicity (i.e. using 3D models of water molecules) or implicitly, using various solvent models (Feig, M. et al., Journal of Computational Chemistry 25 (2): 265-84. (2004) (hereby incorporated by reference in its entirety)). MD can be utilized to assess and evaluate models of proteins, protein-ligand complexes, protein-protein interfaces.
(42) In addition to physics-based approaches, machine learning methods can be implemented to analyze and predict components of protein-protein interfaces. Machine learning methods like Support Vector Machines (SVMs) and Random Forests are general algorithms developed to ‘learn’ from example data represented as vectors (Breiman, L., Random forests: Machine Learning, v. 45, p. 5-32 (2001); Cortes, C., and V. Vapnik, Support-vector networks, Machine Learning, September 1995, Volume 20, Issue 3, pp 273-297,) (each of which is hereby incorporated by reference in its entirety). Machine learning approaches as well as statistics-based methods have been used to predict Ag-Ab interfaces (Sela-Culang, I., et al., Using a combined computational-experimental approach to predict antibody-specific B cell epitopes: Structure, v. 22, p. 646-57 (2014)) (hereby incorporated by reference in its entirety) and suggest positions that may participate in Ag binding (Burkovitz, A., I. et al., Large-scale analysis of somatic hypermutations in antibodies reveals which structural regions, positions and amino acids are modified to improve affinity: FEBS J, v. 281, p. 306-19 (2014)) (hereby incorporated by reference in its entirety).
(43) The molecular mechanisms that underlie somatic hypermutations have been the focus of extensive research. The introduced mutations are predominantly point mutations and rarely base insertions or deletions (Zhao, S. et al. Mol Immunol 47:694-700 (2010); Li, Z. et al., Genes Dev 18, 1-11 (2004) (each of which is hereby incorporated by reference in its entirety)) and are mediated by the activation-induced deaminase (AID) enzyme (Maul, R. W. et al., Adv Immunol 105, 159-191 (2010); Muramatsu, M. et al., J Biol Chem 274, 18470-18476 (1999) (each of which is hereby incorporated by reference in its entirety). AID introduces diversity by converting cytosine to uracil, which activates error-prone DNA repair mechanisms (Maul, R. W. et al., Adv Immunol 105, 159-191 (2010); Pham, P. et al., Nature 424, 103-107 (2003); Peled, J. U. et al., Annu Rev Immunol 26: 481-511 (2008) (each of which is hereby incorporated by reference in its entirety). Cytosines located within DNA motifs that are preferred binding targets of the AID enzyme are commonly referred to as hotspots (Dorner, T. et al., Eur J Immunol 28, 3384-3396 (1998) (hereby incorporated by reference in its entirety). However, not all of the hotspots are targeted (Kinoshita, K. et al., Nat Rev Mol Cell Biol 2, 493-503 (2001) (hereby incorporated by reference in its entirety)), and many SHMs occur near hotspots but not within them (Clark, L. A. et al., J Immunol 177, 333-340 (2006) (hereby incorporated by reference in its entirety)). The assumption that AID plays an important role in the SHM process inspired attempts to utilize it in vitro, e.g. by coupling mammalian cell-surface display with AID-directed SHM (Bowers, P. M. et al., Proc Natl Acad Sci USA 108, 20455-20460 (2011) (hereby incorporated by reference in its entirety)), or by designing phage display libraries based on DNA hotspots (Chowdhury, P. S. et al., Nat Biotechnol 17, 568-572 (1999) (hereby incorporated by reference in its entirety)).
(44) Studies that have attempted to characterize SHMs structurally mostly involved analyses of the crystal structures of one or a few pairs of germline and mature variants of a specific Ab in order to determine how structural factors affect affinity enhancement. In one such study, examination of the X-ray crystal structures of four anti-lysozyme Ab variants at various maturation stages revealed that binding is enhanced by burial of increasing amounts of an apolar surface area and by improving shape complementarity. (Li, Y. et al., Nat Struct Biol 10, 482-488 (2003) (hereby incorporated by reference in its entirety). However, analysis of another set of Abs found that the mature Ab does not have better shape complementarity to the Ag than its germline variant, but exhibits a small improvement in shape complementarity between the variable light (VL) chain and the variable heavy (VH) chain, and has a higher electrostatic contribution to Ag binding than that of the germline Ab. (Midelfort, K. S. et al., J Mol Biol 343, 685-701 (2004) (hereby incorporated by reference in its entirety). The X-ray structure of an anti-hapten Ab and its corresponding germline Ab suggested that, in this case, the increased affinity is achieved mainly by electrostatic optimization. (Chong, L. T. et al., Proc Natl Acad Sci USA 96, 14330-14335 (1999) (hereby incorporated by reference in its entirety). Several studies used molecular dynamics simulations of a handful of mature Abs (Wong, S. E. et al., Proteins 79, 821-829 (2011) (herein incorporated by reference in its entirety), or a specific Ab lineage (Schmidt, A. G. et al., Proc Natl Acad Sci USA 110, 264-269 (2013); Thorpe, I. F. et al., Proc Natl Acad Sci USA 104, 8821-8826 (2007) (each of which is herein incorporated by reference in its entirety), and reported that rigidification of the paratope leads to a reduction in the entropic cost of the interaction.
(45) The studies that have examined whether SHMs are focused on residues involved in Ag binding reached contradictory conclusions. Clark et al. identified SHMs in over 11 000 Ab sequences. (Clark, L. A. et al., J Immunol 177, 333-340 (2006) (herein incorporated by reference in its entirety). They reported that Ag-contacting positions are mutated three times more often than core residues. However, in this analysis, interface positions in the Ab sequence were defined as Ab positions that are within 12 Å of an Ag atom in any PDB structure, a definition that covers mostly residues that do not physically interact with the Ag. SHMs and hotspots were reported to be over-represented in the complementarity-determining regions (CDRs) (Clark, L. A. et al., J Immunol 177, 333-340 (2006); Dorner, T. et al., J Immunol 158, 2779-2789 (1997)). However, while CDRs cover ˜80% of the Ag-binding residues, 50-60% of the residues in the CDRs do not contact the Ag. (Kunik, V. et al., PLoS Comput Biol 8, e1002388 (2012) (herein incorporated by reference in its entirety). Several studies indicated that SHMs mostly occur in the periphery of the germline Ag-binding site and not in its center (Tomlinson, I. M. et al., J Mol Riot 256, 813-817 (1996); Thom, G. et al., Proc Natl Acad Sci USA 103, 7619-7624 (2006) (hereby incorporated by reference in its entirety), and that SHMs do not show a clear preference toward residues that are in contact with the Ag (Ramirez-Benitez, M. C. et al., Proteins 45, 199-206 (2001); Raghunathan, T. et al., J Mol Recog 25, 103-113 (2012) (hereby incorporated by reference in their entirety)). It has even been suggested that mutations in the interface may be disfavored as they disrupt Ab-Ag interaction. (Ramirez-Benitez, M. C. et al., Proteins 45, 199-206 (2001); Persson, J. et al., Tumour Biol 30, 221-231 (2009) (hereby incorporated by reference in their entirety).
(46) In one embodiment, the steps of the process of the present invention correspond to the iterative process described in
(47) Modeling:
(48) In one embodiment of the invention, a model of the antigen of interest in the receptor-bound conformation, is generated (e.g. using tools for homology structural modeling such as MODELLER (Fiser, A., et al. Modeling of loops in protein structures: Protein Sci, v. 9, p. 1753-73 (2000); Marti-Renom, M. A., et al., Comparative protein structure modeling of genes and genomes: Annu Rev Biophys Biomol Struct, v. 29, p. 291-325 (2000); Sali, A., and T. L. Blundell, Comparative protein modelling by satisfaction of spatial restraints: J Mol Biol, v. 234, p. 779-815 (1993)) (each of which is hereby incorporated by reference in its entirety) as implemented in the Discovery Studio suite, or any other structure prediction tool)(Accelrys et al., 2013 (hereby incorporated by reference in its entirety)). When the experimentally determined structure is available (e.g. in the PDB (Berman, H. M., et al., The Protein Data Bank: Acta Crystallogr D Biol Crystallogr, v. 58, p. 899-907 (2002)) (hereby incorporated by reference in its entirety)), it can be used as well). The model may be further refined by energy minimization (e.g. using CHarMM as implemented in the Discovery Studio suite (Brooks, B. R., et al. CHARMM: the biomolecular simulation program: J Comput Chem, v. 30, p. 1545-614 (2009)) (hereby incorporated by reference in its entirety), or any software for minimization), and in some cases molecular dynamics (MD) simulations (e.g. using GROMACS (Hess, B., et al. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation: Journal of Chemical Theory and Computation, v. 4, p. 435-447 (2008)) (hereby incorporated by reference in its entirety) or other MD software tools)
(49) When it is impossible to reliably model the entire protein, a structural model of the desired epitope alone may be used. This model can be generated using, for example, homology modeling (as described above) or de-novo prediction of the structural determinant.
(50) Docking:
(51) In one embodiment of the present invention, the model (or experimental structure) is then docked against a database of antibody three-dimensional structures, using, for example, ZDOCK (Chen, R., et al. ZDOCK: an initial-stage protein-docking algorithm: Proteins, v. 52, p. 80-7 (2003); Pierce, B., and Z. Weng, ZRANK: reranking protein docking predictions with an optimized energy function: Proteins, v. 67, p. 1078-86 (2007); Vreven, T., et al., Performance of ZDOCK in CAPRI rounds 20-26: Proteins (2013)) (each of which is herein incorporated by reference in its entirety) as implemented in Discovery Studio, (Accelrys, Software, and Inc., 2013, Discovery Studio Modeling Environment, Release 4.0, San Diego, Accelrys Software Inc. (hereby incorporated by reference in its entirety); Marcatili, P., et al. The association of heavy and light chain variable domains in antibodies: implications for antigen specificity: Febs Journal, v. 278, p. 2858-2866 (2011)) (hereby incorporated by reference in its entirety) and/or additional docking algorithms (e.g. Hex Ritchie, D. W., and V. Venkatraman, Ultra-fast FFT protein docking on graphics processors: Bioinformatics, v. 26, p. 2398-405 (2010)), (hereby incorporated by reference in its entirety) Megadock (Ohue, M., et al., MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data: Protein Pept Lett, v. 21, p. 766-78 (2014)) (each of which is herein incorporated by reference in its entirety). Biological and structural data for the antigen and antibody may be used to focus the docking or to eliminate unlikely poses (e.g. poses in which the contacts with the antigen are made by residues in the constant region) so that the epitope of interest and the CDRs are in the docked interface. This screening of poses may rely on the following considerations:
(52) 1. Determining whether the contacting residues in the pose involve CDR positions that are likely to be in contact with the antigen. This can be based on biophysical assessment and on statistical assessment of the propensities of contacts in each position in all known antibodies, as described in (Kunik, V., and Y. Ofran, The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops: Protein Eng Des Sel. (2013); Kunik, V., et al., Structural consensus among antibodies defines the antigen binding site: PLoS Comput Biol, v. 8, p. e1002388 (2012b)) (each of which is hereby incorporated by reference in its entirety). Identification of the antigen binding residues can be based on the process described in (Kunik, V., et al. Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)) (hereby incorporated by reference in its entirety), or on other methods for identifying CDRs (e.g. Chothia, C., and A. M. Lesk, Canonical structures for the hypervariable regions of immunoglobulins: J Mol Biol, v. 196, p. 901-17 (1987); Giudicelli, V., et al., IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes: Nucleic Acids Res, v. 33, p. D256-61 (2005); Kabat, E., A., et al., Sequence of proteins of immunological interest, National Institute of Health, Bathesda (1983); Lefranc, M. P., et al., IMGT/3Dstructure-DB and IMGT/DomainGapAlign: a database and a tool for immunoglobulins or antibodies, T cell receptors, MHC, IgSF and MhcSF: Nucleic Acids Research, v. 38, p. D301-D307 (2010); Lefranc, M. P., et al. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data: Nucleic Acids Research, v. 32, p. D208-D210 (2004); Lefranc, M. P., et al. IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains: Dev Comp Immunol, v. 27, p. 55-77 (2003); Morea, V., et al. Antibody modeling: implications for engineering and design: Methods, v. 20, p. 267-79 (2000)) (hereby incorporated by reference in their entirety) or antigen binding residues (Krawczyk, K., et al., Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking: Protein Eng Des Sel, v. 26, p. 621-9 (2013); Krawczyk, K., et al., Improving B-cell epitope prediction and its application to global antibody-antigen docking: Bioinformatics, v. 30, p. 2288-94 (2014); Olimpieri, P. P., et al. Prediction of site-specific interactions in antibody-antigen complexes: the proABC method and server: Bioinformatics, v. 29, p. 2285-91 (2013); TRAMONTANO, A., et al. FRAMEWORK RESIDUE-71 IS A MAJOR DETERMINANT OF THE POSITION AND CONFORMATION OF THE 2ND HYPERVARIABLE REGION IN THE VH DOMAINS OF IMMUNOGLOBULINS: Journal of Molecular Biology, v. 215, p. 175-182 (1990)) (each of which is hereby incorporated by reference in its entirety).
(53) 2. Removing poses in which the epitope does not overlap with the preselected epitope.
(54) 3. Selecting poses that, based on structure-function analysis, are likely to result in desired biological activity.
(55) In one embodiment, the resulting docking poses are then filtered in order to identify poses that have “native-like” properties, such as shape and/or biophysical feature complementarity. Additional scores are learned from known antibody-antigen complexes. The following filters may be implemented:
(56) A. Docking ranking: Top X ranking by various docking scoring functions.
(57) B. Docking consensus: For each docked antibody-antigen complex, poses that pass filter A are compared between at least two different docking algorithms, and those that are generated by more than one algorithm (based on agreement in RMSD of the antibody CDRs) are selected for further analysis.
(58) C. Knowledge-based features of known antibody-antigen complexes: Use machine learning to evaluate the complexes that have passed filter B. For example, we developed two different types of machine-learning classifiers, based on a similar approach to the one described in (Sela-Culang, I. et al., Structure 22:646-657 (2014) (herein incorporated by reference in its entirety).
(59) First Type of Classifiers:
(60) The present inventors assembled a training set of antibody-antigen complexes of known structure. In each complex the present inventors identified the ABR/CDR residues on the antibody that contact the antigen, and the residues on the antigen that contact the antibody. Each antigen residue was described in terms of its secondary structure (predicted or experimentally determined), evolutionary conservation, solvent accessibility, the identity, secondary structure and conservation of each of its neighbors (the inventors used a sequence window of 3 to 7 residues on each side but other window sizes may be used as well). The antibody residues were described in varied windows in terms of residue type, solvent accessibility, the position of residue within the CDR, the type of the CDR, and whether it is a germ-line residue or mutated (SHM). In addition, we built a knowledge-based potential for contacts between antibody residues and antigen residues. These potentials quantify the propensity (e.g. in terms of log likelihood) for a contact between a certain type of residue on the antibody and a certain type of residue on the antigen. That is, it assesses whether a certain type of residue-residue contact between antibody and antigen occurs more or less than expected by chance. This allowed the inventors to determine whether this particular contact is favored or disfavored in antibody-antigen interfaces. The inventors also built a more detailed set of such potentials for each CDR separately. This allows us to give a positive or negative score for each contact on each CDR. When possible (e.g. when the amount of experimental data permits), the inventors also built additional sets of potentials for specific structural positions on each CDR. This was done by aligning multiple CDRs that are similar to each other and then assessing the propensities of each of the 20×20 possible contacts between residue on the antibody position and residues on the antigen.
(61) The input vector for the supervised machine-learning algorithm (Random Forest and SVMs was used, but other machine learning algorithms can be used as well), was a vector that describes a residue position on the antibody, a vector that described a residue position on the antigen and the contact potential for this pair. The positive training set was the observed contacts, and the negative set was random pairing of ABR antibody and antigen surface residues. A 3-fold cross-validation was used. The classifier distinguished well between real and decoy antigenic contacts.
(62) Antibody-antigen complexes can be examined by the analysis of the predictions of classifiers' predictions on the interface residue pairs. For example, geometric or the arithmetic mean of the predictions scores on all or on a subset of the residue pairs in the interface of question.
(63) A Second Type of Classifier:
(64) The present inventors assembled a positive training set of antibody-antigen interfaces collected from experimentally determined 3D structures. A negative set was assembled from docking structures of antibodies to proteins, under the assumption that in the vast majority of cases a random antibody will not bind a random antigen and thus these interfaces represent false interfaces. The inventors filtered these negative interfaces, as described above, to retain only native-like complexes. Then, each interface was described using the following features: the number of contacts, what fraction of contacts are germ-line and what fraction are SHMs. How many specific contacts are there, how many H-bonds, how many aromatic interactions, etc. A score for the curvature of the surface, assessment of shape complementarity, Assessment of charge complementarity, area of the interface, relative area of interface on the antigen, reduction in solvent accessible area for the antibody and for the theoretical paratope (as calculated by canonic CDRs or by Paratome (Kunik, V., S. Ashkenazi, and Y. Ofran, Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)) (herein incorporated by reference in its entirety). Other biophysical and structural description of the interface may be used as well (e.g. conservation). The inventors also recorded the potentials for all contacts, as described above. In addition docking was run for the positive set, and the docking score of all docked poses was recoded. The inventors added to the vector that represented each interface features that described the distribution of docking scores. This is motivated by the observation that the distribution of docking scores of the different poses of a given antibody-antigen pair, differ dramatically between pair that are known to bind each other and pairs that are not known to bind each other (and that are assumed not to). These features include the distance (in terms of standard deviations) of the extreme values from the mean and or the median, the standard deviation itself, the distant between the mean and the median, and quintile characteristics. The inventors then used a Random Forest and an SVM to distinguish between real interfaces and decoys. A 10-fold cross validation has shown that this classifier distinguishes well between real and false interfaces.
(65) In addition to identifying “native-like” complexes based on results of protein-protein docking methods, the antibody-antigen complex may be modeled based on information obtained from structural analyses of protein-protein interfaces. Structures of either the antibodies or the antigen, or even only the epitope, may be screened against a database of 3D structures of protein complexes, in the form of local structure alignments, to identify protein-protein interfaces in which one partner shares structural features with the query protein. Superposition of the query (antibody or antigen/epitope) on the structurally similar protein-protein complex may suggest a model of the antibody-antigen complex, which can subsequently be analyzed using binding free energy calculations (e.g. using the energy calculation tools in Discovery Studio (Accelrys, Software, and Inc., 2013, Discovery Studio Modeling Environment, Release 4.0, San Diego, Accelrys Software Inc.) (hereby incorporated by reference in its entirety), or similar tools such as FoldX (Schymkowitz, J., et al. The FoldX web server: an online force field: Nucleic Acids Research, v. 33, p. W382-8 (2005) (hereby incorporated by reference in its entirety), Rosestta (Kuhlman, B., et al. Design of a novel globular protein fold with atomic-level accuracy: Science, v. 302, p. 1364-8 (2003); Kunik, V., et al., Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a); Liu, Y., and B. Kuhlman, RosettaDesign server for protein design: Nucleic Acids Res, v. 34, p. W235-8 (2006) (hereby incorporated by reference in their entirety) or other computational tools). It is also possible to use machine-learning analysis described above. This methodology can be also implemented as a filter to analyze the models resulting from protein-protein docking. In addition, antibody-antigen interfaces arising from protein-protein docking can be structurally compared, using these methods, with known protein-protein interfaces to identify interactions that may introduce specificity.
(66) Docking models that pass the filters and represent potential complexes with the template antibody may be subjected to energetic refinement (for example, minimization and side chain refinement implemented in Discovery Studio or similar methods) prior to further analyses, and MD simulations may be used to assess their stability.
(67) The process of pose selection described above enables the selection of a docked model with the antibody structure to be used as a template for library design.
(68) Libraries:
(69) In one embodiment of the present invention, positions within the CDRs of the template antibody or antibodies are selected for the introduction of variability for library design. For each antibody template, the CDRs are identified using, for example, Paratome (Kunik, V., et al. Paratome: an online tool for systematic identification of antigen-binding regions in antibodies based on sequence or structure, Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)) (hereby incorporated by reference in its entirety) or other tools for CDR identification. Based on the docked model of the antibody-antigen complex, residues within the CDRs that are in the interface with the antigen in the model are selected as potential candidates for mutational variability. Sequence analysis (using Blast or similar program) and, in some cases, structure based sequence alignments (North, B. et al., J. Mol. Biol. 406:228-256 (2011) (herein incorporated by reference in its entirety) are used to analyze these positions to determine whether they are likely to tolerate variability (based on how often variability is observed in related sequences). In addition, bioinformatic analyses of SHM data such as the data available in the analysis in (Burkovitz, A., I. Sela-Culang, and Y. Ofran, 2014, Large-scale analysis of somatic hypermutations in antibodies reveals which structural regions, positions and amino acids are modified to improve affinity: FEBS J, v. 281, p. 306-19) (hereby incorporated by reference in its entirety), may be used to evaluate the variability of these positions as well as their potential structural and functional relevance. Thus, the SHM data can be used to select both the positions and the variations. As seen in
(70) Variation at each selected CDR position can be determined using physical-chemical considerations, knowledge-based approaches, and based on the SHMs data described above. In one embodiment, the residue positions are mutated in silico to other amino acids, either in the context of the docked model or the structure of the free antibody, in order to calculate the effect of the mutation on both the binding free energy and the folding energy (stability), respectively, using, for example, the Mutation Energy protocols implemented in Discovery Studio (Accelrys, Software, and Inc., 2013, Discovery Studio Modeling Environment, Release 4.0, San Diego, Accelrys Software Inc.) (hereby incorporated by reference in its entirety), or similar such algorithms (e.g. FoldX (Schymkowitz, J., J. Borg, F. Stricher, R. Nys, F. Rousseau, and L. Serrano, 2005, The FoldX web server: an online force field: Nucleic Acids Research, v. 33, p. W382-8) (hereby incorporated by reference in its entirety) Rosetta or algorithms available in the Schrödinger suit (Kuhlman, B., G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker, 2003, Design of a novel globular protein fold with atomic-level accuracy: Science, v. 302, p. 1364-8; Liu, Y., and B. Kuhlman, 2006, RosettaDesign server for protein design: Nucleic Acids Res, v. 34, p. W235-8; Schrödinger, Release, and 2014-3, 2014, MacroModel, version 10.5, Schrödinger, LLC, New York, N.Y.; Schymkowitz, J., J. Borg, F. Stricher, R. Nys, F. Rousseau, and L. Serrano, 2005, The FoldX web server: an online force field: Nucleic Acids Research, v. 33, p. W382-8) (each of which is herein incorporated by reference in its entirety)). Sequence analysis and structure based sequence alignments are used to analyze the CDR positions when considering resulting in silico mutations to determine their likelihood. 3D models of the mutated antibodies in complex with the antigen may be analyzed by machine learning to identify favorable mutations and may be subjected to molecular dynamics simulations to assess the stability of the mutant antibody-antigen docked pose. Interfaces of known binders of the antigen can also be used as a guide for the variability. Applying Genetic algorithm or another search/optimization algorithm on the classifiers can be used to suggest positions and mutations in the library.
EXAMPLES
Example 1
(71) Identification of Substitutable Paratope Residues and Potential Substitutions
(72) This experiment sought to determine the principles that guide in vivo Ab affinity maturation. In particular, we attempted to identify factors that determine which residues are removed and which new ones are introduced during the SHM process. Given the controversies regarding the tendency of the paratope to undergo SHM, we sought to determine whether different structural parts of the Abs have different tendencies for substitutions. To this end, we analyzed 3495 SHMs in 196 structurally characterized Ab-Ag complexes, and examined (a) the role of AID hotspots in directing mutations, (b) the selective pressure for substitutions in different structural regions of the Ab, and (c) the predicted energetic effect of each substitution. It was found that AID motifs have no effect on selection of mutated residues, but the energetic contribution to Ag binding appears to have a major effect. Finally, a map was generated of the preferred substitutions in each region of the Ab. These results contribute to understanding of the principles that govern the SHM process, and may guide the design and engineering of high-affinity Abs.
(73) Using the data regarding preferred substitutions, we identified residues in the template sequence to be modified. Template variants were created by substituting these residues with variant residues indicated by the SHM analysis. In this manner, a library of template variants was formed for subsequent screening.
Example 2
Materials and Methods
(74) A. Ab-Ag Complex Dataset Construction
(75) 3D structure files of 752 Ab-Ag complexes were downloaded from IMGT/3Dstructure-DB (version 4.5.0). (Ehrenmann, F. et al., Nucleic Acids Res 38, D301-D307 (2010); Kaas, Q. et al., Nucleic Acids Res 32, D208-D210 (2004) (each of which is herein incorporated by reference in its entirety). Complexes with Abs from human (154 structures) or mouse and chimeric Abs (492 structures) were retained. Abs from mouse and chimeric Abs were grouped as mouse Abs. To identify the light and heavy chains in each complex, we clustered human sequences into two clusters and murine sequences into two clusters, each corresponding to either heavy or light, using BlastClust. (Dondoshansky, I. et al., BLASTclust (NCBI Software Development Toolkit). National Center for Biotechnology Information, Bethesda, Md. (2002) (herein incorporated by reference in its entirety). Complexes that included only one chain and light chain dimers were removed. For redundancy removal, VH and VL sequences of each Ab were concatenated, and BlastClust was used with sequence identity of 97% and coverage of 95%. The Ab-Ag complex that was not engineered or mutated was the selected representative sequence in each cluster. In cases where there was more than one non-engineered complex, the longest Ag with the best resolution was used. We identified Ags that are proteins or peptides. All other Ags were removed. One complex (PDB ID 1IGC) in which the sole non-Ab chain was protein G was also excluded from the analysis. In case where the closest gene in IMGT did not agree with the annotated species, we reviewed the relevant literature, which led to exclusion of 12 complexes from the analysis: six of these cases were humanized Abs, five of them came from non-naive synthetic libraries and one came from rabbit. Overall, the dataset contained 196 non-redundant Ab-Ag complexes.
(76) B. Identification of Germline Precursors and SHMs
(77) Sequence alignment was used to identify the related germline gene precursors and identify SHMs. Only variable regions were analyzed. Human and mouse sequences were submitted separately. Default parameters were used. The CDRH3 and CDRL3 alignments were manually reviewed and corrected accordingly. Similar results were obtained when the analysis was repeated after removing junction positions (positions 106-116 for the VH domain and positions 115 and 116 for the VL domain).
(78) C. Definition of SHM Contacting Residues, Germline Contacting Residues and Protein-Protein Interfaces
(79) For each complex structure in the protein-protein dataset (fully described previously in Kunik, V. et al., Protein Eng Des Sel 26:599-609 (2013)) (herein incorporated by reference in its entirety), the interface of a given chain included all residues in that chain for which at least one of their heavy atoms is within a distance of 6 Å from any of the other chains (Ofran, Y., “Prediction of protein interaction sites” In C
(80) D. Energy Calculation
(81) We performed a computational alanine scan for all contacting residues in the Ab, and assessed the effect of this mutation on Ag binding. To assess SHMs, we mutated each introduced residue back to its germline residue. ΔΔG values were calculated using FoldX. (Schymkowitz, J. et al., Nucleic Acids Res 33, W382-W388 (2005); Guerois, R. et al., J Mol Biol 320: 369-387 (2002)) (each of which is herein incorporated by reference in its entirety). The following steps were performed in both cases, as they differ from each other only in the mutation target (alanine or the corresponding germline residue). First, PDB structures were optimized using the FoldX RepairPDB function. Then each mutation was performed separately using the BuildModel function. This resulted in generation of mutants and their corresponding wild-type structure models. The heavy chain and the light chain of the Ab were grouped together to calculate the energy values of the assembled Ab, and the AnalyzeComplex function was used to calculate the binding ΔG of each model. The ΔΔG value for each mutant was then calculated by subtracting the wild-type ΔG value from the mutant ΔG value.
(82) E. Ab Structural Division Into Non-Overlapping Structural Regions
(83) Contact between two residues was defined as at least two heavy atoms (one from each residue) within a distance of 6 Å. The region “Ag interface” comprises all residues that contact the Ag but do not contact residues from the other Ab chain. The region “VH-VL interface” comprises all residues that contact the other Ab chains but not the Ag. The region “both interfaces” comprises Ab residues that contact both the Ag and the other Ab chain. The ABRs were identified using Paratome. (Kunik, V. et al., Nucleic Acids Res 40, W521-W524 (2012)) (herein incorporated by reference in its entirety). Residues in the ABR regions that do not contact the Ag or the other Ab chain were grouped as “ABRs not in interfaces”.
(84) F. Amino Acids Within DNA Hotspot Motifs
(85) The DNA hotspot motifs were RGYW or WRCY (Darner, T. et al., Eur J Immunol 28, 3384-3396 (1998)) (herein incorporated by reference in its entirety) where R indicates a purine base, Y indicates a pyrimidine base, and W indicates for an A or T base. For each amino acid, the proportion within hotspot motifs is the number of occasions the amino acid appeared within the hotspot motif out of the total appearances of the same amino acid in the germline sequences (V and J segments only) for all Abs in the dataset.
(86) G. Distance from the Nearest Hotspot Motif
(87) For each amino acid or mutation up to position 105 (according to IMGT numbering) in the V region, the distance from the nearest hotspot motif (RGYW or WRCY) was calculated as described previously. (Clark, L. A. et al., J. Immunol. 177: 333-340 (2006)) (herein incorporated by reference in its entirety). Briefly, the distance was defined as the number of bases between the middle codon and the nearest base of a hotspot motif A distance of zero indicates that the middle codon is inside a hotspot motif. Since the motifs have four positions the center nucleotide of a codon is four times more likely to fall somewhere within the motif than to fall in any other specific distance from it. Therefore, the observed number of cases with a distance of zero was divided by four before calculation of distributions. Amino acids or mutations that had two hotspots within the exact same distance were counted twice for that distance (with opposite signs).
(88) H. Amino Acid Propensity for Mutation
(89) The 196 Ab-Ag complexes were divided into three random subsets. The propensity of each amino acid to be mutated in each subset was calculated as:
(90)
(91) where AA1 gl region.fwdarw.X mature region is the number of changes from amino acid AA1 in the germ-line to any amino acid in the structural region,
(92)
is the frequency of amino acid AA1 in the germ-line sequences of structural region, and. mutations in the region is the number of mutations in the structural region. Priors of 1 were added. Propensity values from each of the random subsets were averaged and then used for standard error calculation.
I. Mutation Probability and Ab Position Numbering
(93) Abs positions and CDR definitions are numbered according to IMGT numbering. (Lefranc, M. P. et al., Dev Comp Immunol 27: 55-77 (2003)) (herein incorporated by reference in its entirety). The mutation probability was calculated as the number of mutations in a specific position divided by the number of appearances of an amino acid in this specific position. If the number of appearances of an amino acid in a specific position was ≤5, it was excluded from
(94) J. Standard Error Calculation
(95) Standard errors for
Example 3
(96) A. Dataset Construction and SHMs Identification
(97) A non-redundant dataset of 196 Ab-Ag complexes was generated (Table S1). Overall, 3495 SHMs were identified in the variable regions. Of those, 2172 occurred in mouse sequences (with a mean of 14.87 mutations per Ab) and 1323 occurred in human sequences (with a mean of 26.46 mutations per Ab). This difference may be ascribed, at least in part, to the way Abs are collected from mice and humans. The former are typically killed, and Abs collected, shortly after exposure to the Ag when they are a few months old. Human Abs, on the other hand, are typically collected from the blood of infected adults after repeated exposures to Ags.
(98) B. AID Hotspot Motifs are Not Correlated to SHMs
(99) As only the amino acid sequences of the mature Abs are available in the Protein Data Bank, it is impossible in most cases to retrieve the DNA sequences of the mature Ab from public databases. However, it is possible to retrieve the DNA sequences of the germline genes. These sequences allow us to evaluate the relationships between SHMs and AID hotspot motifs (RGYW or WRCY; R indicates a purine base, Y indicates a pyrimidine base, W indicates an A or a T base) (Darner T, et al., Eur. J. Immunol. 28:3384-3396 (1998) (hereby incorporated by reference in its entirety) in the germline genes.
(100)
(101) C. SHMs Occur More in Heavy Chains, but Light Chain SHMs are as Important Energetically
(102) We assessed the energetic effect on the binding of the Ag for every mutated residue in the Ab by mutating it back to its germline amino acid (in silico) and predicting the effect of this mutation on the ΔΔG of binding. The calculations were performed using FoldX (Schymkowitz, J. et al., Nucleic Acids Res 33: W382-W388) (hereby incorporated by reference in its entirety), which uses parameters and weights derived from experimental data from a large number of mutations. Large-scale assessments of the energetic predictions by FoldX for 1030 mutants (Guerois, R. et al., J Mol Biol 320: 369-387 (2002) (hereby incorporated by reference in its entirety)) have shown them to be strongly correlated (R=0.83) with experimentally measured effects. Thus, while FoldX may not always provide individual accurate predictions, it may be trusted to reveal trends in large sets of mutations. Half (51%) of the SHMs had predicted ΔΔG values of 0, suggesting that they have no effect on binding, while 32% of the SHMs had positive ΔΔG values and only 17% had negative ΔΔG values, indicating that, as expected, mutating mature residues back to their germline amino acids hampers Ag binding more often than improving it. The distribution of ΔΔG values for SHMs in the VH domain is almost identical to that of SHMs in the VL domain (
(103) D. The Ag Combining Site has the Highest SHM Propensity
(104) We divided the Ab into five non-overlapping structural regions (
(105)
(106) where ‘r’ represents one of the five structural regions. If there is no preference for mutations in one region, the value of P.sub.r for that region is 0. This propensity may be used to assess the selective pressure on each of the structural regions defined. Consistent with previous reports (Ramirez-Benitez, M. C. et al., Proteins 45:199-206 (2001)), Raghunathan, G. et al., J. Mol. Recog. 25:103-113 (2012)), we found that most of the mutations (71.63%) occur outside the Ag-binding site, with 18.55%, 13.75% and 39.33% of the mutations being introduced into the regions “VH-VL interface”, “ABRs not in interfaces” and “other residues”, respectively. However, 87.75% of the Ab residues in the variable region do not contact the Ag. Thus, when normalizing to the relative sizes of these regions (
(107) E. Germline Residues Account for Most of the Binding of the Ag
(108) To determine which contacts contribute more to Ag binding, i.e. those that are formed by the residues mutated during SHM (“SHM contacting residues”) or those that are formed by residues retained from the primary germline sequence (“germline contacting residues”), we compared their predicted energetic contribution by mutating each contacting residue to alanine and calculating the effect of this mutation on binding energy (see “Experimental procedures”). The results are shown in
(109) F. SHMs Make the Ab-Ag Interface more Similar to Other Protein-Protein Interfaces
(110) We compared the amino acid composition of SHM contacts and germline contacts with those of general protein-protein interfaces. All aliphatic hydrophobic amino acids (alanine, isoleucine, leucine, methionine, proline and valine) are under-represented in the Ab-Ag interface compared with general interfaces (
(111) G. Structure and Function Drive the Propensity for Mutation
(112) To understand the role of different amino acids in SHM and the differences between the structural regions, we further analyzed the propensities for mutation in germline amino acids during SHM. As shown in
(113) All polar amino acids show a very distinct preference across these four structural regions. Tyrosine, which is highly important in Ag binding due to its over-representation in Ab ABRs (Kunik, V. et al., Prot. Eng. Des. Sel. 26:599-609 (2013), is actually a preferred target for substitution in ABRs residues that are not in interfaces and in the VH-VL interface. The only exception is the Ag interface, in which tyrosine is slightly protected from substitutions. Threonine, which has also been suggested to be over-represented in Ag interfaces (Ofran, Y. et al., J. Immunol. 181:6230-6235 (2008)), is mostly neutral to mutation, but is mutated less than expected in the VH-VL interface. Tryptophan is a slightly preferred target for mutation among the residues that are part of both interfaces, and is highly under-mutated in all other regions. Asparagine and glutamine show opposite patterns. While asparagine is over-represented, glutamine is under-represented in both the VH-VL interface and ABRs that are not in any interface. Asparagine also has high mutability in both interfaces, and glutamine is mutated less than expected in the Ag interface. As for the charged amino acids, arginine shows a negative propensity for mutation in the VH-VL interface and in both interfaces. Lysine shows a positive propensity for mutations in ABRs that are not in interfaces. Glutamic acid, aspartic acid and histidine are all less mutated than expected in the Ag interface and in both interfaces.
(114) H. Five Amino Acids Account for 49% of Mutations in the Ag Interface Region
(115)
(116) The propensities for substitutions in
(117) I. Mutation Probability and Energy Contribution Reveal Promising Positions for Affinity Enhancement
(118) Rational design of high-affinity Abs requires targeting of Ab positions for mutations. Our analysis identifies such positions based on in vivo SHM data.
(119) The regions in the Ab that have high average ΔΔG values for mutating their residues back to the germline amino acids overlap to some extent with regions that have a high mutation probability. However, not all CDR positions undergo substitutions that contribute to binding. For example, CDRH2 (VH positions 56-65) has high mutation probabilities for most of its residues. However, positions 63 and 65 have, on average, no energetic effect on binding despite their high probability for mutations. Positions that are frequently mutated and also show a substantial effect of SHMs on Ag-binding energy, such as 38, 55, 57, 59, 112, 113 and 114 on the VH domain and 110 and 116 on the VL domain, may be promising targets for in vitro affinity enhancement.
(120) J. Discussion
(121) Many of the insights into the structural basis of in vivo affinity maturation were obtained from analyses of SHMs in a single pair, or in several pairs, of germline and mature Abs Li Y, Li H, Yang F, Smith-Gill S J & Mariuzza R A (2003) X-ray snapshots of the maturation of an antibody response to a protein antigen. Nat Struct Biol 10, 482-488. Midelfort K S, Hernandez H H, Lippow S M, Tidor B, Drennan C L & Wittrup K D (2004) Substantial energetic improvement with minimal structural perturbation in a high affinity mutant antibody. J Mol Biol 343, 685-701. Chong L T, Duan Y, Wang L, Massova I & Kollman P A (1999) Molecular dynamics and free-energy calculations applied to affinity maturation in antibody 48G7. Proc Natl Acad Sci USA 96, 14330-14335. Wong S E, Sellers B D & Jacobson M P (2011) Effects of somatic mutations on CDR loop flexibility during affinity maturation. Proteins 79, 821-829.-Schmidt A G, Xu H, Khan A R, O'Donnell T, Khurana S, King L R, Manischewitz J, Golding H, Suphaphiphat P, Carfi A, et al. (2013) Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody. Proc Natl Acad Sci USA 110, 264-269. Thorpe I F & Brooks C L 3rd (2007) Molecular evolution of affinity and flexibility in the immune system. Proc Natl Acad Sci USA 104, 8821-8826., Acierno J P, Braden B C, Klinke S, Goldbaum F A & Cauerhff A (2007) Affinity maturation increases the stability and plasticity of the Fv domain of anti-protein antibodies. J Mol Biol 374, 130-146.]. Large-scale studies that attempted to elucidate the principles that guide SHM reached contradictory conclusions regarding preference for SHMs in the Ab-Ag interface (Clark L A, Ganesan S, Papp S & van Vlijmen H W (2006) Trends in antibody sequence changes during the somatic hypermutation process: 333-340; Dorner T, Brezinschek H P, Brezinschek R I, Foster S J, Domiati-Saad R & Lipsky P E (1997) Analysis of the frequency and pattern of somatic mutations within nonproductively rearranged human variable heavy chain genes. J Immunol 158, 2779-2789; Ramirez-Benitez M C & Almagro J C (2001) Analysis of antibodies of known structure suggests a lack of correspondence between the residues in contact with the antigen and those modified by somatic hypermutation. Proteins 45: 199-206; Raghunathan G, Smart J, Williams J & Almagro J C (2012) Antigen-binding site anatomy and somatic mutations in antibodies that recognize different types of antigens. J Mol Recog 25: 103-113.). Our division of the Ab into various structural regions, and the calculation of mutation probability and the energy effects of SHMs in each region, reveal that the highest propensity for SHMs is in Ag-binding regions (Ag interface and both interfaces). These regions also provide the greatest energetic contribution to Ag binding. These results are consistent with the selection of B cells based on Ag binding and with previous studies that showed fine-tuning of the Ag-binding site through SHMs (Li Y, Li H, Yang F, Smith-Gill S J & Mariuzza R A (2003) X-ray snapshots of the maturation of an antibody response to a protein antigen. Nat Struct Biol 10, 482-488: Chong L T, Duan Y, Wang L, Massova I & Kollman P A (1999) Molecular dynamics and free-energy calculations applied to affinity maturation in antibody 48G7. Proc Natl Acad Sci USA 96, 14330-14335). Although to a lower extent than the regions involved Ag binding, ABR residues that are not in the interfaces and residues in the VH-VL interface are both favored targets for mutations and make a substantial energetic contribution to Ag binding. This is consistent with previous studies that showed how internal interface stabilization (Acierno J P, Braden B C, Klinke S, Goldbaum F A & Cauerhff A (2007) Affinity maturation increases the stability and plasticity of the Fv domain of anti-protein antibodies. J Mol Biol 374, 130-146.) and increased VH-VL interface shape complementarity (Midelfort K S, Hernandez H H, Lippow S M, Tidor B, Drennan C L & Wittrup K D (2004) Substantial energetic improvement with minimal structural perturbation in a high affinity mutant antibody. J Mol Biol 343, 685-701). result in enhanced Ag binding.
(122) DNA motifs that enhance targeting of the AID enzyme have been the focus of many studies that attempted to identify SHM sites. Such DNA hotspot motifs were previously suggested to play an important role in the formation of SHMs (Darner T, Foster S J, Farner N L & Lipsky P E (1998) Somatic hypermutation of human immunoglobulin heavy chain genes: targeting of RGYW motifs on both DNA strands. Eur J Immunol 28, 3384-3396). However, our results indicate that the mature Ab sequence is determined by the affinity and possibly the stability of the Ab. The lack of correlation between the extent to which an amino acid is located within hotspots and its frequency among mutated positions suggests that structural and functional considerations play a much more important role than the presence of AID hotspots.
(123) Our analysis of SHM, germline and general protein-protein interfaces suggested some evolutionary insights. Tyrosine and tryptophan, which are large, flexible, amphipathic amino acids, were previously suggested to be highly represented in the Ag interfaces, and have been proposed to allow binding of several structurally similar Ags (Mian I S, Bradwell A R & Olson A J (1991) Structure, function and properties of antibody binding sites. J Mol Biol 217, 133-151.) However, the affinity maturation process decreases their representation and increases the representation of aliphatic hydrophobic amino acids. Both SHM contacts and protein-protein contacts are the result of specific evolution and optimization of contacts, while germline-Ag contacts occur between partners that have never met before. This may explain the abundance of germline interface residues that may form several different kinds of contacts, and also the higher similarity between protein-protein interfaces and SHM contacting residues. This observation is consistent with a previous study that suggested that Ab affinity maturation and protein-protein interface evolution are guided by similar principles (J Riot Chem 285: 3865-3871).
(124) The ΔΔG values in this study were predicted by FoldX (Guerois R, et al. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations: 369-387 (2002) (hereby incorporated by reference in its entirety). While there may be other tools that allow energetic assessment of individual mutations, FoldX enables rapid assessment of a large number of SHMs. An independent assessment has shown that FoldX is particularly good in assessment of the energetic effect of mutations to amino acids other than alanine and mutations of residues located in loops (Potapov V, et al., Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details 553-560 (2009). Previous studies have shown that FoldX may be used to identify trends in the evolution of protein function (Tokuriki N, et al., How protein stability and new functions trade off PLoS Comput Biol 4, e1000002 (2008); Tokuriki N, et al., The stability effects of protein mutations appear to be universally distributed 1318-1332 (2007)). Furthermore, it has recently been used for the study Ab-Ag interactions (Kunik V, et al. Structural consensus among antibodies defines the antigen binding site. PLoS Comput Biol 8, e100238 (2012). Kunik V & Ofran Y. The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops: 599-609 (2013). The FoldX energy function also includes scoring parameters for the entropic cost of mutation. However, these parameters are calculated based on theoretical data and have been acknowledged to be a crude estimation of the entropy (Schymkowitz J, et al. The FoldX web server: an online force field. Nucleic Acids Res 33: W382-W388 (2005). It has been shown that loss of flexibility in the Ab paratope and thus a lower entropic cost of the interaction is an important aspect in Ab affinity maturation (Wong S E, et al. Effects of somatic mutations on CDR loop flexibility during affinity maturation, Proteins 79: 821-829 (2011); Schmidt A G, et al., Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody, Proc Natl Acad Sci USA 110: 264-269 (2013). Thorpe I F & Brooks C L 3rd, Molecular evolution of affinity and flexibility in the immune system, Proc Natl Acad Sci USA 104 8821-8826 (2007). Quantification of such effects requires long molecular dynamics simulations or experimental procedures. Such methods are not applicable for a large number of Ab-Ag complexes, thus the estimation of paratope rigidification is beyond the scope of this study.
(125) The Ab-Ag dataset we used consists of 196 non-redundant Ab-Ag complexes. As more Ab-Ag complexes become available, it will be possible to also apply this approach to Ab-hapten interaction, which is currently not practical, and even to the interfaces with specific Ags such as gp120 or hemagglutinin, to elicit SHM patterns that are unique for that Ag. For example, it has recently been shown that Abs that broadly neutralize HIV are characterized by a remarkably high number of SHMs (Scheid J F, et al., Broad diversity of neutralizing antibodies isolated from memory B cells in HIV-infected individuals, Nature 458: 636-640 (2009); Kwong P D & Mascola J R, Human antibodies that neutralize HIV-1: identification, structures, and B cell ontogenies, Immunity 37: 412-425 (2012); Wu X, et al., Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1, Science 329: 856-861 (2010). and may require also SHMs in their framework regions (Klein F, et al. Somatic mutations of the immunoglobulin framework are generally required for broad and potent HIV-1 neutralization, Cell 153: 126-138 (2013).
(126) Over recent decades, Abs have become one of the most effective and popular tools in biotechnology and biomedicine (Maynard J & Georgiou G, Antibody engineering, Annu Rev Biomed Eng 2: 339-376 (2000)) and more than 30 Abs and Ab derivatives have been approved for therapeutic use by the US Food and Drug Administration (Beck A, et al., Strategies and challenges for the next generation of therapeutic antibodies, Nat Rev Immunol 10: 345-352 (2010). Therapeutic and diagnostic Abs frequently require engineering to enhance the affinity of Abs raised in immunized animals or selected by library screens. This step is important to expand detection limits, extend dissociation half-life, decrease drug dosage and increase drug efficiency (Lippow S M, et al., Computational design of antibody-affinity improvement beyond in vivo maturation, Nat Biotechnol 25: 1171-1176(2007). The structural and biophysical principles identified here may allow more focused in vitro design of Abs with enhanced affinities for use in building the libraries of the invention.
Example 4
Antibody Repitoping
(127) 1. Modeling
(128) A model of the antigen of interest, in this case IL-17A in the receptor bound conformation, was generated using Modeler as implemented in the Discovery Studio suite.
(129) 2. Docking
(130) The model was then docked against a large database of antibody three-dimensional structures using ZDOCK as implemented in Discovery Studio. Various poses were screened in order to identify poses that have “native like” properties. For the IL-17A antibody, poses providing optimal blocking of the binding site of the IL17AR were sought. A docking pose of antibody 2ZJS (PDB id) and the model of IL-17A was selected as a template for library design.
(131) 3. Libraries
(132) Positions within the CDRs of the antibody were selected for the introduction of variability for library design according to the methods described infra. For the initial library based on the 2ZJS antibody from the PDB, docked to IL-17A as described above, five positions were selected for variation (1 on chain H, 4 on chain L), yielding a library with diversity (at the amino acid level) of ˜500,000. In addition to the 2ZJS-based library, other libraries were designed based on the docking models with the following PDB structures: 2ADG, 1GPO, 3A6C, 3C09, 1DFB Libraries based on 2ADG and 2ZJS yielded IL-17A binders.
(133) 4. Testing
(134) The initial selection of libraries (2ZJS and 2ADG) against IL-17A yielded several clones that bound the antigen specifically with sub-micromolar affinity, based on titrations performed on the yeast.
(135) After each round of selection the surviving clones were deep-sequenced to analyze which variants are subject to selective pressures and which substitutions are favored or disfavored in the various positions. The results of this analysis were used to design an improved library. Briefly, positions that are under selective pressure (i.e. mutations in these positions improve or hamper binding) are positions that have an effect on the interface. This information can be used to refine the original model of antibody-antigen complex, and, in turn will allow another iteration of the process described above, yielding new libraries with more focused variations.
(136) 5. Another Iteration:
(137) Clones selected from this library as IL-17A binders, were utilized as the basis for the introduction of additional variation to improve affinity and utility. Specific positions within the antibody were selected based on sequence analysis (for example, Blast), positions suggested in the literature, and/or the analysis of deep sequencing data from the initial library. Based on these analyses, a next-generation library was designed.
(138) In this particular case, we were able to identify several positions in two of the libraries that were under selection. For example, in the library that was based on 2ZJS we observed that in two neighboring positions we saw a clear overrepresentation of aromatic residues. This round of selection culminated in a scFv that show full cross blocking of the soluble IL17Ra.
(139) 6. Isolate Binders:
(140)
(141) Additional analysis of the soluble scFv has shown that it does not only bind the IL17a but is also highly thermo-stable, as shown in
Example 5
Differences Between Synthetic and Natural Antibodies
(142) A. Background
(143) A critical question, therefore, in designing synthetic libraries is to what extent the resulting Abs are similar to natural Abs in the way they recognize and bind the Ag. Indeed, good therapeutic biomolecules do not have to mimic natural Abs. However, it is often assumed that libraries that better mimic natural Abs and natural diversity are more likely to yield better binders with better profile. Some novel approaches for library design attempt to introduce diversity that will better imitate natural diversity while also yielding Abs with improved biophysical properties. For example, the human combinatorial antibody library (HuCAL) was created to represent the most frequently used germline families and was optimized to obtain high expression and low aggregation in E. coli. The CDRs cassettes were designed to mimic the length and amino acid composition of naturally occurring Abs (Knappik A, et al., J Mol Biol 296:57-86 (2000); Rothe C, et al., J Mol Biol 376:1182-200 (2008)) (herein incorporated by reference in its entirety). Sidhu et al. (Sidhu S S, et al. J Mol Biol 338:299-310 (2004) (herein incorporated by reference in its entirety)) used a single stable framework scaffold to introduce diversity to the heavy chain, based on the observed propensities of amino acids in CDRs of natural Abs. Another strategy was to amplify only the CDR sequences from naïve B cells and randomly combine these CDRs into a selected Ab framework that can be highly expressed in bacterial system (Soderlind E, et al. Nat Biotechnol 18:852-6 (2000) (herein incorporated by reference in its entirety)). Further understanding of key properties of naturally existing Abs will help Ab engineering technologies to obtain more promising therapeutic Abs candidate.
(144) Here, we compare synthetic Abs to natural Abs to assess to what extent synthetic Abs indeed mimic natural ones. This comparison allowed us to review and revise common assumptions about Ab-Ag interaction. We employ a novel computational tool we developed, “CDRs analyzer” to explore biophysical characteristics of Abs. In this analysis, natural Abs are Abs that originated from hybridoma or from immunized or naïve libraries, and synthetic Abs are Abs that were selected from a synthetic library (i.e., a library that is not naïve or immunized). We found that synthetic Abs rely on CDRH3 significantly more than natural Abs. The binding contribution of CDRH1 and CDRH2 of synthetic Abs is smaller than their contribution in natural Abs. When analyzing the binding mode, we found that epitopes of natural Abs contain many epitope residues that contact multiple CDRs, while epitopes of synthetic Abs have more residues that contact only one CDR. These results show that the current way in which synthetic libraries are designed often yields Abs that do not mimic the way in which natural Abs bind their Ags. Our analysis suggests a set of considerations for library design that will take better advantage of the binding possibilities offered by the structure of the Ab. We discuss how this can yield libraries with more effective binders and with greater diversity of paratopes.
(145) B. Methods
(146) B.1 Construction of Natural Ab-Ag Complexes Datasets
(147) To build a large non-redundant set of natural Abs, a previously published non-redundant dataset of 196 Ab-Ag complexes (Burkovitz, A. et al., FEBS J 281:306-19 (2014) (herein incorporated by reference in its entirety)) was further filtered to create the current study dataset of natural Ab-Ag complex. The “CDRs Analyzer” cannot analyze scFv Abs, Abs that contain disorder residues in the CDRs or non-standard amino acids, complexes that were solved by NMR and complexes composed of more than 25000 atoms. Complexes that met these conditions were deleted from the original dataset. In addition, complexes that included synthetic Abs were moved to the synthetic Ab-Ag dataset. Finally, complexes that contain Ag with length of ≤30 amino acids were also removed. The resulting dataset contained 101 natural Ab-Ag complexes (Table S1).
(148) B.2 Construction of Synthetic Ab-Ag Complexes Datasets
(149) A synthetic Ab-Ag complexes dataset was constructed using both the PDB.sup.32 and sAbDab. (Dunbar, J. et al. Nucleic Acids Res 42:D1140-6 (2014) (herein incorporated by reference in its entirety). The PDB query search was used to curate manually synthetic Ab-Ag complexes. The PDB query type was set to “PubMed abstract” and search words were “phage display antibody” and “library antibody”. In addition, the sequences of the light chain, the heavy chain or the full variable domain of a representative synthetic Ab (PDBID:2H9G) was used to search the sAbDab database using the framework region only option. The retrieved PDB entries were considered synthetic Ab-Ag complexes if the library from which it was isolated included variable domains sequences that were not obtained from a natural repertoire. Two Ab-Ag complexes were considered redundant in case the two Abs bound the same Ag at a similar epitope. Redundancy was removed according to this criterion. We removed from the dataset complexes that contain scFvs, Ag length ≤30 amino acids, Abs that contains disordered residues in the CDRs or non-standard amino acids, complexes with resolution ≤3.6 A° and complexes that are composed of more than 25000 atoms. The final synthetic Ab-Ag complexes dataset contained 36 non-redundant PDB entries.
(150) B.3 Analyzing Ab-Ag Complexes Using “CDRs Analyzer”
(151) CDRs analyzer takes as an input an X-ray structure of Ab-Ag complex in a PDB file format. It automatically identifies the CDRs residues and calculates a set of parameters for all six CDRs. The output is an HTML page presenting the calculated parameters (described below) for each of the CDRs, a list of contacting residues and list of specific interactions. “CDRs Analyzer” was implemented in Perl and Python. The front end of the server is designed in HTML and XML.
(152) Description of CDRs Analyzer:
(153) B.3.1 CDRs Identification
(154) The CDRs are identified using Paratome. (Kunik, V. et al., Nucleic Acids Res 40:W521-4 (2012); Kunik, V. et al. PLoS Comput Biol 8:e1002388 (2012) (herein incorporated by reference in their entirety) An Ag-contacting residue within ±15 residues from the Ag binding region boundaries as defined by Paratome is added to the nearest DR. An Ag-contacting residue is a residue on the Ab that has at least one non-hydrogen atom within 5 Å from a non-hydrogen atom in the Ag.
(155) B.3.2 Number of Contacting Residues
(156) The number of “contacting residues” is the number of residues in a CDR that are in contact with the Ag and the number of residues in the Ag that are in contact with the CDR.
(157) B.3.3 Number of Specific Interactions
(158) The number of “specific interactions” is the sum of the number of salt-bridges, pi-pi, cation-pi and possible H-bonds (McDonald, I. K. et al., J Mol Biol 238:777-93(1994) (herein incorporated by reference in its entirety)) between the CDR and the Ag. A salt bridge is defined as one Asp or Glu side-chain carboxyl oxygen atom and one side-chain nitrogen atom of Arg, Lys or His that are within 4.0 Å of each other. H-bonds were identified by first adding polar hydrogens atoms to the complex using Discovery Studio Visualizer and then by submitting the output file to HBPLUS program with default parameters. (McDonald, I. K. et al., J Mol Biol 238:777-93 (1994) (herein incorporated by reference in its entirety)) Pi-pi interactions are identified according to McGaughey et al. (McGaughey, G. B. et al. J Biol Chem 273:15458-63 (1998) (herein incorporated by reference in its entirety)
(159) Briefly, the distance between the centroid of each pair of pi rings should be 8 Å or less, at least one atom from each ring should be within 4.5 Å. In addition, the angle theta between the normal of one or both rings and the centroid-centroid vector must fall between 0 and ±60 degrees. The angle lambda between the normal of each ring must fall between 0 and ±30 degrees. A cation-pi interaction is defined if: Lys or Arg side chains cations are within 7 Å from a centroid of a pi ring. The perpendicular distance between the cation and the plane of the ring is within 6 Å and the angle between the cation-centroid vector and the ring plane is more than 45 degrees.
(160) B.3.4 Energy Calculations (ΔΔG)
(161) The effect of in-silico mutation of each CDR residue to ALA is calculated using FoldX. (Guerois, R. et al., Journal of Molecular Biology 320:369-87 (2002) (herein incorporated by reference in its entirety)) FoldX's calculations have been previously shown to be correlated to experimentally measured results of 1030 mutants (R=0.83). (Guerois, R. et al. Journal of Molecular Biology 320:369-87 (2002)) A recently published study curated 1100 mutations in Ab-Ag complexes and examined the performance of different energy scoring methods. (Sirin, S. et al. Protein Sci 2015 (herein incorporated by reference in its entirety).
(162) FoldX was one of the top performers in that study, on both destabilizing (ΔΔG>1.0 kcal/mol) and stabilizing (ΔΔG<−1.0 kcal/mol) mutations.
(163) Each PDB structure is first optimized using the FoldX RepairPDB function. Then, residues in the CDR are mutated to Ala using the BuildModel function that generated mutants and their corresponding wild-type structure models. The heavy chain and the light chain of the Ab are grouped together to calculated the energy values of the assembled Ab, and the AnalyzeComplex function is used to calculate the binding ΔG of each model. The calculated ΔΔG for each mutant is then computed by subtracting the wild-type calculated ΔG value from the mutant calculated ΔG value. The “ΔΔG” of a CDR is considered as the sum over its residues. The “CDRs Analyzer” outputs the ranking of the six CDRs according to the ΔΔG values.
(164) B.3.5 Delta Relative Surface Accessibility (ARSA)
(165) RSA is given by dividing the solvent accessibility value by the surface area of the given amino acid. (Chothia, C., J Mol Biol 105:1-12 (1976) (herein incorporated by reference in its entirety)). The solvent accessibility of the Ab residues are calculated using DSSP program. (Kabsch, W. et al., Biopolymers 22:2577-637 (1983) (herein incorporated by reference in its entirety). RSA is computed for each of the residues in the CDR, once with Ag presence (RSAbound) and once without Ag presence (RSAunbound). The ARSA is given by subtracting the RSAQ.sub.bound from the RSA.sub.unbound. The ARSA of a CDR is considered as the sum over its residues.
(166) B.3.6 Binding Contribution Score
(167) To evaluate the involvement of each CDR in Ag recognition we used an estimated calculation, which sums the four parameters values into a single “binding contribution score”. For each of the four binding parameters above, values are normalized and scored according to their quartiles: 4 points for values within the top 25% of the scores, 1 for the values within the lowest 25%. The “binding contribution score” of a given CDR is the sum of the scores over its criteria varies from 4 (no contribution to Ag binding) to 16 (highest contribution to Ag binding). The binding contribution calculation gives an equal weight for the four binding parameters. When more structural data becomes available, these weights should be assessed and optimized. To verify that the score is not sensitive for arbitrary cutoffs, we checked different binding contribution scores by dividing the parameters values into bins of thirds and fifths (instead of quarters). This did not change the results.
(168) B.3.7 Independent and Integrated Ag Residues
(169) An “independent residue” is an Ag residue that is in contact with residues that belong to only one CDR. An “integrated residue” is an Ag residue that is in contact with at least three CDRs. These parameters are used by the “CDRs Analyzer” to calculate the “Independent binding score”, which measure the potential of a given CDR to bind the Ag as peptide. (Burkovitz, A. et al., J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)). For that purpose, the percentage of independent or integrated residues for a given CDR was calculated out of Ag residues contacting that CDR. Here, we aimed to evaluate the complexity of the Ab-Ag interaction. Thus, the percentage of independent or integrated residues were calculated out of the total number of the epitope residues.
(170) B.3.8 Independent Binding Score
(171) The six parameters above (contacting residues, specific interactions, ΔΔG, ARSA, percentage of Independent and integrated Ag residues) are used to evaluate the potential of a CDR to bind the Ag as peptide. (Burkovitz, A. et al., J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)) The values of each of the parameters are normalized and scored according to their quartiles: 4 points for values within the top 25% of the scores, 1 for the values within the lowest 25%. The “Independent binding score” of a given CDR is the sum of the scores over its six criteria.
(172) C. Results
(173) C.1 Data Sets of Natural and Synthetic Abs
(174) Analyzing the Protein Data Bank (PDB) (Berman H M, et al., Nucleic Acids Research 28:235-42 (2000) (herein incorporated by reference in its entirety)) in search of a non-redundant set of natural or synthetic Abs (Methods) yielded a total of 137 Ab-Ag complexes. Of these, 101 are natural (Table S1) and 36 are synthetic (Table S2).
(175) C.2 “CDRs Analyzer”—A Computational Framework for Exploring Ab-Ag Interactions.
(176) The analysis utilized “CDRs Analyzer”, a computational tool we introduce for analyzing Ab-Ag interfaces. It is designed to assist Ab engineering by providing quantitative assessment of the biophysical properties of each residue and each CDR in the paratope. “CDRs Analyzer” takes as input a 3D structure of an Ab-Ag complex in a PDB format and the chain IDs of the Ab and Ag chains to be analyzed. The server provides output both at the residue and at the CDRs levels. The output includes a list of H-bonds (calculated by HBPLUS (McDonald I K, and Thornton J M, J Mol Biol 238:777-93 (1994) (herein incorporated by reference in its entirety)), salt-bridges, pi-pi and cation-pi interactions, and a list of contacting residues (see Methods). Additionally, “CDRs Analyzer” calculates, for each CDR, four parameters to evaluate its contribution to Ag binding: (1) “Contacting residues” is the sum of the number of residues in the CDR that are in contact with the Ag and the number of residues in the Ag that are in contact with the CDR; (2) “Specific interactions” is the number of salt-bridges, pi-pi and cation-pi interaction and the number of possible H-bonds between the CDR residues and the Ag; (3) “Calculated ΔΔG” is the predicted effect on binding of mutating each CDR residue to ALA calculated using FoldX (Guerois, R. et al., Journal of Molecular Biology 320:369-87 (2002) (herein incorporated by reference in its entirety)) and (4) “delta relative accessible surface area (ARSA)” is the sum of the changes in the relative solvent accessibility of each CDR residue upon dissociation of the Ab-Ag complex calculated using DSSP (Kabsch, W. and Sander, C., Biopolymers 22:2577-637 (1983) (herein incorporated by reference in its entirety)). These four binding parameters were combined to give a score that assesses the contribution to Ag binding of a given CDR. This score varies from 4 (no contribution to Ag binding) to 16 (highest contribution to Ag binding; see Methods). It is a unified score that gives an equal weight for the four binding parameters. Ideally, as more structural data become available, the weight that each parameter should have in the final score can be explored and optimized. The binding contribution score is a combined measurement of the Ag binding portion of a given CDR relative to other CDRs of the Ab.
(177) Additionally, “CDRs analyzer” provides the potential of a CDR to bind the Ag as peptide, based on a computational approach that was described previously (Burkovitz A, et al., J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)). “CDRs Analyzer” is available online in http://www.ofranlab.org/CDRs_Analyzer.
(178) C.3 Synthetic Abs Rely Heavily on CDRH3 at the expense of CDRH2 and CDRH1.
(179) CDRH3, which encompasses the V-D-J recombination site, is the most diverse component of natural Abs. As shown in Table A1, in natural Abs CDRH3 has, on average, higher values than any other CDR, for all of the four parameters that were assessed.
(180) C.4 Unlike Synthetic Abs, CDRs in Natural Abs Specialize in Specific Types of Contacts
(181) “CDRs Analyzer” also provides a list of specific contacts (H-bonds, salt bridges, cation-pi or pi-pi). The distribution of each type of interaction across the six CDRs is shown in
(182) In natural Abs, each CDR on the heavy chain specializes in different types of interactions (Kunik, V. and Ofran, Y. Protein Eng Des Sel 2013 (herein incorporated by reference in its entirety)). As shown above, CDRH2 is responsible the largest share of salt-bridges (39.66%). CDRH3 is the main source for H-bonds (30.14%) and all heavy chain CDRs take similar parts of the cation-pi interactions (20.57%, 22.7% and 26.24% of cation-pi interactions from CDRH3, CDRH1 and CDRH2, respectively). This differentiation and specialization is lost for synthetic Abs. For the Abs that emerge from synthetic libraries, CDRH3 takes the central role in all analyzed interactions. CDRH2 has an equal share as CDRH3 only in cation-pi contacts.
(183) C.5 The Focus of Synthetic Abs on CDRH3 Creates Interfaces that are Less Complex and More Modular.
(184) We evaluate the complexity of Ab-Ag interaction using two parameters: independent epitope residue and integrated epitope residues. These parameters reflect the extent to which the six CDRs create an integral interface. An epitope residue on the Ag is considered an “independent residue” if it contacts only one CDR. An epitope residue that contacts three or more different CDRs is considered as an “integrated residue”. To assess the complexity of Ab-Ag interaction, the percentage of integrated and independent residues out of all residues that contact the paratope are calculated (note, however, that the raw output of the “CDRs Analyzer” provides this calculations as a percentage of the residues that contact a given CDR and not as a percentage of the residues that contact the entire paratope, see methods). On average, 57.49% of the epitope residues of natural Abs are independent (that is contact only one CDR). Whereas epitope of synthetic Abs are composed of 63.09% independent residues (
(185) C.6 Demonstrating the Differences Between Synthetic and Natural Abs
(186)
(187) D. Analysis of Results
(188) Synthetic libraries are clearly successful in yielding specific binders that often become successful drug leads. Here, we ask to what extent the products of these libraries mimic natural Abs. One may argue that, as long as the leads are successful, there is no need for the libraries to mimic natural Abs. However, our analysis can be important in two ways: first, as a basic science endeavor, it helps reveal the principles that guide natural Ab-Ag interaction. Second, revealing these principles suggests new avenues that may make synthetic libraries even more potent. While the dataset of synthetic Abs is smaller than that of the natural Abs, the dataset represent a diverse collection of synthetic Abs isolated from a variety of generic (e.g. HuCAL (Knappik, A. et al., J Mol Biol 296:57-86 (2000) or Lee et al. (Lee, C. V. et al., J Mol Riot 340:1073-93 (2004)) (herein incorporated by reference in their entirety)) or custom made libraries. The synthetic Abs in the dataset bind 30 different Ags, which are varied in their size from 51 to 915 residues. We validate that the Ag recognition occurred in different epitope in case two Abs bind the same Ags. Thus the synthetic Abs dataset represents the current strategies for library design. Obviously, as more synthetic Abs become available this analysis should be repeated to refine the insights and establish their significance further.
(189) Large-scale analysis of Ab-Ag complexes can help reveal the principles that allow Igs to accommodate an exquisitely matching paratope for virtually any surface, while strictly maintaining its overall fold. (Novotn, J. et al., Proc Natl Acad Sci USA 83:226-30 (1986); Sela-Culang, I. et al., Front Immunol 4:302 (2013); Sela-Culang, I. et al., Curr Opin Virol 11:98-102. (herein incorporated by reference in their entirety)) The great challenge of Ab design is to make synthetic libraries that will yield Abs against a wide range of targets and epitopes. Indeed, in vivo Ab development relies on a more complex process, and hence may yield Abs with improved properties. This complex process includes gene rearrangement, somatic hyper mutations, clonal selection, both through positive selection for Ag recognition and negative selection for self-binding. We aimed to identify the differences between the Ag binding mechanism of synthetic Abs and natural Abs, which may help improve library design to yield more natural-line Abs. It also allowed us to revisit common assumptions about the role of CDRH3 in Ag recognition.
(190) Obviously, some individual natural Abs and some individual synthetic Abs may be exceptions to the rule. Yet, our results reveal consistent differences between natural and synthetic Abs. The focus of synthetic libraries on engineering CDRH3 creates CDRH3 loops that participate in Ag recognition above the average of CDRH3 in natural Abs. As a result, CDRs H1 and H2 of synthetic Abs contribute less to Ag binding. CDRH3 loops encompass the V-D-J junction, hence this region displays the largest diversity among the six CDRs of the Abs in terms of length, sequence, and structure (Chothia, C. et al., Nature 342:877-83 (1989); Kuroda, D. et al., Proteins 73:608-20 (2008); Morea, V. et al., J Mol Biol 275:269-94 (1998) (herein incorporated by reference in their entirety)). CDRH3 is also located at the center of the binding site and is the CDR loop that undergoes the most significant conformational changes upon binding (Sela-Culang, I. et al., J Immunol 189:4890-9 (2012) (herein incorporated by reference in its entirety)) Thus, it is commonly assumed that CDRH3 accounts for the ability of Abs to recognize and bind specific epitopes. Understandably, Ab engineering methods often focus on CDRH3. For example, Fellouse et al. designed phage display libraries with diversity of 10.sup.4 to 10.sup.22 in CDRH3 and diversity of 32 to 896 in other CDRs. (Fellouse, F. A. et al., J Mol Biol 373:924-40 (2007) (herein incorporated by reference in its entirety)) In the initial HuCAL libraries, (Knappik, A. et al., J Mol Biol 296:57-86(2000) (herein incorporated by reference in its entirety)) diversity beyond the 49 chosen frameworks was introduced only to CDRH3 and CDRL3. In other studies, specific Abs were obtained from libraries with introduced diversity only to CDRH3. (Mahon, C. M. et al. J Mol Biol 425:1712-30 (2013); Braunagel, M. and Little, M. Nucleic Acids Res 25:4690-1 (1997); der Maur, A A et al., J Biol Chem 277:45075-85 (2002) (herein incorporated by reference in their entirety).
(191) However, the relative importance of CDRH3 compared to other CDRs has been recently revisited in numerous studies. Large scale analyses (Kunik, V. and Ofran, Y., Protein Eng Des Sel 2013; Robin, G. et al, J Mol Biol 426:3729-43(2014) (herein incorporated by reference in their entirety)) of Abs have assessed the role of CDRH3. It has been demonstrated that CDRH2 may be as important as CDRH3 (Robin, G. et al. J Mol Biol 426:3729-43(2014 (herein incorporated by reference in its entirety))) in its contribution to the binding free energy of the Ab-Ag complex. In addition, in 93% of the Ab-Ag complexes, CDRH2 contained at least one residue with high energetic contribution (ΔΔG>0.8 kcal/mol) in comparison to 90% of the complexes with such residues from CDRH3. In another study, CDRH3 was found to be responsible for 30.6% of the energetically important Ag-binding residues. (Kunik, V. and Ofran, Y., Protein Eng Des Sel 2013. (herein incorporated by reference in its entirety)) That is, most of the energetically important Ag-binding residues come from other CDRs. This has been shown also for specific examples like the interaction between HyHEL-10 and lysozyme, in which CDRH2 and CDRL1 display a dominant role, while CDRH3 shows very low binding contribution. (Burkovitz, A. et al., J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)) The fact that CDRH3 is not necessary for the versatility of Abs was ultimately demonstrated by a study that has shown that synthetic libraries can yield specific Abs against different Ags with diverse CDRL3 and fixed CDRH3. (Persson, H. et al., J Mol Biol 425:803-11(2013) (herein incorporated by reference in its entirety)) In another study, the introduction of diversity into the sequence of anti ErbB2 Ab only at CDRH3 did not result in affinity enhanced variant, while beneficial mutants could be obtained by engineering one of the other contacting CDRs (CDRH1,H2,L1 or L3). (Hu, D. et al., PLoS One 10:e0129125 (2015) (herein incorporated by reference in its entirety)) This emphasizes that the importance of CDRH3 differ between Abs.
(192) The reliance of synthetic Abs on CDRH3 may take a toll on the diversity of the epitopes that the library can bind, which may be referred to as the effective diversity of the library (as opposed to its actual diversity, represented by the number of unique sequences). Existing synthetic libraries tend to yield Abs with CDRH3 dominance. The typically fixed length and sequence of the other loops does not allow for paratopes with other binding topologies. It is therefore possible that, while the number of variants in the library may be higher than the number of variants in natural repertoires, these synthetic Abs represent only a small subset of the possible Abs that would be represented in a much smaller natural set of Ab sequences.
(193) The effective diversity of a library is not the number of unique Ab sequences it has, but the number of different epitopes they can bind. This is defined by how many of the variants are expressed and fold into Abs with paratopes that are very different from each other. Our results suggest that tampering only with CDRH3 may not be a good way to obtain diverse paratopes. Based on the results presented here, one can propose approaches for improving Ab engineering. Building libraries that allow for higher diversity in all CDRs may result in Abs that have binding modes that are more similar to those of natural Abs, which might increase the effective diversity of the library and culminate in higher success rates. Of note is the degeneration of CDRH2 and CDRH1 in synthetic Abs, most remarkably in the percentage of salt-bridges coming from these CDRs and H-bonds and cation-pi interactions from CDRH1. To correct for this and create better libraries, the amino acid composition in these CDRs should be corrected to favor these types of interactions. This could be achieved by elevating the propensity of charged amino acids in CDRH2 and CDRH1 to produce more salt bridges or elevating the propensity of aromatic, positively charges or polar amino acids in CDRH1 to produce more cation-pi and H-bonds. It is also possible that the frameworks that are commonly used in synthetic libraries are suitable for producing interactions that rely on CDRH3. Considering additional frameworks may, therefore, be beneficial.
(194) A novel approach for the design of synthetic libraries is based on the diversity of natural Ig repertoire (naïve, memory and plasma B-cells), which can be characterized using next generation sequencing (NGS). (Glanville, J. et al., J Proc Natl Acad Sci USA 106:20216-21 (2009); Zhai, W. et al., J Mol Biol 412:55-71(2011)
(195) Glanville et al. (Zhai, W. et al., J Mol Biol 412:55-71 (2011) (herein incorporated by reference in their entirety)) analyzed ˜10.sup.5 sequences of Ab variable fragments from 654 healthy human donors and, consistent with our finding, reported a substantial contribution to total diversity from somatically mutated residues in CDRs 1 and 2. Based on these results, a synthetic Ab library was constructed by introducing a diversity at positions across the six CDRs while the amino acid usage in each position was design to mimic the natural repertoires usage. (Zhai, W. et al., J Mol Biol 412:55-71 (2011) (herein incorporated by reference in its entirety)) The 3D structure of the Ab-Ag complexes that were selected by these modern libraries are still not available. We expect that the relative binding contribution of the different CDRs in these synthetic Abs will better mimic the natural Ab binding mechanism than the synthetic Abs analyzed in the current study.
(196) Although there are many available tools for the automated analysis of Abs sequences, (Kaas, Q. et al., Nucleic Acids Res 32:D208-10 (2004); Ehrenmann, F. et al., Nucleic Acids Res 38:D301-7 (2010); Kunik, V. et al., Nucleic Acids Res 40:W521-4 (2012); Abhinandan, K. R. et al., J Mol Biol 369:852-62 (2007); Ye, J. et al. Nucleic Acids Res 41:W34-40 (2013); Retter, I. et al Nucleic Acids Res 33:D671-4 (2005) (herein incorporated by reference in their entirety)) the development of tools for the structural analysis of Ab-Ag complexes is still in its infancy. Two existing tools that provide comprehensive structural analysis of Abs are ABangle, for calculating the orientation between the VH and the VL, (Dunbar, J. et al., Protein Eng Des Sel 26:611-20 (2013) (herein incorporated by reference in its entirety)) and the “AbAgDb dataset”, which contains interaction profiles of ˜500 Ab-Ag complexes in the PDB. (Kulkarni-Kale, U. et al., Methods Mol Biol 1184:149-64 (2014) (herein incorporated by reference in its entirety)). In the “AbAgDb”, the data is available only for the curated PDBs and most of the output is at the atoms or residues level and not at CDRs level, similarly to tools analyzing general protein-protein interactions. (Tina, K. G. et al., Nucleic Acids Res 35:W473-6 (2007); Laskowski, R. A. et al. Trends Biochem Sci 22:488-90 (1997)) (herein incorporated by reference in their entirety).
(197) “CDRs Analyzer” is designed to assist Ab engineering protocols by providing quantitative assessment of the biophysical properties both at the loop level—by assessing the contribution of each CDR—and at the residue level by identifying specific positions of interest within interface. Here, we used “CDRs Analyzer” to explore the differences between natural and synthetic interactions. This tool can be used to analyze Abs against pathogenic Ags or human-self Ags, to explore the theory that V-genes are evolutionarily pre-configured to recognize common motifs in Ags from pathogenic source. “CDRs Analyzer” can also be applied to characterize other sets of immunological interactions. For example, it allows evaluation of the differences in binding properties of peptide-binding Abs and protein-binding Abs, or the differences between different families of Abs or even differences between Abs against different Ags. However, the most straightforward way to use “CDRs Analyzer” is for the analysis of individual Abs. It is applicable for experimentally solved Ab-Ag complexes as well as to computational models of such complexes. The output of “CDRs Analyzer” can assist different Ab engineering protocols. The contacting residues list and the specific interactions list can guide choosing specific positions for Ab affinity enhancement, decreasing aggregation or for deimmunization. The CDRs binding contribution may be an important consideration for CDR grafting, Ab humanization, design of two-in-one Abs and for identifying CDR-derived peptides. (Burkovitz, A. et al. J Immunol 190:2327-34 (2013) (herein incorporated by reference in its entirety)).
(198) TABLE-US-00001 TABLE A1 Binding parameters of CDRs from natural and synthetic Abs- Average values (standard error) of the four binding parameters calculated by “CDRs Analyzer” for all CDRs in natural and synthetic Abs Contacting CDR Abs residues Specific interactions ΔΔG ΔRSA H1 natural 9.39 (±0.45) 1.64 (±0.16) 2.28 (±0.25) 0.81 (±0.05) synthetic 9.17 (±1.05) 0.94 (±0.18) 2.19 (±0.41) 0.84 (±0.12) H2 natural 11.76 (±0.54) 2.5 (±0.21) 3.7 (±0.29) 1.1 (±0.06) synthetic 11.94 (±1.08) 1.97 (±0.33) 3.34 (±0.43) 1.25 (±0.13) H3 natural 14.25 (±0.48) 2.79 (±0.22) 5.77 (±0.36) 1.31 (±0.06) synthetic 18.97 (±1.23) 4.39 (±0.56) 8.2 (±0.68) 1.83 (±0.16) L1 natural 6.62 (±0.43) 0.93 (±0.13) 1.78 (±0.19) 0.59 (±0.05) synthetic 6.03 (±0.78) 0.92 (±0.28) 1.4 (±0.34) 0.59 (±0.08) L2 natural 4.58 (±0.51) 0.64 (±0.11) 1.24 (±0.17) 0.41 (±0.05) synthetic 4.75 (±0.85) 0.83 (±0.27) 1.3 (±0.33) 0.45 (±0.09) L3 natural 7.74 (±0.43) 1.35 (±0.14) 1.76 (±0.17) 0.62 (±0.04) synthetic 7.06 (±0.71) 1.47 (±0.28) 1.93 (±0.32) 0.59 (±0.07)
(199) TABLE-US-00002 TABLE S1 Non-redundant dataset of natural Ab-Ag complexes: Heavy Light Antigen PDB ID chain chain chains origin of Ab Orign of Ag 1 1A14 H L N Hybridoma Pathogenic 2 1AFV H L A Hybridoma Pathogenic 3 1AHW B A C Hybridoma Human - self 4 1AR1 C D B Hybridoma Non pathogenic 5 1DQJ B A C Hybridoma Non pathogenic 6 1E6J H L P Hybridoma Pathogenic 7 1EGJ H L A Hybridoma Human - self 8 1EO8 H L A Hybridoma Pathogenic 9 1EZV X Y E Hybridoma Non pathogenic 10 1FBI H L X Hybridoma Non pathogenic 11 1FE8 H L A Hybridoma Human - self 12 1FJ1 B A F Hybridoma Pathogenic 13 1FSK C B A Hybridoma Pathogenic 14 1H0D B A C Hybridoma Human - self 15 1IQD B A C Immunized Human - self 16 1JHL H L A Hybridoma Non pathogenic 17 1JPS H L T Hybridoma Human - self (Humanized) 18 1JRH H L I Hybridoma Human - self 19 1K4C A B C Hybridoma Non pathogenic 20 1KB5 H L AB Hybridoma Non pathogenic 21 1KEN H L AC Hybridoma Pathogenic 22 1MLC B A E Hybridoma Non pathogenic 23 1NCA H L N Hybridoma Pathogenic 24 1NDG A B C Hybridoma Non pathogenic 25 1NMB H L N Hybridoma Pathogenic 26 1NSN H L S Hybridoma Pathogenic 27 1OAK H L A Hybridoma Human - self 28 1OB1 B A C Hybridoma Pathogenic 29 1ORQ B A C Hybridoma Non pathogenic 30 1ORS B A C Hybridoma Non pathogenic 31 1OSP H L O Hybridoma Pathogenic 32 1OTS C D AB Hybridoma Pathogenic 33 1P2C B A C Hybridoma Non pathogenic 34 1PKQ B A E Hybridoma Non pathogenic 35 1QFU H L A Hybridoma Pathogenic 36 1RJL B A C Hybridoma Pathogenic 37 1RVF H L 123 Hybridoma Pathogenic 38 1RZJ H L G Immunized Pathogenic 39 1SY6 H L A Hybridoma Human - self 40 1TPX B C A Hybridoma Pathogenic 41 1V7M H L V Hybridoma Human - self 42 1VFB B A C Hybridoma Non pathogenic 43 1WEJ H L F Hybridoma Non pathogenic 44 1YJD H L C Hybridoma Human - self 45 1YNT B A F Hybridoma Pathogenic 46 1YQV H L Y Hybridoma Non pathogenic 47 1YY9 D C A Hybridoma Human - self 48 1Z3G H L A Hybridoma Pathogenic 49 1ZTX H L E Hybridoma Pathogenic 50 2ADF H L A Hybridoma Human - self 51 2AEP H L A Hybridoma Pathogenic 52 2BDN H L A Hybridoma Human - self 53 2DD8 H L S Naïve Pathogenic 54 2DTG A B E Hybridoma Human - self 55 2DTG C D E Hybridoma Human - self 56 2FD6 H L U Hybridoma Human - self 57 2HMI D C B Hybridoma Pathogenic 58 2J4W H L D Hybridoma Pathogenic 59 2JEL H L P Hybridoma Pathogenic 60 2NY7 H L G Immunized Pathogenic 61 2VIR B A C Hybridoma Pathogenic 62 2VWE E C AB Hybridoma Human - self 63 2VXQ H L A Immunized Pathogenic 64 2VXS K O DC Naïve Human - self 65 2VXT H L I Hybridoma Human - self 66 2W9E H L A Hybridoma Human - self 67 2XQY G L A Hybridoma Pathogenic 68 2XRA H L A Immunized Pathogenic 69 2XWT A B C Immunized Human - self 70 2YC1 A B C Naïve Toxin 71 2ZJS H L Y Hybridoma Non pathogenic 72 3AB0 B C A Hybridoma Pathogenic 73 3BGF B C A Hybridoma Pathogenic 74 3BSZ H L F Hybridoma Human - self 75 3CVH H L AC Hybridoma Non pathogenic 76 3D85 B A C Hybridoma Human - self 77 3D9A H L C Hybridoma Non pathogenic 78 3EOA H L I Hybridoma Human - self (Humanized) 79 3FFD A B P Hybridoma Human - self 80 3FMG H L A Hybridoma Pathogenic 81 3G04 B A C Immunized Human - self 82 3GI8 H L C Hybridoma Non pathogenic 83 3GJF H L AC Naïve Human - self 84 3I50 H L E Hybridoma Pathogenic 85 3IDX H L G Immunized Pathogenic 86 3IU3 A B K Hybridoma Human - self 87 3IYW H L ABC Immunized Pathogenic 88 3JWD H L A Immunized Pathogenic 89 3KJ4 H L A Hybridoma Non pathogenic 90 3KJ6 H L A Hybridoma Human - self 91 3KS0 K J A Hybridoma Non pathogenic 92 3L5W H L I Hybridoma Human - self 93 3LIZ H L A Hybridoma Non pathogenic 94 3LQA H L GC Immunized Pathogenic 95 3LZF H L A Immunized Pathogenic 96 3MXW H L A Hybridoma Non pathogenic 97 3NCY P S AB Hybridoma Pathogenic 98 3NIG H L A Hybridoma Human - self 99 3O0R H L BC Hybridoma Pathogenic 100 3P30 H L A Immunized Pathogenic 101 3RAJ H L A Hybridoma Human - self
(200) TABLE-US-00003 TABLE S2 Non-redundant dataset of synthetic Ab-Ag complexes: Heavy Light Antigen PDB ID chain chain chains Origin of Ab Origin of Ag 1 1ZA3 B A S YS binary code.sup.1 Human - self 2 2FJG H L VW Lee et al. .sup.2 2004a Human - self 3 2FJH H L VW Lee et al. .sup.2 2004a Human - self 4 2H9G B A R Lee et al. .sup.2 2004a Human - self 5 2HFG H L R Lee et al. .sup.2 2004a Human - self 6 2QQK H L A VH/VL library .sup.3 Human - self 7 2QQN H L A VH/VL library .sup.3 Human - self 8 2R0K H L A Lee et al. .sup.2 2004a Human - self 9 2XTJ D B A HuCAL GOLD .sup.4 Human - self 10 3BN9 D C B HuCAL.sup.5 Human - self 11 3DVG B A XY Fellouse et al..sup.6 Human - self 12 3G6D H L A HuCAL GOLD .sup.4 Human - self 13 3G6J F E AB Lee et al. .sup.2 2004a Human - self 14 3GRW H L A Lee et al. .sup.2 2004a Human - self 15 3HI6 X Y B Hoet et al. .sup.7 Human - self 16 3K2U H L A Lee et al. .sup.2 2004a Human - self 17 3KR3 H L D Hoet et al. .sup.7 Human - self 18 3L95 H L Y Human - self 19 3MA9 H L A HuCAL GOLD.sup.4 Pathogenic 20 3N85 H L A WS binary code .sup.8 Human - self 21 3NH7 H L A HuCAL GOLD.sup.4 Human - self 22 3NPS B C A HuCAL.sup.5 Human - self 23 3P0Y H L A Lee et al. .sup.2 2004a and Human - self Bostrom et al. .sup.9 24 3P11 H L A Lee et al. .sup.2 2004a and Human - self Bostrom et al. .sup.9 25 3PGF H L A Fellouse et al..sup.6 Pathogenic 26 3PNW B A C Perssonetal..sup.10 Human - self 27 3R1G H L B VH/VL library .sup.3 Human - self 28 3SOB H L B Lee et al. .sup.2 2004a Human - self 29 3U30 C B A Lee et al. .sup.2 2004a Human - self 30 4DKE H L AB Lee et al 2004b .sup.11 and Human - self VH/VL library .sup.3 31 4DKF H L AB Lee et al 2004b .sup.11 and Human - self VH/VL library .sup.3 32 4DN4 H L M HuCAL GOLD .sup.4 Human - self 33 4JQI H L AV Fellouse et al..sup.6 Non-pathogenic 34 4OGY H L A Hoet et al. .sup.7 Human - self 35 4XTR E F AB Non-pathogenic 36 4ZFG H L A Lee et al. .sup.2 2004a and Human - self Bostrom et al. .sup.9 .sup.1Fellouse F A, Li B, Compaan D M, Peden A A, Hymowitz S G, Sidhu S S. Molecular recognition by a binary code. J Mol Biol 2005; 348: 1153-62. .sup.2 Lee C V, Liang W C, Dennis M S, Eigenbrot C, Sidhu S S, Fuh G. High-affinity human antibodies from phage-displayed synthetic Fab libraries with a single framework scaffold. J Mol Biol 2004; 340: 1073-93. .sup.3 Liang W C, Dennis M S, Stawicki S, Chanthery Y, Pan Q, Chen Y, et al. Function blocking antibodies to neuropilin-1 generated from a designed human synthetic antibody phage library. J Mol Biol 2007; 366: 815-29. .sup.4Rothe C, Urlinger S, Löhning C, Prassler J, Stark Y, Jäger U, et al. The human combinatorial antibody library HuCAL GOLD combines diversification of all six CDRs according to the natural immune system with a novel display method for efficient selection of high-affinity antibodies. J Mol Biol 2008; 376: 1182-200. .sup.5Knappik A, Ge L, Honegger A, Pack P, Fischer M, Wellnhofer G, et al. Fully synthetic human combinatorial antibody libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J Mol Biol 2000; 296: 57-86. .sup.6Fellouse F A, Esaki K, Birtalan S, Raptis D, Cancasci V J, Koide A, et al. High-throughput generation of synthetic antibodies from highly functional minimalist phage-displayed libraries. J Mol Biol 2007; 373: 924-40. .sup.7 Hoet R M, Cohen E H, Kent R B, Rookey K, Schoonbroodt S, Hogan S, et al. Generation of high-affinity human antibodies by combining donor-derived and synthetic complementarity-determining-region diversity. Nat Biotechnol 2005; 23: 344-8. .sup.8 Birtalan S, Fisher R D, Sidhu S S. The functional capacity of the natural amino acids for molecular recognition. Mol Biosyst 2010; 6: 1186-94. .sup.9 Bostrom J, Yu SF, Kan D, Appleton B A, Lee C V, Billeci K, et al. Variants of the antibody herceptin that interact with HER2 and VEGF at the antigen binding site. Science 2009; 323: 1610-4. .sup.10Persson H, Ye W, Wernimont A, Adams J J, Koide A, Koide S, et al. CDR-H3 diversity is not required for antigen recognition by synthetic antibodies. J Mol Biol 2013; 425: 803-11. .sup.11 Lee C V, Sidhu S S, Fuh G. Bivalent antibody phage display mimics natural immunoglobulin. J Immunol Methods 2004; 284: 119-32.
Example 6
(201) Methods for Re-Epitoping Antibody
(202) The antibody in this example binds to the human P2X4. Methods to re-epitope the antibody to introduce improved binding were developed. Strategies based on sequence, structural, and biological data were implemented to generate libraries that yielded improved Abs.
(203) The first strategy for library design was based on sequence analyses of the antibody of this example in order to identify positions that play a key role in the native paratope as well as positions and specific variants that may contribute to a re-epitoped interface. Positions were selected for variation if they were in the CDRs, as defined by Paratome and/or Kabat, and if they were not conserved based on sequence alignments of homologs obtained by a Blast search of the pdb database. A total of 50 positions spanning CDRs in both the H and L chains were selected. Each position that was selected was varied independently, using an NNK codon (When N denotes any of the four standard nucleotides and K denotes Guanine or Thyamine), such that the library was made up of clones with single mutations. In addition, a library of clones with double mutations, one in the H chain and one in the L chain was constructed and cloned into a phage display plasmid.
(204) Following three rounds of selection against P2X4 lipoparticles, as well as ‘null’ lipoparticles (i.e., lipoparticles that do not present the receptor), the libraries underwent deep sequencing to identify positions and variants that were variable or conserved under the different selection positions.
(205) Standard sequencing identified a variant with increased affinity towards P2X4, which contained two mutations (one in each chain). This variant was expressed as soluble scFv and as IgG and the binding affinity was measured using standard techniques.
(206) A second strategy for library design was to select positions for variation based on a combination of sequence, structure, and biological data, which are predicted to form surface patches on the Ab. Variation at each of these patches, or clusters of residues, may yield insight into the native paratope, as well as specific variants that contribute to binding and/or are relevant for re-epitoping. As this strategy includes prediction of surface patches, a three-dimensional model of the antibody is required.
(207) Alternatively, one of the P2X4 library designs (based on P2X4 binder) is based on SHM data (Burkovitz, A. et al. FEBS J, v. 281, p. 306-19 (2014); Kunik, V. et al., Nucleic Acids Res, v. 40: England, p. W521-4 (2012a)(hereby incorporated by reference in their entirety). SHM data is used to choose positions to vary, and the data that describes the frequencies of the observed Ag-binding amino acid per CDR is used to choose variation at each position. This design does not depend on a 3D model of the antibody, and can be useful for designing a general library that can be used for different targets. Any germline sequence or an antibody with known favorable experimental properties can be used.
(208) Several models of the antibody of this example were generated. Modeling was performed with the Antibody Modeling Protocols in Discovery Studio and in MOE. One of the models underwent further refinement by energy minimization.
(209) Positions for variation were selected if they met the following criteria: 1) High probability of mutation from germline based on data in Burkovitz et al (greater than 0.2 frequency); 2) defined as a CDR by Paratome; 3) Are >10% solvent accessible in the antibody model. As H3 isn't represented fully in the data from Burkovitz et al, all positions in H3 were included. Residues that were predicted to be structurally important, for example, forming a salt-bridge within the antibody in the model, or contributing to hydrophobic core packing, even though they have >10% solvent accessibility, were excluded.
(210) Positions that met the above requirements were visually inspected in the models. Groups of 5 of these positions that had spatial proximity were selected for variation with an NNS codon at each position (S denotes Guanine or Cytosine). Five such libraries were constructed, each spanning a distinct cluster of residues, although with some overlap in positions between some of the libraries. The libraries were cloned into phage display system and underwent selection against P2X4 by employing an iterative process of depletion on HEK cells and panning on P2X4 overexpressing HEK cells.
(211) Enriched clones were sequenced and individually tested for binding. Purified scFV-phage fusion of enriched clones were mixed with a negative control scFv-phage particle at a ratio of 1:1000 and underwent one round of panning on P2X4 expressing HEK cells or on negative control HEK cells. Phages were eluted from the cells and the ratio of the tested clone scFv-phage over the negative control scFv-phage was determined. The enrichment of the tested scFv-phage in the course of panning is proportional to binding. This way a re-epitoped clone, displaying improved binding was identified. The next steps will be to purify a soluble scFv and then IgG determine affinity and test for biological activity.
(212) It is to be appreciated that the Detailed Description section, and not the Summary and Abstract sections, is intended to be used to interpret the claims. The Summary and Abstract sections may set forth one or more but not all exemplary embodiments of the present invention as contemplated by the inventor(s), and thus, are not intended to limit the present invention and the appended claims in any way.
(213) The present invention has been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
(214) The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.
(215) The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
(216) The claims in the instant application are different than those of the parent application or other related applications. The Applicant therefore rescinds any disclaimer of claim scope made in the parent application or any predecessor application in relation to the instant application. The Examiner is therefore advised that any such previous disclaimer and the cited references that it was made to avoid, may need to be revisited. Further, the Examiner is also reminded that any disclaimer made in the instant application should not be read into or against the parent application.