SYSTEM AND METHOD FOR GENERATING ANTIBODY LIBRARIES
20180196926 ยท 2018-07-12
Assignee
Inventors
Cpc classification
C40B50/06
CHEMISTRY; METALLURGY
C07K16/00
CHEMISTRY; METALLURGY
G16B15/30
PHYSICS
G16B30/00
PHYSICS
G01N2500/04
PHYSICS
G16B15/00
PHYSICS
B01J19/0046
PERFORMING OPERATIONS; TRANSPORTING
G16B35/00
PHYSICS
G16C20/10
PHYSICS
C07K2317/10
CHEMISTRY; METALLURGY
G16C10/00
PHYSICS
International classification
B01J19/00
PERFORMING OPERATIONS; TRANSPORTING
Abstract
The invention relates to system and method for generating an antibody library. Specifically, the invention relates to a computer-implemented system and method for generating a library of antibodies based on a predetermined epitope.
Claims
1. A computer implemented method for generating a library of antibodies, the method comprising: generating one or more seed structures based on one or more predetermined amino acid sequences of a complementarity determining region (CDR), one or more predetermined variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, or a combination thereof; providing a predetermined epitope; docking said one or more seed structures on said epitope; evaluating one or more motifs of said one or more seed structures for one or more predetermined developability properties; and identifying one or more target structures in order to generate a library, thereby generating a library of antibodies.
2. The method of claim 1, wherein the step of generating one or more seed structures comprising: obtaining a first amino acid sequence of a complementarity determining region (CDR) associated with a heavy chain and a second amino acid sequence of a CDR associated with a light chain from a database of CDR sequences; obtaining one or more variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, wherein each of said pair having one or more predetermined developability properties that facilitate for screening antibodies; and analyzing said amino acid sequences and said VH/VL pairs with the use of a macro-molecular algorithmic unit to generate one or more seed structures.
3. The method of claim 1, further comprising: evaluating the docked seed structures for a shape complementarity and an epitope overlap; selecting one or more seed structures having a value exceeding a predetermined threshold level, wherein said value is associated with a shape complementarity score, an epitope overlap score, or a combination thereof.
4. The method of claim 1, wherein the step of evaluating one or more motifs comprising evaluating one or more motifs of the selected structures to determine whether said one or more motifs exhibit a negative effect for one or more predetermined developability properties.
5. The method of claim 1, wherein the step of identifying one or more target structures is based on the determination of presence or absence of said negative effect of said one or more motifs.
6. The method of claim 2, wherein said first amino acid sequence is H3 sequence of CDR3.
7. The method of claim 2, wherein said first amino acid sequence is L3 sequence of CDR3.
8. The method of claim 2, wherein said database is a CDR3 sequence database.
9. The method of claim 2, wherein said one or more predetermined developability properties facilitate for selecting one or more VH/VL pairs.
10. The method of claim 2, wherein at least one of said one or more predetermined developability properties is an immunogenicity.
11. The method of claim 2, wherein at least one of said one or more predetermined developability properties is an expression rate (mg/L), a relative display rate, a thermal stability (T.sub.m), an aggregation propensity, a serum half-life, an immunogenicity, or a viscosity.
12. The method of claim 2, wherein said macro-molecular algorithmic unit evaluates the amino acid sequence of H3 loop, L3 loop, or a combination thereof.
13. The method of claim 2, wherein said macro-molecular algorithmic unit modifies or optimizes the amino acid sequence of H3 loop, L3 loop, or a combination thereof, based on a Point Specific Scoring Matrix (PSSM) and said one or more VH/VL pairs.
14. The method of claim 2, wherein said one or more seed structures are generated based on an energy function of H3 loop, L3 loop, said one or more VH/VL pairs or a combination thereof.
15. The method of claim 2, wherein said one or more seed structures are generated based on humanization of said structures.
16. The method of claim 1, wherein said predetermined epitope is a subset of a protein.
17. The method of claim 1, wherein said predetermined epitope has one or more residues that interact with its interacting partner at a distance <4 A.
18. The method of claim 3, further comprising evaluating the selected seed structures for a simulated annealing process.
19. The method of claim 18, wherein said annealing process is performed by a Monte Carlo simulation.
20. The method of claim 18, wherein said annealing process is performed based on rigid body minimization, antibody H3-L3 sequence optimization, optimizing the packing of interface and core, optimizing the backbone of antibody, optimizing the light and heavy chain orientation, optimizing the antibody as monomer, or a combination thereof.
21. The method of claim 4, wherein the step of evaluation optionally comprising analyzing one or more residues in the H3 or L3 loops to determine a mutation based on a Point Specific Scoring Matrix (PSSM) or a probability threshold and evaluate an energy score.
22. The method of claim 4, wherein the step of evaluation comprising removing immunogenic motifs.
23. The method of claim 4, wherein the step of evaluation comprising removing one or more motifs with negative effects on one or more predetermined developability properties.
24. A system for generating a library of antibodies, the system comprising: a seed structure generation unit that generates one or more seed structures based on one or more predetermined amino acid sequences of a complementarity determining region (CDR), one or more predetermined variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, or a combination thereof; an epitope unit that provides a predetermined epitope; a docking unit that facilitates docking said one or more seed structures on said epitope; an evaluation unit that evaluates one or more motifs of said one or more seed structures for one or more predetermined developability properties; and a library generation unit that identifies one or more target structures in order to generate a library of antibodies.
25. A computer readable storage media comprising instructions to perform a method for generating a library of antibodies, the method comprising: generating one or more seed structures based on one or more predetermined amino acid sequences of a complementarity determining region (CDR), one or more predetermined variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, or a combination thereof; providing a predetermined epitope; docking said one or more seed structures on said epitope; evaluating one or more motifs of said one or more seed structures for one or more predetermined developability properties; and identifying one or more target structures in order to generate a library, thereby generating a library of antibodies.
26. A computer implemented method for generating a library of antibodies, the method comprising: obtaining a first amino acid sequence of a complementarity determining region (CDR) associated with a heavy chain and a second amino acid sequence of a CDR associated with a light chain from a database of CDR sequences; obtaining one or more variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, wherein each of said pair having one or more predetermined developability properties that facilitate for screening antibodies; analyzing said amino acid sequences and said VH/VL pairs with the use of a macro-molecular algorithmic unit to generate one or more seed structures; providing a predetermined epitope; docking said one or more seed structures on said epitope; evaluating the docked seed structures for a shape complementarity and an epitope overlap; selecting one or more seed structures having a value exceeding a predetermined threshold level, wherein said value is associated with a shape complementarity score, an epitope overlap score, or a combination thereof; evaluating one or more motifs of the selected structures to determine whether said one or more motifs exhibit a negative effect for one or more predetermined developability properties; and identifying one or more target structures based on the determination of said negative effect of said one or more motifs in order to generate a library, thereby generating a library of antibodies.
27. A system for generating a library of antibodies, the method comprising: a complementarity determining region (CDR) unit that facilitates obtaining a first amino acid sequence of a CDR associated with a heavy chain and a second amino acid sequence of a CDR associated with a light chain from a database of CDR sequences; a framework unit that facilitates obtaining one or more variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, wherein each of said pair having one or more predetermined developability properties that facilitate for screening antibodies; an analysis unit that facilitates analyzing said amino acid sequences and said VH/VL pairs with the use of a macro-molecular algorithmic unit to generate one or more seed structures; an epitope unit that provides a predetermined epitope; a docking unit that facilitates docking said one or more seed structures on said epitope; an evaluation unit that facilitates evaluating the docked seed structures for a shape complementarity and an epitope overlap; a selection unit that facilitates selecting one or more seed structures having a value exceeding a predetermined threshold level, wherein said value is associated with a shape complementarity score, an epitope overlap score, or a combination thereof; a motif evaluation unit that facilitates evaluating one or more motifs of the selected structures to determine whether said one or more motifs exhibit a negative effect for one or more predetermined developability properties; and a library generation unit that facilitates identifying one or more target structures based on the determination of said negative effect of said one or more motifs in order to generate a library, thereby generating a library of antibodies.
28. A computer readable storage media comprising instructions to perform a method for generating a library of antibodies, the method comprising: obtaining a first amino acid sequence of a complementarity determining region (CDR) associated with a heavy chain and a second amino acid sequence of a CDR associated with a light chain from a database of CDR sequences; obtaining one or more variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs, wherein each of said pair having one or more predetermined developability properties that facilitate for screening antibodies; analyzing said amino acid sequences and said VH/VL pairs with the use of a macro-molecular algorithmic unit to generate one or more seed structures; providing a predetermined epitope; docking said one or more seed structures on said epitope; evaluating the docked seed structures for a shape complementarity and an epitope overlap; selecting one or more seed structures having a value exceeding a predetermined threshold level, wherein said value is associated with a shape complementarity score, an epitope overlap score, or a combination thereof; evaluating one or more motifs of the selected structures to determine whether said one or more motifs exhibit a negative effect for one or more predetermined developability properties; and identifying one or more target structures based on the determination of said negative effect of said one or more motifs in order to generate a library, thereby generating a library of antibodies.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The invention will be better understood from a reading of the following detailed description taken in conjunction with the drawings in which like reference designators are used to designate like elements:
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION OF THE INVENTION
[0026] The invention provides system and method for generating an antibody library. Specifically, the invention relates to a computer-implemented system and method for generating a library of antibodies based on a predetermined epitope.
[0027]
[0028] As shown in
[0029] In one aspect, an antibody library can be generated in an online environment. As illustrated in
[0030] In one embodiment, server 11 may include a plurality of programmed platforms or units, for example, but are not limited to, a seed generation platform 12, docking platform 20, design platform 28, and an epitope unit 34. Seed generation platform 12 may include one or more programmable units, for example, but are not limited to, a complementarity determining region (CDR) unit 14, a framework unit 16, and an analysis unit 18. Docking platform 20 may include a plurality of programmed platforms or units, for example, but are not limited to, a docking unit 22, an evaluation unit 24, and a selection unit 26. Design platform 28 may include a plurality of programmed platforms or units, for example, but are not limited to, a motif evaluation unit 30 and a library generation unit 32.
[0031] The term platform or unit, as used herein, may refer to a collection of programmed computer software codes for performing one or more tasks.
[0032] CDR 14 unit may facilitate a user to obtain a first amino acid sequence of a CDR associated with a heavy chain and a second amino acid sequence of a CDR associated with a light chain from a database 35 of CDR sequences. In one embodiment, the first amino acid sequence is H3 sequence of CDR3. In another embodiment, the first amino acid sequence is L3 sequence of CDR3. In one example database 35 is a CDR3 sequence database.
[0033] Framework unit 16 may facilitate a user to obtain one or more variable heavy (VH) and variable light (VL) structural framework (VH/VL) pairs. Each of the pair may have one or more predetermined developability properties that facilitate for screening antibodies. The predetermined developability properties may also facilitate for selecting one or more desirable VH/VL pairs. Examples of a predetermined developability property include, for example, but not limited to, an expression rate (mg/L), a relative display rate, a thermal stability (T.sub.m), an aggregation propensity, a serum half-life, an immunogenicity, and a viscosity. In a particular embodiment, the predetermined developability property is an immunogenicity.
[0034] Analysis unit 18 may facilitate for analyzing the amino acid sequences and the VH/VL pairs with the use of a macro-molecular algorithmic unit to generate one or more seed structures.
[0035] The macro-molecular algorithmic unit may facilitate for evaluating the amino acid sequence of H3 loop, L3 loop, or a combination thereof. The macro-molecular algorithmic unit can be used to modify or optimize the amino acid sequence of H3 loop, L3 loop, or a combination thereof. In one embodiment, the amino acid sequence of H3 loop, L3 loop, or a combination thereof can be modified or optimized based on a Point Specific Scoring Matrix (PSSM). In another embodiment, the amino acid sequence of H3 loop, L3 loop, or a combination thereof can be modified or optimized based on one or more VH/VL pairs.
[0036] In one aspect, one or more seed structures are generated based on an energy function of H3 loop, L3 loop, VH/VL pair or a combination thereof. In another aspect, one or more seed structures are generated based on humanization of the structures.
[0037] Epitope unit 34 may facilitate for providing a predetermined epitope. In one example, the epitope is determined based on a subset of a protein. In another example, the epitope has one or more residues that interact with its interacting partner at a predetermined distance. In one embodiment, the distance is <4 A. Other suitable distances are also encompassed within the scope of the invention.
[0038] Docking unit 22 may facilitate for docking one or more seed structures on the epitope. Evaluation unit 24 may facilitate for evaluating the docked seed structures for a shape complementarity and an epitope overlap.
[0039] Selection unit 26 may facilitate for selecting one or more seed structures having a value exceeding a predetermined threshold level. In one embodiment, the predetermined threshold level is based on a shape complementarity score. In another embodiment, the predetermined threshold level is based on an epitope overlap score. In some embodiments, the predetermined threshold level is based a combination of a shape complementarity score and an epitope overlap score.
[0040] In some embodiments, one or more selected seed structures can be optimized using a simulated annealing process which is an adaptation of the Monte Carlo method to generate sample states of a thermodynamic system. In another embodiment, the simulated annealing process is composed of rigid body minimization, antibody H3-L3 sequence optimization, optimizing the packing of interface and core, optimizing the backbone of antibody, optimizing the light and heavy chain orientation, optimizing the antibody as monomer, or a combination thereof.
[0041] Motif evaluation unit 30 may facilitate for evaluating one or more motifs of the selected structures to determine whether one or more motifs exhibit a negative effect for one or more predetermined developability properties. In some embodiments, the one or more motifs with negative effects are removed. In a particular embodiment, an immunogenic motif is removed.
[0042] In one embodiment, CDR regions are mutated according to a Point Specific Scoring Matrix (PSSM) and the evaluation may be performed by evaluating an energy score that is derived from the algorithmic unit.
[0043] Library generation unit 32 may facilitate for identifying one or more target structures based on the determination of any negative effect of one or more motifs in order to generate a library.
[0044]
[0045]
[0046] As shown in item 76, a packing and a side chain minimization can be performed. As shown in item 78, an energy score can be derived. As shown in item 79, immunogenic or sequence motif affecting developability can be penalized to determine the energy function. As shown in item 80, an output score can be sorted based on energy estimates. As shown in item 84, one or more top ranking structures or models can be selected for each VH/VL pair to serve as seeds for docking stage.
[0047]
[0048]
[0049] The following examples are presented in order to more fully illustrate the preferred embodiments of the invention. They should in no way be construed, however, as limiting the broad scope of the invention.
EXAMPLES
Example 1
[0050] Our invention utilizes computational processing power to compute optimal antibody molecules that bind a predefined epitope of a selected target polypeptide molecule. Given a computer system and a macro molecular modeling software that is able to approximate the free energy of a protein molecule (a.k.a free energy score, and/or score may be used interchangeably) the algorithm is detailed below and is divided to 3 sections: [0051] 1. Seed generation [0052] 2. Docking [0053] 3. Design
[0054] Each of the 2 first sections generates the input for the next section. Unless otherwise stated, all procedures described here (such as grafting, mutating) are purely computational.
Stage 1: Seed Generation
[0055] 1. Collect H3+L3 sequences from a data set (either human or other organism): [0056] a. B cell repertoire [0057] b. existing PDB structures [0058] 2. Collect VH/VL pairs of antibody frameworks that have good developability properties (F) (See Table 1) [0059] 3. Use a macro-molecular modeling software to either: [0060] a. model (do not change amino acid sequence of H3+L3 loops) [0061] b. design (optimize the amino acid sequence of the loops according to PSSM and VH/VL structure) [0062] the H3-L3 combinations on top of all VH/VL pairs of antibody frameworks [0063] 4. Select top N best energy scoring structures (VH-H3-VL-L3) for each framework (NxF) to serve as seeds [0064] 5. If started from non-human framework, humanize at the end. [0065] Stage 2: Docking [0066] 6. Define epitope (E) (Eset of protein residues) [0067] a. Rational selectionmanually define a subset of protein residues to serve as epitope. [0068] b. According to interacting partnerdefine the epitope as the set of all residues that interact (distance to partner <4 A) with that target's interacting partner. [0069] 7. dock all seeds using a protein docking software on target [0070] 8. Collect top P best predictions complexes for each seed, based on shape complementarity score [0071] 9. for each complex P calculate epitope overlap.
Example
[0072] a. Calculate E.sub.pthe set of residues that interact (distance to partner <4 A) with the target's interacting partner [0073] b. Calculate:
Stage 3: Design
[0077] 1. Use a protein modeling software and a predefined energy function to iterate the following as a Monte Carlo with Simulated Annealing process: [0078] a. Rigid body minimization [0079] b. Antibody H3-L3 sequence optimization [0080] c. optimize packing of interface and core [0081] d. optimize backbone of antibody [0082] e. optimize light and heavy chain orientation [0083] f. optimize antibody as monomer [0084] 2. Extract a chosen number of best scoring structures [0085] 3. Optionally, Enrich the set of selected antibodies by running FilterScan: [0086] a. Go over each position in the H3 and L3 loops and try all possible mutations or mutations according to PSSM and a probability threshold (mutations that are more common according to the PSSM will have a higher probability of being sampled) [0087] b. Evaluate energy score and accept only if improved. [0088] 4. For each chosen structure: [0089] a. Remove motifs that may have negative effect on developability [0090] b. Remove immunogenic motifs.
TABLE-US-00001 TABLE 1 Developability properties used for selecting VH/VL frameworks Developability properties used for screening Expression rate (mg/L) Relative display rates (Yeast, Phage, Bacteria, Ribosome) Thermal stability (T.sub.m) Aggregation propensity Serum half life Immunogenicity Viscosity
Implementation
[0091] On an amazon cloud, installed with a protein modeling software: [0092] 1. Start with 50,000 antibody models, dock each of them on target. [0093] 2. Calculate overlap with interaction site of the ligand (epitope) take the best 10% of the models [0094] 3. Run a design algorithm on each of the 10%, generate 5 designs for each. (On our cluster, it took 2 hours for a single CPU to generate 1 design. Overall, 50,000 CPU hours) [0095] 4. Amplify the variability of the designs by running the FilterScan algorithm. [0096] 5. Pick the best scoring 50,000 for synthesis.
[0097] Alternatively, one can start with more antibody models in the first step, and omit the filterscan step. Starting from a larger number of antibody models should yield a library with a larger diversity, as the filterscan algorithm generates just one mutation per model. Starting from a larger number of antibody models however, requires more CPU hours and therefore is more costly.
[0098] Having described preferred embodiments of the invention with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments, and that various changes and modifications may be effected therein by those skilled in the art without departing from the scope or spirit of the invention as defined in the appended claims.