COMPUTATIONAL PIPELINE FOR ANTIBODY MODELING AND DESIGN
20180260518 ยท 2018-09-13
Inventors
Cpc classification
G16B15/00
PHYSICS
C07K16/00
CHEMISTRY; METALLURGY
G16B15/30
PHYSICS
International classification
Abstract
This disclosure presents methods for antibody structure prediction and design. We utilize the growing number of antibody structures and sequences are used with powerful protein modeling methods to design and predict antibody structural models up to sub-angstrom accuracy. The invention also relates to systems and methods for generating an antibody library. Specifically, the invention relates to computer-implemented systems and methods for generating an antibody library for a predetermined epitope. The invention further relates to determining structural models of the interface between an antibody and its antigen. The invention also relates to determining structural models of an unbound complementarity determining region of an antibody.
Claims
1. A computer implemented method or a computer readable storage media comprising instructions to perform a method for generating a library of antibody models to an epitope, the method comprising: providing a sequence database of complementarity determining regions (CDRs) and a database of backbone dihedral angles for CDR segments compatible with a preselected structural template and based on known antibody 3-D structures; docking said preselected structural template on said epitope; evaluating one or more sequences from said databases using a simulated annealing process; and identifying one or more segment sequences in order to generate a library, thereby generating a library of antibody models to the epitope.
2. The method of claim 1, further comprising the step of generating the databases of CDRs and of backbone dihedral angles for CDR segments, comprising the steps of: selecting an antibody structure to serve as a template; obtaining a set of non-redundant high resolution antibody models; extracting the ScFv portion from each antibody model; cutting each ScFv into four segments, wherein the segments are: VH residues 1-99, an H3 loop, VL residues 1-87, and an L3 loop; generating a Point Specific Scoring Matrix (PSSM) for each segment; and generating database entries for each segment using a macromolecular algorithmic unit.
3. The method of claim 1, wherein said annealing process is performed by a Monte Carlo simulation with metropolis acceptance criteria.
4. The method of claim 3, wherein said annealing process comprises optimizing the packing of side chains, VH-VL rigid body minimization, antibody-ligand rigid body minimization, antibody VH-VL sequence optimization, optimizing the backbone of antibody, optimizing rotamers, optimizing the light and heavy chain orientation, optimizing the antibody as monomer, or a combination thereof.
5. The method of claim 4, wherein multiple iterations of said Monte Carlo simulation are performed, wherein a structural model is accepted if it passes predetermined metropolis acceptance criteria after a predetermined number of iterations.
6. The method of claim 2, wherein said macromolecular algorithmic unit cyclically optimizes or modifies the amino acid sequence based on a predetermined Point Specific Scoring Matrix (PSSM) and Root Mean Square Deviation (RSMD) of the modeled segment, wherein sequences having an RSMD below a predetermined threshold following a predetermined number of optimization/modification cycles are included in the database.
7. The method of claim 6, wherein each cycle of said cyclical optimization further comprises inserting a chain break at a random position that is not a part of secondary structure (-helix or -sheet).
8. The method of claim 6, wherein said cyclical optimization further comprises evaluation of each segment or a combination thereof, using a cyclical loop closure process repeated over a predetermined number of cycles.
9. The method of claim 8, wherein said cyclical loop closure process evaluates segments based on sequence optimization, side chain packing, side chain minimization and energy function.
10. The method of claim 1, wherein said epitope is a protein.
11. A computer implemented method or a computer readable storage media comprising instructions to perform a method for generating a 3-D structural model of an antibody-antigen complex, the method comprising: providing a sequence database of complementarity determining regions (CDRs) and a database of backbone dihedral angles for CDR segments compatible with a preselected structural template and based on known antibody 3-D structures; providing a predetermined antibody sequence and a predetermined epitope structure for said antigen; docking said preselected structural template on said epitope; evaluating the predetermined antibody sequence having structural conformations of entries from said databases using a simulated annealing process; and optimizing the structural conformation, thereby generating the 3-D structural model of the antibody-antigen complex.
12. The method of claim 11, further comprising the step of the databases of CDRs and of backbone dihedral angles for CDR segments, comprising the steps of: selecting an antibody structure to serve as a template; obtaining a set of non-redundant high resolution antibody models; extracting the ScFv portion from each antibody model; cutting each ScFv into four segments, wherein the segments are: VH residues 1-99, an H3 loop, VL residues 1-87, and an L3 loop; generating a Point Specific Scoring Matrix (PSSM) for each segment; and generating database entries for each segment using a macromolecular algorithmic unit.
13. The method of claim 11, wherein said structural conformations comprise a set of backbone dihedral angles for each entry in said database.
14. The method of claim 11, wherein said annealing process is performed by a Monte Carlo simulation with metropolis acceptance criteria.
15. The method of claim 14, wherein said annealing process comprises optimizing the packing of side chains, optimizing rotamers, antibody-ligand rigid body minimization, VL-VH rigid body minimization, optimizing the backbone of antibody, optimizing the light and heavy chain orientation, optimizing the antibody as monomer, or a combination thereof.
16. The method of claim 15, wherein multiple iterations of said Monte Carlo simulation are performed, wherein a structural model is accepted if it passes metropolis acceptance criteria after a predetermined number of iterations.
17. The method of claim 12, wherein said macromolecular algorithmic unit cyclically optimizes or modifies the amino acid sequence based on a predetermined Point Specific Scoring Matrix (PSSM) and Root Mean Square Deviation (RSMD) of the modeled segment, wherein sequences having a RSMD below a predetermined threshold following a predetermined number of optimization/modification cycles are included in the sequence database.
18. The method of claim 17, wherein each cycle of said cyclical optimization further comprises inserting a chain break at a random position that is not a part of secondary structure (-helix or -sheet).
19. The method of claim 18, wherein said cyclical optimization further comprises evaluation of each segment or a combination thereof, using a cyclical loop closure process repeated over a predetermined number of cycles.
20. The method of claim 19, wherein said cyclical loop closure process evaluates segments based on sequence optimization, side chain packing, side chain minimization and energy function.
21. The method of claim 11, wherein said epitope is a protein.
22. A computer implemented method or a computer readable storage media comprising instructions to perform a method for generating a 3-D structural model of an antibody, the method comprising: providing a sequence database of complementarity determining regions (CDRs) and a database of backbone dihedral angles for CDR segments compatible with a preselected structural template and based on known antibody 3-D structures; providing a predetermined antibody sequence; evaluating the predetermined antibody sequence having structural conformations of entries from said databases using a simulated annealing process; and optimizing the structural conformation, thereby generating the 3-D structural model of the antibody.
23. The method of claim 22, further comprising the step of generating the databases of CDRs and of backbone dihedral angles for CDR segments, comprising the steps of: selecting an antibody structure to serve as a template; obtaining a set of non-redundant high resolution antibody models; extracting the ScFv portion from each antibody model; cutting each ScFv into four segments, wherein the segments are: VH residues 1-99, an H3 loop, VL residues 1-87, and an L3 loop; generating a Point Specific Scoring Matrix (PSSM) for each segment; and generating database entries for each segment using a macromolecular algorithmic unit.
24. The method of claim 22, wherein said structural conformations comprise a set of backbone dihedral angles for each entry in said database.
25. The method of claim 22, wherein said optimization process is performed by a Monte Carlo simulation with metropolis acceptance criteria.
26. The method of claim 25, wherein said annealing process comprises optimizing the packing of side chains, optimizing rotamers, VL-VH rigid body minimization, optimizing the backbone of antibody, optimizing the light and heavy chain orientation, optimizing the antibody as monomer, or a combination thereof.
27. The method of claim 26, wherein multiple iterations of said Monte Carlo simulation are performed, wherein a structural model is accepted if it passes metropolis acceptance criteria after a predetermined number of iterations.
28. The method of claim 23, wherein said macromolecular algorithmic unit cyclically optimizes or modifies the amino acid sequence based on a predetermined Point Specific Scoring Matrix (PSSM) and Root Mean Square Deviation (RSMD) of the modeled segment, wherein sequences having a RSMD below a predetermined threshold following a predetermined number of optimization/modification cycles are entered into sequence database.
29. The method of claim 28, wherein each cycle of said cyclical optimization further comprises inserting a chain break at a random position that is not a part of secondary structure (-helix or -sheet).
30. The method of claim 29, wherein said cyclical optimization further comprises evaluation of each segment or a combination thereof, using a cyclical loop closure process repeated over a predetermined number of cycles.
31. The method of claim 30, wherein said cyclical loop closure process evaluates segments based on sequence optimization, side chain packing, side chain minimization and energy function.
32.-34. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The invention will be better understood from a reading of the following detailed description taken in conjunction with the drawings in which like reference designators are used to designate like elements:
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION OF THE INVENTION
[0026] The invention relates to systems and methods for generating an antibody and/or an antibody library. Specifically, the invention relates to computer-implemented systems and methods for generating an antibody model or library of antibody models targeted to a predetermined epitope. The invention further relates to determining structural models of the interface between a predetermined antibody and its antigen. The invention also relates to determining structural models of a predetermined antibody's unbound complementarity determining region.
[0027] Several studies clearly indicate that CDR-H3 can be modeled with good (<2.0 ) accuracy just by relying on the existing structural database of antibodies. The accuracy of those methods can be greatly improved by incorporating energy-based refinement and knowledge based sampling approaches Gray et al. have developed antibody-modeling software named RosettaAntibody, based on their Rosetta design suite (Rohl, Carol A et al. Protein structure prediction using Rosetta. Methods in enzymology (2004) 383:66-93.). They modeled CDR-H3 loops using fragment taken from the Protein Data Bank (PDB) and relaxed the resulting loops using cyclic coordinate descent. Then, these loops were minimized using the Rosetta minimization procedure.
[0028]
[0029] As shown in
[0030] In one aspect, an antibody library can be generated in an online environment. As illustrated in
[0031] In one embodiment, server 11 may include a plurality of programmed platforms or units, for example, but are not limited to, a seed generation platform 12, docking platform 20, design platform 28, and an epitope unit 34. Seed generation platform 12 may include one or more programmable units, for example, but are not limited to, a complementarity determining region (CDR) unit 14, a framework unit 16, and an analysis unit 18. Docking platform 20 may include a plurality of programmed platforms or units, for example, but are not limited to, a docking unit 22, an evaluation unit 24, and a selection unit 26. Design platform 28 may include a plurality of programmed platforms or units, for example, but are not limited to, a motif evaluation unit 30 and a library generation unit 32.
[0032] The term platform or unit, as used herein, may refer to a collection of programmed computer software codes for performing one or more tasks.
[0033] In one embodiment, the systems and methods use a database of antibody complementarity determining regions (CDRs) that are compatible with a preselected template antibody structure. In another embodiment, the database is built through a process of optimizing known CDR sequences for compatibility with template antibody structures using a Point Specific Scoring Matrix (PSSM) and then evaluating an energy score. In a further embodiment, sequence optimization includes sampling a mutation into a position in the sequence under examination, guided by the PSSM distribution and repeating the PSSM based compatibility analysis. In yet another embodiment, a PSSM is generated for every CDR cluster with known 3-D structure prior to sequence optimization.
[0034] In one embodiment, the PSSM is created using PSI-BLAST (Position-Specific Iterative Basic Local Alignment Search Tool). In another embodiment, the PSSM is created by counting the number of amino acids, and then calculating the likelihood of each amino acid in each position using a background distribution.
[0035]
[0036]
[0037] After placing the template, a Monte-Carlo procedure with the metropolis acceptance criteria is carried out for each entry in the segment database for a preset number of iterations (K=250), as shown in items 202-215, in order to simulate annealing. As shown in item 202, the backbone dihedral angles of a segment from the segment database are input into the annealing simulation. As shown in item 203, a PSSM of the segment under examination is input into the annealing simulation. As shown in item 204, a sequence optimization algorithm is run for amino acids that pass a predefined likelihood threshold, as determined using the PSSM. As shown in item 205, the side chains are packed in the selected segment. As shown in item 206, Rotamer-Trials with Minimization is performed. In one embodiment, the RotamerTrials are performed until the score of the complex shows an insignificant change between iterations. In another embodiment, the RotamerTrials are performed for a predetermined number of iterations. In one embodiment, the rotamer trials are performed for at least 5 iterations, at least 6 iterations, at least 7 iterations, at least 8 iterations, at least 9 iterations, at least 10 iterations, or for more than 10 iterations. As shown in item 207, rigid body minimization of the ScFv versus the target is performed. As shown in item 208, a second sequence optimization is performed in a way similar to item 204. As shown in item 209, the side chains are packed in the selected segment in a way similar to item 205. As shown in item 210, cyclical Rotamer-Trials Minimization is performed in a way similar to item 206. As shown in item 211, a rigid body minimization of the VH versus the VL is performed. As shown in item 212, a third sequence optimization is performed in a way similar to item 204. As shown in item 213, the side chains are packed in the selected segment in a way similar to item 205. As shown in item 214, Rotamer-Trials Minimization are performed in a way similar to item 206. As shown in item 215, the structural models that pass predefined metropolis acceptance criterion are accepted for output (item 216). Additional rounds of optimization prior to output (between items 214 and 215 in
[0038]
[0039] The following examples are presented in order to more fully illustrate the preferred embodiments of the invention. They should in no way be construed, however, as limiting the broad scope of the invention.
EXAMPLES
[0040] Embodiments of this invention utilize computational processing power to compute optimal antibody molecules, as well as structural models of antibody-antigen interfaces and of unbound complementarity determining regions. Provided herein are methods and systems to determine optimal antibody molecules that comprise the library
Example 1
[0041] In this Example, computational processing power is used to compute antibody structural models that bind an epitope of a selected target polypeptide. Given a computer system and macro molecular modeling software that is able to approximate the free energy of a protein molecule (a.k.a free energy score, and/or score may be used interchangeably) the process is detailed below and is divided into 3 stages: [0042] 1. Creating a Point Specific Scoring Matrix (PSSM) for antibody segments [0043] 2. Creating a segment database [0044] 3. Designing an antibody for a target
[0045] The first step in this process is to build a database of antibody CDR backbone segments, in particular, computing backbone phi, psi, and omega angles from known antibody crystal structures. Each of the two first stages generates the input for the next stage. In the next step, these parameters are then mixed and matched using macromolecular modeling software to either compute an optimal sequence for the purpose of designing a de-novo new model of antibody binder towards a particular target, or predict an unknown antibody structure given a sequence.
Stage 1: Creating a Point Specific Scoring Matrix (PSSM) for Antibody Segments:
[0046] 1. Partition each of the segments according to their length; [0047] 2. For each different segments with equal lengths: [0048] a. Cluster the segments according to RMSD; [0049] b. For each cluster, generate a PSSM (using psi-blast or any equivalent tool).
Stage 2: Creating a Segment Database (See FIG. 1):
[0050] 1. Choose an antibody structure to serve as a template; [0051] 2. Collect all non-redundant (99% sequence identity cutoff) high resolution (<3.0 ) antibody models; [0052] 3. Extract the ScFv portion from the models; [0053] 4. Cut each ScFv into 4 segments (chothia numbering): [0054] a. VH Residues 1-91; [0055] b. 92-END; [0056] c. VL residues 1-87; [0057] d. VL residues 88-END. [0058] 5. For each segment: [0059] a. While N<10: [0060] i. Copy the dihedral angles of the segment to the respective segment of the template [0061] ii. Load the PSSM that corresponds to that segment [0062] iii. Insert a chain-break to a random position in the segment, that is not part of a secondary structure (helix or -sheet) [0063] iv. Use a macro-molecular modeling software suite (e.g. Rosetta) to run a loop closure algorithm (either CCD or Kinematic loop closure) for K iterations, in an attempt to close the loop. [0064] v. Restrain the energy function in a way that: [0065] 1. penalizes large deviations from the original dihedral angles or the XYZ coordinates of the backbone atoms. [0066] 2. Favors backbone perturbations that bring the C and N terminals of both sides of the cut to be within range of a peptide bond (i.e. 1.33 1 std) [0067] 3. Favor sequences with high likelihood (computed w.r.t the PSSM) [0068] vi. Between each perturbation of the backbone done by the loop closure algorithm, apply side-chain packing and minimization, as well as sequence optimization. (Inserting amino acids that result in optimal score) [0069] vii. Do not allow introducing amino acids that have likelihood lower than a predefined threshold at a particular position, according to the PSSM [0070] viii. If the loop is not closed after K iterations of CCD/Kinematic loop closure, return to step a.i, increment N by 1. [0071] ix. If after K iterations, the RMSD of the modeled segment is smaller than X (X should be predefined, usually between 0<X<1 angstrom) Add the segment to the DB. Otherwise, increment N by 1, return to step a.i [0072] 6. If the segment failed to pass the conditions above after 10 iterations, discard it. [0073] 7. If the segment passed the conditions above after 10 iterations or less, add it to database
Stage 3 Design an Antibody Structural Model for a Target (See FIG. 2):
[0074] Use a macro-molecular modeling software (such as Rosetta) to perform the following steps: [0075] 1. Place the template in the desired position and orientation towards the target epitope. This can be done by either using docking software or superimposing the template on an existing antibody-antigen complex. [0076] 2. Run the following Monte-Carlo procedure with the metropolis acceptance criteria for K iterations: [0077] a. Load the backbone dihedral angles of a randomly selected segment from the database that was created in Stage 2 [0078] b. Load the PSSM that match each of the selected segments [0079] c. Modify the default scoring function of the macromolecular modeling software: [0080] i. Introduce a scoring term that favors sequences with high likelihood, with respect to the previously computed PSSM [0081] d. Using the PSSM, for each position select a set of amino acids that pass a predefined likelihood threshold. [0082] e. Run a Design procedure: For each position in the segment and in a sphere of a predefined distance from it (10 ), use the amino acids selected in the above step to optimize the sequence w.r.t the scoring function of the macro-molecular modeling software [0083] f. Pack the side chains in the selected segment and in a sphere of a predefined distance (10 ) around the segment [0084] g. Run Rotamer-Trials Minimization procedure: Repeat until the score of the complex converges (e.g. delta between two iterations <2) or number of iterations 5: [0085] i. For each side chain in the segment and in a sphere of a predefined distance (e.g. 10 ), Find the rotamer with the minimal energy [0086] h. Run a rigid body minimization procedure, allowing the ScFv to sample the 6 degrees of freedom w.r.t the target [0087] i. Pack the segment's side chains and the side chains that are located in a sphere of a predefined distance from it, and run Rotamer-Trials with minimization (identical to g) [0088] j. Run a design procedure (identical to e) [0089] k. Run a rigid body minimization procedure, allowing the VH/VL segments to sample the 6 degrees of freedom w.r.t each other. [0090] l. Run a design procedure (identical to e) [0091] m. Pack the segment's side chains and the side chains that are located in a sphere of a predefined distance from it, and run Rotamer-Trials with minimization (identical to g) [0092] n. Accept the new structural model, if it passes the metropolis acceptance criterion (with the given structural model score as a parameter) [0093] 3. Output the structural model that was last accepted by the Monte-Carlo method
Example 2
[0094] In this Example, computational processing power is used to compute the structural model of an antibody-antigen interface for an antibody bound to an epitope of a selected target polypeptide, as well as to compute the structural model of unbound complementarity determining region (CDR) for an antibody. Given a computer system and macro molecular modeling software that is able to approximate the free energy of a protein molecule (a.k.a free energy score, and/or score may be used interchangeably) the process is detailed below and is divided into 3 stages: [0095] 1. Creating a Point Specific Scoring Matrix (PSSM) for antibody segments; [0096] 2. Creating a segment database; [0097] 3. Modeling an antibody from sequence.
[0098] The first two stages in this Example are similar to those outlined in Example 1.
Stage 3 Model an Antibody from Sequence
Use a macro-molecular modeling software (such as Rosetta) to perform the following steps: [0099] 1. Load the template PDB file and the antibody sequence [0100] 2. If the modeling is done in the presence of a target ligand, dock the template in the desired position on the target ligand using a docking software or superimposition [0101] 3. For each segment in the template (VH, H3, VL, L3) load the backbone dihedral angle of random segment with the same length from the segment database that was created in Stage 2 [0102] 4. Thread the antibody sequence on the ScFv [0103] 5. Run the following Monte-Carlo procedure for K iterations: [0104] a. Load the backbone dihedral angles of a randomly selected segment from the database that was created in Stage 2 [0105] b. Pack the side chains in the selected segment and in a sphere of a predefined distance (10 ) around the segment [0106] c. Run Rotamer-Trials Minimization procedure: Repeat until the score of the complex converges (e.g. delta between two iterations <2) or number of iterations 5: [0107] i. For each side chain in the segment and in a sphere of a predefined distance (e.g. 10 ), Find the rotamer with the minimal energy [0108] d. If a target exists in the simulation, run a rigid body minimization procedure, allowing the ScFv to sample the 6 degrees of freedom w.r.t the target [0109] e. Pack the segment's side chains and the side chains that are located in a sphere of a predefined distance from it, and run Rotmer-Trials with minimization (identical to g) [0110] f. Run a rigid body minimization procedure, allowing the VH/VL segments to sample the 6 degrees of freedom w.r.t each other. [0111] g. Pack the segment's side chains and the side chains that are located in a sphere of a predefined distance from it, and run Rotmer-Trials with minimization (identical to g) [0112] h. Accept the new structural model, if it passes the Monte-Carlo acceptance criteria (with the given structural model score as a parameter) [0113] 6. Output the structural model that was last accepted by the Monte-Carlo method
[0114]
[0115]
[0116] Having described preferred embodiments of the invention with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments, and that various changes and modifications may be effected therein by those skilled in the art without departing from the scope or spirit of the invention as defined in the appended claims.