METHODS AND APPARATUS FOR DOUBLE-INTEGRATION ORTHOGONAL SPACE TEMPERING

20180052952 · 2018-02-22

Assignee

Florida State University Research Foundation, Inc. (Tallahassee, FL)

Inventors

Cpc classification

International classification

Abstract

The orthogonal space random walk (OSRW) algorithm is generalized to be the orthogonal space tempering (OST) method via the introduction of the orthogonal space sampling temperature. A double-integration recursion method enables practically efficient and robust OST free energy calculations, augmented by a -dynamics approach. The double-integration OST method performs alchemical free energy simulations, to calculate the free energy difference between benzyl phosphonate and difluorobenzyl phosphonate in aqueous solution, to estimate the solvation free energy of the octanol molecule, and to predict the nontrivial Barnase-Barstar binding affinity change induced by the Barnase N58A mutation. The DI-OST method robustly enables practically efficient free energy predictions, particularly when strongly coupled slow environmental transitions are involved. A classical set of p38 MAP Kinase inhibitors are also employed as a test bed for evaluating relative binding affinity calculation methods. Throughout the molecular dynamics (MD) sampling no human intervention was involved

Claims

1. A method for predicting a chemical state, comprising: orthogonal space tempering through orthogonal space sampling temperature.

2. The method according to claim 1, further comprising: double integration recursion.

3. The method according to claim 2, wherein: the double integration recursion is based on dynamic reference restraining.

4. The method according to claim 3, wherein: the method provides an output selected from the group consisting of the free energy difference between benzyl phosphonate and difluorobenzyl phosphonate in aqueous solution, an estimate of the pK.sub.a value of a buried titratable residue, Glu-66, in the interior of the V66E staphylococcal nuclease mutant, and the binding affinity of xylene in the T.sub.4 lysozyme L.sub.99A mutant.

5. A system for predicting a chemical state, said system embodied on a computer readable medium coupled to a processor and comprising: means for accepting input; means for performing orthogonal space tempering through orthogonal space sampling temperature based on said input; and means for providing output.

6. The system according to claim 5, wherein: said input includes a molecular structure and an energy function.

7. The system according to claim 6, wherein: said output includes molecular trajectory and free energy.

8. The system according to claim 5, further comprising: means for performing double integration recursion.

9. The system according to claim 8, wherein: the double integration recursion is based on dynamic reference restraining.

10. The system according to claim 5, wherein: said output is selected from the group consisting of the free energy difference between benzyl phosphonate and difluorobenzyl phosphonate in aqueous solution, an estimate of the pK.sub.a value of a buried titratable residue, Glu-66, in the interior of the V66E staphylococcal nuclease mutant, and the binding affinity of xylene in the T.sub.4 lysozyme L.sub.99A mutant.

11. A computer readable medium containing program instructions for predicting a chemical state, wherein execution of the program instructions by one or more processors of a computer system causes the one or more processors to carry out the steps of: accepting input; performing orthogonal space tempering through orthogonal space sampling temperature based on said input; and providing output.

12. The computer readable medium according to claim 11, wherein: said input includes a molecular structure and an energy function.

13. The computer readable medium according to claim 12, wherein: said output includes molecular trajectory and free energy.

14. The computer readable medium according to claim 11, wherein: said steps include performing double integration recursion.

15. The computer readable medium according to claim 14, wherein: the double integration recursion is based on dynamic reference restraining.

16. The computer readable medium according to claim 11, wherein: said output is selected from the group consisting of the free energy difference between benzyl phosphonate and difluorobenzyl phosphonate in aqueous solution, an estimate of the pK.sub.a value of a buried titratable residue, Glu-66, in the interior of the V66E staphylococcal nuclease mutant, and the binding affinity of xylene in the T.sub.4 lysozyme L.sub.99A mutant.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0041] FIG. 1 is a high level flow chart illustrating how the invention functions.

[0042] FIG. 2 is a high level block diagram illustrating an apparatus for carrying out the invention.

[0043] FIG. 3 illustrates the alchemical free energy simulation setup with (a) The thermodynamic cycle, and (b) The illustrative example of the alchemical transition setup.

[0044] FIG. 4 illustrates the relative free energy prediction results with (a) The time-dependent changes of the accuracy indexes including RMSD, MUE, and PI, and (b) The comparison between the experimental values and the predicted results. The results predicted at 3.5 ns are shown in blue circles and the final results with the elongated simulation lengths are shown in red circles.

[0045] FIG. 5 illustrates the details on the OST simulation of the bound 14 with (a) The binding site structure, (b) The time-dependent changes of the number of the surrounding waters and the values, and (c) The two-dimension biasing potential g.sub.m(,) adaptively generated in the OST simulation.

DETAILED DESCRIPTION

[0046] The DI-OST Algorithm

[0047] The present invention is focused on alchemical free energy simulations, by which protein-ligand binding affinity changes, protein-protein binding affinity changes, solvation energies, pK.sub.a values, and other chemical state related thermodynamic properties can be predicted. The disclosed DI-OST algorithm is also applicable to the geometry-based potential of mean force calculations.

[0048] To carry out alchemical free energy calculations, as described in Equation (1), a scaling parameter needs to be introduced to connect two target chemical states. A simplest hybrid energy function is the linear form shown in Equation (4).

U.sub.0()=(1)U.sub.s.sup.A+U.sub.s.sup.B+U.sub.e(4)

[0049] where U.sub.s.sup.A and U.sub.s.sup.B are the energy terms unique in the two end chemical states; U.sub.e represents the common environmental energy terms shared by the two end states. When dummy atoms are employed in one of the end states, soft-cores potentials are commonly applied to treat the van der Waals terms or/and the electrostatic terms in U.sub.s.sup.A and U.sub.s.sup.B to avoid the end point singularity issue.

[0050] In GE alchemical free energy simulations, needs to be dynamically coupled with the motion of the rest of the system. Such extended dynamics can be realized either via the hybrid Monte Carlo method, where the scaling parameter jumps along a prearranged discrete ladder are enabled through the metropolis acceptance/rejection procedure, or via the -dynamics method, where moves in the continuous region between 0 and 1 are enabled through an extended Hamiltonian approach. The extended dynamics of the scaling parameter in OSRW are implemented on the basis of the -dynamics method. In the original -dynamics free energy calculation method, the scaling parameter is treated as a one-dimensional fictitious particle. In the present invention, especially to rigorously constrain between 0 and 1, a novel -dynamics approach is proposed. In this -dynamics, is set as the function (); the variable is treated as a one-dimensional fictitious particle, which travels periodically between and . In OSRW simulations, uniform distributions are targeted. Here, the usage of the -dynamics approach is mainly for the purpose of constraining the range; actually, it is preferable to have uniform sampling in the space. For the above purpose, the functional form of () according to the is designed as shown in Equation (5).

[00002] $\begin{matrix} () = {\begin{matrix} r .Math. .Math. \sin^{2} .Math. \frac{}{2}, .Math. .Math._{o} \\ a .Math. .Math. + b,_{o} < < -_{o} \\ - a .Math. .Math. + b,_{o} - < < -_{o} \\ r .Math. .Math. \sin^{2} .Math. \frac{}{2} + c, -_{o} .Math. .Math. \end{matrix} & (5) \end{matrix}$

[0051] in which r=1/(1cos .sub.o+(2.sub.o)sin .sub.o), a=r/2 sin .sub.o, b=r/2(1cos .sub.o.sub.o sin .sub.o), and c=r/2(.sub.o)sin .sub.o+r/2(1cos .sub.o.sub.o sin .sub.o)r sin.sup.2((.sub.o)/2). In Equation (5), .sub.o is the parameter utilized to separate the linear region and the end-state (=0,1) transition region. In OSRW and OST simulations, .sub.o should be set as a tiny value so that A is almost 1 and B is almost zero; thus the Jacobian contribution from the () function can be negligible. The propagation and the thermolyzation of the particle are based on the Langevin equation, the same as how the particle is treated in the original -dynamics method.

[0052] The OSRW method is based on the modified potential energy function as described in Equation (2). The OSRW algorithm has two recursion components: the recursion kernel to adaptively update g.sub.m(,F.sub.) toward its target function G.sub.o(,F.sub.) and the recursion slave to adaptively update f.sub.m() toward its target function G.sub.o() based on the concurrent g.sub.m(,F.sub.) function. In the original implementation, the metadynamics strategy is employed as the recursion kernel. Specifically, the free energy biased potential g.sub.m(,F.sub.) can be obtained by repetitively adding a relatively small Gaussian-shaped repulsive potential as explained in Equation (6)

[00003] $\begin{matrix} h_{0} .Math. \exp (- \frac{{.Math. - (t_{i}) .Math.}^{2}}{2 .Math. w_{1}^{2}}) .Math. \exp (- \frac{{.Math. F_{} - F_{} (t_{i}) .Math.}^{2}}{2 .Math. w_{2}^{2}}) & (6) \end{matrix}$

[0053] which is centered around [(t.sub.i),F.sub.(t.sub.i)] at the scheduled update time and thereby discourages the system from often visited configurations. With this procedure repeated, the overall biasing potential shown in Equation (7)

[00004] $\begin{matrix} g_{m} (, F_{}) = \underset{t_{i}}{.Math.} .Math. h_{o} .Math. \exp (- \frac{{.Math. - (t_{i}) .Math.}^{2}}{2 .Math. w_{1}^{2}}) \exp (- \frac{{.Math. F_{} - F_{} (t_{i}) .Math.}^{2}}{2 .Math. w_{2}^{2}}) & (7) \end{matrix}$

[0054] will build up and eventually flatten the underlying curvature of the free energy surface in the (,F.sub.) space. Then, the free energy profile along the reaction coordinate (,F.sub.), which should eventually converge to G.sub.o(,F.sub.), can be estimated as g.sub.m(,F.sub.).

[0055] Since for a state , the free energy profile along its generalized force direction can be estimated as g.sub.m[,F.sub.()], the generalized force distribution should be proportional to exp{.sub.og.sub.m[,F.sub.()]}, in which .sub.o represents 1/(kT.sub.o). On the basis of the above discussion, free energy derivatives at each state can be obtained as shown in Equation (8).

[00005] $\begin{matrix} \frac{G_{o}}{} .Math. |_{^{}} = {F_{}}_{^{}} = \frac{_{F_{}} .Math. F_{} .Math. \exp .Math. {_{o} [g_{m} (, F_{})]} .Math. (-^{})}{_{F_{}} .Math. \exp .Math. {_{o} [g_{m} (, F_{})]} .Math. (-^{})} & (8) \end{matrix}$

[0056] Following the TI formula, the free energy change between the initial state with which is the lower bound of the collective variable range, and any target state with the order parameter can unfold as a function of shown in Equation (9).

[00006] $\begin{matrix} G_{o} () =_{_{i}}^{} .Math. \frac{G_{o}}{} .Math. |_{^{}} .Math.^{} & (9) \end{matrix}$

[0057] In the original OSRW implementation, the metadynamics strategy, as described in Equation (7), serves as the recursion kernel; the TI based formula (Equations (8) and (9)) serves as the recursion slave with f.sub.m() recursively set as instantaneously estimated G.sub.o().

[0058] On the basis of the above OSRW procedure, we carried out a free energy simulation study on the model system. The model simulation was performed on the basis of two-dimensional Langevin dynamics, where the temperature was set as 50 K and the particle mass was set as 100 g/mol. The OSRW simulation led to a converged free energy profile G.sub.o(x) [targeted as f.sub.m(x)], and a converged g.sub.m(x,U.sub.o/x) (in the model case, U.sub.o/x is the generalized force), where two energy minima are smoothly connected along U.sub.o/x at the transition state region. When converged, this represents the residual free energy surface after the free energy surface flattening treatment g.sub.m(x,U.sub.o/x) along the order parameter. [g.sub.m(x,U.sub.o/x)] reveals the fact that the residual free energy barrier exists around the transition state region. It can be traced along U.sub.o/x near the transition state, and more importantly, the residual barrier height (about 2.2 kcal/mol) is similar to that of the hidden energy barrier. In this model system, the generalized force can reveal the direction of the order-parameter-coupled hidden process; this is a prerequisite for efficient and accurate calculations of the target free energy profile G.sub.o(x).

[0059] To further understand the role of U.sub.o/x and the difference between the OSRW sampling [e.g., based on U.sub.o+f.sub.m(x)+g.sub.m(x,U.sub.o/x) as in Equation (2)] and the classical generalized ensemble sampling [e.g., based on U.sub.o+f.sub.m(x) as in Equation (1)], we respectively employed the biasing energy functions f.sub.m(x) and f.sub.m(x)+g.sub.m(x,U.sub.o/x), which were obtained in the recursion step, to perform two corresponding equilibrium generalized ensemble simulations. The OSRW sampling allows the system to travel repetitively between two energy minima; as a comparison in the classical generalized ensemble simulation, the system is trapped in the original energy minimum state due to the lack of sampling acceleration along the hidden dimension. Furthermore, according to the umbrella sampling reweighting relationship, the samples collected from the OSRW simulation can be employed to recover the free energy surface along x and y, the well-sampled region of which is the same as the target energy surface. As shown from this recovered free energy surface, the samples are more concentrated along the minimum energy path that connects two energy wells.

[0060] In an OSRW simulation, the sampling volume in the orthogonal space increases with the elongation of the simulation length; additionally, the diffusion sampling overhead around the states, where no hidden barrier exists, continuously increases. As mentioned above, the OSRW method can be generalized to the orthogonal space tempering (OST) algorithm. The target energy function of the OST scheme is described in Equation (3). In the OST scheme, free energy surfaces along the generalized force direction are not completely flattened. Then, the orthogonal space effective sampling temperature T.sub.ES can impose an effective sampling boundary to ensure the long-time scale convergence. A larger T.sub.ES allows more efficient crossing of hidden free energy barriers but introduces more diffusion sampling overhead. Interestingly, when T.sub.ES approaches the infinity limit, the OST method becomes the original OSRW algorithm; when T.sub.ES approaches the system reservoir temperature T.sub.o, the second-order GE sampling turns to the first-order GE sampling as described in Equation (1).

[0061] The metadynamics method according to the invention achieves adaptive recursions based on a dynamic force-balancing relationship. Its performance strongly depends on energy surface ruggedness and preset parameters. To improve the convergence behavior of OST, in the present work, we designed an alternative method to gain robust recursions.

[0062] Among various recursion methods, the adaptive biasing force (ABF) algorithm has a similar efficiency to that of the metadynamics algorithm. In contrast to the metadynamics technique, the ABF method has been mathematically proven; thus the usage of the ABF method as the recursion kernel, specifically via the calculation of the F.sub.-dependent free energy profile G.sub.o(,F.sub.) at each state, can ensure free energy convergence robustness. A challenging issue remains: how to numerically calculate the generalized force of F.sub. to estimate target F.sub.-dependent free energy profiles. As a matter of fact, calculating generalized forces of complex order parameters has been known to be a difficult issue in the ABF algorithm implementation. To circumvent this issue, in our OST implementation, we propose a dynamic reference restraining (DRR) recursion strategy. Specifically, the target OST potential described above with reference to Equation (3) is rewritten as Equation (10)

[00007] $\begin{matrix} U_{m} = U_{o} () + \frac{1}{2} .Math. {k_{} (F_{} -)}^{2} + f_{m} () + \frac{T_{ES} - T_{o}}{T_{ES}} .Math. g_{m} (,) & (10) \end{matrix}$

[0063] in which the generalized force fluctuation is restrained to the move of another dynamic particle. In Equation (10), f.sub.m() is still targeted toward G.sub.o(), and g.sub.m(,) is targeted toward G.sub.o(,), the negative of the free energy surface along (,) in the canonical ensemble with the energy function U.sub.o()+k (F.sub.).sup.2G ), where G() is the -dependent free energy surface in the canonical ensemble with U.sub.o()+k (F.sub.).sup.2 as the energy function. On the basis of Equation (10), motions along F.sub. are indirectly activated via the restraining treatment to the dynamic reference. Here, the dynamics of the particle are also realized through the same extended Hamiltonian method as in -dynamics or -dynamics, which was discussed above.

[0064] According to the OST target function in Equation (10), we need to design a recursion kernel to estimate G.sub.o(,) in order to adaptively update g.sub.m(,). To obtain the two-dimensional function G.sub.o(,), first, the ABF method is directly employed to calculate the-dependent free energy profile at each state, specifically on the basis of the following TI relationship shown in Equation 11.

[00008] $\begin{matrix} G_{o^{}} (^{},) =_{} .Math. {\frac{U_{o^{}} (,)}{} .Math. (-^{})}_{^{}} .Math.^{} & (11) \end{matrix}$

[0065] Here, U.sub.o(,) represents U.sub.o()+k (F.sub.).sup.2; then U.sub.o(,)/ can be simply evaluated as k (F.sub.). It is noted that the numerical boundary of G.sub.o(,), i.e., the integration boundary in Equation (11), changes as the recursion proceeds. Following the general ABF strategy, <U.sub.o(,)/()>, can be adaptively estimated as shown in Equation (12)

[00009] $\begin{matrix} \frac{\underset{i}{.Math.} .Math. - k_{} [F_{} (t_{i}) - (t_{i})] .Math. [(t_{i}) -^{}] .Math. [(t_{i}) -^{}]}{\underset{i}{.Math.} .Math. [(t_{i}) -^{}] .Math. [(t_{i}) -^{}]} & (12) \end{matrix}$

[0066] where t.sub.i is the ith scheduled sample-collecting time. Equations (11) and (12) only allow the obtaining of the one-dimension function G.sub.o(,) at each state. The height of the G.sub.o(,) function can be recalibrated as shown in Equation (13)

[00010] $\begin{matrix} G_{o^{}} (^{},) = G_{o^{}} (^{},) - G_{o^{}, \min} (^{},) - RT .Math. .Math. \ln .Math._{} .Math. \exp (- \frac{G_{o^{}} (^{},) - G_{o^{}, \min} (^{},)}{{kT}_{o}}) & (13) \end{matrix}$

[0067] where G.sub.o,min(,) is the lowest value in the free energy curve G.sub.o(,); G.sub.o(,) represents the postcalibration function of G.sub.o(,). All of the calibrated one-dimension G.sub.o(,) functions can be assembled to be the target two-dimension G.sub.o(,) function. Then, g.sub.m(,) can be adaptively updated as instantaneously estimated G.sub.o(,). This calibration procedure is based on the g.sub.m(,) function definition in Equation (10), specifically to fulfill the condition that the target energy function for g.sub.m(,) free energy flattening treatment has already been flattened along the direction. In the DI-OST method according to the invention, Equations (11)-(13) constitute the recursion kernel.

[0068] Regarding the recursion slave, the TI formula in Equation (9) is still used to estimate G.sub.o(); then, (dG.sub.o/d)|.sub. at each state needs to be evaluated. Different from the recursion in the original OSRW algorithm, where the target function of the recursion kernel is G.sub.o(,F.sub.), here, the target function of the recursion kernel G.sub.o(,) does not provide direct information on generalized force F.sub. distributions. For the fact that F.sub. is restrained to, a simple but an approximate way of estimating (dG.sub.o/d)|.sub. can be made on the basis of the assumption of < >.sub.=<F.sub.>.sub.. Thus, (dG.sub.o/d)|.sub. can be approximated via Equation (14).

[00011] $\begin{matrix} \frac{G_{o}}{} .Math. |_{^{}} = {F_{}}_{^{}}_{^{}} = \frac{_{} .Math. .Math. .Math. \exp .Math. {[g_{m} (,)]} .Math. (-^{})}{_{} .Math. \exp .Math. {[g_{m} (,)]} .Math. (-^{})} & (14) \end{matrix}$

[0069] To more rigorously estimate (dG.sub.o/d)|.sub., G.sub.o(,F.sub.) needs to be calculated for each state as described above. Notably, the samples collected at the state with F.sub.=F.sub. can be considered as being obtained from multiple independent ensembles, each of which corresponds to a unique restraining reference value . According to the umbrella integration relationship, based on the samples from each (,) restraining ensemble, (dG.sub.o(,F.sub.)/dF.sub.)|.sub.F, can be estimated as 1/(.sub.0)(F.sub.F.sub..sup.,)/(.sub..sup.,).sup.2k.sub.(F.sub.), where F.sub..sup., stands for the average of the F.sub. values of all of the samples in the (,) restraining ensemble and .sub..sup., represents the variance of samples. Using the multihistogram approach to combine the estimations from all of the restraining ensembles that are visited at the state, (dG.sub.o(,F.sub.)/dF.sub.)|.sub., can be calculated as shown in Equation (15)

[00012] $\begin{matrix} \frac{G_{o} (^{}, F_{})}{F_{}} .Math. |_{F_{^{},^{}}} = \frac{_{^{}} .Math. (^{} .Math.^{}, F_{^{}}) [\frac{1}{_{o}} .Math. \frac{F_{^{}} - \overline{F_{}^{^{},^{}}}}{{(_{}^{^{},^{}})}^{2}} - k_{} (F_{^{}} -^{})]}{_{^{}} .Math. (^{} .Math.^{}, F_{^{}})} & (15) \end{matrix}$

[0070] where (where (.sub.,F) denotes the total number of the (,F.sub.) samples that are collected from the restraining ensemble.

[0071] Then, based on the TI relationship, G.sub.o(,F.sub.) can be calculated according to Equation (16).

[00013] $\begin{matrix} {.Math. G_{o^{}} (^{}, F_{}) =_{F_{^{}}} .Math. \frac{G_{o} (^{}, F_{})}{F_{}} .Math.}_{F_{^{},^{}}} .Math. F_{^{}} & (16) \end{matrix}$

[0072] Again, like in Equation (11), the numerical boundary of G.sub.o(,F.sub.), i.e., the integration boundary in Equation (16), changes as the recursion proceeds. Following the corresponding derivation in the original OSRW method, we can obtain (dG.sub.o/d)|.sub. at the state using Equation 17.

[00014] $\begin{matrix} \frac{G_{o}}{} .Math. |_{^{}} = {F_{}}_{^{}} = \frac{_{F_{}} .Math. F_{} .Math. \exp .Math. {-_{o} [G_{o^{}} (, F_{})]} .Math. (-^{})}{_{F_{}} .Math. \exp .Math. {-_{o} [G_{o^{}} (, F_{})]} .Math. (-^{})} & (17) \end{matrix}$

[0073] On the basis of the corresponding TI formula in Equation (9), f.sub.m(), which is targeted as G.sub.o(), can then be adaptively updated. In the DI-OST method according to the invention, Equations (15)-(17) and (9) constitute the recursion slave. Notably, f.sub.m() does not have to be equal to G.sub.o() in a strict manner; here, it is highly recommended to employ the approximate approach based on Equations (11)-(14) and (9) to update f.sub.m(), and the more rigorous approach based on Equations (15)-(17) and (9) to estimate G.sub.o(), because of the fact that < >.sub. in Equation (14), is directly estimated from-space ABF calculations (Equations (11) and (12)) and should converge faster. In the DI-OST method, both the recursion kernel and the recursion slave are based on the integration schemes. Therefore, it is named the double-integration recursion method.

[0074] The double-integration recursion based OST method is implemented in the orthogonal space sampling module, which is currently coupled with our customized CHARMM program. See, Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 1983, 4, 187-217 and Brooks, B. R.; Brooks, C. L.; Mackerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Calfisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M. Feig; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: The biomolecular simulation program. J. Comput. Chem. 2009, 30, 1545-1614. CHARMM is available from Harvard University.

[0075] In the present invention, the following van der Waals soft-core potential form is employed to treat the atoms which are annihilated as illustrated in Equation (18)

[00015] $\begin{matrix} U_{softcore .Math. vdW} = (1 -) [\frac{A}{{(_{vdW} .Math.^{2} + r^{6})}^{2}} - \frac{B}{_{vdW} .Math.^{2} + r^{6}}] & (18) \end{matrix}$

[0076] where .sub.vdW is the van der Waals soft-core shifting parameter. It is noted that Equation (18) is different from the one in the currently released CHARMM program. The electrostatic soft-core potential is based on Equation (19)

[00016] $\begin{matrix} U_{softcore .Math. elec} = \frac{(1 -) .Math. Q_{A} .Math. Q_{B}}{\sqrt{_{elec} .Math. + r^{2}}} & (19) \end{matrix}$

[0077] where .sub.elec is the electrostatic soft-core shifting parameter. In Equations (18) and (19), the annihilation is assumed to occur at the state of =1; to be consistent, in this study, all of the dummy atoms are set at the state of =1.

[0078] In the present invention, the DI-OST method is demonstrated in the context of alchemical free energy simulation, specifically to calculate the free energy difference between benzyl phosphonate and difluorobenzyl phosphonate in aqueous solution, to estimate the solvation free energy of the octanol molecule, and to predict the Barnase-Barstar nontrivial binding affinity change induced by the Barnase N58A mutation.

[0079] The molecules of benzyl phosphonate (BP) and difluorobenzyl phosphonate (F2BP) are the side chain analogues of prototypical phosphotyrosine mimetics, which are common targets in drug discovery. The free energy difference between these two molecules in aqueous solution, G.sub.BP.fwdarw.F2BP.sup.aqueous, has been used as a test-bed to analyze free energy simulation methods. In practical studies, if combined with the free energy difference in gas phase G.sub.BP.fwdarw.F2BP.sup.gas, G.sub.BP.fwdarw.F2BP.sup.aqueousG.sub.BP.fwdarw.F2BP.sup.gas gives rise to the value of the solvation energy difference; if combined with the free energy difference in a protein binding site G.sub.BP.fwdarw.F2BP.sup.protein, G.sub.BP.fwdarw.F2BP.sup.proteinG.sub.BP.fwdarw.F2BP.sup.aqueous gives rise to the value of the binding free energy difference. Here, the test calculations on G.sub.BP.fwdarw.F2BP.sup.aqueous are used to comparatively evaluate the original OSRW method and the invention's DI-OST method in the aspects of algorithm robustness and long-time convergence. On the basis of each of the two methods, five sets of independent simulations were carried out. The MD simulation setup was the same as the one in the earlier studies, where the BP and F2BP molecules are described with the CHARMM22 parameter. In total, 294 water molecules are included in the truncated octahedral box; the water molecules are treated with the TIP3P model. The diagram below shows the setup of the alchemical transition from BP to F2BP.

##STR00001##

[0080] For the fact that there is no vanishing atom in either of the end states, the linear hybrid energy function (as described by Equation (4)) is used in this model study.

[0081] In the five OSRW simulations, g(,F.sub.) (in Equation (7)) was updated every 10 time steps; the height of the Gaussian function h was set as 0.01 kcal/mol; the widths of the Gaussian function, .sub.i and .sub.2, were set as 0.01 and 4 kcal/mol respectively; and f.sub.m() was updated (based on Equations (8) and (9)) once per 1000 time steps. In the five DI-OST simulations, the samples were collected every time step; g(,) was updated (based on Equations (11)-(13)) once per 1000 time steps; f.sub.m() was updated (based on Equations (17-19) and (9)) once per 1000 time steps; and T.sub.ES was set as 600 K (the system reservoir temperature is 300 K). The length of each simulation is 20 nanoseconds (ns).

[0082] The model calculation on the octanol solvation free energy is to understand the role of the orthogonal space sampling temperature T.sub.ES. The octanol molecule which is described by the CHARMM general force field (CGFF), is embedded in a truncated octahedral water box with a total of 713 TIP3P water molecules. In the alchemical free energy simulation setup, the solvated octanol molecule (=0) is changed to a gas phase molecule (=1), which does not have any interaction with the solvent molecules. Accordingly, all of the van der Waals and the electrostatic energy terms describing the solute-solvent interactions are subject to the soft-core treatment, in which .sub.vdW is set as 0.5 and .sub.elec is set as 5.0. Then, the solvation free energy of octanol G.sub.octanol.sup.solvaton can be estimated as the negative of the free energy difference G.sub.=0.fwdarw.=1 between the two end states.

[0083] To understand the influence of T.sub.ES on sampling efficiency, two sets of independent DI-OST simulations were run, each of which includes eight simulations with T.sub.ES respectively set as 750 and 375 K (the system reservoir temperature is 300 K). The samples were collected every time step. gm(,) was updated (based on Equations (11)-(13)) once per 1000 time steps. f.sub.m() was also updated (based on Equations (17-19) and (9)) once per 1000 time steps. The length of each simulation is 17 ns.

[0084] The model study on the binding between barnase, an extracellular RNase of Bacillus amyloliquefaciens, and barstart, the intracellular polypeptide inhibitor of barnase demonstrates the DI-OST method in predicting mutation induced protein-protein binding affinity changes. The barnase N58A mutation is located at the second layer of the binding interface; this noncharging mutation causes about 3.1 kcal/mol of the binding affinity loss. The DI-OST simulations were performed to calculate the alchemical free energy changes in two environments: G.sub.Asn.fwdarw.Ala.sup.complex in the barnase-barstar complex and G.sub.Asn.fwdarw.Ala.sup.barnase in the unbound barnase. The binding affinity change G.sub.Asn.fwdarw.Ala can be calculated as G.sub.Asn.fwdarw.Ala.sup.complex G.sub.Asn.fwdarw.Ala.sup.barnase. All of the systems are treated with the CHARMM27/CMAP model. In the model for the G.sub.Asn.fwdarw.Ala.sup.complex calculation, the barnase-barstar complex (with the PDB code of 1BRS) is embedded in the octahedral box with 18 902 water molecules; in the model for the G.sub.Asn.fwdarw.Ala.sup.barnase calculation, the unbound barnase (also based on the PDB code of 1BRS) is embedded in the octahedral box with 11 291 water molecules.

[0085] In the alchemical free energy simulation setup shown in the diagram below, the vanishing atoms in Asn58 (=0) are switched to the corresponding dummy atoms at =1. The bond, angle, and dihedral terms associated with the dummy atoms are set identical to the corresponding ones of the original asparagine residue. All of the van der Waals terms and the electrostatic energy terms associated with the dummy atoms are subject to the soft-core treatment, in which .sub.vdW was set as 0.5 and .sub.elec was set as 5.0. The three DI-OST simulations were performed with T.sub.ES set as 1500 K (the system reservoir temperature is 300 K); the samples were collected every time step. g(,) was updated (based on Equations (11-13)) once per 1000 time steps. f.sub.m() was also updated (based on Equations (17-19) and (9)) once per 1000 time steps.

##STR00002##

[0086] The CGFF parameters were generated through the CHARMM-GUI server. The particle mesh ewald (PME) method 63 was applied to take care of the long-range columbic interactions while the short-range interactions were totally switched off at 12 . The Nse-Hoover method was employed to maintain a constant reservoir temperature at 300 K, and the Langevin piston algorithm was used to maintain the constant pressure at 1 atm. The time step was set as 1 fs.

[0087] The results from one of the five DI-OST simulations are summarized as follows. In about 800 ps, the scaling parameter completed the first one-way trip, which started at =0. It is noted that free energy estimations are only possible when the sampling covers the entire space. At 820 ps, the initial estimation of G.sub.BP.fwdarw.F2BP.sup.aqueous gives 299.91 kcal/mol, which is very close to the finally converged result 299.77 kcal/mol. In the DI-OST scheme, the ((T.sub.EST.sub.o)/T.sub.ES)g.sub.m(,) biasing term enables the accelerating of moves, which through the restraint term k (F.sub.).sup.2 induces simultaneous fluctuation enlargement of the generalized force F.sub.. In these simulations, the restraint force constant k was set as 0.1 (kcal/mol).sup.1; F.sub. and are robustly synchronized. The recursive orthogonal space tempering treatment allows F.sub. fluctuations to be continuingly enlarged until around 8 ns; then the space sampling boundary imposed by T.sub.ES was reached. Subsequent recursion kernel and recursion slave updates enable continuous refinement of the g.sub.m(,) and f.sub.m() terms. At the end of the 20 ns simulation, the orthogonal space sampling temperature 600 K allows the fluctuations of and F.sub. to overcome 9KT strongly coupled free energy barriers that are hidden in the orthogonal space.

[0088] The BP and D2BP molecules differ only in their local polarity. One would expect moderate environment changes to be associated with the target alchemical transition; simulating the BP-D2BP transition may not fully demonstrate the sampling power of the DI-OST method. However, for its simplicity, this is an ideal system to test the robustness and the long-time convergence behavior of a free energy simulation method. The estimated free energies from the five DI-OST simulations converge to the average value of 299.77 kcal/mol, which quantitatively agrees with the results obtained from the classical free energy simulation studies. Notably, as mentioned above, in this model study, we only targeted our calculations on the estimation of the alchemical free energy difference G.sub.BP.fwdarw.F2BP.sup.aqueous, the value of which alone does not have any physical meaning. With 20 ns of the simulation lengths, the variance of the five independently estimated values is as low as 0.01 kcal/mol. Within only 940 picoseconds (ps), all five DI-OST simulations had completed their first one-way trips. Then, the average of the estimated values is 299.82 kcal/mol, and the variance of the calculation results is 0.12 kcal/mol. In 2 ns, the average of the estimated values converges to 299.79 kcal/mol, and the variance of the calculation results is 0.04 kcal/mol. In DI-OST simulations, G.sub.o() [the negative of f.sub.m()] should converge faster than G.sub.o(,) [the negative of g.sub.m(,)] because of the fact that the free energy derivative dG.sub.o()/d is largely determined by the lower region of the free energy surface along (,F.sub.). Besides the sampling efficiency, the DI-OST method provides free energy estimation robustness and long-time convergence rigorousness.

[0089] As discussed above, the original OSRW method is limited in two aspects. First, the orthogonal space sampling temperature T.sub.ES is effectively infinity; thus, there is no boundary to restrict the magnitude of F.sub. fluctuation enlargement. The orthogonal space free energy surface flattening treatment enlarges F.sub. fluctuations boundlessly. In comparison with the DI-OST simulations, which have their sampling boundaries imposed by the finite T.sub.ES value (600 K), the OSRW simulations have ever-increasing sampling coverage. Consequently, both the average and the variance of the free energy results show time-dependent oscillatory behaviors. Second, the original OSRW method is based on the metadynamics recursion kernel. The metadynamics kernel provides extra dynamic boosts on moves. Then, the first one-way trips can be quickly completed (around 350 ps in average). Although the free energy estimations could be started earlier, both of the short-time and long-time convergence behaviors of the OSRW simulations are not nearly as good as those of the DI-OST simulations. For example, at 2 ns, the average of the free energy values from the OSRW simulations converges to 299.97 kcal/mol, and the variance of these results is about 0.10 kcal/mol. The metadynamics sampling in the OSRW simulations is by nature in the nonequilibrium regime; in comparison, the sampling in the DI-OST simulations starts in the near-equilibrium regime and rigorously approaches the equilibrium regime with the converging of the two recursion target functions. The robustness and the convergence behavior of OSRW simulations can be improved with the decreasing of the employed Gaussian height; however, it is expected that then the orthogonal space recursion (the recursion kernel) efficiency will be lower and the g.sub.m(,F.sub.) convergence will be slower.

[0090] The DI-OST algorithm allows the orthogonal space sampling strategy to be more robustly realized for free energy simulations. It should be noted that although in the above comparison, better robustness and long-time convergence behavior of the DI-OST simulations have been demonstrated; indeed, within the simulated time scale, the absolute performance of the OSRW simulations is also expected to be superior.

[0091] Among various alchemical free energy simulation applications, solvation free energy calculations are unique because of the fact that they may require extensive sampling but the results are still quantitatively verifiable by classical free energy simulations. In this study, we carried out solvation energy calculations on the octanol molecule to understand the role of the orthogonal space sampling temperature T.sub.ES in the DI-OST method.

[0092] As discussed above, the sampling length required to achieve the first one-way trip is a key factor in sampling efficiency measurement. The average of the first one-way trip sampling lengths in the eight T.sub.ES=750 K DI-OST simulations is 1.6 ns; the variance of these sampling lengths is 0.53 ns. In comparison, the average of the first one-way trip sampling lengths in the eight T.sub.ES=375 K DI-OST simulations is 3.57 ns, and the variance of the first one-way trip sampling lengths is 0.63 ns. The sampling bottleneck is located in the region of (0.7, 0.8); infrequent crossing of this region slows down overall round-trip diffusivity. The solute appearance/annihilation transition is the major event in this sampling bottleneck region. It is noted that due to the employment of the soft-core potential, the solute appearance/annihilation transition is shifted from =1, the expected region when the linear hybrid alchemical potential is applied, to this new region. Solvent molecule reorganizations are the hidden events that are associated with solute insertions/annihilations. When the orthogonal space sampling temperature T.sub.ES is higher (for example 750 K), the magnitude of the F.sub. fluctuation is expected to be larger and hidden free energy barriers associated with solvent reorganizations can be more quickly crossed; thereby, the sampling of the bottleneck region can be more efficient.

[0093] With regard to the time-dependent averages of the estimated desolvation free energies from the eight T.sub.ES=750 K DI-OST simulations, and the time-dependent variances of the estimated desolvation free energies from the eight T.sub.ES=750 K DI-OST simulations, at around 2 ns, the average of the estimated values is 3.45 kcal/mol and the variance of these values is about 0.23 kcal/mol. At around 6 ns, the average of the estimated values drops to around 3.35 kcal/mol, while their variance decreases to 0.17 kcal/mol. At around 13.5 ns, the free energy estimations reach very nice convergence with the average value of 3.36 kcal/mol, and the estimation variance drops below 0.1 kcal/mol. By the inclusion of the long-range Lennard-Jones correction (0.79 kcal/mol), the predicted solvation energy, 4.150.1 kcal/mol, is in excellent agreement with the experimental value 4.09 kcal/mol. At 17 ns, a nicely converged g.sub.m(,) function was obtained with the variance further reduced to 0.08 kcal/mol. The orthogonal space sampling temperature 750 K allows the fluctuations of and F.sub. to quickly escape 5 kT strongly coupled free energy barriers. In comparison, the eight T.sub.ES=350 K DI-OST simulations have smaller sampling coverage in the orthogonal space. The lack of sampling in the orthogonal space not only leads to the longer sampling length requirement for the first one-way trips as discussed above but also leads to the slower convergence. At 17 ns, some of the T.sub.ES=350 K DI-OST simulations have not yet converged well because of the fact that the variance among them is still larger than 0.1 kcal/mol. As a result, the average of these values is about 0.05 kcal/mol away from the average of the values estimated from the T.sub.ES=750 K simulations. With T.sub.ES=350 K, the orthogonal space sampling treatment temperature 350 K only allows the fluctuations of and F.sub. to escape less than 2 kT strongly coupled hidden free energy barriers.

[0094] As shown in the above analysis, the orthogonal space tempering treatment allows the sampling bottleneck regions, where hidden free energy barriers exist, to be more efficiently explored. If there is no hidden free energy barrier in the orthogonal space, a higher orthogonal space sampling temperature T.sub.ES may introduce more diffusion sampling overhead, which might lower free energy estimation precision. In practical biomolecular simulation studies, there usually exist large hidden free energy barriers, and then, obtaining accurate free energy estimation should be a higher priority than improving estimation precision, as long as the estimation precision is in a reasonable range. On the basis of our experience, when a new system is explored, we would like to recommend setting T.sub.ES in a range between 750 and 1500 K.

[0095] It has been known that charge-charge interactions are directly responsible for the strong binding between Barnase and Barstar. The Barnase Asn58 residue is located at the second layer of the binding interface. As measured experimentally, the noncharging N58A mutation causes 3.1 kcal/mol of the binding affinity loss. This unusual electrostatic response suggests that nontrivial conformational changes are likely to be coupled with the N58A mutation. To quantitatively understand the N59A induced binding affinity change, a specialized technique like the DI-OST method should be applied to ensure adequate sampling of the coupled structural transitions. To confidently sample such conformational changes, in the DI-OST simulations, T.sub.ES is set at 1500 K.

[0096] Two DI-OST simulations, which are respectively based on the Barnase-Barstar (bound) complex structure and the Barnase (unbound) structure, were performed. In 4 ns, multiple round-trips were realized in both of the DI-OST simulations. It took the bound-state simulation only 1.1 ns to complete the first one-way trip, while it took the unbound-state sampling about 1.8 ns to cover the entire order parameter range. The dynamics of the scaling parameter in the unbound-state simulation reveals that the region of =0.4 is the sampling bottleneck area, where slow gating events need to occur for continuing travels. In 4 ns, good convergence was realized in both of the free energy simulations. Through the DI-OST recursion treatment, the -dependent free energy derivatives dG.sub.o/d were calculated; the binding affinity change G.sub.Asn.fwdarw.Ala is largely responsible for the difference that occurs near the alanine state (=1), where the two free energy derivative curves are distinct from each other. As discussed below, the conformational change of the mutated (N58A) Barnase induced by the binding/unbinding of Barstar is mainly responsible for G.sub.Asn.fwdarw.Ala. On the basis of the TI formula (Equation (9)), G.sub.Asn.fwdarw.Ala.sup.complex is 1 estimated to be 94.0 kcal/mol and G.sub.Asn.fwdarw.Ala.sup.Barnase is estimated to be 91.1 kcal/mol; thus G.sub.Asn.fwdarw.Ala can be predicted to be 2.9 kcal/mol, which is in excellent agreement with the experimental value of 3.1 kcal/mol. The orthogonal space tempering treatment allows the fluctuations of and F.sub. to overcome 12-14 kT of the strongly coupled hidden free energy barriers.

[0097] The comparison of the crystal structures (1BRS and 1BNR) suggests that the Barnase protein has the identical conformation at the bound and the unbound states. The Barnase Asn58 is located on a Barstar-binding loop, but at the opposite side from the binding interface residues, for instance, Arg59. In these structures, the binding interface region on the Arg59-containing loop is zipped by the hydrogen bond between the amide group of Gly61 and the carbonyl group of Asn58; thereby Arg59 can be accurately positioned into the binding site. This zipped structure is further locked by two additional hydrogen bonds between the Asn58 side chain and the backbone amide/carbonyl groups. In the bound-state DI-OST simulation, with residue 58 repeatedly interconverted between the two end chemical states: asparagine and alanine, the structure of the Arg59-containing loop stayed unchanged, even when approached the alanine state (=1). The hydrogen bond between the amide group of Gly61 and the carbonyl group of Asn58 was not broken during the entire simulation. The fluctuation of the distance between residues 58 and 63 was modest. In contrast, in the unbound-state simulation, synchronously with the move, the Arg59-containing loop varied back and forth between the original zipped conformation (at the asparagine state when =0) and a newly formed unzipped conformation (at the alanine state when =1). When residue 58 turned to alanine, the distance between residues 58 and 63 increased, and when traveled back to the asparagine state, the canonical hydrogen bonds between these two residues were formed again. Correspondingly, the zipping hydrogen bond repetitively broke and reformed. On the unzipped loop of the unbound N59A mutant, Arg59 flips away from its wild-type gesture that is originally preorganized to bind Barstar.

[0098] The above analysis suggests that there is strong coupling between the Barnase-Barstar binding and the Arg59-containing loop zipping, and Asn58 plays a pivotal role in prestabilizing the zipped conformation of the Arg59-containing loop when Barnase is in the unbound state. Therefore, the Barnase-Barstar binding can be enhanced. When Asn58 is mutated to alanine, the Arg59-containing loop in the unbound Barnase is unzipped due to the loss of both the locking hydrogen bonds by Asn58 and the binding of the Barstar. When the N58A mutant binds Barstar, some free energy penalty needs to be paid in order to form the bound conformation, which, as revealed by the bound state DI-OST simulation, stays zipped in the Barstar-bound state regardless of the existence of Asn58. The two simulations share the similar free energy derivative curves near the asparagine (=0) state; this indicates that there is only modest contribution from the direct electrostatic interaction difference to the binding affinity change. In essence, the binding affinity change induced by the N58A mutation is largely responsible for the mutation-induced conformational change at the unbound state. The DI-OST method allows the corresponding conformational change to be synchronously sampled with the moves; therefore, the binding affinity change can be efficiently predicted.

[0099] The simulations described above were performed using a 16-core Intel 3.2 GHz cluster. However, as discussed below, other computing platforms may be preferred.

[0100] Turning now to FIG. 1, the invention is realized with the implementation of two pieces of software: a modified version of CHARMM and FLOSS (which is the software that implements the above-described orthogonal space recursion and propagation calculations). The FLOSS software can be obtained from Florida State University.

[0101] As shown, input is provided using the FLOSS command line interface. The code listing below shows an example of the input:

TABLE-US-00001 !!!!!!! OSRW ====================================================== BLOCK 3 call 2 sele none end call 3 sele none end RMLA BOND THET !DIHE IMPH !RMBOnd and RMANgle work only in lambda-dynamics. QLDM THETa TYP2 THE0 0.00001 LANG temp 300.0 !LDIN BLOCKnum LAMBda LamVelocity LamMass ReferenceE LDIN 1 1.0 0.0 100.0 0.0 200.0 LDIN 2 1.0 0.0 100.0 0.0 200.0 LDIN 3 0.0 0.0 100.0 0.0 200.0 LDMA ! key word for convert lambdas into coefficient matrix end set ndyn 20000000 open unit 48 form write name gauss.dat open unit 49 form write name gauss.rst open unit 77 form write name Flambda.dat open unit 78 form write name Flambda-fitting-parameters.dat open unit 88 form write name free.dat open unit 9 form write name dvdl.dat !open unit 55 form write name wtmp.dat !open unit 66 form write name fitting.dat !open unit 67 form write name fitting.rst ! Turn on umbrella sampling fitting facilities umbrella lamb nresol 100 trig 30 poly 20 umbrella init nsim 1000 update 10000 equi 1000 thresh 100000 temp 300 umbrella stoff set fqrestart 1000 ! use the same output frequency for gauss.rst, Flambda-fitting-parameters.dat, and @j_dyn1.res. open unit 60 form write name g2d.dat open unit 61 form write name g2d.rst open unit 39 form write name flc.dat open unit 29 form write name lamct.dat ! Turn on OSRW OSRW QABF BSOR 4 UG2D 60 UGRS 61 - DURS TMASa 0.00005 TTEMp 300.0 TBETa 500.0 TFORCeconst 0.05 - RMME MLEN 5000 SLEN @ndyn MAXR 5000 - UFLC 39 UNCL 29 - QGS2 GSFA 0.9 - ND2S 5 CCAD 5 GCBO 20.0 GCON 25.0 QFLC FTOS 200 - MINLambda 0.0 MAXLambda 1.0 BINLambda 100 - ! minimum, maximum lambda (always 0 and 1 respectively for alchemical free energy simulation) - ! and number of bins MINDudl 400.0 MAXDudl 600.0 BINDudl 1000 - ! minimum, maximum dU/dlambda and number of bins HGAUssian 0.0 - ! Gaussian height WGALambda 0.01 WGADudl 2.0 - ! Widths for lambda and dU/dlambda FQAGaussian 1 - ! frequency to add one Gaussian count FQOGaussian @fqrestart - ! frequency to output Guassian counts UNOGaussian 48 UNGRestart 49 - ! units for Gaussian counts output and restart files W4SM 5 - ! Cutoff bin numbers for Gaussian energy calculation FQDUdl 10 - ! frequency to output dvdl.dat UNDudl 9 - ! unit for dvdl.dat FQFRee 100 - ! frequency to output free.dat UNFree 88 - ! unit for free.dat FITLambda 1000 - ! frequency to update F(lambda) FITOutput @fqrestart - ! frequency to output Flambda.dat and Flambda-fitting- parameters.dat NBLFitting 200 - ! number of bins of lambda for F(lambda) NBDFitting 2000 - ! number of bins of dU/dlambda for F(lambda) DUBC 0.001 DUBHeight 3.0 - ! cutoffs for <dU/dlambda> calculation FITUnit 77 FTPU 78 - ! units for Flambda.dat and Flambda-fitting- parameters.dat NOCEntering !!!!!!!! END OSRW ======================================================

[0102] The input is then sent to the CHARMM molecular dynamics engine which passes it to an input interpreter. The input interpret interpreter sends some input to the FLOSS environment and some to the CHARMM environment. These two software engines operate in parallel and pass information to each other as illustrated. Both engines perform recursive calculations. CHARMM calculates molecular dynamics and FLOSS performs orthogonal space calculations. Two outputs are provided in the form of molecular trajectory and the free energy information from FLOSS, and a regular MD output from CHARMM. These outputs can be used to predict the most viable new drug candidates.

[0103] The FLOSS output includes four data files called dvd1.dat, flc.dat, free.dat, and g2d.pm3d.dat. The file dvd1.dat gives the time-dependent parameter changes. The file flc.dat gives the current free energy related information. The file free.dat gives the time-dependent estimated free energy values. The file g2d.pm3d.dat gives the orthogonal space free energy surface information A snippet example of each file is listed below.

TABLE-US-00002 0 1.0000000 39.997488 73.154030 33.156542 3.1415927 0.0000000 73.154030 33.156542 0.0000000 0.0000000 0.0000000 0.0000000 10 0.99708939 36.283837 15.564164 51.848001 3.1324437 0.23703498E01 20.052203 53.326373 405.14451 177.87914 0.89760770E01 0.29567442E01 20 0.99539778 19.519949 32.634796 52.154745 3.1271294 0.14915055 35.771960 56.894757 251.89787 28.301134 0.62743291E01 0.94800234E01 30 0.99166579 20.173132 35.976338 56.149470 3.1154051 0.26341060E01 31.543365 60.959727 173.40017 121.16818 0.88659460E01 0.96205139E01 40 0.98581450 9.6569670 40.567302 50.224269 3.0970227 0.92541334E01 28.119055 54.267326 272.34691 311.89749 0.24896494 0.80861144E01 dvdl.dat 0.500000E02 136.265 0.00000 0.00000 0.00000 0.150000E01 136.265 1.36265 0.00000 0.00000 0.250000E01 136.265 2.72529 0.00000 0.00000 0.350000E01 136.265 4.08794 0.00000 0.00000 0.450000E01 136.265 5.45059 0.00000 0.00000 0.550000E01 136.265 6.81324 0.00000 0.00000 0.650000E01 136.265 8.17588 0.00000 0.00000 0.750000E01 136.265 9.53853 0.00000 0.00000 0.850000E01 136.265 10.9012 0.00000 0.00000 0.950000E01 136.265 12.2638 0.00000 0.00000 flc.dat 1000 100.000000 2000 100.000000 3000 100.000000 4000 83.694128 5000 84.032298 6000 84.492292 free.dat 0.6450000 61.50000 49.50000 0.6450000 61.50000 48.50000 0.6450000 61.50000 47.50000 0.6450000 61.50000 46.50000 0.6450000 61.50000 45.50000 0.6450000 61.50000 44.50000 0.6450000 61.50000 43.50000 0.6450000 61.50000 42.50000 g2d.pm3d.dat

[0104] Turning now to FIG. 2, an apparatus for implementing the invention includes a processor with associated memory, an input and an output. As mentioned above, the test simulations were performed with a 16-core Intel 3.2 GHz cluster. However, it is believed that other computing platforms may be preferred. In particular, it is believed that GPUs are more powerful in implementing the invention than CPUs. The NVIDIA GPU platform (http://www.nvidia.com/object/gpu-applications/html) is the presently preferred platform.

Orthogonal Space Tempering Simulation Study of p38 MAP Kinase Inhibitors

[0105] The p38MAP Kinase inhibitor dataset consists of 17 molecules with a common scaffold (FIGS. 3a and 3b). Computational prediction of the relative binding affinity via the alchemical thermodynamic cycle in FIG. 3a can be considered as a quantitative analogue of intuitive assessment of an alternative substitution group in lead optimization by medicinal chemists. The p38 MAP Kinase inhibitor test-system is unique for the fact that non-additive effects such as protein binding site dynamics and water displacements may make non-trivial contributions. Therefore, the relative binding efficacies are challenging to predict either by chemical intuitions or via more approximate methods. When brute-force MD propagations are applied, extensive simulation lengths are required to adequately sample these subtle effects. In an OST free energy simulation, through the effective tempering treatment along the generalized force direction, the sampling of environmental relaxations can be simultaneously enhanced. In the present study, the orthogonal space sampling temperature was set as 1500 K in order to gain quicker exploration of environmental responses.

[0106] In the OST scheme the data collection for single free energy calculation is made through one continuous simulation. As elaborated in the supporting information, the target-state B and the reference-state A (FIG. 3b) are connected in an alchemical potential, based on which the dynamics of the order parameter is propagated via the extended Hamiltonian formulation. Treated by the OST strategy, round-trip moves of the particle between the two end states (the state B: =0; and the reference state A: =1) are efficiently accelerated and the alchemical transition can thereby be repetitively sampled. According to the thermodynamic cycle in FIG. 3a, one relative binding affinity prediction requires two OST simulations: one on the protein-ligand complex and the other on the solvated ligand in a water box that is usually much smaller than the one containing the protein-ligand complex. Thus, the overall computing cost is mostly the time spent on the protein-ligand OST simulations. In the present study, compound 1 is employed as the common reference state A. As shown in FIG. 3b, when the target compound, for instance 14, has excess number of atoms, the soft-core potentials were applied to treat the annihilated atoms. Notably, compounds 4 to 17 have two possible binding poses, the exposed conformer when the R1 and R2 groups directly face the bulk water and the buried conformer where the R1 and R2 groups flip away from the binding entrance residues (FIG. 4). These two binding poses are likely to be separated by large free energy barriers. Thus in order to confidently assess each of these compounds, two independent ligand-bound OST simulations were performed: one based on the exposed conformer to calculate G.sub.B(exposed).fwdarw.A.sup.Protein-Ligand; and the other, depending on the convenience of setups, either based on the buried conformer to calculate G.sub.B(buried).fwdarw.A.sup.Protein-Ligand or via the pseudo-alchemical approach to directly estimate G.sub.B(exposed).fwdarw.B(buried).sup.Protein-Ligand. To ensure the broad meaningful of the present prediction study, the ligands were treated with a general GAFF/AM1-BCC potential, the protein was described by the AMBER94 force field, and the solvent water was described by the TIP3P model; moreover, all the MD setups including the parameter generations were automatically obtained through the CHARMM-GUI web-server. (See, e.g., Wang, J. M.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general AMBER force field. J. Comput. Chem. 2004, 25, 1157-1174; and Jo, S.; Kim, T.; Iyer, V. G.; Im, W. CHARMM-GUI: A web-based graphical user interface for CHARMM. J. Comput. Chem. 2008, 29, 1859-1865.)

[0107] The experimental relative binding free energies were estimated based on the measured log(IC.sub.50) values. The time-dependent comparison between the experimental relative binding free energies and the prediction results shows that in 2.5 ns of each of the protein-ligand simulations, the overall RMSD was lowered to be within 1.0 kcal/mol; the overall mean unsigned error (MUE) was lowered to be within 0.8 kcal/mol; and the overall predictive index (PI), which is to represent the ranking order correlation, was increased to be above 0.76. These accuracy indexes robustly improved with the elongation of the simulation lengths; for instance at 3.5 ns, the prediction accuracy was enhanced to 0.92 kcal/mol of RMSD, 0.77 kcal/mol of MUE, and 0.76 of PI. As discussed earlier, on compounds 4 to 17, two set of the OST simulations were performed to respectively estimate the binding free energies of the two possible poses; the above comparison analysis is based on the lower estimated binding free energy of each compound. In G.sub.B(exposed).fwdarw.B(buried).sup.Protein-Ligand (or G.sub.B(exposed).fwdarw.A.sup.Protein-LigandG.sub.B(buried).fwdarw.A.sup.Protein-Ligand) calculations converged relatively fast so that the dominant binding pose of each of these compounds could be clearly distinguished within 1.0 ns. As identified, among all of these compounds, only 13 and 17 favor the buried pose; these two compounds are structurally distinct from the others in that both the R1 and R3 positions are occupied by hydrophobic moieties. The identified poses are largely consistent with the results from an earlier study except for the case of 13. In this early study, based on its identified exposed conformer, the binding free energy of 13 was significantly overestimated. Chemical accuracy has been defined as the situation when the overall prediction error is less than 1.0 kcal/mol. Although chemical accuracy is an essential target, how to robustly achieve such accuracy is an equally important issue because a predictive method needs to be robustly applicable beyond a group of experts, who usually know how to use their great insights to bias sampling decisions. In this regard, the results of the present study are particularly meaningful for the fact it demonstrates the possibility for an automated sampling procedure to efficiently enable the chemical accuracy level of predictions. Those skilled in the art will appreciate that here chemical accuracy is not defined in a strict manner for the fact that binding affinities are not always linearly proportional to the log(IC.sub.50) values and five out of the 16 estimations have their individual unsigned errors larger than 1.0 kcal/mol (FIG. 4b).

[0108] As mentioned above, the p38 MAP Kinase inhibitor dataset has been known to be a challenging test bed; more approximate approaches often fail in satisfactorily reproducing the experimental values due to their inadequate description of non-additive effects such as water displacements. Moreover, the relative binding affinity range is as narrow as 2-3 kcal/mol, which corresponds to 100 fold potency range. A recent pioneering effort was made to address this issue by pre-sampling water molecules around the binding site. Although this pre-sampling treatment allows the prediction accuracy to be improved to the level of 1.44 kcal/mol of RMSD, 0.92 kcal/mol of MUE, and 0.61 of PI, still in this earlier study, complex educated procedures were needed and hundreds of folds of the sampling steps (the Monte Carlo steps) were required. Comparably, the OST method allows a similar accuracy level to be acquired within 1.0 ns of the sampling length for each of the protein-ligand simulations. It is noted that because different force fields are employed, the above efficiency comparison is only conditionally informative. In each of the free energy calculations, before the OST simulation, only a short (50 pico-seconds) canonical MD simulation was employed to slightly equilibrate the system for the fact that by the self-healing nature, OST simulations do not need to begin with fully equilibrated structures. Overall, the MD lengths required by OST are even shorter than (at most equivalent to) those required by popular approximate methods such as the MM/PBSA method.

[0109] To examine the long-time convergence behavior and the prediction robustness, the length of the simulation on each of the protein-ligand complexes was further elongated. Due to the heterogeneity of the sampling lengths (ranging from 4 ns to 9 ns), the final results are simply shown by the blue circles in FIG. 4b. Most of the prediction results stayed unchanged. Only in three systems, the changes are relatively significant (FIG. 4b); the prediction accuracy on 14 and 15 was enhanced by >0.5 kcal/mol and the prediction accuracy on 5 was slightly lowered by 0.25 kcal/mol. Interestingly, calculating the relative binding affinities of 14 and 15 has been known to be sampling-costly. As shown in FIG. 5a, which illustrates the binding pocket surrounding compound 14, the R1 group (the hydroxyl group in 14) is closely behind the deeper binding entrance; and its polarity directly influences the displacement of neighboring water molecules and the opening/closure of the nearby binding entrance. Therefore, in order to quantitatively simulate the alchemical transition between a target state with a polar group at the R1 position, such as 14 or 15, and the reference state 1 with a hydrophobic phenyl ring at the same location, adequate sampling of possible slow environmental relaxations is pivotal. In the orthogonal sampling scheme the generalized force is employed as the order parameter to describe strongly-coupled orthogonal space motions; the enlargement of its fluctuation via the effective tempering treatment in the OST method allows the sampling of slow motions that gate order parameter transitions to be specifically accelerated. In the simulation on 14 (FIG. 5b), with the moves of the order parameter , the number of the water molecules surrounding R1 (within 5 ) synchronously fluctuated, largely between 1 and 5. The two-dimension biasing potential g.sub.m(,), which was adaptively obtained during the OST simulation, is shown in FIG. 5c. In the algorithm, the generalized force is restrained to ; this potential reveals the fact that the orthogonal space sampling temperature 1500 K allows the fluctuation of the generalized force to overcome 8 KT strongly-coupled free energy barriers that are hidden in the orthogonal space, such as the barriers responsible for water displacements and binding entrance opening/closure.

[0110] In this work, a very general force field, which could be built automatically through a web-server, was employed; the acquired accuracy is surprisingly encouraging. Certainly, to gain better accuracy, particularly to improve the PI value to more confidently distinguish drug candidates that are in an even narrower potency range, the quality of employed potential energy functions needs to be improved. The unprecedented efficiency observed in the present study also sheds light on the feasibility of applying more advanced energy models in predicting protein-ligand interactions. Currently, we are actively coupling the OST algorithm with a fast MD engine, GROMACS (See, e.g., Van der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. C. GROMACS: Fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701-1718) based on the observed efficiency, meeting the pharmaceutical-industry lead optimization requirement, predictions in two days, is within reach. Moreover, the minimum-human-input feature of automated sampling methods can be appealing to industrial drug discovery processes as well. Although challenging, the p38 MAP Kinase inhibitor dataset certainly cannot represent all the possible complex scenarios; further test studies on varieties of datasets will be performed, in particular on the situation when no confident protein-ligand complex structure is available.

[0111] There have been described and illustrated herein several embodiments of methods and apparatus for double-integration orthogonal space tempering. While particular embodiments of the invention have been described, it is not intended that the invention be limited thereto, as it is intended that the invention be as broad in scope as the art will allow and that the specification be read likewise. It will therefore be appreciated by those skilled in the art that yet other modifications could be made to the provided invention without deviating from its spirit and scope as claimed.

METHODS AND APPARATUS FOR DOUBLE-INTEGRATION ORTHOGONAL SPACE TEMPERING

Assignee

Inventors

Cpc classification

Classification Explorer

G16C20/50

PHYSICS

Classification Explorer

G16B5/00

PHYSICS

Classification Explorer

G16C10/00

PHYSICS

International classification

Classification Explorer

G06F19/12

PHYSICS

Abstract

Claims

Description