Methods for shape comparison between drug molecules
09811642 · 2017-11-07
Assignee
Inventors
Cpc classification
G16B15/00
PHYSICS
G16C20/30
PHYSICS
International classification
G06F7/60
PHYSICS
Abstract
The invention relates to a calculating method for molecular volume and shape comparison of two molecules. The method includes steps of loading in three-dimensional structure information of a first molecule, the three-dimensional structure information comprising type and coordinate values of each atom contained in the first molecule; obtaining respective van der Waals radius based on the type of respective atom contained in the first molecule, converting the three-dimensional structure information into a group of Gaussian spheres representing atoms in the first molecule; calculating overlap volume for each pair of Gaussian spheres; calculating the weight of each Gaussian sphere; calculating self-overlap volume of the first molecule, the self-overlap volume being used as a volume of the first molecule. The present invention is useful mathematical expression of molecular shape, shape comparison of drug molecules, and pharmacophore comparison of drug molecules, which comparisons, in turn, useful for virtual screening of drug molecules.
Claims
1. A method for shape comparison between drug molecules, comprising steps of: (31) loading in three-dimensional structure information for a first molecule and a second molecule and calculating self-overlap volumes of the first and second molecules, the calculation comprising steps of: loading in three-dimensional structure information of the first molecule, the three-dimensional structure information comprising type and coordinate values of each atom contained in the first molecule; obtaining respective van der Waals radius based on the type of respective atom contained in the first molecule, converting the three-dimensional structure information into a group of Gaussian spheres representing atoms in the first molecule, each of the set of Gaussian spheres having a radius equal to the van der Waals radius of respective atom and a position equal to the coordinate position of respective atom; calculating overlap volume for each pair of Gaussian spheres in the first molecule, wherein the ij.sup.th pair of Gaussian spheres in the first molecule consists of the i.sup.th and the j.sup.th Gaussian spheres and has a overlap volume v.sub.ij; calculating the weight of each Gaussian sphere in the first molecule, the weight of the .sup.ith Gaussian sphere in the first molecule
2. The method of claim 1, wherein the intermolecular volume of step (32) is also an overlap volume between molecules, which is calculated by a step of calculating overlap volumes
3. The method of claim 1, wherein the k is ranged between 0.5 and 1.0, and k′ is ranged between 0.5 and 1.0.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
(8) The present invention will now be described in more detail with reference to preferred examples in combination with the drawings.
(9) The present invention proposed a quantitative comparison method for molecular three-dimensional shape, with reduced complexity of calculation for quantitative comparison of three-dimensional shapes of molecules and increased calculation accuracy.
(10) In formulas (1) and (2), Gaussian function g.sub.i(r) is used instead of hard sphere density function f.sub.i(r), while the remaining part of the formulas (1) and (2) keeps unchanged. The Gaussian volume density for each of the atoms in an molecule is expressed in Formula (3):
(11)
(12) wherein r represents coordinate of any point, r.sub.i represents the coordinate of atom i, σ.sub.i represents the van der Waals radius of the atom, and p is an adjustable factor typically with a value of 2.7. Due to the simplicity of summation and derivation of Gaussian function, the calculation complexity is reduced. However, when Formula (2) is used for calculating molecule shape, the number of summation of higher orders will be explosively increased, which brings challenges to the design of the algorithm. Therefore, an approximation is performed to Formula (1), i.e., molecular volume density is simplified as a simple add-up of atom volume densities:
(13)
(14) Although this approximation simplifies the calculation, the calculation accuracy is reduced because volume overlaps between atoms are completely ignored.
(15) The self-overlap volume of a molecule obtained by Formula (4) is higher than actual molecule volume by an overestimation ratio of between 2 and 6.
(16) Because the calculation of molecular shape similarity involves the overlap volume between molecules and the self-overlap volume of the molecule,
(17)
(18) large error will be generated when Formula (4) is used for similarity calculation. Formula (4) is further modified to achieve a quantitative comparison method for molecular three-dimensional shape.
(19) For any molecule consisted of N atoms, the shape of the molecule can be expressed by a volume density function,
(20)
(21) wherein w.sub.i is a weight factor for atom i, r.sub.i represents the coordinate of atom i, σ.sub.i represents the van der Waals radius of the atom, and p is an adjustable factor which is set to a value of 2√{square root over (2)}. Such a p value enables the overlap volume for an atom Gaussian sphere with itself to equal to the volume of the atom itself, i.e, 4πσ.sub.i.sup.3/3.
(22) The determination of the weight factor is critical to the present invention. The factor eliminates, to the maximum extent, the errors caused by the simple add-up and the approximation. Because the error is originated from the double calculation of the volume of overlapping between atoms, the self-overlap volume of the molecule thus obtained is multiple times than actual value. The weight factor is an empirical function in relation to the environment of an atom in the molecule. The empirical function satisfies the following requirements:
(23) If the atom is remote from other atoms in the molecule (for example, inert gas molecule contains only one atom), the weight factor should equal to 1; if the atom is overlapped with other atom, the factor should be less than 1 and the more the overlap, the smaller the factor.
(24) The present invention proposes a method for determining the value of the weight factor:
(25)
(26) wherein k is a universal constant obtainable by fitting, ranged between 0.5 and 1.0. Accordingly, the overlap volume between two molecules is calculated by Formula (8):
(27)
(28) wherein, w.sub.i is the weight factor of the i.sup.th atom in the first molecule, w.sub.j′ is the weight factor of the j.sup.th atom in the second molecule, v.sub.ij″ is the overlap volume between the i.sup.th atom in the first molecule and the j.sup.th atom in the second molecule, A is the set of all the atoms in the first molecule, and B is the set of all the atoms in the second molecule.
(29) If the first and second molecules have same 3-D structures, V.sub.12 becomes self-overlap volume of the molecule, which is supposed to be equal to the self-volume of the molecule. The constant k is obtained by fitting the self-overlap volumes to the hard-sphere volumes for a set of diverse molecules.
(30)
(31) Because the accuracy of the overlap volume between molecules is improved by the present invention, the accuracy of shape similarity computation of the molecules by Formula (5) is also improved.
(32) Furthermore, the weight factor, as an empirical constant, can also be determined by the topological properties of an atom in a molecule, such as atom heterozygosity, bonding type and number, neighboring atom type, and so on. The weight factor thus determined has nothing to do with molecular configuration, which reducing the flexible overlap between molecules and also the complexity of flexible comparison of molecular shape similarity.
Example I. Calculation of Molecule Volume
(33) As shown in
(34) step S301, loading in three-dimensional structure information of a first molecule, the three-dimensional structure information comprising type and coordinate values of each atom contained in the first molecule;
(35) step S302, obtaining respective van der Waals radius based on the type of respective atom contained in the first molecule, converting the three-dimensional structure information into a group of Gaussian spheres representing atoms in the first molecule, each of the set of Gaussian spheres having a radius equal to the van der Waals radius of respective atom and a position equal to the coordinate position of respective atom, the expression of Gaussian function is shown in Formula (3);
(36) step S303, calculating overlap volume for each pair of Gaussian spheres, wherein the ij.sup.th pair of Gaussian spheres consists of the i.sup.th and the j.sup.th Gaussian spheres and has a overlap volume v.sub.ij, because the product of Gaussian functions is still a Gaussian function, the overlap integral can thus simply obtained, and the overlap integral is preserved in a two-dimensional array for repeat use in subsequence calculation;
(37) step S304, calculating the weight of each Gaussian sphere and preserving it in a one-dimensional array for later use, the weight of the i.sup.th Gaussian sphere being
(38)
wherein v.sub.i is the volume of the i.sup.th Gaussian sphere, and k is a constant;
(39) step S305, calculating self-overlap volume of the first molecule
(40)
the self-overlap volume being used as a volume of the first molecule, wherein w.sub.i is the weight of the i.sup.th Gaussian sphere, w.sub.i is the weight of the j.sup.th Gaussian sphere, v.sub.ij is the overlap volume of the i.sup.th and the j.sup.th Gaussian spheres, and A is a set of all Gaussian sphere in the first molecule.
Example II. Calculation of Maximum Overlap Volume and Shape Similarity Between Two Molecules
(41) The present example involves the maximum optimization of overlap volume between molecules, the process of which is shown in
(42) Step S401, read in the first and second molecules, the self-overlap volume of each of the molecules being calculated by way of Example I, and also the weight factor of each atom, both being preserved in the memory of a computer, specifically including
(43) loading in three-dimensional structure information of a first molecule, the three-dimensional structure information comprising type and coordinate values of each atom contained in the first molecule;
(44) obtaining respective van der Waals radius based on the type of respective atom contained in the first molecule, converting the three-dimensional structure information into a group of Gaussian spheres representing atoms in the first molecule, each of the set of Gaussian spheres having a radius equal to the van der Waals radius of respective atom and a position equal to the coordinate position of respective atom;
(45) calculating overlap volume for each pair of Gaussian spheres in the first molecule, wherein the i.sup.th pair of Gaussian spheres consists of the i.sup.th and the i.sup.th Gaussian spheres in the first molecule and has a overlap volume v.sub.ij;
(46) calculating the weight of each Gaussian sphere in the first molecule, the weight of the i.sup.th Gaussian sphere in the first molecule being
(47)
wherein v.sub.i is the volume of the i.sup.th Gaussian sphere in the first molecule, and k is a constant;
(48) calculating self-overlap volume of the first molecule
(49)
wherein A is a set of all Gaussian sphere in the first molecule;
(50) loading in three-dimensional structure information of a second molecule, the three-dimensional structure information comprising type and coordinate values of each atom contained in the second molecule;
(51) obtaining respective van der Waals radius based on the type of respective atom contained in the second molecule, converting the three-dimensional structure information into a group of Gaussian spheres representing atoms in the second molecule, each of the set of Gaussian spheres having a radius equal to the van der Waals radius of respective atom and a position equal to the coordinate position of respective atom;
(52) calculating overlap volume for each pair of Gaussian spheres in the second molecule, wherein the ij.sup.th pair of Gaussian spheres consists of the i.sup.th and the i.sup.th Gaussian spheres in the second molecule and has a overlap volume v.sub.ij′;
(53) calculating the weight of each Gaussian sphere in the second molecule, the weight of the i.sup.th Gaussian sphere in the first molecule being
(54)
wherein v.sub.i′ is the volume of the i.sup.th Gaussian sphere in the second molecule, and k′ is a constant;
(55) calculating self-overlap volume of the second molecule
(56)
wherein B is a set of all Gaussian sphere in the second molecule.
(57) Step 402, initial overlap is performed for the two molecules. As a start for the maximum volume overlap, the centers of gravity of the two molecules are overlapped. The initial overlap needs not to be very accurate as a maximum optimization will be performed for the overlap volume. A simple gravity center overlap is to translate the average position, serving as the center of gravity, of each atom in each of the molecules to enable them to overlap;
(58) During overlap optimization, one of the molecules (such as the first molecule) is fixed, and the other molecule is subject to transformation of rigid body rotation and translation, in order to obtain the maximum overlap volume through Formula (8). The transformation relates to six degrees of freedom. In this example, the Newton-Raphson method is adopted to optimize the six degrees of freedom, then perform Step S403.
(59) Step S403, for the current overlap, the overlap volume
(60)
between the first and second molecules is calculated, wherein v.sup.ij″ is the overlap volume between the i.sup.th atom in the first molecule and the j.sup.th atom in the second molecule. The first and second derivatives of V.sub.12, with respect to the coordinates of each atom in the second molecule is calculated, and then the derivatives are transformed onto the variables for rigid rotation and translation, to obtain the first and second derivatives of V.sub.12, with respect to the six degrees of freedom of the rotation and translation of the second molecule.
(61) Step S404, the Newton-Raphson method is adopted to determine the variables for transforming the six degrees of freedom for the rotation and translation of the second molecule, and transform the coordinate of the second molecule accordingly, and then perform Step S405;
(62) Step S405, determine whether a convergence exits. If yes, perform Step S406. If no, return to Step S403.
(63) One typical issue in optimization is multiple maxima. The purpose of this example is to obtain an overall maximum overlap volume. The solution for achieving this purpose is to adopt multiple initial overlaps (initial orientations) and select the maximum value from the optimized results. Normally, the multiple different initial overlaps can be achieved by rotation of any one of the overlaps along different axes.
(64) Step S406, preserve the overlap volume of current initial overlap. If there is any initial overlap that is not performed, select one of the unperformed initial overlaps as current initial overlap and perform Step S403. Otherwise, select the maximum value from the overlap volumes obtained for all of the initial overlaps V.sub.12.sup.Max as the maximum overlap volume, and perform step S407.
(65) Step S407, calculate the similarity
(66)
between the first and second molecules and output the similarity S.sub.12 as the comparison result of the shapes of the two molecules.
(67) The condition of convergence of step S405 is normally that the module of first-order derivative is less than a predetermined threshold.
Example III. Comparison of Composite Similarity of Shapes and Pharmacophores Between Two Molecules
(68) The present example further incorporates the contribution of pharmacophores to the similarity comparison of molecules, which is illustrated in
(69) Step S501, read in the 3-D structural information of the first and second molecules, calculate self-overlap volume of each molecule and weight factor of each atom, find out and locate pharmacophores in each of the molecules, and store these results in the memory of a computer. Each of the pharmacophores is indicated with a Gaussian sphere with a radius of 2 angstroms. The Gaussian spheres are differently colored according to the types of the pharmacophores. For example, hydrogen bond donor is colored green, hydrogen bond receptor is colored pink, hydrophobic group is colored pearl blue, positive charge is colored red, negative charge is colored navy-blue, and so on.
(70) To obtain the composite similarity, the overlap between the two molecules needs to be optimized. Unlike Example II, in addition to the overlap volume between the two molecules, the target functions to be optimized further include the overlap contribution between same type pharmacophores in the two molecules. Because same type pharmacophores are normally dispersed in a molecule, no weight calibration is necessary. The target function to be optimized is:
(71)
(72) wherein V.sub.12, is the overlap volume between molecules obtained by Formula (8), F.sub.ij′ is the overlap volume between the i.sup.th pharmacophore in the first molecule and the j.sup.th pharmacophore in the second molecule, and the summation over F.sub.ij′ is limited only to the same type (color) of pharmacophores.
(73) The self composite overlap volume
(74)
of the first molecule is calculated by Formula (9), wherein F.sub.ij is the overlap volume between the i.sup.th pharmacophore and the j.sup.th pharmacophore in the first molecule, and the summation over F.sup.ij is limited only to the same type (color) of pharmacophores. The self composite overlap volume
(75)
of the second molecule is calculated by Formula (9), wherein F.sub.ij′ is the overlap volume between the i.sup.th pharmacophore and the j.sup.th pharmacophore in the second molecule, and the summation over F.sub.ij′ is limited only to the same type (color) of pharmacophores
(76) Step S502, initial overlap is performed for the two molecules. As a start for the maximum volume overlap, the centers of gravity of the two molecules are overlapped. The average position of each atom in each molecule, serving as the center of gravity, is translated to enable them to overlap;
(77) During overlap optimization, one of the molecules (such as the first molecule) is fixed, and the other molecule is subject to transformation of rigid body rotation and translation, in order to obtain the maximum overlap volume through Formula (9). The transformation relates to six degrees of freedom. In this example, the Newton-Raphson method is adopted to optimize the six degrees of freedom, then perform Step S503.
(78) Step S503, for the current overlap, the composite overlap volume
(79)
between the first and second molecules is calculated. The first and second derivatives of O.sub.12 with respect to the coordinates of each atom in the second molecule is calculated, and then the derivatives are transformed onto the variables for rigid rotation and translation, to obtain the first and second derivatives of O.sub.12 with respect to the six degrees of freedom of the rotation and translation of the second molecule.
(80) Step S504, the Newton-Raphson method is adopted to determine the variables for transforming the six degrees of freedom for the rotation and translation of the second molecule, and transform the coordinate of the second molecule accordingly, and then perform Step S505;
(81) Step S505, determine whether a convergence exits. If yes, perform Step S506. If no, return to Step S503.
(82) One typical issue in optimization is multiple maxima. The purpose of this example is to obtain an overall maximum overlap volume. The solution for achieving this purpose is to adopt multiple initial overlaps (initial orientations) and select the maximum value from the optimized results. Normally, the multiple different initial overlaps can be achieved by rotation of any one of the overlaps along different axes.
(83) Step S506, preserve the overlap volume of current composite initial overlap. If there is any initial overlap that is not performed, select one of the unperformed initial overlaps as current initial overlap and perform Step S503. Otherwise, select the maximum value from the composite overlap volumes obtained for all of the initial overlaps O.sub.12.sup.Max as the maximum overlap volume, and perform step S507.
(84) Step S507, calculate the similarity
(85)
(Formula 10) between the first and second molecules and output the similarity S.sub.12 as the comparison result of the shapes of the two molecules.
(86) The condition of convergence of step S505 is normally that the module of first-order derivative is less than a predetermined threshold.
Example IV. Overlap of a Set of Molecules
(87) The present example is an extension to Example II, which is illustrated in
(88) Step S601, a system loads in a set of molecules, calculates a volume of each of the molecules and a weight factor of respective atom in each of the molecules. The results are stored in a memory of a computer.
(89) Step S602, select one from the set of molecules as a target. The target molecule may be the first molecule, the molecule having the largest volume, or be designated as desired. After selection, the target molecule is fixed.
(90) Step S603, rotate and translate the remaining molecules by the method described in Example II to obtain the maximum overlap with the target molecule. Due to the multiple maxima issue, multiple initial overlap position can be adopted for each overlapping molecule, and then a optimal overlap result is selected.
(91) Step S604, output the coordinates of the target molecule and the molecule which is well overlapped with the target molecule. The pair of overlap molecules thus obtained can be used for 3-D quantitative structure-activity relationship (QSAR) analysis or the establishment of pharmacophore model.