Method for screening of target-based drugs through numerical inversion of quantitative structure-(drug)performance relationships and molecular dynamics simulation
11705224 · 2023-07-18
Assignee
Inventors
Cpc classification
G16C20/30
PHYSICS
G16C20/20
PHYSICS
G16C10/00
PHYSICS
International classification
G16B15/30
PHYSICS
G16C10/00
PHYSICS
G16C20/20
PHYSICS
Abstract
Disclosed is a target-based drug screening method using inverse quantitative structure-(drug)performance relationships (QSPR) analysis and molecular dynamics simulation. The method includes modeling a molecular structure of a test compound group against a target molecule, obtaining a quantitative structure-(drug)performance relationships (QSPR) of the test compound group, acquiring the optimal pharmacophore of a novel target-based drug through a numerical inversion of the QSPR, and selecting drug candidates having a molecular structure similar to the optimum pharmacophore from the test compound group.
Claims
1. A computer-implemented target-based drug screening method comprising: modeling, using a computer processor, a molecular structure of a test compound group against a target molecule, wherein the test compound group comprises one or more compounds that alters an activity of the target molecule, and wherein the target molecule comprises a polypeptide or a nucleic acid, the modeling comprising: receiving, by the computer processor, biological experimental data and chemical experimental data of the test compound group, and optimizing, by the computer processor, the molecular structure of the test compound group by quantum chemistry modeling of the molecular structure of the test compound on the basis of the biological and chemical experimental data; generating, by the computer processor, quantitative structure-performance relationships (QSPR) between one or more structure and performance of the test compound group, the generating comprising: producing, by the computer processor, molecular descriptors from the molecular structure of the test compound group by comparing against previously generated drug candidates stored in a database, and modeling by the computer processor, the QSPR on the basis of the molecular descriptors; acquiring, by the computer processor, an optimum pharmacophore of a novel drug through a numerical inversion of the QSPR; and selecting, by the computer processor, one or more drug candidate having a molecular structure similar to the optimum pharmacophore from the test compound group, the selecting comprising: verifying, by the computer processor, an optimum candidate based on images generated by the computer processor performing molecular dynamics simulation on data representing a plurality of complexes each comprising one of the selected drug candidates and the target molecule, the drug candidate data having different lipophilicity values associated with properties of the optimum candidate, wherein verifying the optimum candidate is based at least in part on the lipophilicity values.
2. The method according to claim 1, wherein, in the QSPR, the performance comprises one or more performances selected from among biological activity, inhibitory activity, the lipophilicity, toxicity, metabolic stability and blood-brain barrier permeability.
3. The method according to claim 1, wherein the generating further comprises: selecting one or more molecular descriptors from among the produced molecular descriptors; and using, in the modeling of the QSPR, a genetic algorithm using the selected molecular descriptors, wherein the molecular descriptors in the modeling of the QSPR is the selected molecular descriptors.
4. The method according to claim 1, wherein, during the acquiring by the computer processor, the optimum pharmacophore of the novel drug is acquired through a numerical inversion process according to Expression 2,
5. The method according to claim 1, wherein the selecting further comprises: rating each novel drug candidate according to an Euclidean distance between the optimum pharmacophore of the novel drug and the molecular structure of the novel drug candidate group; and selecting drug candidates that are rated equal to or higher than a predetermined level from among the drug candidates in the novel drug candidate group.
6. The method according to claim 1, wherein the selecting further comprises selecting the one or more drug candidate data having a maximum lipophilicity value.
Description
DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
BEST MODE
(11) Hereinafter, the present invention will be described in detail. Since the following description is provided for detailed description of one embodiment of the present invention, the scope of the present invention as defined by the appended claims is not limited to the embodiment although definite or limiting terms and expressions are used in the description. In describing one embodiment of the present invention, well-known functions or constructions will not be described in detail when they may obscure the gist of the present invention.
(12)
(13) Referring to
(14)
(15) The molecular structure modeling step S10 is to model the molecular structures of compounds in a test compound group to be tested against a target molecule. Examples of the target molecule include a protein, an enzyme, DNA, and RNA. The test compound refers to a compound that inhibits the activity of the target molecule or alters the target molecule.
(16) Specifically, referring to
(17) The compound selection process S11 is to select a group of test compounds. In this process, test compounds that can inhibit the activity of the target molecule or alter the target molecule are selected.
(18) The data reception process S12 is to receive biological and chemical experimental data of the test compounds. The data includes the boiling point, freezing point, polarity, solubility, reactivity, toxicity, selectivity, and the like of each test compound.
(19) In addition, the molecular structure modeling process S13 is to model the molecular structure of the test compound group on the basis of the experimental data and optimize the molecular structure by using quantum chemistry.
(20)
(21) The quantitative structure-(drug)performance relationships model creation step S20 is to obtain relationships between the structures and the performances of the test compound group.
(22) Specifically, referring to
(23) The molecular descriptor calculation process S21 is to calculate 4000 or more molecular descriptors on the basis of the molecular structure.
(24) In addition, the quantitative structure-(drug)performance relationships (QSPR) modeling process S22 is to obtain a QSAR model on the basis of the molecular descriptors.
(25) Specifically, in the QSPR modeling process S22, a genetic algorithm (GA) is applied to the molecular descriptors resulting from the calculation process S21 to select a part of the molecular descriptors, and then the quantitative structure-(drug)performance relationships is modeled by using the selected molecular descriptors.
(26) Here, the genetic algorithm (GA) is the most popular optimization algorithm that is based on the direct inference of natural selection and the Darwinian evolution of genes in biological systems, and it can be successfully applied to various processes such as data mining and optimization.
(27) In the present invention, QSPRs are modeled by using molecular descriptors calculated theoretically, rather than using 2-dimensional molecular descriptors that are often used in conventional quantitative structure-activity relationships (QSAR). Therefore, a more accurate description of the molecular structure of a target-based drug is possible.
(28) In addition, in the quantitative structure-(drug)performance relationships (QSPR), the performance may include one or more performances selected from among biological activity, inhibitory activity, lipophilicity, toxicity, metabolic stability and blood-brain barrier permeability. The relationships between each of the various performances and the molecular structure are obtained and used in the subsequent step.
(29) In the present invention, since nearly 4000 molecular descriptors are used to model the structure-performance relationships, the selection of the molecular descriptors, which has the greatest impact on the activity prediction of drug, and regression modeling can be simultaneously performed.
(30) In addition, the optimal pharmacophore acquisition step S30 is to obtain the optimal pharmacophore of a novel drug through numerical inversions of the QSPRs.
(31) Specifically, the optimal pharmacophore acquisition step S30 is to obtain the optimal pharmacophore which maximizes the performance (e.g., lipophilicity) of a drug through the numerical inversion which is performed according to Expression 1.
x*=arg max log {circumflex over (k)}.sub.w
s.t log {circumflex over (k)}.sub.w=C{circumflex over (t)}
{circumflex over (t)}=Px
{circumflex over (t)}.sup.TS.sub.t.sup.−1t≤c.sub.1
∥P{circumflex over (t)}−x∥≤c.sub.2 Expression 1
(32) x: a vector of molecular descriptors of a novel drug
(33) x*: a vector of optimal molecular descriptors of a novel drug candidate calculated through mathematical programming based on Expression 1
(34) C: output variable loading matrix of partial least squares (PLS)
(35) a PLS score vector of input variables (where the input variables are molecular descriptors x)
(36) P: PLS loading matrix
(37) {circumflex over ( )}: value predicted by PLS model
(38) S.sub.t: sample covariance matrix of t
(39) c.sub.1, c.sub.2: appropriate constant
(40) In addition, in the optimal pharmacophore acquisition step S30, the optimal pharmacophore having the performance (for example, lipophilicity or activity (log k.sub.i) designated by the user) can be obtained through the numerical inversion performed according to Expression 2.
(41)
(42) x: a vector of molecular descriptors of a novel drug
(43) x*: a vector of optimal molecular descriptors of a novel drug calculated through mathematical programming based on Expression 2
(44) C: output variable loading matrix of partial least squares (PLS)
(45) a PLS score vector of input variables (where the input variables are molecular descriptors x)
(46) P: PLS loading matrix
(47) {circumflex over ( )}: value predicted by PLS model
(48) S.sub.t: sample covariance matrix of t
(49) c.sub.1, c.sub.2: appropriate constant
(50) log k.sub.w,ref: lipophilicity value set by user
(51) log k.sub.i,ref: activity value set by user
(52) In addition, in the optimal pharmacophore acquisition step S30, the optimal pharmacophore having the performance designated by the user can be obtained through the numerical inversion using various objective functions such as Expression 1 or Expression 2.
(53)
(54) The drug candidate group screening step S40 is to select drug candidates having a molecular structure similar to the optimal pharmacophore of the novel drug.
(55) Specifically, referring to
(56) In addition, the novel drug candidate group rating process S40 is a process of rating each of the drug candidates in the novel drug candidate group according to the Euclidean distance between the optimal pharmacophore of the novel drug and the molecular structure of each of the drug candidates. A candidate with a shorter Euclidean distance is rated a higher level.
(57) In addition, the novel drug candidate group selection process 41 is a process of selecting drug candidates that are rated equal to or higher than a predetermined level from among all of the candidates in the novel drug candidate group.
(58)
(59) On the other hand, referring to
(60) In addition, the novel target-based drug verification step S50 is a step of verifying complexes each being composed of one of the drug candidates for the novel target-based drug, which are selected through the screening step, and the target molecule through molecular dynamics simulation.
(61) Specifically, in the novel target-based drug verification step S50, the optimum candidate for a novel drug against the target molecule can be selected by verifying the drug candidates by performing molecular dynamics simulation on the complexes each being composed of one of the selected drug candidates and the target molecule.
(62) On the other hand, as described above, the present invention checks various structural changes (conformational ensembles) rather than checking only the fixed molecular structures of the drug candidate group and the target molecule by using the molecular dynamics simulation.
(63) The greater details of the present invention will be described below with reference to examples and experiments described below. However, the examples and experiments are intended to describe the present invention in greater detail, and the scope of the present invention is not limited thereto.
Example: Designing and Screening of Sulfonamide Derivatives Inhibiting CA IX (i.e., Target Molecule)
(64) The present invention was applied to designing of sulfonamide derivatives inhibiting CA IX (i.e., target molecule). In the present example, lipophilicity was set as the performance of a drug, and quantitative structure-performance relationships (QSPR) were modeled using partial least squares (PLS).
(65) Liquid chromatography-mass spectrometry (LC-MS) was used to determine the lipophilicity (log k.sub.w) values of 14 sulfonamide isomers which were pre-synthesized (Table 1).
(66) TABLE-US-00001 TABLE 1 Com- Structure pound R.sub.1 R.sub.2 logk.sub.w
14 Sulfonamide Compounds and their Lipophilicity Values
(67) The molecular structure was optimized through PM3 semi-empirical quantum mechanics and nearly 4000 molecular descriptors were calculated for the optimized structure. This data was divided into a training data set and a test data set. Next, a genetic algorithm combined with the partial least squares method was used for descriptor selection. Thus, a quantitative structure-performance relationships (QSPR) was established on the basis of four molecular descriptors (Table 2).
(68) TABLE-US-00002 TABLE 2 Molecular descriptor Description R.sub.7e+ R maximum auto-correlation of a deviation of 7 weighted by Sanderson electronegativities F10[C-Cl] Frequency of a (C-Cl) atomic pair at a topological distance n of 10 H.sup.6i H auto-correlation of a deviation of 6 weighted by an ionization potential MLOGP Moriguchi octanol-water partition coefficient indicating hydrophobicity
Molecular Descriptor Selected for QSPR Modeling
(69)
(70) As shown in
(71) The developed QSPR model is inverted through numerical optimization having constrains described below.
x*=arg max log {circumflex over (k)}.sub.w
s.t log {circumflex over (k)}.sub.w=C{circumflex over (t)}
{circumflex over (t)}=Px
{circumflex over (t)}.sup.TS.sub.t.sup.−1t≤c.sub.1
∥P{circumflex over (t)}−x∥≤c.sub.2
(72) x: a vector of molecular descriptors of a novel drug
(73) x*: a vector of molecular descriptor of a novel drug calculated through mathematical programming based on the above expression.
(74) C: output variable (i.e., lipophilicity) loading matrix of partial least squares (PLS)
(75) a PLS score vector of input variables (where the input variables are molecular descriptors x)
(76) P: PLS loading matrix
(77) {circumflex over ( )}: value predicted by PLS model
(78) S.sub.t: sample covariance matrix of t
(79) c.sub.1, c.sub.2: appropriate constant
(80) The optimum molecular descriptors were obtained through this method, and the optimum molecular descriptors were compared against a database of previously generated drug candidates. Each of the drug candidates was then rated according to the Euclidean distance. This procedure yielded 11 drug candidates for a maximum log k.sub.w value of 3.965 (Table 3).
(81) TABLE-US-00003 TABLE 3 χ* R.sub.7e+ F10[C—Cl] H.sub.6i MLOGP Max, log kw = 3.965 0.025 5 1.040 4.529 Structure R.sub.1 R.sub.2 X value obtained from Database Distance C45 C46 C65 C66 C64 C63 C56 C55 C59 C51 C34
Euclidean Distance Between Abnormal Drug Designed by Structure-Performance Relationships Model and Each of Candidate Compounds in Database
(82)
(83) In order to verify the derived candidate compounds for a target-based drug, as shown in
(84) The molecular dynamics simulation showed that Zn.sup.2+ ions were coordinated with three histidine residues at the active sites. This was performed to ensure that the function of the CA IX enzyme was not impaired in the simulation.
(85)
(86) The stability and flexibility of the enzyme were analyzed by calculating the root mean square deviation (RMSD) and the root mean square fluctuation (RMSF). The hydrogen bonds and hydrophobic and hydrophilic interactions were also evaluated.
(87) As shown in
(88) This discrepancy is due to the small size of the 9FK sulfonamide. Since the screened compounds are bulky, the enzyme must match in its form in such a way that it can accommodate the compound within its active site.
(89)
(90) The root mean square fluctuation (RMSF) was calculated for all the complexes to evaluate the conformational ensemble of CA IX as shown in
(91) The results showed that the enzyme was the most stable in the CA IX-9FK complex. All the other complexes have a similar pattern of RMSF shapes. However, in most cases, higher RMSF values were exhibited in two typical regions: (i) flexible N ends (residues 9 to 20) and (ii) flexible loops (residues 230-240).
(92) The hydrogen bonding network found in the CA IX structure was analyzed to determine the cause of the difference in RMSF value. To this end, the percentage of hydrogen bonds that are formed was calculated through simulation (Table 4). It was found that a hydrogen bond was formed in each of the W9-H68, A133-R136, and R136-G139 bond pairs.
(93) TABLE-US-00004 TABLE 4 Bond pair C45 C46 C65 C66 C64 C63 C56 C55 C59 C51 C34 9FK W9-H68 0.04 1.73 67.01 2.77 35 21.77 9.09 0.41 26.07 48.06 0 80.45 A133-R136 43.55 49.71 49.04 36.07 48.71 38.48. 8.12 44.04 12.07 52.3 2.69 40.09 R136-G139 46.17 45.43 47.47 30.65 51.81 32.83 16.7 38.64 2.12 55.2 0.96 38.04 S237-P234 56.47 44.01 64.93 30.11 33.26 0 67.82 63.49 67.11 67.17 46.47 92.14 S237-G233 69.42 0 66.81 76.46 34.86 78.51 65.51 55.86 55.51 46.92 68.82 83.82 L249-R252 46.75 66.86 3.66 66.4 70.65 20.36 69.95 85.47 68 89.04 80.53 82
Percentage of Hydrogen Bonds Formed in Simulation
(94) The stability of the designed sulfonamide derivatives at the active site of the CA IX has a significant impact on the interaction. For this reason, changes in the conformation and flexibility of enzymes are considered. Their flexibility was observed by analyzing the root mean square deviation (RMSD) value when overlaying the conformations first in 22.06 ns. The results showed that all of the ligands were quite stable at the active site.
(95) The analysis of the interactions between CA IX and the designed sulfonamide derivatives and between CA IX and 9FK was performed in a manner of calculating the percentage of ligand-amino residue interactions (Table 5) and then calculating the average number of atoms and residues within a ligand coverage of 0.35 nm throughout the simulation (Table 6).
(96) TABLE-US-00005 TABLE 5 Residue C45 C46 C65 C66 C64 C63 C56 C55 C59 C51 C34 9FK T200 60.47 92.22 78.73 n.a. 98.02 82.95 98.02 100 99.19 n.a. 99.34 100 E106 n.a. 100 n.a. n.a. n.a. 32.24 97.01 100 n.a. n.a. 100 100 Q92 99.94 n.a. 38.33 84.48 99.42 93.23 95.43 62.05 n.a. 91.77 80.81 71.97 W210 n.a. 31.74 30.7 n.a. 81.86 n.a. n.a. 65.91 100 96.02 99.8 64.09 L134 66.68 n.a. 53.46 n.a. n.a. n.a. n.a. n.a. 71.83 58.6 n.a. 59.09 P203 83.41 n.a. 62.94 n.a. 44.02 n.a. n.a. n.a. n.a. 51.52 n.a. 46.94
Interaction Between CA IX and Candidate Compound for Drug
(97) TABLE-US-00006 TABLE 6 Measure C45 C46 C65 C66 C64 C63 C56 C55 C59 C51 C34 9FK Number of 29.19 ± 35.78 ± 29.13 ± 22.28 ± 28.12 ± 24.01 ± 32.70 ± 35.06 ± 45.79 ± 23.49 ± 45.12 ± 18.39 ± atoms 3.46 4.23 3.06 5.53 3.56 2.89 4.07 4.89 4.23 3.28 4.35 3.26 within 0.35 um of ligand Number of 13.5 ± 14.5 ± 17.18 ± 10.08 ± 12.82 ± 13.56 ± 16.06 ± 16.66 ± 18.14 ± 13.09 ± 16.62 ± 9.41 ± protein 1.54 1.53 1.70 1.58 1.76 1.49 1.52 1.66 1.60 1.56 1.49 1.60 residues within 0.35 nm of ligand
Interaction Between CA IX and Candidate Compound for Drug
(98) The interaction between 9FKs determined by crystallographic studies was maintained throughout the simulation. C65 ligand showed very similar characteristics except for their interaction with a compound E106. C66, in contrast, is the most different complex from the CA IX-9FK complex. Only two of the seven residues that interact with a ligand were conserved. Stronger hydrophobic and hydrophilic interactions were observed for the simulation of other complexes.
(99) As noted above, according to the example, 14 novel drug candidates were proposed through an inverse quantitative structure-performance relationships analysis, and molecular dynamics simulations were performed on 11 of the 14 candidates. As a result, all of the 11 candidates have been found to be suitable for the inhibition of CA IX than sulfonamide which is a baseline compound.
(100) In addition, all hydrophobic and hydrophilic interactions between substitution groups and active sites were carefully analyzed. Since crystallization of CA IX-ligand complexes is very difficult due to the complex membrane binding structures of enzymes, such an analysis can provide insight into and guidance for future synthesis.
(101) According to the analysis results, two compounds C59 and C34 are particularly promising for actual synthesis for in vitro and in vivo experiments to be performed in the subsequent step, in which the two compounds are (i) 5-chloro-4-methyl-2-sulfamoyl-phenyl) (1E)-4-chloro-5-hydrocy-N-(4-methylanilino)pentaneimidothioate and (ii) (5-chloro-4-methyl-2-sulfamoyl-phenyl) (1E)-N-(4-methyl-2-nitro-anilino)hexamidothioate.
(102) As described above, a target-based drug screening method according to the present invention can contribute to the rapid and efficient discovery of novel target-based drug. That is, the simplicity and rapid computation of inverse quantitative structure-performance relationships analysis and molecular dynamics simulation significantly reduce the cost of drug discovery and enable synthesis and wasteful pharmaceutical tests of false compounds to be avoided.
INDUSTRIAL APPLICABILITY
(103) The present invention can be applied to an initial research stage for drug development because it is possible to significantly reduce investment for discovery of target-based drug and to avoid synthesis and wasteful pharmaceutical tests of false compounds.