METHOD FOR PREDICTING THERAPEUTIC EFFICACY OF COMBINED DRUG BY MACHINE LEARNING ENSEMBLE MODEL

Abstract

A method of predicting therapeutic efficacy of a combined drug is provided. The method of predicting therapeutic efficacy of a combined drug can be useful in efficiently predicting therapeutic efficacy of the combined drug on cells by establishing and learning data through a computer using data on cells, data on individual drugs, and data on reaction between the cells and the individual drugs.

Claims

1. A method of predicting therapeutic efficacy of a combined drug, comprising: providing cell-related data; providing drug-related data on a plurality of drugs to be combined; providing drug/cell correlation-related data on correlation between the drugs and the cells; learning the cell-related data, the drug-related data, and the drug/cell correlation-related data using a computer algorithm; and evaluating combined therapeutic efficacy of the drugs to be combined.

2. The method of claim 1, wherein the providing of the cell-related data comprises: a first step of providing gene-level data; and a second step of providing pathway-level data deduced from the gene-level data.

3. The method of claim 2, wherein the gene-level data comprises mutation-related data, or intragenic copy number variation-related data.

4. The method of claim 1, wherein the providing of the drug-related data on the plurality of drugs comprises: providing the drug-related data on each of the plurality of drugs whose combined therapeutic efficacy is intended to be evaluated.

5. The method of claim 1, wherein the providing of the drug-related data on the plurality of drugs comprises: extracting the drug-related data at a pathway level from the drug-related data at a gene level.

6. The method of claim 5, wherein the drug-related data at the gene level provided in the providing of the drug-related data provide information on a custom-character target at a gene level.

7. The method of claim 5, wherein the drug-related data at the pathway level deduced from the drug-related data at the gene level provide mapping information and module information on a target at a pathway level.

8. The method of claim 1, wherein the providing of the drug/cell correlation-related data comprises: providing the drug/cell correlation-related data on each of the plurality of drugs whose combined therapeutic efficacy is intended to be evaluated.

9. The method of claim 1, wherein the providing of the drug/cell correlation-related data comprises: mapping feature data at a pathway level from the data on the correlation between the individual drugs and the cells at a gene level.

10. The method of claim 8, wherein the data on the correlation between the individual drugs and the cells at the gene level comprise drug target-related data, dose-related data, and drug response-related parameters.

11. The method of claim 1, wherein establishing a learning model for the cell-related data, the drug-related data and the correlation between the drugs and the cells using the computer algorithm comprises deducing n (n>1) gradient boosting classifier models consisting of a combination of different feature data and a combination of different learning parameters.

12. The method of claim 11, wherein the establishing of the learning model for the cell-related data, the drug-related data and the correlation between the drugs and the cells using the computer algorithm comprises predicting the combined therapeutic efficacy of the drugs to maximize cross-validation performance using an ensemble of then (n>1) gradient boosting classifier models.

13. The method of claim 11, wherein predicting and evaluating the combined therapeutic efficacy of the drugs to be combined is performed by calculating probabilities (P_S) of classifier models which predict the combined drug to have a synergic effect and probabilities (P_N) of classifier models which predict the combined drug to have no synergic effect.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0041] The above and other objects, features and advantages of the present invention will become more apparent to those of ordinary skill in the art by describing in detail exemplary embodiments thereof with reference to the accompanying drawings, in which:

[0042] FIG. 1 is a schematic diagram showing a method of predicting therapeutic efficacy of a combined drug according to the present invention;

[0043] FIGS. 2 to 8 are diagrams showing input data for machine learning:

[0044] FIG. 2 shows a mutation matrix at a gene level;

[0045] FIG. 3 shows a copy number variation matrix at a gene level;

[0046] FIG. 4 shows a drug target matrix at a gene level;

[0047] FIG. 5 shows a mutation matrix at a pathway level;

[0048] FIG. 6 shows a copy number variation matrix at a pathway level;

[0049] FIG. 7 shows a drug target matrix at a pathway level; and

[0050] FIG. 8 shows a correlation matrix between drugs and cells;

[0051] FIG. 9 shows input features used for ensemble learning of a gradient boosting classifier model in the method of predicting therapeutic efficacy of a combined drug according to the present invention;

[0052] FIG. 10 shows that gradient boosting classifier models which form an entire ensemble model have different performance supplementation patterns according to the learning parameters in the method of predicting therapeutic efficacy of a combined drug according to the present invention;

[0053] FIG. 11 shows a method of preventing predicted results from being changed according to the order of drug combinations upon prediction of synergic effects of drugs in the method of predicting therapeutic efficacy of a combined drug according to the present invention;

[0054] FIG. 12 shows an increase in precision value using a class-weighting technique to solve problems such as class distribution imbalance, which distinguishes between the presence and absence of the synergic effect of the combined drug, in the method of predicting therapeutic efficacy of a combined drug according to the present invention;

[0055] FIG. 13 shows that an ensemble is formed using probability values obtained through the prediction rather than values themselves predicted in each model when the ensemble is formed in the ensemble model according to one exemplary embodiment of the present invention;

[0056] FIG. 14 shows synergic result values and consequential confidence values which are predicted by the ensemble model when predicted in each of the models forming the ensemble model in the method of predicting therapeutic efficacy of a combined drug according to the present invention;

[0057] FIG. 15 shows the entire performance of the ensemble model according to one exemplary embodiment of the present invention; and

[0058] FIG. 16 shows types of cell lines having high prediction precision in the ensemble model according to one exemplary embodiment of the present invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

[0059] Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings. While the present invention is shown and described in connection with exemplary embodiments thereof, it will be apparent to those skilled in the art that various modifications can be made without departing from the scope of the invention.

[0060] Unless specifically stated otherwise, all the technical and scientific terms used in this specification have the same meanings as what are generally understood by a person skilled in the related art to which the present invention belongs. In general, the nomenclatures used in this specification and the experimental methods described below are widely known and generally used in the related art.

[0061] According to one exemplary embodiment of a method of predicting therapeutic efficacy of a combined drug according to the present invention, three models were established. The test results in each of the models are shown in FIG. 9.

[0062] In method of predicting therapeutic efficacy of a combined drug according to the present invention, each of the models forming an ensemble exhibits different prediction performances, depending on the constructed data and learning parameters used. For each of the three models S4, S5 and S11, different combinations were made using data sets, types of loss functions as learning parameters, leaning rates, data sampling ratios, the number of trees forming gradient boosting, and class weights to be learned.

[0063] FIG. 10 shows performance features in each of models by comparing predicted values to correct values. As shown in FIG. 10, it can be seen that the models exhibiting different performance features were used to determine the optimum predicted values complementarily due to an ensemble effect.

[0064] FIG. 11 shows a method of preventing predicted results from being changed according to the order of drug combinations when synergic effects of drugs are predicted according to the drug combinations in one embodiment of the method of predicting therapeutic efficacy of a combined drug according to the present invention.

[0065] In the method of predicting therapeutic efficacy of a combined drug according to the present invention, the machine learning is divided into learning data and test data for the purpose of prediction. In a prediction test, correct prediction is achieved even when a data format of the test data is identical to that of the learning data. For example, when a position of information is changed, an unwonted test may be carried out as the information whose position is changed is used instead of the original information.

[0066] The method of predicting therapeutic efficacy of a combined drug according to the present invention includes learning a plurality of drugs, but has a problem in that positions of the drugs may be changed during a test. Therefore, the reliability of the results may be improved by performing learning on the plurality of drugs in duplicate by changing positions of types of information on the drugs during machine learning in the present invention.

[0067] In the method of predicting therapeutic efficacy of a combined drug according to the present invention, there is generally an imbalance in the number of cases in which the presence and absence of the synergic effect in the learning data are present as correct answers. This is because there are absolutely few cases in which the drugs are effective in biological problems.

[0068] FIG. 12 shows a method of solving a problem regarding the class imbalance according to the present invention. The method of predicting therapeutic efficacy of a combined drug according to the present invention is characterized by finding an optimum range of performance while changing the class weights to supplement such a class imbalance. As shown in FIG. 12, the class weights were applied from a basic weight of 1.0 to the peak weight of 2.2 at a change unit of 0.2. At the peak weight, the recall performance is not degraded below the base line

[0069] In the method of predicting therapeutic efficacy of a combined drug according to the present invention, six evaluation indexes were made to evaluate the prediction performance of synergy values of the drug combinations in an algorithm, as shown in FIG. 13. The results of measuring the performance evaluation results for the evaluation indexes are shown in FIG. 13.

[0070] 1. Sequential three way ANOVA: scoreglobal=sgnlog 10(p) [0071] sgn: sign of the effect size, [0072] p: p-value for F-statistic

[0073] 2. BAC_20=(Sensitivity+Specificity)/2

[0074] 3. Precision_20=TP/(TP+FP)

[0075] 4. Sensitivity_20=TP/(TP+FN)

[0076] 5. Specificity_20=TN/(TN+FP)

[0077] 6. F1_20=2 TP/(2TP+FP+FN)

[0078] Note: TP: True Positive, TN: True Negative, FP: False Positive, and FN: False Negative

[0079] When a confusion matrix was constructed to calculate BAC, precision, sensitivity, specificity, and F1 values, the cut-off value for the presence or absence of the synergy values was set as 20.

[0080] In the method of predicting therapeutic efficacy of a combined drug according to the present invention, when an ensemble is formed using the result values predicted in each of gradient boosting models forming the ensemble, that is, using the presence and absence of the synergic effect, the presence and absence of the synergic effect are not simply summed up, but predicted confidence bounds of the presence and absence of the synergic effect in each model are applied to be summed up, as shown in FIG. 14, thereby further improving the prediction performance by the ensemble.

[0081] FIG. 15 shows synergy values (synergy_score) and confidence values according to the drug combination (id) predicted using the developed algorithm in the method of predicting therapeutic efficacy of a combined drug according to the present invention.

[0082] The drug combinations and concentrations thereof with which certain cells were treated are shown in in the id of FIG. 15.

[0083] (Case) NCI-H747;IGFR_2;MAP2K_1;3.000;10.000

[0084] NCI-H1793: cell type, IGFR_2: drug1 name, MAP2K_1: drug2 name, 3.000: peak concentration (uM) of drug 1, 10.000: peak concentration (uM) of drug2

[0085] FIG. 15 shows that the synergy_score is indicated by 1 or 0 using certain cut-off values for the presence and absence of the synergy values of the combinations.

[0086] In the method of predicting therapeutic efficacy of a combined drug according to the present invention, the presence and absence of the synergic effects through gradient boosting depends on the confidence values. The presence and absence of the synergic effects are determined by applying the cut-off values to the confidence bounds. In FIG. 15, the confidence value refers to an output value in the gradient boosting model, that is, a probability value determining whether or not the drug combinations have a synergic effect.

[0087] FIG. 16 shows results of analyzing whether the method of predicting therapeutic efficacy of combined drug according to the present invention has a strong point in predicting the combined therapeutic efficacy against a certain cell line using the developed algorithm, and whether the method easily predicts the combined therapeutic efficacy in which cell line if so.

[0088] A confusion matrix between correct answers and predicted results provided based on the certain cut-off values for a total of 85 cell lines were generated. The accuracies and accuracy p-values of the predicted results for respective cell lines were calculated using such a confusion matrix. As a result, it was revealed that the accuracy p-value was less than 0.1 in 11 of the 85 cell lines.

[0089] Therefore, the present inventors have judged that the prediction method according to the present invention has a significant strong point for the 11 corresponding cell lines, and identified site primary and histology of each of the cell lines. Then, the 11 cell lines whose accuracy p-value was less than 0.1 were aligned according to the accuracy. As a result, the bar graph in which organs are indicated by different colors is shown in FIG. 16. As shown in FIG. 16, it can be seen that the 11 cell lines consisted of 8 lung carcinoma cell lines, two breast carcinoma cell lines, and one large intestine carcinoma cell line.

[0090] The method of predicting therapeutic efficacy of a combined drug according to the present invention can be useful in efficiently predicting therapeutic efficacy of the combined drug on certain cells by establishing and learning data for designing input features for rote learning so as to predict the therapeutic efficacy of the combined drug on a certain cell line using a computer and data on cells, data on individual drugs, and data on reaction between the cells and the individual drugs.

[0091] Accordingly, the method of predicting therapeutic efficacy of a combined drug according to the present invention is applicable to a new-drug development process by choosing a drug combination having a high probability.

[0092] It will be apparent to those skilled in the art that various modifications can be made to the above-described exemplary embodiments of the present invention without departing from the scope of the invention. Thus, it is intended that the present invention covers all such modifications provided they come within the scope of the appended claims and their equivalents.

METHOD FOR PREDICTING THERAPEUTIC EFFICACY OF COMBINED DRUG BY MACHINE LEARNING ENSEMBLE MODEL

Inventors

Cpc classification

Classification Explorer

G16B40/00

PHYSICS

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G16C20/30

PHYSICS

Classification Explorer

G06N20/20

PHYSICS

Classification Explorer

G16B5/20

PHYSICS

Classification Explorer

G16B5/00

PHYSICS

Classification Explorer

G16C20/70

PHYSICS

Classification Explorer

G16B20/00

PHYSICS

International classification

Classification Explorer

G06F19/24

PHYSICS

Classification Explorer

G06N7/00

PHYSICS

Classification Explorer

G06F19/12

PHYSICS

Abstract

Claims

Description