MOLECULAR TYPING OF MULTIPLE MYELOMA AND APPLICATION

Abstract

Disclosed are molecular typing of multiple myeloma and application thereof. Specifically, disclosed is a product comprising a substance for obtaining or detecting 97 gene expressions in multiple myeloma patients to be detected and an apparatus for operating a multiple myeloma Bayesian classifier. By using the product, the present invention identifies a gene module co-expressed with the MCL1 gene, thereby distinguishing molecular subtypes of multiple myeloma having different prognoses and bortezomib sensitivities.

Claims

1. An application of a substance for obtaining or detecting the expression of 97 genes in patients with multiple myeloma to be tested in the preparation of products for predicting the prognosis of the patients with multiple myeloma to be tested: the 97 genes comprising: ACBD3, ADAR, ADSS, ALDH2, ANP32E, ANXA2, ATF3, ATP8B2, CACYBP, CAPN2, CCND1, CCT3, CDC42SE1, CERS2, CHSY3, CLIC1, CLMN, COPA, CSNK1G3, DAPS, DENND1B, ENSA, EPRS, EPSTI1, EVL, FAM13A, FAM49A, FLAD1, FRZB, GLRX2, HAX1, HDGF, HLA-A, HLA-B, HLA-C, HLA-F, HLA-G, IL6R, ISG20L2, JTB, KLF2, LAMTOR2, LDHA, MCL1, MOXD1, MRPL24, MRPL9, MVP, MYL6, NDUFS2, NOP58, NOTCH2NL, NTAN1, PAK1, PI4 KB, PIEZO1, PIK3AP1, PIM2, PIP5K1B, PMVK, POGZ, PPIA, PRCC, PRKCA, PRRC2C, PSMB4, PSMD4, RAB29, RCBTB2, SCAMP3, SCAPER, SDHC, SEL1L3, SELPLG, SHC1, SIDT1, SSR2, STAP1, TAP1, TIMM17A, TLR10, TMCO1, TOR1AIP2, TOR3A, TP53INP1, TPM3, TRANK1, TROVE2, UAP1, UBE2Q1, UBQLN4, UHMK1, VPS45, YY1AP1, ZC3H11A, ZFP36, and ZNF593.

2. The application according to claim 1, characterised in that: the prognosis is reflected in a prognostic survival rate, a length of survival or a degree of survival risk.

3. An application of a substance for obtaining or detecting the expression of 97 genes in patients with multiple myeloma (MM) to be tested in the preparation of products with at least one of the following a-c functions: a. detection of an efficacy of bortezomib or bortezomib-containing treatment in patients with MM; b. detection of a sensitivity of bortezomib or bortezomib-containing treatment in patients with MM; c. direction for administration of bortezomib or bortezomib-containing treatment in patients with MM; the 97 genes comprising: ACBD3, ADAR, ADSS, ALDH2, ANP32E, ANXA2, ATF3, ATP8B2, CACYBP, CAPN2, CCND1, CCT3, CDC42SE1, CERS2, CHSY3, CLIC1, CLMN, COPA, CSNK1G3, DAP3, DENND1B, ENSA, EPRS, EPSTI1, EVL, FAM13A, FAM49A, FLAD1, FRZB, GLRX2, HAX1, HDGF, HLA-A, HLA-B, HLA-C, HLA-F, HLA-G, IL6R, ISG20L2, JTB, KLF2, LAMTOR2, LDHA, MCL1, MOXD1, MRPL24, MRPL9, MVP, MYL6, NDUFS2, NOP58, NOTCH2NL, NTAN1, PAK1, PI4 KB, PIEZO1, PIK3AP1, PIM2, PIP5K1B, PMVK, POGZ, PPIA, PRCC, PRKCA, PRRC2C, PSMB4, PSMD4, RAB29, RCBTB2, SCAMP3, SCAPER, SDHC, SEL1L3, SELPLG, SHC1, SIDT1, SSR2, STAP1, TAP1, TIMM17A, TLR10, TMCO1, TOR1AIP2, TOR3A, TP53INP1, TPM3, TRANK1, TROVE2, UAP1, UBE2Q1, UBQLN4, UHMK1, VPS45, YY1AP1, ZC3H11A, ZFP36, and ZNF593.

4. An application of a substance for obtaining or detecting the expression of 97 genes in patients with multiple myeloma (MM) to be tested or an apparatus for running a Bayesian classifier of multiple myeloma in preparing products for predicting the prognosis of the patients with multiple myeloma to be tested: the 97 genes comprising: ACBD3, ADAR, ADSS, ALDH2, ANP32E, ANXA2, ATF3, ATP8B2, CACYBP, CAPN2, CCND1, CCT3, CDC42SE1, CERS2, CHSY3, CLIC1, CLMN, COPA, CSNK1G3, DAP3, DENND1B, ENSA, EPRS, EPSTI1, EVL, FAM13A, FAM49A, FLAD1, FRZB, GLRX2, HAX1, HDGF, HLA-A, HLA-B, HLA-C, HLA-F, HLA-G, IL6R, ISG20L2, JTB, KLF2, LAMTOR2, LDHA, MCL1, MOXD1, MRPL24, MRPL9, MVP, MYL6, NDUFS2, NOP58, NOTCH2NL, NTAN1, PAK1, PI4 KB, PIEZO1, PIK3AP1, PIM2, PIP5K1B, PMVK, POGZ, PPIA, PRCC, PRKCA, PRRC2C, PSMB4, PSMD4, RAB29, RCBTB2, SCAMP3, SCAPER, SDHC, SEL1L3, SELPLG, SHC1, SIDT1, SSR2, STAP1, TAP1, TIMM17A, TLR10, TMCO1, TOR1AIP2, TOR3A, TP53INP1, TPM3, TRANK1, TROVE2, UAP1, UBE2Q1, UBQLN4, UHMK1, VPS45, YY1AP1, ZC3H11A, ZFP36, and ZNF593; the Bayesian classifier of multiple myeloma is obtained by a method comprising the following steps: 1) obtaining the expression data of the 97 classifier genes in n MM samples; 2) assigning the MM samples into an MCL1-M high subtype or an MCL1-M low subtype by consensus clustering; and 3) employing a nave Bayes method to construct the Bayesian classifier on the basis of the two subtypes of step 2), the 97 gene expression data of n multiple myeloma samples in step 1), and prognostic survival data of the n multiple myeloma samples.

5. The application according to claim 4, characterised in that: the prognosis is reflected in a prognostic survival rate, a length of survival or a degree of survival risk.

6. An application of a substance for obtaining or detecting the expression of 97 genes in patients with multiple myeloma (MM) to be tested or an apparatus for running a Bayesian classifier of multiple myeloma in preparing products in the preparation of products with at least one of the following a-c functions: a. detection of an efficacy of bortezomib or bortezomib-containing treatment in patients with MM; b. detection of a sensitivity of bortezomib or bortezomib-containing treatment in patients with MM; c. direction for administration of bortezomib or bortezomib-containing treatment in patients with MM; the 97 genes comprising: ACBD3, ADAR, ADSS, ALDH2, ANP32E, ANXA2, ATF3, ATP8B2, CACYBP, CAPN2, CCND1, CCT3, CDC42SE1, CERS2, CHSY3, CLIC1, CLMN, COPA, CSNK1G3, DAPS, DENND1B, ENSA, EPRS, EPSTI1, EVL, FAM13A, FAM49A, FLAD1, FRZB, GLRX2, HAX1, HDGF, HLA-A, HLA-B, HLA-C, HLA-F, HLA-G, IL6R, ISG20L2, JTB, KLF2, LAMTOR2, LDHA, MCL1, MOXD1, MRPL24, MRPL9, MVP, MYL6, NDUFS2, NOP58, NOTCH2NL, NTAN1, PAK1, PI4 KB, PIEZO1, PIK3AP1, PIM2, PIP5K1B, PMVK, POGZ, PPIA, PRCC, PRKCA, PRRC2C, PSMB4, PSMD4, RAB29, RCBTB2, SCAMP3, SCAPER, SDHC, SEL1L3, SELPLG, SHC1, SIDT1, SSR2, STAP1, TAP1, TIMM17A, TLR10, TMCO1, TOR1AIP2, TOR3A, TP53INP1, TPM3, TRANK1, TROVE2, UAP1, UBE2Q1, UBQLN4, UHMK1, VPS45, YY1AP1, ZC3H11A, ZFP36, and ZNF593; and the Bayesian classifier of multiple myeloma is obtained by a method comprising the following steps: 1) obtaining the expression data of the 97 classifier genes in n MM samples; 2) assigning the MM samples into an MCL1-M high subtype or an MCL1-M low subtype by consensus clustering; and 3) employing a nave Bayes method to construct the Bayesian classifier on the basis of the two subtypes of step 2), the 97 gene expression data of n multiple myeloma samples in step 1), and prognostic survival data of the n multiple myeloma samples.

7. A product, comprising a substance for obtaining or detecting the expression of 97 genes in patients with multiple myeloma (MM) to be tested or an apparatus for running a Bayesian classifier of multiple myeloma.

8. The product according to claim 7, characterised in that: the product has at least one of the following 1) to 4) functions: 1) predicting the prognosis of patients with multiple myeloma to be tested; 2) detecting the sensitivity of patients with multiple myeloma to bortezomib or a drug containing bortezomib; 3) detecting an efficacy of bortezomib or a drug containing bortezomib in the patients with multiple myeloma to be tested; 4) instructing the patients with multiple myeloma to be tested for administration of bortezomib or medications containing bortezomib.

9. The product according to claim 7 or 8, characterised in that: the product further comprises a carrier for recording a detection method; the detection method comprises the following steps: obtaining or detecting the expression of 97 genes in the multiple myeloma patient to be tested to obtain the expression data of the 97 genes in the multiple myeloma patient to be tested; and then classifying the expression data of 97 genes in the multiple myeloma patient to be tested with a Bayes classifier of multiple myeloma, wherein the predicted prognosis of patients with multiple myeloma belonging to an MCL1-M-High subtype is significantly poor than that of patients with multiple myeloma belonging to an MCL1-M-Low subtype; or, the detection method comprises the following steps: obtaining or detecting the expression of 97 genes in the multiple myeloma patient to be tested to obtain the expression data of the 97 genes in the multiple myeloma patient to be tested; and then classifying the expression data of 97 genes in the multiple myeloma patient to be tested with a Bayes classifier of multiple myeloma, wherein the predicted prognosis of patients with multiple myeloma belonging to an MCL1-M-High subtype is better than that of patients with multiple myeloma belonging to an MCL1-M-Low subtype; or, the detection method comprises the following steps: obtaining or detecting the expression of 97 genes in the multiple myeloma patient to be tested to obtain the expression data of the 97 genes in the multiple myeloma patient to be tested; and then classifying the expression data of 97 genes in the multiple myeloma patient to be tested with a Bayes classifier of multiple myeloma, wherein if the patient with multiple myeloma to be tested belongs to an MCL1-M-High subtype, bortezomib or drugs containing bortezomib are used for treatment; if the patient with multiple myeloma to be tested belongs to an MCL1-M-Low subtype, bortezomib or bortezomib-containing drugs are not used for treatment.

10. The product according to claims 7 to 9, characterised in that: the multiple myeloma patients to be tested is a single patient or a plurality of patients.

11. A method of constructing a model for classifying multiple myeloma patients, comprising the following steps: 1) obtaining the expression data of the 97 classifier genes in n MM samples; 2) assigning the MM samples into an MCL1-M high subtype or an MCL1-M low subtype by consensus clustering; and 3) employing a nave Bayes method to construct the Bayesian classifier, that is, a target model, on the basis of the two subtypes of step 2), the 97 gene expression data of n multiple myeloma samples in step 1), and prognostic survival data of the n multiple myeloma samples.

12. A nave Bayes classifier prepared by the method according to claim 11.

13. An application of the nave Bayes classifier according to claim 12 in preparing at least one of 1) to 4), or preparing at least one product of 1) to 4): 1) predicting the prognosis of patients with multiple myeloma to be tested; 2) detecting the sensitivity of patients with multiple myeloma to bortezomib or a drug containing bortezomib; 3) detecting an efficacy of bortezomib or a drug containing bortezomib in the patients with multiple myeloma to be tested; 4) instructing the patients with multiple myeloma to be tested for administration of bortezomib or medications containing bortezomib.

14. A method for predicting the prognosis of patients with multiple myeloma to be tested, comprising the following steps: obtaining or detecting the expression of 97 genes in the multiple myeloma patient to be tested to obtain the expression data of the 97 genes in the multiple myeloma patient to be tested; and then classifying the expression data of 97 genes in the multiple myeloma patient to be tested with a Bayes classifier of multiple myeloma; the predicted prognosis of patients with multiple myeloma belonging to an MCL1-M-High subtype is significantly poor than that of patients with multiple myeloma belonging to an MCL1-M-Low subtype.

15. The method according to claim 14, characterised in that: the prognosis is reflected in a prognostic survival rate, a length of survival or a degree of survival risk; the predicted prognosis of patients with multiple myeloma belonging to the MCL1-M-High subtype is significantly poor than that of patients with multiple myeloma belonging to the MCL1-M-Low subtype, which is re-elected as at least one of the following 1) to 3): 1) the predicted prognostic survival rate of patients with multiple myeloma to be tested belonging to the MCL1-M-High subtype is significantly lower than that of patients with multiple myeloma to be tested belonging to the MCL1-M-Low subtype; 2) the predicted prognostic survival of patients with multiple myeloma to be tested belonging to the MCL1-M-High subtype is significantly lower than that of patients with multiple myeloma to be tested belonging to the MCL1-M-Low subtype; 3) the predicted degree of survival risk of patients with multiple myeloma to be tested belonging to the MCL1-M-High subtype is significantly lower than that of patients with multiple myeloma to be tested belonging to the MCL1-M-Low subtype.

16. Detection of an efficacy of bortezomib or a drug containing bortezomib for a patient with multiple myeloma to be tested, comprising the following steps: obtaining or detecting the expression of 97 genes in the multiple myeloma patient to be tested to obtain the expression data of the 97 genes in the multiple myeloma patient to be tested; and then classifying the expression data of 97 genes in the multiple myeloma patient to be tested with a Bayes classifier of multiple myeloma, wherein the predicted prognosis of patients with multiple myeloma belonging to an MCL1-M-High subtype is better than that of patients with multiple myeloma belonging to an MCL1-M-Low subtype.

17. Direction for administration of bortezomib or a drug containing bortezomib for a patient with multiple myeloma to be tested, comprising the following steps: obtaining or detecting the expression of 97 genes in the multiple myeloma patient to be tested to obtain the expression data of the 97 genes in the multiple myeloma patient to be tested; and then classifying the expression data of 97 genes in the multiple myeloma patient to be tested with a Bayes classifier of multiple myeloma, wherein if the patient with multiple myeloma to be tested belongs to an MCL1-M-High subtype, bortezomib or drugs containing bortezomib are used for treatment; if the patient with multiple myeloma to be tested belongs to an MCL1-M-Low subtype, bortezomib or bortezomib-containing drugs are not used for treatment.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0071] FIG. 1 is a plot of ROC curve for the Bayes classification in GSE2658 data set.

[0072] FIG. 2 is a plot of ROC curve for the Bayes classification in MMRF data set.

[0073] FIG. 3 is a plot of ROC curve for the Bayes classification in GSE19784 data set.

[0074] FIG. 4 is an overall survival of patients with MCL1-M high MM or MCL1-M low MM in GSE2658.

[0075] FIG. 5 is an overall survival of patients with MCL1-M high MM or MCL1-M low MM in in GSE2658.

[0076] FIGS. 6A and 6B show the overall survival (FIG. 6A) and progression-free survival (FIG. 6B) of patients with MCL1-M high MM or MCL1-M low MM in GSE19784.

[0077] FIGS. 7A to 7D show distinct responses of patients with MCL1-M high MM or MCL1-M low MM to bortezomib-containing treatment in GSE19784.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0078] All of the experimental methods used in the following Examples are conventional methods unless otherwise indicated.

[0079] All of the materials, reagents, etc. used in the following Examples are commercially available unless otherwise indicated.

Example 1. Screening of Molecular Diagnostic Markers for Multiple Myeloma and Implementation of Molecular Typing

[0080] From the MM gene expression dataset GSE2658 published by NCBI, a gene module co-expressed with MCL1 (MCL1-M), containing 87 genes, was identified using Pearson correlation coefficient analysis. Based on the foregoing, 46 genes upregulated in MM samples with low expression of MCL1-M were also identified. For stable classification outcomes, 36 genes among the above-mentioned 133 genes with low classification capacity were excluded, 97 genes with robust differential expression with relatively high level of expression were selected.

[0081] These 97 genes are as follows: ACBD3, ADAR, ADSS, ALDH2, ANP32E, ANXA2, ATF3, ATP8B2, CACYBP, CAPN2, CCND1, CCT3, CDC42SE1, CERS2, CHSY3, CLIC1, CLMN, COPA, CSNK1G3, DAPS, DENND1B, ENSA, EPRS, EPSTI1, EVL, FAM13A, FAM49A, FLAD1, FRZB, GLRX2, HAX1, HDGF, HLA-A, HLA-B, HLA-C, HLA-F, HLA-G, IL6R, ISG20L2, JTB, KLF2, LAMTOR2, LDHA, MCL1, MOXD1, MRPL24, MRPL9, MVP, MYL6, NDUFS2, NOP58, NOTCH2NL, NTAN1, PAK1, PI4 KB, PIEZO1, PIK3AP1, PIM2, PIP5K1B, PMVK, POGZ, PPIA, PRCC, PRKCA, PRRC2C, PSMB4, PSMD4, RAB29, RCBTB2, SCAMP3, SCAPER, SDHC, SEL1L3, SELPLG, SHC1, SIDT1, SSR2, STAP1, TAP1, TIMM17A, TLR10, TMCO1, TOR1AIP2, TOR3A, TP53INP1, TPM3, TRANK1, TROVE2, UAP1, UBE2Q1, UBQLN4, UHMK1, VPS45, YY1AP1, ZC3H11A, ZFP36, and ZNF593.

[0082] These 97 genes were selected as classifier genes for classification. Based on the expression data of these 97 genes, the 551 MM samples in GSE2658 were clustered into MCL1-M high and MCL1-M low subtype using consensus clustering. However, clustering-based method does not enable classification of individual samples. To enable classification of individual MM samples, the 551 samples were randomly split into a training set (369 samples) and a validation set (182 samples) at a ratio of 2:1. The stratified sampling process was guided by the results of consensus clustering, to ensure the proportion of MCL1-M high and MCL1-M samples in the training and validation sets remains the same as in the original dataset.

[0083] Based on the expression data of these 97 classifier genes in the 369 samples from the training set, and the subtyping results of MCL1-M high or MCL1-M low to these samples in consensus clustering, Bayes classification for assigning individual MM samples into the MCL1-M high or MCL1-M low subtype was trained using nave Bayes classification algorithm with the klaR package of R.

The codes of MM Bayes classifier are as follows:

TABLE-US-00001 options(warn=1) # install machine learning package klaR install.packages(klaR) # load expression data of 97 classifier genes in GSE2658 from file and pre-processing library(klaR) i=0 while(TRUE){ GSE2658.data<read.delim(gse2658.batch_removed.txt, row.names=1,stringsAsFactors = T) GSE2658<apply(GSE2658.data[,1], 1,scale) GSE2658<t(GSE2658) GSE2658<data.frame(GSE2658.data[,1],GSE2658) colnames(GSE2658)[1]<subtype colnames(GSE2658)<colnames(GSE2658.data) rownames(GSE2658)<rownames(GSE2658.data) # split samples into training set and validation set at a ratio of 2:1 while (TRUE){ split_train_test<function(data,ratio){ train_indices<sample(length(data[,1]),as.integer(length(data[,1])*ratio)) return(train_indices) } train_sets<GSE2658[split_train_test(GSE2658,0.67),] test_sets<GSE2658[split_train_test(GSE2658,0.67), # Construction of nave Bayes classification model using the training set GSE2658.NB<NaveBayes(subtype ~.,data=train_sets,fL=1) if(as.vector(GSE2658.NB$apriori)[1]<0.453&as.vector(GSE2658.NB$apriori[1])>0.451){ break } } # Verification of the power of nave Bayes classification model in the validation set results<predict(GSE2658.NB,test_sets[,1],threshold = 0.1,type=raw) predicted_class<as.data.frame(results) predicted_class[,2:3]<apply(predicted_class[,2:3],2,round,3) # Identification of nave Bayes classification model with an accuracy rate > 97% using cross labelling process compare_table<data.frame(predicted_class$class,test_sets$subtype) colnames(compare_table)<c(predicted_class,original_class) table<prop.table(table(compare_table),2) accuracy=c(table[1,1],table[2,2]) if(accuracy[1]>0.95&accuracy[2]>0.95){ break } else) i=i+1 } } print(paste(Both sensitivity and Specifity gets greater than 0.97 at the, i,th,trial,sep= )) # Bayes classification-based subtype Prediction in MMRF data set mmrf.data<read.delim(mmrf.batch_removed.txt, row.names=1,stringsAsFactors = T) mmrf<apply(mmrf.data[,1], 1,scale) mmrf<t(mmrf) rownames(mmrf)<rownames(mmrf.data) colnames(mmrf)<colnames(mmrf.data)[1] mmrf<data.frame(mmrf.data$subtype,mmrf) colnames(mmrf)[1]<subtype results.mmrf<predict(GSE2658.NB,mmrf[,1],threshold = 0.01,type=raw) predicted_class.mmrf<as.data.frame(results.mmrf) predicted_class.mmrf[,2:3]<apply(predicted_class.mmrf[,2:3],2,round,3) compare_table.mmrf<data.frame(predicted_class.mmrf$class,mmrf.data$subtype) colnames(compare_table.mmrf)<c(predicted_class,original_class) prop.table(table(compare_table.mmrf),2) # Bayes classification-based subtype Prediction in GSE19784 data set gse19784.data<read.delim(gse19784.batch_removed.txt, row.names=1,stringsAsFactors = T) gse19784<apply(gse19784.data[,1], 1,scale) gse19784<t(gse19784) colnames(gse19784)<colnames(gse19784.data)[1] gse19784<data.frame(gse19784.data$subtype,gse19784) colnames(gse19784)[1]<subtype results_19784<predict(GSE2658.NB,gse19784[,1],threshold = 0.01,type=raw) predicted_class_19784<as.data.frame(results_19784) predicted_class_19784[,2:3]<apply(predicted_class_19784[,2:3],2,round,3) compare_table_19784<data.frame(predicted_class_19784$class,gse19784$subtype) colnames(compare_table_19784)<c(predicted_class,original_class) prop.table(table(compare_table_19784),2)

[0084] Moreover, 182 samples in the validation set were used to evaluate the accuracy of the classification.

[0085] Bayes classification model was optimised using the accuracy results of each run, until the accuracy was greater than 95%. The accuracy results for Bayes classification in GSE2658 is presented in Table 1, and the ROC curve data in FIG. 1.

TABLE-US-00002 TABLE 1 Accuracy of Nave Bayes prediction model in GSE2658 Molecular subtyping based on unsupervised Consensus Clustering Subtype MCL1-M-High MCL1-M-Low Molecular MCL1-M-High 77 3 subtyping MCL1-M-Low 4 98 predicted by nave Accuracy 95.1% 97.0% Bayes model

[0086] For testing whether nave Bayes model developed using data from GSE2658 could be generally used, the applicant used nave Bayes model in MM data set MMRF published by NCI and the GEO MM data set GSE19784.

[0087] The MMRF data set was different from GSE2658, as the expression data were obtained from mRNA-seq. Bayes classification model for MMRF data set is presented in Table 2, and the ROC curve plot in FIG. 2.

TABLE-US-00003 TABLE 2 Accuracy of nave Bayes prediction model established with GSE2658 data set in MMRF data set Molecular subtyping based on unsupervised consensus clustering MCL1-M- Subtype High MCL1-M-Low Molecular subtyping MCL1-M-High 240 11 predicted by nave MCL1-M-Low 4 323 Bayes model accuracy 94.5% 96.7%

[0088] The results show that even for cross-platform, the classifier can maintain high accuracy, which shows that it has a high value for promotion and application.

[0089] Similar to GSE2658, expression data in data set GSE19784 were also generated using Affymetrix U133 2.0 plus 2.0 platform. GSE19784 was generated by different laboratories and at different time period, the experimental conditions are unlikely the same as for GSE2658, the two data sets may thereby have different dynamics and noise in gene expression profile. The results of nave Bayes prediction in GSE19784 is shown in Table 3 and the ROC curve plot in FIG. 3. Accurate classification results were also generated in data set GSE19784.

TABLE-US-00004 TABLE 3 The accuracy of the classifier built using the GSE2658 data set in the GSE19784 data set Molecular subtyping based on unsupervised consensus clustering MCL1-M- MCL1-M- Subtype High Low Molecular subtyping MCL1-M-High 98 25 predicted by nave Bayes MCL1-M-Low 7 174 model accuracy 93.3% 87.4%

[0090] The results show that the classifier can better overcome the above-mentioned problems and still maintain high accuracy.

Example 2. Application of Nave Bayes Prediction Model in the Prediction of Survival of Patients with MM

[0091] I. Data set GSE2658

[0092] Based on the expression data of the 97 classifier genes in 551 pre-treated MM samples in GSE2658 database, the 551 samples were classified using the Nave Bayes prediction model developed in Example 1, resulting in 249 MCL1-M high MMs and 302 MCL1-M low MMs.

[0093] The follow-up time for 551 patients of MMs was 72 months. The results of survival analysis (Kaplan-Meier analysis and Cox regression analysis) are shown in FIG. 4. Distinct survival was observed between MCL1-M high and MCL1-M low subtypes, the overall survival in patients with MCL1-M high MM was significantly lower compared with the overall survival in patients with MCL1-M low MM (log-rank test, p=0.0201; hazard ratio 1.588, p=0.0212).

[0094] Thus, based on the expression of the 97 classifier genes in MCL1 gene group, the nave Bayes prediction model enabled the prediction of prognosis of patients with MM.

[0095] II. Database MMFR

[0096] Based on the expression data of the 97 classifier genes in 534 pre-treated MM samples (pre-treatment testing), molecular classification was performed using the nave Bayes prediction model developed in Example 1, resulting in 231 MCL1-M high MMs and 303 MCL1-M low MMs in the classification of the 534 samples.

[0097] The follow-up time for 534 patients of MMs was 48 months. The results of survival analysis (Kaplan-Meier analysis and Cox regression analysis) are shown in FIG. 5. Distinct survival was observed between MCL1-M high and MCL1-M low subtypes, the overall survival in patients with MCL1-M high MM was significantly lower compared with the overall survival in patients with MCL1-M low MM (log-rank test, p=0.006663; hazard ratio 1.838, p=0.00706).

[0098] Thus, irrespective of the technical platform for the detection of expression data, the expression of the 97 classifier genes and the nave Bayes prediction model enabled the prediction of prognosis of patients with MM.

[0099] III. Database GSE19784

[0100] Based on the expression data of the 97 classifier genes in 304 pre-treated MM samples in the database GSE19784, molecular classification was performed using the nave Bayes prediction model developed in Example 1, resulting in 107 MCL1-M high MMs and 196 MCL1-M low MMs.

[0101] The follow-up time for 304 patients of MMs was 96 months. The results of survival analysis (Kaplan-Meier analysis and Cox regression analysis) are shown in FIG. 6 (panel A for overall survival; panel B for progression free survival). Distinct survival was observed between MCL1-M high and MCL1-M low subtypes, the overall survival in patients with MCL1-M high MM was significantly lower compared with the overall survival in patients with MCL1-M low MM (log-rank test, p<0.0001; hazard ratio 1.91, p=0.0002). GSE19784 also contains progression-free survival data. Similarly, progression-free survival in patients with MCL1-M high MM was significantly lower compared with progression-free survival in patients with MCL1-M low MM (log-rank test, p=0.0282; Likelihood ratio test, hazard ratio 1.36, p=0.031). These results confirm that the expression of the 97 classifier genes and the nave Bayes prediction model enabled the prediction of prognosis of patients with MM.

Example 3. The Molecular Diagnostic Markers and Classification of Multiple Myeloma are Predicting Whether the Test Patient can be Treated with Bortezomib

[0102] The gene expression data in GSE19784 were generated from MM patients enrolled in a randomised phase III clinical trial (the HOVON-65/GMMG-HD4 trial), the treatment details for all patients were documented. The patients were randomly assigned into the two groups receiving either a drug combination of VAD (vincristine, doxorubicin, and dexamethasone; 155 patients) or PAD (bortezomib, doxorubicin, and dexamethasone; 148 patients). The difference therebetween is that the PAD combination contains bortezomib. All the expression data were derived from samples before treatment.

[0103] Using nave Bayes prediction model, the MM samples were classified as the MCL1-M high and MCL1-M low subtype (as described in Example 1). The survival analysis (Kaplan-Meier analysis and Cox regression analysis) were separately analysed in MCL1-M high samples (51 with PAD treatment; 56 with VAD treatment); or MCL1-M low samples (104 MMs with PAD treatment; 92 MMs with VAD treatment) according to the treatment options.

[0104] The results are shown in FIG. 7, panel A for overall survival in MCL1-M high subtype, panel B for overall survival in MCL1-M low subtype, panel C for progression-free survival in MCL1-M high subtype, panel D for progression-free survival in MCL1-M low subtype. Bortezomib-based PAD treatment only prolonged the survival of patients with MCL1-M high MM, particularly the progression-free survival (FIG. 7, left panel, MCL-M high subtype, right panel, MCL-M low subtype; Upper: overall survival curve, Lower: progression-free survival curve). This shows bortezomib-based PAD treatment can postpone the progression of MCL1-M high MM, but for patient with MCL-M low MM, bortezomib-based PAD treatment does not show any effect. In summary, this invention enables the stratification of patients with MM for treatment decisions, which can avoid the treatment of MCL1-M low MM with bortezomib. This reduces treatment-related economic burden and prevents treatment-induced side effects.

Example 4. Application of Nave Bayes Prediction Model in Stratifying Patients with MM into Different Risk Groups

[0105] Bone marrow samples of 30 newly diagnosed MMs were collected at Beijing Chaoyang hospital. CD138+ cells were purified using anti-CD138 antibody-coated beads and used to generate total RNA. The RNA preparations were hybridised with Affymetrix Prime View array for detection of the expression of the 97 classifier genes.

[0106] Consensus clustering was performed to identify the MCL1-M high or MCL-M low samples in group; nave Bayes prediction model developed in Example 1 was performed to identify the MCL1-M high or MCL1-M low samples individually.

[0107] As shown in Table 4, the classifying results of consensus clustering and nave Bayes prediction model were highly concordant. Only 1 MCL1-M high MM was predicted as MCL-M low MM, suggesting that nave Bayes prediction model can be utilised for prediction of MM subtypes in individual samples.

[0108] Because of the limitations in the size of data set and the short follow-up period, survival analysis was not performed. However, based on the traditional risk parameters (existing medical certification index), the 19 MCL1-M high MMs contained 14 high risk MMs defined by the traditional risk parameters, and the 11 MCL1-M low MMs contained only 3 high risk MMs defined by the traditional risk parameters. This shows that in this example, the established classification can still predict the patient's prognosis.

TABLE-US-00005 TABLE 4 Accuracy of the classifier built using the GSE2658 data set in the collected samples Molecular subtyping based on unsupervised Consensus Clustering MCL1-M- MCL1-M- Subtype High Low Molecular subtyping MCL1-M-High 18 0 predicted by nave Bayes MCL1-M-Low 1 11 model accuracy 94.7% 100%

INDUSTRIAL APPLICABILITY

[0109] Current molecular classification schemes do not correlate to the cellular origin of MM, also fail to predict treatment effect. The present invention has explored gene co-expression networks around key signalling pathways of germinal centre development for understanding MM etiology and molecular classification of MM. The applicant has screened for dys-regulated gene networks involved in the development from B cells to plasma cells, because these networks potentially play important roles in MM pathogenesis. Following a series of analyses, the applicant identified gene co-expression module around MCL1 (MCL1-M), developed a classification scheme to assign MMs into the MCL-M high or MCL-M low subtype. These two subtypes are distinct in their prognosis and patterns of genomic alterations. More importantly, this classification scheme predicts response to bortezomib treatment, and correlates to plasma cell development. The current invention constitutes a new platform for the development of individualised precision therapy against MM, also improves the understanding of MM pathogenesis.

MOLECULAR TYPING OF MULTIPLE MYELOMA AND APPLICATION

Inventors

Cpc classification

Classification Explorer

C12Q1/68

CHEMISTRY; METALLURGY

Classification Explorer

G16B40/00

PHYSICS

Classification Explorer

G01N33/57484

PHYSICS

Classification Explorer

G16B25/00

PHYSICS

Classification Explorer

G16H50/30

PHYSICS

Classification Explorer

G01N33/5011

PHYSICS

Classification Explorer

G01N2800/52

PHYSICS

International classification

Classification Explorer

G01N33/574

PHYSICS

Classification Explorer

G01N33/50

PHYSICS

Abstract

Claims

Description