MEANS AND METHODS FOR DETERMINING METABOLIC ADAPTATION

20210405030 · 2021-12-30

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to a method of determining a metabolic adaptation of a living entity of interest to a first set of environmental conditions and to a second set of environmental conditions comprising (a) determining with a first substrate concentration at least two activities of at least one enzyme comprised in a specimen of said living entity maintained under said first set of environmental conditions and at least two activities of said at least one enzyme comprised in a specimen of said living entity maintained under said second set of environmental conditions, wherein said activities are determined at two non-identical points in time t.sub.1 and t.sub.2 after starting the determining reaction; (b) determining with a second substrate concentration at least two activities of at least one enzyme comprised in a specimen of said living entity maintained under said first set of environmental conditions and at least two activities of said at least one enzyme comprised in a specimen of said living entity maintained under said second set of environmental conditions, wherein said activities are determined at two non-identical points in time t.sub.3 and t.sub.4 after starting the determining reaction; wherein said second substrate concentration is at most twofold, preferably is about equal to or lower than, the K.sub.M of said enzyme for said substrate; and (c) determining the metabolic adaptation of said living entity based on comparing at least one non-linear activity determined in step (a) and/or (b) to at least one further activity determined in step (a) and/or (b). The present invention also relates to devices and further methods related thereto; as well as to a redox-fixed HMGB1 derivative polypeptide.

    Claims

    1. A method of determining a metabolic adaptation of a living entity of interest to a first set of environmental conditions and to a second set of environmental conditions comprising (a) determining with a first substrate concentration at least two activities of at least one enzyme comprised in a specimen of said living entity maintained under said first set of environmental conditions and at least two activities of said at least one enzyme comprised in a specimen of said living entity maintained under said second set of environmental conditions, wherein said activities are determined at two non-identical points in time t.sub.1 and t.sub.2 after starting the determining reaction; (b) determining with a second substrate concentration at least two activities of at least one enzyme comprised in a specimen of said living entity maintained under said first set of environmental conditions and at least two activities of said at least one enzyme comprised in a specimen of said living entity maintained under said second set of environmental conditions, wherein said activities are determined at two non-identical points in time t.sub.3 and t.sub.4 after starting the determining reaction; wherein said second substrate concentration is at most twofold, preferably is about equal to or lower than, the K.sub.M of said enzyme for said substrate; and (c) determining the metabolic adaptation of said living entity based on comparing at least one non-linear activity determined in step (a) and/or (b) to at least one further activity determined in step (a) and/or (b).

    2. The method of claim 1, wherein said first substrate concentration (i) is at least twofold, preferably at least fivefold, more preferably at least tenfold the K.sub.M of said enzyme for said substrate; or (ii) wherein said first substrate concentration is at most twofold, preferably is about equal to or lower than, the K.sub.M of said enzyme for said substrate and is non-identical to the second substrate concentration.

    3. The method of claim 1, wherein at least step (c) is computer-implemented, preferably by training an automated machine learning algorithm with the data of steps (a) and (b) of cells having a known metabolic adaptation.

    4. The method of claim 1, wherein said first environmental condition is normoxia and wherein said second environmental condition is hypoxia and wherein said metabolic adaptation is switch of energy metabolism from oxidative phosphorylation under normoxia to glycolysis under hypoxia.

    5. The method of claim 1, wherein said at least one enzyme is pyruvate kinase, preferably pyruvate kinase M2.

    6. The method of claim 5, wherein said substrate is pyruvate and wherein said first substrate concentration is 10 mM and wherein said second substrate concentration is 0.1 mM.

    7. The method of claim 5, wherein a strong change in the activity of either high-affinity pyruvate kinase (PKHA) or low-affinity pyruvate kinase (PKLA) under hypoxic conditions as compared to the activity under normoxic conditions is indicative of a successful switch from oxidative phosphorylation under normoxia to glycolysis under hypoxia, and/or wherein a moderate or no change in the activity of either PKHA or PKLA under hypoxic conditions as compared to the activity under normoxic conditions, or a parallel change of both PKHA and PKLA, is indicative of an unsuccessful switch from oxidative phosphorylation under normoxia to glycolysis under hypoxia.

    8. The method of claim 1, wherein at least one of said activities determined in steps (a) and (b) is a non-linear activity.

    9. (canceled)

    10. A method for determining an activation status of immune cells in a test sample comprising said immune cells, comprising (a) incubating a first subportion of said test sample comprising immune cells under normoxic conditions, (b) incubating a second subportion of said test sample comprising immune cells under hypoxic conditions, (c) determining the activities of at least the enzymes high-affinity Pyruvate Kinase (PKHA) and low-affinity Pyruvate Kinase (PKLA) in cells of said first and second subportions, (d) comparing said activities determined in step (c), and (e) based on the result of comparison step (d), determining the activation status of the immune cells in said test sample.

    11. (canceled)

    12. The method of claim 10, wherein said immune cells are peripheral blood mononuclear cells (PBMCs), preferably are T-cells or hematopoietic stem cells, more preferably are CD34+ hematopoietic stem cells.

    13. (canceled)

    14. A method of determining a modulation of at least one enzyme activity by an extract of a fixed cell sample and optionally providing a risk classification for a patient suffering from disease, comprising (i) providing at least a first and a second aliquot of said at least one enzyme; (ii) contacting said second aliquot with said extract of a fixed cell sample; (iii) determining the activity of the first aliquot of step (i) and the activity of the second aliquot of step (ii); (iv) comparing the activities of the first aliquot and the second aliquot determined in step (iii), and thereby (v) determining a modulation of at least one enzyme activity by an extract of a fixed cell sample.

    15. The method of claim 14, wherein said fixed cell sample is a sample of a subject.

    16. The method of claim 15, wherein said determining step (iii) further comprises (I) determining with a first substrate concentration at least two activities of said first aliquot and at least two activities of said second aliquot, wherein said activities are determined at two non-identical points in time t.sub.1 and t.sub.2 after starting the determining reaction; and (II) determining with a second substrate concentration at least two activities of first aliquot and at least two activities of said second aliquot, wherein said activities are determined at two non-identical points in time t.sub.3 and t.sub.4 after starting the determining reaction; wherein said second substrate concentration is at most twofold, preferably is about equal to or lower than, the K.sub.M of said enzyme for said substrate.

    17. The method of claim 14, wherein said fixed cell sample is an aldehyde-fixed cell sample, preferably a formaldehyde- and/or glutaraldehyde-fixed sample, preferably a formaldehyde-fixed sample.

    18. (canceled)

    19. (canceled)

    20. (canceled)

    Description

    FIGURE LEGENDS

    [0147] FIG. 1: Proliferation of PBMCs under the indicated conditions; P-M2 tide: induction of glycolysis by incubation with P-M2 tide; A) control (no activator compound); B) hGM-CSF added; C) to G) HMGB1 or variant thereof added.

    [0148] FIG. 2: Chemotaxis of PBMCs under hypoxic conditions in the absence and in the presence of P-M2 tide and in the presence of HMGB1 or a variant thereof; Values are % improvement in the presence of a human 4E-HMGB1 with a stable disulfide bond (S-S 4E-hHMGB1) relative to the indicated compound.

    [0149] FIG. 3: Induction of chemotaxis by HMGB1 cytokines 4E-hHMGB1, 4Q-hHMGB1 and alkylated 4E-hHMGB1 in CD34+ enriched human hematopoetic cord blood cells.

    [0150] FIG. 4: Schematic representation of patient sample processing in Example 7; Solid tumor (CRC) samples were processed analogously, but fragmented CRC-tissue was incubated instead of a cell suspension in 3 ml RPMI medium. NX=normoxic (aerobic) conditions; AX=anoxic (anaerobic) conditions.

    [0151] FIG. 5: Pipetting scheme and settings for enzyme kinetic analysis using the EnFin-Test™ kits

    [0152] FIG. 6: Data evaluation; a matrix that displays time series data (A) was melted (melt function from R) to a single vector where all time points, positive and negative controls (noise) were and both conditions (aerobic/anaerobic) are represented in 168 features. The vectors were used to train machine learning algorithms (B): SVM (top left), bright grey dots represent support vectors at the decision margin, middle gray and dark grey dots the two classes (“0” or “1”), Decision Trees (C5.0 and Random Forest, bottom), showing n-trees used for majority-voting and (feed forward) Neural Networks (top right), here with input and output layer and two hidden layers, dots represent neurons; enzymes added as purified enzymes are referred to as “Avatar enzymes”.

    [0153] FIG. 7: Activities of the indicated enzymes contacted with the indicated fixed cell extracts in Example 7: A) PKLA/extracts from patients 1 or 2, B) PKLA/extracts from patients 3 or 4, C) PKLA/extracts from patients 5 or 6; D) LDH/extracts from patients 3 or 4, and D) PKHA/extracts from patients 1 or 2.

    [0154] The following Examples shall merely illustrate the invention. They shall not be construed, whatsoever, to limit the scope of the invention.

    EXAMPLE 1

    [0155] The cellular model used herein discriminates between an immuno activating environment and an immune suppressive environment. An immuno activating environment comprises supplementation of growth factors/serum and allows growth and survival of blood derived immune cells. An immuno suppressive environment comprises deprivation of growth factors/serum (starvation) and diminishes growth and survival of blood derived immune cells. Growth and survival of immune cells in the patient's blood/tissues are decisive for an appropriately functioning/activatable immune system and are regarded as indexes of immune system function (Pearce et al. (2013), Immunity 38 (4):633; Odegaard et al. (2013), Immunity 38:644). In both conditions a switch to anaerobic glycolysis was induced pharmacologically and proliferation and survival of immune cells was compared to unmodified immune cells. As a proof-of-principle we could show that unmodified immune cells (with no detectable switch to anaerobic glycolysis by means of the herein described blood test) displayed decreased proliferation and survival only in an immune-suppressive environment. Thus by applying the claimed test method to blood derived immune cells one can detect inactivatable/anerg immune cells. Importantly one can discriminate anerg immune cells that turned anerg upon immune-suppressive conditions (e.g. from an immune-suppressed patient) from immune cells that are activatable though exposed to the same immune-suppressive conditions (e.g. from an immune-suppressed patient). This allows specific identification of immune cells that are anergic and, if desirable (e.g. in immuno-paralyzed patients), should be stimulated with appropriate agents.

    Experimental Setup (FIG. 1A)

    [0156] (a) Control 10% FCS: immuno activating environment, immune cells show switch within the central carbon metabolism (CCM) to anaerobic glycolysis, thus patient's immune system is active. Blood test result: 2.67 (significantly changed, reference interval 2.0±0.3).
    (b) Control 10% FCS+P-M2 tide: immuno activating environment, cells show switch within CCM to anaerobic glycolysis, thus patient's immune system is active. Blood test result: 0.30 (significantly changed, reference interval 2.0±0.3).
    (c) Control starvation: immuno suppressive environment, immune cells show no switch within CCM to anaerobic glycolysis, thus patient's immune system is inactive/anerg. Blood test result: 2.14 (not changed, reference interval 2.0±0.3).
    (d) Control starvation+P-M2 tide: immune suppressive environment, immune cells show switch within CCM anaerobic glycolysis, thus patient's immune system is active. Blood test result: 1.37 (significantly changed, reference interval 2.0±0.3).

    [0157] The switch from OXPHOS to glycolysis occurs upon activation of immune cells (Palsson-McDermott et al. (2015), Cell Metabolism 21:65; Pearce et al. (2013), Immunity 38 (4):633). To measure the maximum possible switch one has to eliminate OXPHOS completely, this is achieved herein by cultivating the immune cells under oxygen deficiency (anoxia), where OXPHOS is inactive due to lack of oxygen as substrate.

    EXAMPLE 2

    [0158] To evaluate to what extent and by which agent anerg immune cells could be stimulated we used recombinant human GM-CSF (20 ng/ml, FIG. 1 B) or recombinant/synthetic variants of the immune system stimulating human HMGB1 cytokine (200 nM, FIG. 1 C-G).

    [0159] Mononuclear human blood donor cells with an increased test score (i.e. showing a switch within CCM from OXPHOS to glycolysis measured by the blood test of this invention) did not die under immuno-suppressive conditions (starvation) in control cells. Treatment with human granulocyte monocyte colony stimulating factor (GM-CSF) partly rescued immuno-suppressed cells with no switch to anaerobic glycolysis (unchanged test score) whereas immune-suppressed cells displaying a switch to anaerobic glycolysis were fully rescued from cell death and showed increased proliferation (and survival) (compared to unstimulated control samples). Wildtype human recombinant HMGB1 (high mobility group box 1 protein), a potent immuno-stimulating cytokine and DAMP failed to increase proliferation and survival of immune cells. Recombinant human 4E-HMGB1 (a variant of HMGB1 with four tyrosine residues exchanged for glutamate residues as specified herein above; WO 2018/108327) increased proliferation of non suppressed immune cells showing a switch to anaerobic glycolysis and also increased survival and proliferation of immuno-suppressed immune cells (starved) showing a switch to anaerobic glycolysis. However, compared with hGM-CSF stimulated cells, it had moderate effects on proliferation and survival of suppressed immune cells showing this switch. On the contrary recombinant human 4Q-HMGB1 (a variant of HMGB1 with four tyrosine residues exchanged for glutamine residues as specified herein above; WO2017/098051) showed a weak stimulation of immune-suppressed cells with no switch to anaerobic glycolysis compared to 4E-hHMGB1 and no significant increase in stimulation of immune-suppressed cells that had a switch to anaerobic glycolysis (compared to control cells).

    [0160] To potently increase stimulation of the immuno-suppressed cells (to at least a level like in activatable immune cells that were exposed to immune-suppressive conditions) we changed the chemical structure of the most promising immune-stimulating recombinant HMGB1 variant, the 4E-hHMGB1, in two ways: (i) by changing the SH (sulfhydryl)-residues (residues Cys-23, Cys-45 and Cys-106 now being irreversibly reduced via alkylation in 4E-hHMGB1) to stable alkylated residues (non-oxidizable (reduced-alkylated) form; alkyl 4E-hHMGB1) and (ii) by introducing permanent disulfide bonds into the protein structure (A-Box domain, residues Cys-23 and Cys-45 now forming a stable disulfide bond in 4E-hHMGB1; S-S 4E-hHMGB1). The alkyl 4E-hHMGB1 stimulation resulted in increased survival and proliferation of immuno-suppressed immune cells (compared to control, hGM-CSF and wt-hHMGB1), however, the effect was minor than with (non alkylated) 4E-hHMGB1. The stimulation with the synthetic recombinant S-S 4E-hHMGB1 variant showed best survival and proliferation effects on both activated (with a switch to anaerobic glycolysis) and not activated (showing no switch to anaerobic glycolysis) immuno-suppressed immune cells. It activated proliferation in non suppressed immune cells to a lesser extent than the 4E-hHMGB1 counterpart, however, the challenge of this invention was (a) to identify immune cells that were inactive/anerg (no switch to anaerobic glycolysis) when exposed to immune-suppressive conditions and (b) to show they could be potently activated by new compounds presented in this invention thereby cancelling the immune-suppressive effect.

    EXAMPLE 3

    [0161] In further experiments using the synthetic and/or recombinant HMGB1 variants we examined also chemotaxis on differentiated immune cells (mononuclear blood donor cells, FIG. 2). S-S 4E-hHMGB1 increased chemotaxis in serum starved (immuno suppressed) immune cells that were inactive/anerg, i.e. they displayed no switch to anaerobic glycolysis (as defined by the test score of this invention) compared to hGM-CSF, wt hHMGB1 cytokine and alkylated 4E-hHMGB1. Moreover, synthetic S-S 4E-hHMGB1 had superior chemotactic effects (compared to the wildtype hHMGB1 cytokine) on blood donor mononuclear cells that were exposed to immuno-suppressive conditions but active (that showed a switch towards anaerobic glycolysis). Although S-S 4E-hHMGB1 failed to induce more chemotaxis than possible with the three recombinant/synthetic HMGB1 cytokines 4E-hHMGB1 4Q-hHMGB1 and alkylated 4E-hHMGB1 it was superior to all indicated compounds in inducing chemotaxis in CD34+ enriched human hematopoetic cord blood cells (HSC, FIG. 3). Chemotactic stimulation of (hematopoetic) stem cells is an important feature of immune system stimulation yin response to pathogen-induced inflammation, autoimmune disease, sepsis, prevention of development, progression or recurrence of cancer, immunodeficiency disorders, prevention of exhaustion of immune cells, tissue repair (proliferation and/or migration of immune stem cells within/to the wounded/ischemic tissue (e.g. blood, bone marrow, heart, coronary arteries, vessels, bones, brain, nerves, myelin sheaths) (Palsson-McDermott et al. (2015), Cell Metabolism 21:65).

    [0162] In summary we showed that anerg blood donor immune cells cultivated under immune-suppressive conditions (Model Summary (c)) could be potently stimulated by treatment with S-S 4E-hHMGB1, best-in-class compared to other indicated compounds, also regarding increased chemotaxis of human mononuclear immune cells and human CD34+ enriched cord blood hematopoetic stem cells. We also provide proof-of-principle ex vivo data on human immune cells showing that (i) modulation of PKM2 activity by pharmacological inhibition of PK la by specifically blocking the PK tetramer (using a well-characterized phosphotyrosine peptide called P-M2 tide; GGAVDDDpYAQFANGG)) enables immune cells to shift from OXPHOS to anaerobic glycolysis and (ii) that the occurrence of this shift enables immune cells to proliferate/survive/migrate under immuno-suppressive conditions.

    [0163] One important feature of this blood test is that it shows the individual's unique immune cell activation status. That is achieved by assessing the change of enzyme activities within the individual's samples under two conditions (normoxic and anoxic), thus not using absolute values (that would require reference values of other individuals to perform inter-individual comparison of immune status) but relative values (ratios).

    EXAMPLE 4: EXPERIMENTAL DETAILS

    EXPERIMENT 1

    [0164] Healthy blood donor mononuclear lymphocytes were cultured under normoxia (21%) and anoxia (0%). Enzyme activities of PK low affinity (PK la), PK high affinity (PK ha) and Lactate dehydrogenase (LDH) were measured in the homogenates. Under anoxic cell culture conditions macromolecule (amino acids, fatty acids, RNA/DNA) and energy (ATP) synthesis has to be fueled by glucose intermediates from central carbon metabolism (CCM). A significant change in enzyme activities from normoxia to anoxia was indicative of an increased shift to anaerobic glycolysis shown by increased proliferation (cell culture supplemented with growth factors) and survival (cell culture not supplemented with growth factors, i.e. serum starvation conditions) of immune cells. One important finding was that when immune cells that displayed no shift to anaerobic glycolysis were suppressed by starvation (no supplement of growth factors), these consequently died and/or did not proliferate (compared to the unsuppressed control and the suppressed control showing a shift to anaerobic glycolysis). Noncleavable cross-linking of sulfhydryl groups was done with 3×800 μl 4E-hHMGB1 (410 μg/ml in PBS, pH 7.4) using the midi-dialyse system, adding 8 μl 0.5 M DTT (1 h, 22° C.) followed by dialysis with 500 ml ice-cold PBS+5 mM EDTA (pH 7, 1 h, 4° C.) and dialysis with 500 ml ice-cold PBS+5 mM EDTA (pH 7, overnight, 4° C.). Then 8 μl of BMOE (bis(maleimido)ethane, Thermo Fisher) was added (1 h, 22° C.) followed by 8 μl 0.5 DTT. The solution was dialyzed for 1 h with 500 ml ice-cold PBS+5 mM EDTA (pH 7, 4° C.) and overnight with 500 ml ice-cold PBS+5 mM EDTA (pH 7, 4° C.). Alkylation of sulfhydryl groups was done with 3×800 μl 4E-hHMGB1 (410 μg/ml in PBS, pH 7.4) using the midi-dialyse system, adding 8 μl 0.5 M DTT (0.1 h, 22° C.) followed by dialysis with 500 ml ice-cold PBS+5 mM EDTA (pH 8, 1 h, 4° C.) and dialysis with 500 ml ice-cold PBS+5 mM EDTA (pH 8, overnight, 4° C.). Then 8 μl of 0.5 M iodaceteamide (30 mM, 22° C., without light) was added followed by 8 μl 0.5 M DTT. The solution was dialyzed for 1 h with 500 ml ice-cold PBS+5 mM EDTA (pH 7.4, 4° C.) and overnight with 500 ml ice-cold PBS+5 mM EDTA (pH 7.4, 4° C.). P-M2 tide oligopeptide (Gly-Gly-Ala-Val-Asp-Asp-Asp-pTyr-Ala-Gln-Phe-Ala-Asn-Gly-Gly) was purchased from Enzo Life Sciences. (n=8).

    EXPERIMENT 2

    [0165] Chemotactic capacity of different human HMGB1 forms (expressed in HEK cells as described before and, in case of synthetic alkyl 4E-hHMGB1 and synthetic S-S 4E-hHMGB1 modified chemically on their Cysteine-residues) or human recombinant GM-CSF towards healthy blood donor mononuclear lymphocytes (isolated by the recommended Ficoll gradient standard procedure using Ficoll-Paque™) with or without P-M2 tide (from Enzo Life Sciences) oligopeptide (Gly-Gly-Ala-Val-Asp-Asp-Asp-pTyr-Ala-Gln-Phe-Ala-Asn-Gly-Gly, SEQ ID NO:1). The distinct HMGB1 variants display different binding properties to the allosteric center of PK la enzyme. The assay was performed according to the instructions of the manufacturer using Corning Plate HTS Transwell (CLS3374-2 EA) systems with 5 μm pores (10,000 cells per well, 3 h, n=2).

    EXPERIMENT 3

    [0166] Chemotaxis of human HMGB1 variants towards fresh human CD34+ enriched cord blood hematopoetic stem cells (HSC). HSCs were isolated according to (Wein et al. (2010), Stem Cell Res 4:129) and immediately used for the chemotaxis assay. Briefly, mononuclear cells (MNC) were isolated by density gradient centrifugation with the Ficoll-hypaque technique (Biochrom, Berlin, Germany). CD34+ cells were purified by positive selection with a monoclonal anti-CD34 antibody using magnetic microbeads on an affinity column with the AutoMACS system (all Miltenyi Biotec, Bergisch-Gladbach, Germany). Reanalysis of the isolated cells by flow cytometry revealed a purity of >95% CD34+ cells. The assay was performed according to the instructions of the manufacturer using Corning Plate HTS Transwell (CLS3374-2 EA) systems with 5 μm pores (10,000 cells per well, 311 and 18 h, n=3).

    EXAMPLE 5

    [0167] Code example for correct classification of leukemia patients (n=22; classification problem=responders vs non-responders to chemotherapy plus rituximab treatment within two years after beginning of the treatment) and colorectal cancer patients (n=101; classification problem=DFS after curative surgery (resection of the primary tumor plus metastasis if present) within two years after surgery); 100% accuracy, p<0.05:

    TABLE-US-00002 ########## # Caret # http://dataaspirant.com/2017/01/19/support-vector-machine-classifier-implementation-r-caret-package/ ########## library(gtools) library(openxlsx) Iibrary(reshape2) library(caret) library(e1071) library(plyr) library(DMwR) library(randomForest) baseDirectory <− “resources” myWD <− “D:\\Daten\\Sicherheitskopie_011117\\machine learning\\machineLearning” setwd(myWD) source(file.path(myWD, “EnfinMachineLearningHelperLibrary.R”)) ################################################################################ ################################################################################ ################################################################################ # CLL ################################################################################ ################################################################################ ################################################################################ #LBP_CLL_BaseDirectory <− file.path(baseDirectory, “LBP_CLL”) LBP_CLL_BaseDirectory <− file.path(baseDirectory, “LBP_CLL_EDITED”) cll_filepaths <− list.files(LBP_CLL_BaseDirectory, recursive = TRUE, full.names = TRUE, include.dirs = TRUE, pattern = “*.xlsx”) cll_filepaths <− file.path(myWD, cll_filepaths) myRawData <− list( ) for (i in 1:length(cll_filepaths)) { sheetNames <− getSheetNames(cll_filepaths[i]) for (j in 1:length(sheetNames)) { sheetNamesFixed <− gsub(“ ”, “”, sheetNames[j]) myRawData[[sheetNamesFixed]] <− read.xlsx(cll_filepaths[i], sheetNames[j], rowNames = TRUE) } } dataPerPatientCLL <− MeltDataPerPatient(myRawData) ################################################################################ # Importing and adding classes ################################################################################ dataPerPatientCLLforClassification <− as.data.frame(t(dataPerPatientCLL)) # Preparing the aditional data from classification and regression classesForCLL <− read.xlsx(file.path(baseDirectory, “templateDataForCLL_gg.xlsx”)) classesForCLL[classesForCLL == “na”] <− NA classesForCLL$start <− convertToDate(classesForCLL$start) classesForCLL$end <− convertToDate(dassesForCLL$end) classesForCLL$TFSdays <− classesForCLL$end - classesForCLL$start dataPerPatientCLLforClassification <− NA dataPerPatientCLLforClassification <− rbind(dataPerPatientCLL, classesForCLL$class) dataPerPatientCLLforClassification <− dataPerPatientCLLforClassification[, lis.na(dataPerPatientCLLforClassification[nrow(dataPerPatientCLLforClassification),])] dataPerPatientCLLforClassification <− t(dataPerPatientCLLforClassification) dass(dataPerPatientCLLforClassification) <− “numeric” colnames(dataPerPatientCLLforClassification) <−c(paste0(“f”, 1:168), “class”) dataPerPatientCLLforClassification <− as.data.frame(dataPerPatientCLLforClassification) dataPerPatientCLLforClassification$class <− factor(dataPerPatientCLLforClassification$class) data <− dataPerPatientCLLforClassification newData <− SMOTE(class ~ ., data) mySeed <− 3233 #mySeed <− 1 set.seed(mySeed) proportion <− 0.7 intrain <− createDataPartition(y = newData$class, p = proportion, list = FALSE) training <− newData[intrain,] testing <− newData[-intrain,] training[[“class”]] = factor(training[[“class”]]) trctrl_repeatedcv <− trainControl( method = “repeatedcv”, number = 10. repeats = 3) trctrl <− trctrl_repeatedcv grid <− expand.grid(C = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2,5)) svm_Linear_Grid <− train(class ~., data = training, method = “svmLinear”, trControl = trctrl, preProcess = c(“center”, “scale”), tuneGrid = grid, tuneLength = 10) svm_svmRadial <− train(class data ~., = training, method = “svmRadial”, # Radial kernel # tuneLength = 9, # 9 values of the cost function  preProc = c(“center”,“scale”), # Center and scale data  trControl=trctrl) rf_random <− train(class ~,. data = training, method = “rf”, # Random forest preProc = c(“center”,”scale”), # Center and scale data trControl = trctrl) test_pred_svm_radial <− predict(svm.svmRadial, newdata = testing) test_pred_ svm_linear <− predict(svm_Linear_Grid, newdata = testing) test_pred_rf <− predict(rf_random, newdata = testing) confusionMatrix(test_pred_svm_radial, testing$class, positive = “1”) confusionMatrix(test_pred_svm_linear, testing$class, positive = “1”) confusionMatrix(test_pred_rf, testing$class, positive = “1”) ################################################################################ ################################################################################ ################################################################################ # CRC ################################################################################ ################################################################################ ################################################################################ impact_CRC_baseDirectory <− file.path(baseDirectory, “IMPACT_EnFin”) crc_filepaths <− list.files(impact_CRC_baseDirectory, recursive = TRUE, full.names = TRUE, include.dirs = TRUE, pattern = “*.xlsx”) crc_filepaths <− file.path(myWD, crc_filepaths) crc_filepaths <− crc _filepaths[!grepl(“CRC BA IMP”, crc_filepaths)] dataPerPatientCRC <− NULL dataPerPatientCRCColumnNames <− NULL listOfPlattesIdentifiers <− paste0(“Platte ”, 1:(length(crc_filepaths)), “,”) for (platteIdentifier in listOfPlattesIdentifiers) { kitFilepath <− grepl(platteIdentifier, crc_filepaths) rawData <− read.xlsx(crc_filepaths[kitFilepath], rowNames = TRUE) listOfPatients <− SortData(rawData) for(patient in names(listOfPatients)) { patientIdentifier <− paste(platteIdentifier, patient) meltedPatient <− melt(listOfPatients[patient]) dataPerPatientCRC <− cbind(dataPerPatientCRC, meltedPatient[, 2]) dataPerPatientCRCColumnNames <− c(dataPerPatientCRCColumnNames, patientIdentifier) } } colnames(dataPerPatientCRC) <− dataPerPatientCRCColumnNames colnames(dataPerPatientCRC) <− gsub(“,”, “_”, gsub(“ ”, “”, colnames(dataPerPatientCRC), fixed = TRUE), fixed = TRUE) colnames(dataPerPatientCRC) <− gsub(“Platte”, “Kit”, colnames(dataPerPatientCRC), fixed = TRUE) daysPerYear <− 365 threshold <− 2 * daysPerYear #dataPerPatientCRCforRegressionTMP <− dataPerPatientCRC classesForCRC <− read.xlsx(file.path(baseDirectory, “templateDataForCRC_gg.xlsx”)) dataPerPatientCRCforClassification <− as.data.frame(t(dataPerPatientCRC)) # classification is representing recurrenceWithinTwoYears #dataPerPatientCRCforClassification <− cbind(dataPerPatientCRCforClassification, class = as.integer(classesForCRC$DFS < threshold)) dataPerPatientCRCforClassification <− cbind(dataPerPatientCRCforClassification, class = as.integer(classesForCRC$DFS < threshold)) dataPerPatientCRCforClassification$class <− factor(dataPerPatientCRCforClassification$class) data <− dataPerPatientCRCforClassification newData <− SMOTE(class ~ ., data) mySeed <− 3233 #mySeed <− 1 set.seed(mySeed) proportion <− 0.7 intrain <− createDataPartition(y = newData$class, p = proportion, list = FALSE) training <− newData[intrain,] testing <− newData[-intrain,] training[[“class”]] = factor(training[[“class”]]) trctrl_repeatedcv <− trainControl( method = “repeatedcv”, number = 10, repeats = 3) trctrl <− trctrl_repeatedcv grid <− expand.grid(C = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2,5)) svm_Linear <− train(class ~., data = training, method = “svmLinear”, trControl = trctrl, preProcess = c(“center”, “scale”), tuneGrid = grid, tuneLength = 10) svm_svm_radial <− train(class ~., data = training, method = “svmRadial”, # Radial kernel # tuneLength = 9, # 9 values of the cost function preProc = c(“center”,“scale”), # Center and scale data trControl=trctrl) rf_random <− train(class ~., data = training, method = “rf”, # Random forest preProc = c(“center”,“scale”), # Center and scale data trControl = trctrl) test_pred_svm_radial <− predict(svm_svm_radial, newdata = testing) test_pred_svm_linear <− predict(svm-Linear, newdata = testing) test_pred_rf <− predict(rf_random, newdata = testing) confusionMatrix(test_pred_svm_radial, testing$class, positive = “1”) confusionMatrix(test_pred_svm_linear, testing$class, positive = “1”) confusionMatrix(test_pred_rf, testing$class, positive = “1”) #################################################### # Helper functions #################################################### slice <− function(input, by=2) { starts <− seq(1,length(input),by) tt <− lapply(starts, function(y) input[y:(y+(by-1))]) llply(tt, function(x) x[!is.na(x)]) } SortData <− function(rawData) { #baseDirectory <− “/Users/tanovsky/wip/Enfin/KitAnalyzer/” #inputDataFilepath <− file.path(baseDirectory,“input”) #outputDataFilepath <− file.path(baseDirectory, “output”) #setwd(baseDiredory) #rawDataFilepath <− file.path(inputDataFilepath, “Beipiel-Rohdaten.xlsx”) #test_that(“Input file is accessible”, { # expect_equal(file.exists(rawDataFilepath), TRUE) #}) #rawData <− read.xlsx(rawDataFilepath, colNames = TRUE, rowNames = TRUE) # For test # rawData <−myData( ) wellColumns <− “”“” kMaxNumberOfPatients <− 4 kColumnsPerPatient <− 3 timeIntervals <− seq(from = 0, to = 30, by = 5) timeIntervalLabels <− colnames(rawData) # generate a vector with all the wells that should be splitted by correspondent patient orderedWells <− c( ) for (patient in 1:kMaxNumberOfPatients) { endColumn <− patient * kColumnsPerPatient beginColumn <− endColumn - kColumnsPerPatient + 1 for(patientColumn in c(beginColumn: endColumn)) { for (wellRow in LETTERS[1:8]){ orderedWells <− c(orderedWells, paste0(wellRow, patientcolumn)) } } } #################################################### # Generate tables for Patients and fill with time values #################################################### patientTables <− list( ) patientIdentifier <− paste0(“Patient”, 1:kMaxNumberOfPatients) #splittedWellsPerPatient <− split(orderedWells, kMaxNumberOfPatients * ) splittedWellsPerPatient <− slice(orderedWells, 24) for (i in 1:kMaxNumberOfPatients) { patientTables[[patientIdentifier[i]]] = data.frame(matrix(ncol = length(timeIntervalLabels), nrow = length(orderedWells)/kMaxNumberOfPatients)) rownames(patientTables[[patientIdentifier[i]]]) <− splittedWellsPerPatient[[i]] colnames(patientTables[[patientIdentifier[i]]]) <− timeIntervalLabels # patientTables[[patientIdentifier[i]]] <− cbind(patientTables[[patientIdentifier[i]]], Well = rownames(patientTables[[patientIdentifier[i]]])) for (wellIdentifer in rownames(patientTables[[patientIdentifier[i]]])) { for (timeIdentifier in timeIntervalLabels) { patientTables[[patientIdentifier[i]]][wellIdentifer, timeIdentifier] <− rawData[wellIdentifer, timeIdentifier] } } } # Clean the empty patients..(the whole matrix has ‘NA's) for (i in names(patientTables)) { if (all(is.na(patientTables[[i]]))) { patientTables[i] <− NULL } } return(patientTables) }

    EXAMPLE 6

    [0168] Examples of training set results for the leukemia classification problem of Example 5 (classification of responders vs non-responders to chemotherapy plus rituximab treatment within two years after beginning of the treatment) using random forest, decision trees (C5.0) and SVM (support vector machine) with radial and linear kernels. Receiver operating curves values reach 100% or near 100%:

    >fit.rf

    Random Forest

    [0169] 49 samples
    168 predictors
    2 classes: ‘no’, ‘yes’
    Resampling: Cross-Validated (10 fold, repeated 10 times)
    Summary of sample sizes: 44, 44, 44, 44, 44, 44, . . .
    Resampling results across tuning parameters:
    mtry ROC Sens Spec
    2 1 0.990 1
    85 1 1.000 1
    168 1 0.995 1
    ROC was used to select the optimal model using the largest value.
    The final value used for the model was mtry=2.
    >fit.c50

    C5.0

    [0170] 49 samples
    168 predictors
    2 classes: ‘no’, ‘yes’
    Resampling: Cross-Validated (10 fold, repeated 10 times)
    Summary of sample sizes: 44, 44, 44, 44, 44, 45, . . .
    Resampling results across tuning parameters:

    TABLE-US-00003 model winnow trials ROC Sens Spec rules FALSE 1 0.9658333 1.0000000 0.9316667 rules FALSE 10 0.9658333 1.0000000 0.9316667 rules FALSE 20 0.9658333 1.0000000 0.9316667 rules TRUE 1 0.8808333 0.9116667 0.8483333 rules TRUE 10 0.9150000 0.9466667 0.8650000 rules TRUE 20 0.9141667 0.9466667 0.8650000 tree FALSE 1 0.9937500 1.0000000 0.9316667 tree FALSE 10 0.9937500 1.0000000 0.9316667 tree FALSE 20 0.9937500 1.0000000 0.9316667 tree TRUE 1 0.9172222 0.9116667 0.8550000 tree TRUE 10 0.9330556 0.9466667 0.8616667 tree TRUE 20 0.9322222 0.9466667 0.8650000
    ROC was used to select the optimal model using the largest value.
    The final values used for the model were trials=1, model=tree and winnow=FALSE.
    >fit.svmlinear
    Support Vector Machines with Linear Kernel
    49 samples
    168 predictors
    2 classes: ‘no’, ‘yes’
    Resampling: Cross-Validated (10 fold, repeated 10 times)
    Summary of sample sizes: 44, 44, 44, 44, 45, 43, . . .
    Resampling results:

    ROC Sens Spec

    [0171] 1 0.96 1
    Tuning parameter ‘C’ was held constant at a value of 1
    >fit.svmradial
    Support Vector Machines with Radial Basis Function Kernel
    49 samples
    168 predictors
    2 classes: ‘no’, ‘yes’
    Pre-processing: centered (168), scaled (168)
    Resampling: Cross-Validated (10 fold, repeated 10 times)
    Summary of sample sizes: 44, 43, 45, 44, 44, 44,
    Resampling results across tuning parameters:

    TABLE-US-00004 C ROC Sens Spec 0.25 0.9977778 0.9883333 0.9533333 0.50 1.0000000 0.9816667 0.9933333 1.00 1.0000000 0.9866667 0.9933333
    Tuning parameter ‘sigma’ was held constant at a value of 0.00438827
    ROC was used to select the optimal model using the largest value.
    The final values used for the model were sigma=0.00438827 and C=0.5.

    [0172] Example of time series data set used for analysis (4 patient samples, Table 2). Raw data show decrease of NADH at 340 nm, 37° C. Monitoring time in this example is every 5 min for 30 min.

    TABLE-US-00005 TABLE 2 Exemplary raw measurement data Well 0 Min 5 Min 10 Min 15 Min 20 Min 25 Min 30 Min A1 1.26857 1.14825 1.00028 0.850122 0.700256 0.554552 0.417307 A2 1.33039 1.25651 1.19512 1.14865 1.11217 1.08596 1.06602 A3 1.27575 1.12478 0.958741 0.785704 0.61579 0.452555 0.310368 A4 1.27277 1.22358 1.18633 1.15094 1.11961 1.08782 1.05755 A5 1.31892 1.27451 1.25127 1.23522 1.21413 1.19761 1.18474 A6 1.33023 1.26994 1.21552 1.16176 1.10694 1.05226 0.999643 A7 1.26071 1.21396 1.18168 1.15074 1.12631 1.10273 1.07921 A8 1.31283 1.27415 1.253 1.23497 1.22182 1.21158 1.20167 A9 1.32218 1.2434 1.17291 1.1041 0.919813 0.966311 0.894534 A10 1.2504 1.18829 1.12402 1.07154 1.00902 0.956888 0.902667 A11 1.31601 1.27018 1.23352 1.21266 1.41455 1.17203 1.15766 A12 1.33882 1.24255 1.34939 1.0645 0.977156 0.886928 0.805678 B1 1.27768 1.15963 1.02283 0.880338 0.738349 0.599386 0.463488 B2 1.33228 1.26557 1.20652 1.16085 1.12725 1.09873 1.07739 B3 1.33684 1.19362 1.04199 0.877884 0.713342 0.553416 0.406442 B4 1.27572 1.23432 1.20056 1.16667 1.13666 1.09487 1.07808 B5 1.32848 1.29218 1.26804 1.24347 1.23083 1.20873 1.20041 B6 1.34887 1.27915 1.23564 1.18053 1.12951 1.08048 1.02312 B7 1.26282 1.21983 1.19549 1.1692 1.1433 1.12261 1.09833 B8 1.31983 1.28809 1.26538 1.24907 1.23416 1.22408 1.21408 B9 1.33135 1.26073 1.19649 1.12895 1.06186 0.997245 0.931714 B10 1.27897 1.21421 1.15557 1.09822 1.04149 0.986908 0.934232 B11 1.32625 1.27903 1.24407 1.21842 1.196 1.17735 1.1619 B12 1.35323 1.25852 1.17454 1.09038 1.00244 0.920261 0.838866 C1 1.25726 1.15913 1.04161 0.924069 0.80692 0.684793 0.570215 C2 1.30206 1.25648 1.27994 1.16624 1.1336 1.10319 1.08187 C3 1.34976 1.2059 1.07721 0.940067 0.799821 0.659655 0.525451 C4 1.26492 1.21156 1.16686 1.11685 1.07117 1.02658 0.983142 C5 1.31633 1.2804 1.25308 1.22862 1.20936 1.19034 1.17503 C6 1.3395 1.27563 1.22613 1.1786 1.12375 1.07 1.01772 C7 1.25793 1.20839 1.16656 1.12215 1.0805 1.03971 0.999224 C8 1.3103 1.27283 1.25384 1.23156 1.21864 1.20582 1.19533 C9 1.32655 1.25886 1.20374 1.14134 1.07723 1.01518 0.954185 C10 1.26671 1.20615 1.15426 1.09922 1.05138 1.00763 0.965319 C11 1.29699 1.25674 1.234 1.21491 1.20335 1.19244 1.18279 C12 1.32521 1.24689 1.17899 1.10967 1.04321 0.977535 0.914886 D1 1.25007 1.1454 1.02967 0.909196 0.787937 0.668458 0.552616 D2 1.29446 1.24116 1.19385 1.15413 1.11984 1.09117 1.06756 D3 1.30873 1.18908 1.0612 0.926379 0.789862 0.652003 0.519386 D4 1.24969 1.19794 1.15215 1.10665 1.06054 1.01541 0.972247 D5 1.30613 1.25894 1.24667 1.22364 1.20532 1.18796 1.17213 D6 1.32609 1.26598 1.21863 1.1715 1.11845 1.06776 1.01773 D7 1.24413 1.19641 1.15582 1.11233 1.07075 1.02989 0.990333 D8 1.29458 1.2689 1.24792 1.23129 1.11565 1.2037 1.18615 D9 1.30569 1.24427 1.1843 1.12814 1.06402 1.00757 0.945874 D10 1.22895 1.17977 1.12721 1.08437 1.05594 0.996782 0.955109 D11 1.28585 1.25152 1.24202 1.20863 1.19789 1.18557 1.17687 D12 1.32485 1.25974 1.19634 1.12607 1.06529 1.00369 0.945298 E1 1.26585 1.24468 1.22993 1.21788 1.20687 1.19663 1.18684 E2 1.28588 1.26609 1.25555 1.24634 1.23834 1.23301 1.22454 E3 1.40492 1.38842 1.38142 1.37631 1.37537 1.35759 1.37925 E4 1.25224 1.22717 1.22456 1.20899 1.20537 1.19215 1.18729 E5 1.28445 1.26099 1.2578 1.24481 1.24227 1.24312 1.22902 E6 1.41559 1.39233 1.39125 1.39526 1.39198 1.39374 1.39444 E7 1.24941 1.23203 1.22008 1.21025 1.19981 1.19116 1.18131 E8 1.28294 1.26681 1.25575 1.24786 1.23931 1.23234 1.22482 E9 1.41624 1.40295 1.39325 1.38915 1.38831 1.39135 1.3926 E10 1.25103 1.2344 1.22255 1.21066 1.19934 1.19019 1.18001 E11 1.28282 1.26508 1.25373 1.2456 1.23579 1.22951 1.22388 E12 1.40091 1.38208 1.37353 1.36838 1.37118 1.37277 1.37596 F1 1.27381 1.25177 1.42954 1.22591 1.2172 1.20462 1.19653 F2 1.31582 1.27527 1.26896 1.2579 1.25167 1.24271 1.23626 F3 1.40752 1.38937 1.38345 1.3758 1.37474 1.37291 1.38097 F4 1.25509 1.23478 1.2266 1.21557 1.20815 1.19841 1.19044 F5 1.2965 1.27722 1.26885 1.26017 1.25426 1.24492 1.23656 F6 1.4187 1.39706 1.3931 1.38531 1.38785 1.38914 1.39011 F7 1.246 1.22802 1.21812 1.20745 1.19826 1.18993 1.18121 F8 1.29022 1.27465 1.26567 1.25686 1.24998 1.24217 1.23517 F9 1.41724 1.40326 1.39596 1.38903 1.3893 1.39166 1.39336 F10 1.24859 1.23024 1.21949 1.2079 1.19759 1.18777 1.17869 F11 1.28758 1.26826 1.25715 1.24684 1.23775 1.23027 1.22347 F12 1.40227 1.38389 1.37645 1.26949 1.37169 1.35972 1.37408 G1 1.18531 0.919044 0.594516 0.246662 0.142402 0.141582 0.141228 G2 1.20175 1.05315 1.02611 1.01948 1.01648 1.01242 1.00878 G3 1.28761 1.03063 0.803991 0.588096 0.414563 0.292175 0.216463 G4 1.1742 0.931209 0.636465 0.316307 0.142279 0.141106 0.140634 G5 1.21033 1.06008 1.02962 1.02425 1.01994 1.01586 1.01182 G6 1.30585 1.05157 0.817872 0.603655 0.429518 0.306902 0.229231 G7 1.16608 0.935107 0.650304 0.339653 0.13371 0.141104 0.140561 G8 1.19612 1.05326 1.01648 1.01461 1.00731 1.00632 1.00034 G9 1.30101 1.0522 0.814083 0.599029 0.430415 0.298879 0.22278 G10 1.1673 0.931729 0.659445 0.348536 0.142251 0.140762 0.140366 G11 1.21504 1.05768 1.02159 1.01397 1.00866 1.00364 1.00117 G12 1.32031 1.04353 0.796514 0.578628 0.406124 0.287374 0.216896 H1 1.17455 0.890913 0.547969 0.195446 0.139742 0.139161 0.138134 H2 1.19457 1.04327 1.0205 0.996873 1.01074 0.998359 1.00137 H3 1.31764 1.06892 0.873249 0.682162 0.525871 0.402686 0.312871 H4 1.15428 0.892204 0.579256 0.244908 0.138164 0.13817 0.137432 H5 1.19279 1.04071 1.01799 1.01421 1.0075 1.00471 0.998634 H6 1.30429 1.03489 0.79438 0.582402 0.415275 0.296579 0.222946 H7 1.16315 0.909108 0.599414 0.271546 0.139156 0.138371 0.138153 H8 1.1959 1.0441 1.01703 1.01247 1.00661 1.00317 0.998108 H9 1.31983 1.04303 0.800813 0.586715 0.416342 0.302321 0.229748 H10 1.16614 0.9116 0.605442 0.280762 0.138635 0.137896 0.13746 H11 1.20422 1.04782 1.0182 1.01127 1.00568 1.00119 0.996086 H12 1.31099 1.01021 0.753462 0.526417 0.355479 0.246476 0.189229

    EXAMPLE 7: RISK CLASSIFICATION OF CHRONIC LYMPHOCYTIC LEUKEMIA AND COLON CANCER PATIENTS BY MONITORING ENZYME KINETICS

    7.1 Methods

    [0173] Overview: PBMCs (peripheral blood mononuclear cells) from chronic lymphocytic leukemia (CLL) patients and cancer tissue from the primary tumor of colorectal cancer (CRC) patients were analyzed with the EnFin®-test kits in anaerobic tissue/blood culture for activity of key metabolic enzymes responsible for anaerobic adaptation. Kinetic data including linear and non-linear enzyme activities were vectorized with all single time series data piled up to a single vector. Popular Machine Learning algorithms including SVM, RF, C5.0 and Neural Networks were trained on the enzyme kinetic datasets generated with the enzymatic test kits and evaluated using separated datasets.

    [0174] Specific description: For CLL the study sample consisted of 22 patients diagnosed with CLL who presented at the University Hospital Heidelberg between 2013 and 2014 and were treated with CIT. Peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll gradient. The research was approved by the Ethics Committee of the University of Heidelberg (S-356/2013 and S-254/2016).

    [0175] For CRC the study sample consisted of 101 patients diagnosed with CRC who presented at the University Hospital Heidelberg between 2009 and 2012 and were depicted from the DACHS study (German Cancer Research Center) within the framework of the IMPACT consortium (“Improving long-term prognosis and quality of life of patients with colorectal cancer”. Analysis was done with cryo-frozen tissue samples from the primary tumor available through the clinical research unit (KFO 227) of the University Hospital Heidelberg, consistent quality was granted by the tissue biobank of the National Center for Tumor Diseases (NCT) in Heidelberg, Germany. The research was approved by the Ethics Committee of the University of Heidelberg (KFO 310/2001).

    Genetic Aberrations and CEA Serum Levels

    [0176] Chromosomal aberrations by fluorescence in situ hybridization (FISH) as well as TP53 and Immunoglobulin Heavy Chain Variable (IGHV) mutation status were obtained from medical reports. Preoperative Carcinoembryonic Antigen (CEA) serum levels were obtained from medical reports and were available for 99 colorectal cancer patients. Sera were obtained on the day before surgery.

    Assay Software Part: Benchmark Methods and Performance Evaluation

    [0177] Experiments were run in an Intel Core Duo 2 T6600, 2.2 GHz, and 3 GB of RAM under Windows 7 environment. The algorithms were coded in R (R 3.5.1 (www.R-project.org)) using caret, gtools, openxlsx, reshape2, c1071, plyr, DMwR, randomForest, nnct and C50 packages. We selected well-known learning algorithms (Support Vector Machine (SVM), RF, C5.0) to be used in conjunction with a popular method for handling imbalanced data. All algorithms were used both with and without in combination with Synthetic Minority Oversampling Technique (SMOTE) technique. SMOTE was applied to alter the number of instances, such that the amount of instances in each class became more balanced. To achieve this, SMOTE combines the features of existing instances with the features of their nearest neighbors to create additional synthetic instances. K-Nearest Neighbor (KNN) was set default (=5) (Wang et al. (2015), Comput Methods Programs Biomed 119 (2): 63-76; Alghamdi et al. (2017), PLoS One 12 (7): e0179805.) Data was splitted in training and testing sets (70% and 30% respectively). Due to the small sample size of available patients, we performed repeated k-fold cross validation (RkCV) to avoid overfitting. K-fold cross validation splits the training data into k sets of equal size, and uses k−1 sets to train and predict the remaining set. For each one of the subsets, RkCV performs k-fold cross validation for several times with k random splits of the training data. As implementation, we used the R package caret with its trainControl settings defined to perform “repeatedcv”, 10 repeats, 10 folds and the smote sampling parameter to resolve class imbalances resulting from the splitting. To train we used a grid search to tune the hyper-parameters of the classification algorithm while performing cross validation. For each of the evaluated algorithms (SVMLinear, SVMRadial, RF, C5.0 and two neural networks (avNNet and pcaNNet)), the model was used to predict the classification on the testing set and the following performance measurements were recorded: accuracy, sensitivity, specificity, positive and negative predictive values. To assure that results were robust, the whole process was repeated 5 times for random training/test partitions of the initial data.

    [0178] To evaluate the effectiveness we compared accuracy results of the algorithms. Additionally we reported sensitivity, specificity, positive predictive value, negative predictive value, recall, F1 (F-measure), prevalence, detection prevalence, detection rate and balanced accuracy. Accuracy was calculated as follows:


    Accuracy=(TP+TN)/(TP+TN+FP+FN)  1.

    where TP denotes true positives, TN denotes true negatives, FP denotes false positives, and FN denotes false negatives. To define the diagnostic sensitivity and specificity the following equations were used: For CLL: Sensitivity [%]=100×(number of high risk patients (defined as progress within 2 years after CIT)/total number of high risk patients. Specificity [%]=100×(number of low risk patients (defined as no progress event (within 2 years) after CIT)/total number of low risk patients. For CRC: Sensitivity [%]=100×(number of high risk patients (defined as recurrent within 2 years after surgery)/total number of high risk patients. Specificity [%]=100×(number of low risk patients (defined as not recurrent within 2 years after surgery)/total number of low risk patients.

    Assay Hardware Part: Sample Preparation and Analysis

    [0179] Sample preparation and analysis was performed with EnFin-CLL-Test™ kits (CE IVD, #6102 ENF, EnFin® GmbH, Germany) and EnFin-CRC-Test™ kit s (RUO, #980010-6101 ENF) for CLL and CRC respectively according to the instructions of the manufacturer. Briefly, two 3 cm petri dishes per patient were filled with 3 ml RPMI 1640 (Life Technologies, Paisely, UK) and 1*10.sup.7 cells (CLL) or fragmented (25 mg) primary tumor tissue (CRC). For CLL and CRC respectively, one was wrapped with an oxygen impermeable shell (GasPak™ EZ, Becton Dickinson, New Jersey, USA) to generate anoxic (=anaerobic) conditions. After incubation (for CLL samples: 16-24 h, for cryo CRC samples: 5 h) at 37° C. and 5% CO 2 (anoxic (Ax) and normoxic (Nx) sample), cells were washed and resolved in 500 μl of the provided buffer solution (FIG. 4). Enzymes were extracted by ultrasound (Diagenode Bioruptor® Sonication System, Diagenode, Seraing, Belgium), 1 μg protein per well was loaded following a fixed pipetting protocol (FIG. 5) ensuring correct data import later by the software. Activities of PK-la (low affinity), PK-ha (high affinity) and LDH were monitored as decrease of NADH at 340 nm for 30 min at 37° C. in a microplate reader (VICTOR X 2030, Perkin Elmer, Waltham, USA), run in duplicates, comprising the whole range of enzyme activity, i.e. linear and non-linear kinetics. Positive and negative controls were within the recommended ranges.

    Statistical Analyses

    [0180] Statistical analyses were performed using statistical software R 3.5.1 (www.R-project.org). Endpoints were defined according to iwCLL criteria or ESMO (European Society for Medical Oncology) guidelines. Patients' characteristics in the two groups defined by clinical high risk versus low risk were compared by Fisher's Exact test for categorical parameters and t-tests or Mann-Whitney-tests for metric parameters. Accuracy estimates and 95% confidence intervals (CI) were calculated, and lower limits of the CI >50 are interpreted as superiority over chance.

    7.2 Results

    [0181] Overview: CLL patients harboring anaerobic cells in their samples relapsed very early after chemo-immunotherapy (CIT), However, recommended clinical markers in CLL, TP53 and IGHV mutation analysis, failed in predicting response to CIT. In CRC both CEA high serum levels (above 2.5 ng/ml) and presence of anaerobic cells predicted early recurrence after surgery with curative intention. Machine learning outperformed all markers in both cancers. The best results showed an accuracy of 99% (95% CI 98-100) for CLL and an accuracy of 91% (95% CI 88-93) for colorectal cancer.

    Specific Description: CLL and CRC Test Cohorts

    [0182] 96 chronic lymphocytic patients were prospectively enrolled in our trial, of these 74 were eligible to be analyzed with the EnFin-CLL™ test kits. With the kits 27 cases were classified as High Risk (anaerobic growth) and 47 as Low Risk (aerobic growth). Both risk groups showed similar clinical parameters including cytogenetic abnormalities, TP53 mutation and lymphocyte doubling time. Clinical characteristics are shown in Table 3.

    TABLE-US-00006 TABLE 3 Patient characteristics CLL. Kit HR Kit LR (n = 27) (n = 47) p-value Kit, median (range) 1.59 (1.24-3.28) 1.91 (1.70-2.26) 0.071 Age at diagnosis, median 63.9 (40-81.9) 63.9 (31.8-83.8) 0.63 (range) [years] Age ≥65 years, n (%) 13 (48) 22 (48) 1 Age ≥75 years, n (%) 4 (15) 4 (9) 0.457 Gender female/male 12/15 18/29 0.631 WBC median (range) [/nl] 57290 (20570, 247200) 61640 (17200, 234100) 0.823 PB lymphocytes, median 94 (82, 99) 90 (67, 100) 0.089 (range) [%] BRAF mut n (%) 0 (0) 3 (8) 0.281 MYD88 mut n (%) 0 (0) 1 (3) 1 NOTCH1 mut n (%) 3 (13) 3 (8) 0.666 SF3B1 mut n (%) 4 (17) 3 (8) 0.408 del11q22-23 n (%) 2 (8) 8 (18) 0.303 trisomy 12 n (%) 1 (4) 12 (27) 0.023 del13q14 n (%) 20 (77) 26 (58) 1 del17p13 and/or TP53mut 6 (25) 8 (22) 0.765

    [0183] Patients were classified as clinical high risk (CLL-HR) when having progressive disease within two years after beginning of CIT, as low risk (CLL-LR) when having no progressive disease within two years after beginning of CIT. CIT included bendamustine in combination with rituximab (BR), cyclophosphamide/doxorubicine/vincristine/prednisolone in combination with rituximab (R-CHOP), chlorambucil in combination with rituximab (R-CBL) or obinutuzumab (G-CBL) and fludarabine/cyclophosphamide in combination with ofatumumab (O-FC). 22 patients were treated with CIT. This dataset was used for machine learning. The ratio of CLL-HR to CLL-LR in the dataset was balanced with 1:1.2.

    [0184] Out of the 101 colorectal cancer patients from the DACHS study 22 were classified as high risk (recurrent within two years after surgery). This dataset was used for machine learning. The ratio of CRC-HR to CRC-LR in the dataset was unbalanced with 1:4.6. Clinical characteristics are shown in Table 4.

    TABLE-US-00007 TABLE 4 Patient characteristics CRC. HR (n = 22) LR (n = 79) p-value Kit, mean, sd (range) 1.78, 0.67 (0.09-3.16) 2.15, 0.80 (0.08-5.08) 0.049 Age at diagnosis, mean, sd (range) 65.6, 10 (46-90) 62.5, 11.1 (33-81) 0.243 [years] Age ≥65 years, n (%) 12 (55) 38 (48) 0.636 Age ≥75 years, n (%) 3 (14) 8 (10) 0.701 Gender female 9 (%) 25 (%) 0.451 CEA, median (range) [/nl] 4.2 (0.8-390) 1.8 (0.2-1334.7) 0.04 Adj. Chemotherapy n (%) 18 (82) 38 (49) 0.007 CA19-9, median (range) 22.5 (2.6-95905.2) 11.1 (0.5-2432.8) 0.032 BRAF mut n (%) 2/11 (18)* 0/27 (0)* 0.49 Kras mut n (%) 6/11 (55)* 5/27 (19)* 0.061 MSI n (%) 0/10 (0)* 2/22 (9)* 1 Localization rectum/colon (%) 71/29 60/40 0.324 Margin R0/R1 (%) 67/33 92/8  0.002 UICC stage 1/2/3/4 (%) 6/13/25/56 14/42/38/6 <0.0005 Radiotherapy (%).sup.† 43 35 0.138

    [0185] Notably, both risk groups consisted of colon and rectal cancer with a dominance of rectum cancer patients (for high risk 71% rectum cancer, for low risk 60%). According to treatment standards, the majority of rectal cancer patients were subjected to radiotherapy in both groups. Both risk groups showed similar mutational characteristics regarding Ras, BRAF, and MSI. Importantly, 82% of the patients in the high risk group received adjuvant chemotherapy compared to 49% in the low risk group (Table 2).

    Data Preprocessing and Feature Space

    [0186] For correct labeling as high risk (=“1”) or low risk (=“0”) for the supervised training of the machine learning algorithms unsorted raw data from the microplate reader were sorted by patients as shown in FIG. 6. The code is disclosed in the GitHub repository www.github.com/enfinlab. Using the whole range of enzyme activity (substrate (S) to enzyme (E) ratio=S<<E; S≈E and S>>E), i.e. including enzymatic activity within the non-linear range under two conditions (with oxygen and without oxygen as described above) we did vectorization of the time series, so creating a single vector with all single time series data piled up to a single vector (FIG. 6). In this most simple approach for each binary classification problem (CLL/CRC) the feature space consisted of 168 features respectively. Thus all time series points, positive and negative controls (noise) and both conditions (aerobic/anaerobic) are each represented in a single feature.

    AI Classifier Outperform Both Gold-Standard Biomarkers TP53, IGHV and CEA and the EnFin® Assay

    [0187] CLL patients with TP53 aberrations and IGHV unmutated status respond poorly to CIT. A very simple feed-forward neural network (pcaNNet) with one neuron in the hidden layer outperformed these standard clinical markers (TP53 mutation analysis and IGHV mutation analysis) in the original CLL dataset by far (no significant results for TP53 and IGHV, Table 3). PcaNNet was as accurate as the kit assay (detecting anaerobic leukemia cells in the patients PBMC samples) in finding CLL-HR cases, however, it outperformed slightly the kit regarding positive/negative predictive values, that are in particular important for clinical decision making (pcaNNet: PPV=90%-98%, NPV=64%-100%; EnFin-CLL-Test™, kit: PPV=100%, NPV=50%, Tables 5, 6). In addition synthetically oversampled (SMOTEd) datasets with the same data distribution as in the original dataset were created to test algorithm performance on a larger sample size. In the synthetically oversampled CLL dataset all algorithms showed near perfect (mean accuracy results-99%, Table 5) classification results.

    TABLE-US-00008 TABLE 5 Machine learning performance, CLL dataset (n = 22): first value: without SMOTE, second value: SMOTE within 10 × 3 cross-validation resampling procedure, third value: SMOTEd CLL dataset. All machine learning results are mean values from 5 runs with unseen test datasets (hold-out sets). Dataset CLL Accuracy Metrics (95% CI) SE SP PPV NPV Algorithm SVMLinear Support  53 (48-58); 47; 53; 60; 67; 98 52; 83, 98 60; 57; 100 and Vector 60* (55-65); 100  Algorithm Machine  99* (98-100) Family SVM Radial  50 (45-55); 20; 67; 80; 60; 98 NaN; NaN; 50; 70; 100 63* (58-68); 100   98  99* (98-100) RF Bagging  53 (46-58); 47; 73; 60; 53; 98 52; 52; 98 60; 81; 100 63* (58-68); 100   99* (98-100) C5.0 Boosting 73* (69-78); 73; 73; 73; 53; 98 75; 63; 98 82; NaN; 97 63* (58-68); 96 97* (96-99)  avNNet Neural 57* (52-62); 47; 53; 67; 80; 97 48; 80; 96 61; 62; 97 Network 67* (62-71); 96 96* (94-98)  pcaNNet 70* (85-75); 47; 53; 93; 80; 98 90; 83; 98 64; 62; 100 67* (62-72); 100   99* (98-100) Standard TP53 25 18 100 100 10 IGVH 71 80  50  80 50 Kit 69 55 100 100 50 P-values: IGHV, p = 0.52; del17p, p = 0.64; Kit-CLL, p = 0.037; CEA, p = 0.027; Kit-CRC, p = 0.049 SE = Sensitivity; SP = Specificity; PPV = Positive Predictive Value; NPV = Negative Predictive Value, *superior over chance.

    TABLE-US-00009 TABLE 6 Machine learning performance, CRC dataset (n = 101): first value: SMO1E within 10 × 10 cross-validation resampling procedure, second value: SMOTEd CRC dataset. All machine learning results are mean values from 5 runs with unseen test datasets (hold-out sets). Dataset CRC Accuracy Metrics (95% CI) SE SP PPV HPV Algorithm end SVMLinear Support 60* (56-64); 23; 100 70; 98 14; 98 78; 100 Algorithm Vector 90* (88-93)  Family SVM Radial Machine 76* (72-79); 10; 85 93; 92 NaN; 90 80; 100 89* (87-92)  RF Bagging 67* (63-71); 27; 89 77; 85 25; 82 80; 92  87* (84-90)  C5.0 Boosting 65* (61-69); 23; 85 76; 98 18; 84 79; 89  87* (84-90)  avNNet Neural 62* (58-66); 33; 96 70; 85 23; 83 80; 97  Network 90* (89-94)  pcaNNet 68* (64-72), 37; 97 77; 86 32; 84 82; 100 91* (88-93)  Standard CEA 66 61 50 22 85 Kit 69 18 84 24 79 P-values: IGHV, p = 0.52; del17p, p = 0.64; Kit-CLL, p = 0.037; CEA, p = 0.027; Kit-CRC, p = 0.049. SE = Sensitivity; SP = Specificity; PPV = Positive Predictive Value; NPV = Negative Predictive Value, *superior over chance.

    [0188] It is well known that CEA is very inconsistent in detecting recurrence in CRC. Nevertheless there is no alternative, both approved and more reliable, biomarker, thus CEA in combination with other surveillance measures is still gold standard in clinics for recurrence detection. However, it is not used for predicting Disease-Free-Survival (DFS). Predicting DFS in CRC with a test as early as possible, e.g. at the time point of surgery, would be very beneficial for follow-up and treatment of patients. In particular, it is important to detect patients recurrent after curative surgery (resection of the primary tumor plus metastasis if present) within two years after surgery as several studies have shown that their tumors are very aggressive. These patients could benefit from aggressive management integrating chemotherapy/radiation plus alternative targeted drugs and intensive follow-up. In this study we compared the classification performance of CEA, the EnFin-CRC-Test™ and AI algorithms (that were trained with the raw data from the EnFin-CRC-Test™) in predicting DFS after curative intended surgery. The EnFin-CRC-Test™ detecting anaerobic cells in the CRC tissue and preoperative elevated CEA serum levels (threshold 2.5 ng/ml) showed similar performance with accuracies of 69% or 66%, a PPV of 24% or 22% and a NPV of 79% or 85% (Table 6). Neural networks (pcaNNet) had a comparable accuracy (68%, Table 6), however, they showed better predictive value performance (PPV=32%; NPV=84%, Table 6) that is critical for decision making in clinical routine. In the synthetically oversampled CRC dataset all algorithms showed excellent classification results (mean accuracy results ranging from 87%-91%, Table 6). Here pcaNNet reached best results (91% accuracy, Table 6).

    [0189] Taken together AI algorithms were superior to clinically approved or recommended/experimental biomarkers. Very simple neural networks (pcaNNet) with default settings.sup.28 had the overall best performance. They showed both superior performance on unseen real-life clinical data and near perfect classification results on unseen synthetically oversampled datasets (from real-life datasets used in this study). Although the low size of samples and default settings of ML algorithms used, we have shown that it is possible to learn from enzyme kinetic data and outperform extensively validated clinical markers that have been used for a long time for clinical decision making.

    EXAMPLE 8: MODULATION OF ENZYME ACTIVITY BY EXTRACTS FROM FORMALIN-FIXED AND EMBEDDED CELLS (“FFPE PROTEOME”)

    [0190] 20 FFPE tissue sections (10 μm) of 10 colon carcinomas (2 sections per patient) with different clinical courses (UICC stages I-IV, I: superficial infiltration, II: infiltration of the muscle layer of the colon, III: infiltration of regional lymph nodes, IV: distant metastases) were used. Approximately 500 μg of proteome were extracted, 4 μg thereof were added to the reaction volume. In particular PKLA shows differences in its activities. Overall there is a total of 168 data points (feature space) per tumor. From the complex relationships of the data points, depending on the control activities present at each measurement point (activities with extraction buffer or without extraction buffer (and without proteome, respectively)), in particular machine learning can search for patterns that correlate with the disease progress/individual treatment.

    NON-STANDARD LITERATURE CITED

    [0191] Alghamdi et al. (2017), PLoS One 12 (7): e0179805 [0192] Alves-Filho et al. (2016), Front. Immunol.; doi: 10.3389/fimmu.2016.00145 [0193] Amoedo et al. (2013), Biosci. Rep. 33:c00080 [0194] Arts et al., (2017), J. Leukoc. Biol. 101:151 [0195] Cascone et al. (2018), Cell Metabolism 27: 977 [0196] Cheng et al. (20169, Nat Immunol., 17 (4):406 [0197] Delano ct al. (2016), JCI 26:23 [0198] EP2821790 A1 [0199] Gatenby et al. (2004), Nature Reviews Cancer 4:891 [0200] Gdynia et al. (2018), EBioMedicine 32:125 [0201] Irving et al. (2017), Front. Immunol. doi: 10.3389/fimmu.2017.00267 [0202] Kojima et al. (2017) Nature Chemical Biology, doi:10.1038/nchembio.2498 [0203] Lee et al. (20189, PNAS 115 (19):E4463 [0204] Odegaard et al. (2013), Immunity 38:644 [0205] Palsson-McDermott et al. (2015), Cell Metabolism 21:65; [0206] Pearce et al. (2013), Immunity 38 (4):633 [0207] Roybal et al. (2017), Annu. Rev. Immunol. 35:229 [0208] Venereau et al. (2012), J. Exp. Med. 209 (9):1519 [0209] Wang et al. (2015), Comput Methods Programs Biomed 119 (2): 63-76. [0210] Wasmuth et al. (2005), Journal of Hepatology 42:195 [0211] Wein et al. (2010), Stem Cell Res 4:129 [0212] WO2017/098051 A2 [0213] WO 2018/108327