METHOD TO PREDICT PATHOLOGICAL GRADE AND TO IDENTIFY DRUG TARGETS AGAINST GLIOMA TUMOR
20210230705 · 2021-07-29
Inventors
Cpc classification
G16B25/00
PHYSICS
G16H50/30
PHYSICS
International classification
Abstract
Systems and methods for developing predictive models are provided that consider the mechanistic regulations of various bimolecular events and measure the expression of various genes, proteins, and metabolites. The predictive models are utilized in a computer-implemented method to classify the patient derived glioma tumor samples in different pathological grades and successively predict the patient specific combination of drug targets by analyzing the intra-tumor heterogeneity.
Claims
1. A method for determining the risk of tumor development grade of Glioblastoma (GBM or Low Grade Glioblastoma) from the general GBM developmental model, the method comprising; (i) collecting tumor samples from patients diagnosed with glioblastoma cancer followed by extracting and purifying whole cell mRNA form the samples; (ii) performing high-throughput mRNA sequencing process requiring purified and enriched mRNA library for generating the pair-end short reads of the sequences, which is further checked by measuring the library size and base quality and shared in a computer readable text file; (iii) processing RNA-Seq files of step (ii) for quality checking of the sequences, aligning of the sequence reads to the human reference genome and quantifying the transcripts/genes; (iv) performing differential gene expression analyses to determine the differential mRNA expression profile of the collected tumor specimen with respect to normal cells; and (v) producing a predictive computing module based on the tumor specimen, wherein the predictive computing module utilizes the mRNA expression profile of step (iv) comprising select input proteins, performs mechanistic simulations to compute the frequencies of tumorigenic [LGG (Low Grade Glioblastoma) and GBM] and non-tumorigenic cells, followed by computing a novel quantitative (i.e., phenotype predictor) score of each sub-types of glioma tumor cells to predict the chances of occurrence of glioma tumor of respective grades.
2. The method of claim 1, wherein the method is based on an activity ratio (AR).
3. The method of claim 1, wherein the predictive computing module comprising selecting marker proteins involved in cellular states of phenotypic functions of normal neurogenesis and glioblastoma tumorigenesis in adults is selected from the group consisting of apoptosis, aNSC (Neural Stem Cell) renewal, NPC (Neuron Progenitor Cell) differentiation, ASPC (Astrocyte Progenitor Cell) differentiation, and GBM (Glioblastoma) development, the method comprising: (i) defining cellular states, viz. quiescent (or inactive) state, NSC (neural stem cell) renewal, GSC (Glioma Stem Cell) renewal, ASPC (Astrocytes progenitor cell), low and high grade glioblastoma tumor (LGG-I, LGG-II, Grade-IV) states based on activation of phenotypic function and co-expression of multiple marker proteins; (ii) simultaneously identifying (a) proteins in the network model that drive the cellular states by measuring the novel activity ratio (AR) scores thereby identifying driver proteins causing tumorigenesis in the subject and classifying the tumors into different grades (low and high); and (b) a risk of a subject developing glioblastoma (low or high) by measuring the novel phenotype predictor score of the cellular states; (iii) identifying intra-tumor heterogeneities by assessing the variations of protein expressions with high or low grades of glioblastoma tumor cells and the molecular sub-clones of the tumor cells; and (iv) screening and identifying a combination of cell signaling proteins as potential drug targets by a novel perturbation analysis and determining the chance of tumor relapse or cell death.
4. The method of claim 3, wherein the predictive computing module comprises: (i) defining a plurality of input, intermediate, and output molecules (e.g., proteins, metabolites) and their intra-cellular biochemical reactions involved in the process of normal neurogenesis and glioblastoma tumorigenesis in adult human brain; (ii) selecting marker proteins (or end products of the intra-cellular biochemical reactions) responsible for common phenotypic outcomes observed during normal neurogenesis; (iii) writing logical equations for defining the functional and dynamic relationships between the molecules and their connections with the phenotypic expressions; (iv) defining a cellular state “Quiescent (inactive)” wherein no phenotypes are observed; (v) randomly assigning the binary initial states (either 1 or 0) to the input proteins and creating 1 million random expression vectors of the input proteins; (vi) dividing the 1 million binary expression vectors of the input proteins in 100 separated batches, in which each batch contains 10,000 random expression vectors; (vii) starting the simulations for each batch (10,000 initial states) and recording the phenotypic output of each initial state; (viii) iterating the process over all 100 batches and simultaneously recording the distributions of the phenotypic outcomes (or cellular states) of each batch; (ix) computing the Shannon entropy of the observed phenotypic or cellular outcomes of each simulation batch and selecting the batch with highest entropy value for further analyses to compute the activity ratio (AR) score and frequency distributions of the observed cellular states; (x) checking the cellular state or phenotypes viz. quiescent state, apoptosis, NSC renewal, NPC differentiation, ASPC differentiation, and GSC renewal have arrived in the distribution or not; (xi) discarding the results if any of the phenotypes of (x) are missing in the simulation batch and repeating the steps from (iii) to (x) until observing the appropriate distributions of the desired phenotypes or cellular states mentioned in (x) of adult neurogenesis process; (xii) selecting the simulation batch with highest entropy if the simulation successfully reproduce the normal and aNSC's developmental process by expressing the phenotypes mentioned in (x) and subsequently compute the activity ratio (−1.44≤AR≤+1.44) of each input protein for individual phenotype or cellular state; (xiii) validating the activity ratio (AR) score of each protein (i.e., positive AR score for up-regulated/active states=+1.44; negative AR score for down-regulated/inactive states=−1.44) representing each phenotype with the experimental data, such as immunohistochemistry, western blot, mass spectrometry, or transcriptomics; and (xiv) if a protein shows wrong activity in any of the phenotype, repeating the simulation from step (iii) to (xiii), or otherwise proceeding to the next steps: (a) declaring the proteins with positive (and maximum=+1.44) and negative (and minimum=−1.44) activity ratio (AR) score observed in a particular phenotype as the driver proteins for the development of that phenotype, for example, the positive and maximum AR score of P53 is the driver of the phenotype apoptosis; (b) naming the simulation strategy from (i) to (xiv) as the model of aNSC development simulation, in which the normal neurogenesis and gliogenesis including natural cell death (apoptosis) processes are successfully simulated and validated with the experimental observations; (c) fitting the distribution of the phenotype prediction scores of each cellular state observed in 100 simulation batches with different distribution functions and calculating the confidence interval of the phenotype predictor score of a cellular state from its corresponding distribution function; (d) finding the cellular state Glioblastoma stem cells (GSC) renewal with input protein P53 at inactive state (i.e., mutation or negative AR score) in the simulation outputs and thus proving the inactivation of tumor suppressor protein P53 as the driver of GSC development from aNSCs in human brain; and (e) removing P53 protein from the input protein list made in step (v) and considering the P53 protein as constitutively down-regulated (binary: 0 or False) state in the logical equation model developed in step (iii) to create another new model.
5. The method of claim 1, wherein the cellular states are selected from the group consisting of quiescent (or inactive) state, NSC (neural stem cell) renewal, GSC (Glioma Stem Cell) renewal, ASPC (Astrocytes progenitor cell), and low and high grade glioblastoma tumor (LGG-I, LGG-II, Grade-IV) states based on activation of phenotypic function and co-expression of multiple marker proteins.
6. The method of claim 4 in a model with P53 mutation comprising: (i) assessing the observed cellular states and their frequency distributions across 100 simulation batches and identifying the simulation batch with highest Shannon entropy as mentioned in (ix); (ii) selecting the simulation batch with highest Shannon entropy for activity ration calculation and assessing the distribution of the frequencies of cellular states; (iii) checking whether the cellular state “apoptosis” is present in the frequency distribution or not. If yes, then modify the logical equations as mentioned at step (iii) and repeat the steps from (iii) to (xvi), otherwise proceed to the next step; (iv) checking whether the cellular state “GSC renewal” and “ASPC differentiation” are present or not. If not, then modify the logical equations as mentioned at step (iii) and repeat the steps from (iii) to (xvii), otherwise proceed to the next step; (v) calculating the AR score of each input protein driving the cellular state “ASPC differentiation” and extracting those proteins which have maximum & positive (+1.44) and minimum & negative (−1.44) AR scores in the cellular state of “ASPC differentiation”; (vi) repeating the simulation from step (iii) to (xix), if a protein shows wrong activity in any of the phenotype, otherwise proceed to the next step: (a) declaring the proteins with positive (and maximum=+1.44) and negative (and minimum=−1.44) activity ratio (AR) score observed in a particular phenotype as the driver proteins for the development of that phenotype, for example, JAK2 and STAT3 proteins showing positive and maximum AR score are the driver of “ASPC differentiation” process and P53 protein having negative AR score=−1.44 is the driver of GSC renewal; (b) renaming the simulation strategy from (xv)-(xix) as the model of “glioblastoma stem cells (GSC) development” model, in which the GSCs show continuous renewal process with no marker sign of apoptosis. GSCs also develop into matured but mutated (P53) astrocytes cells which clearly indicates the development of astrocytoma or glioma; (c) fitting the distribution of the phenotype prediction scores of each cellular state observed in 100 simulation batches with different distribution functions and calculating the confidence interval of the phenotype predictor score of a cellular state from its corresponding distribution function; (d) finding the cellular state Astrocytes progenitor cells (ASPC) differentiation and the AR scores of each protein which have shown either the value of +1.44 or −1.44; and (e) removing the proteins with AR score either +1.44 or −1.44 from the input protein list made in step (v) and considering the P53 protein as constitutively down-regulated (binary: 0 or False) state in the logical equation model developed in step (iii) to create another new model.
7. The method of claim 1, wherein after determining the risk of tumor development by LGG or GBM from the general GBM developmental model by using the patient's mRNA sequencing data, the method of target screening and identifying the potential protein for personalized target-based therapy comprises: (i) extracting the temporal activity profiles of intermediate molecules of the network model from the STG (a State Transition Graph) generated after simulation started from the initial states of the molecules to the cellular state representing high-grade (Grade-IV) glioblastoma cellular state; (ii) extracting the activity profile of Grade-IV glioblastoma cellular state in STG; (iii) identifying the delays and correlations of activity profiles of each intermediate molecule with the temporal activity profile observed for high-grade (Grade-IV) glioblastoma cellular state by Fast Fourier Transformation (FFT) analyses; (iv) extracting the molecules having absolute correlation value ≥0.75 with P value <0.05, and Delay ≤3 with the activity pattern of high-grade (Grade-IV) glioblastoma cellular state, and selecting for perturbation analyses; (v) perturbing the molecules by constitutively up-regulating molecules which have negative correlation with high-grade (Grade-IV) glioblastoma cellular state in the new perturbation analysis; (vi) perturbing the molecules by constitutively down-regulating the molecules which have negative correlation with high-grade (Grade-IV) glioblastoma cellular state in the new perturbation analysis; (vii) assessing the changes of frequencies of the cellular states representing the low-grade and high-grade glioblastoma cellular states in the perturbation studies with the previous simulation. (viii) placing the perturbed molecules at the top of the list on the basis of its ability to maximum suppression of low-grade and high-grade glioblastoma cellular states in the perturbation study, wherein the molecules held rank at the top of the list are considered as potential targets for target-based, personalized glioblastoma therapy.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
DETAILED DESCRIPTION
Definitions
[0045] Normalized frequency of cellular state: The normalized frequency of a cellular state in a particular simulation batch is the ratio of its observed frequency to a total number of random initial conditions.
[0046] Shannon entropy: Shannon entropy is used in this application to measure the total information content of a simulation batch by measuring the summation of the negative logarithms of the probability mass functions of all the observed cellular states.
[0047] Activity Ratio (AR): Activity Ratio score of any input protein/node with respect to a particular cellular state of the logical model simulation is measured by calculating the total number of times the node is active or inactive within all the random input sequences, which trigger that particular cellular state in the attractor space.
[0048] Predictor Score: The predictor score of a particular cellular state observed in the simulation defines as the ratio of the total number of observed occurrences to the total phenotypic cost required for reaching the cellular state in the attractor space.
[0049] Phenotypic cost function is a function calculated by measuring the ratio of the average Hamming distance traversed by all the molecules while reaching at the particular phenotype/cellular state to the normalized frequency of the phenotypes.
[0050] Hamming distance between two successive nodes (or cellular states) in a State Transition Graph (STG) is the temporal changes or evolution of the expression dynamics of the pathway molecules in successive time points.
[0051] In the fixed-point (singleton attractor) state, the binary expressions of the cellular state and the pathway molecules will show homeostatic behavior, whereas in the periodic state it will be oscillatory in nature.
[0052] Particular embodiments will now be described in detail in connection with certain preferred and optional embodiments, so that various aspects thereof may be more fully understood and appreciated.
[0053] The embodiments disclosed herein narrate the stepwise executions of different computational modules-starting from the transcriptomic data collection from glioblastoma patients to tumor grade prediction, driver gene or biomarker identification, and personalized drug-targets identification.
[0054] Embodiments herein disclose a computer implemented process for identifying proteins/genes involved in glioblastoma tumorigenesis, calculating the potential risk of glioblastoma tumor development, classifying the grades of tumor cells, detecting patient specific intra-tumor heterogeneity and tumor sub-clones, and screening personalized potential drug-targets in a subject using mechanistic simulation of a network model.
[0055] Accordingly,
[0056] In a preferred embodiment the illustrations of
[0057] (i) collecting tumor samples from patients diagnosed with glioblastoma cancer (101) followed by extracting and purifying whole cell mRNA form the samples (102);
[0058] (ii) performing high-throughput mRNA sequencing process (103) requiring purified and enriched mRNA library for generating the pair-end short reads of the sequences, which is further checked by measuring the library size and base quality and shared in a computer readable text file;
[0059] (iii) processing RNA-Seq files (104) of step (ii) for quality checking of the sequences, aligning of the sequence reads to the human reference genome and quantifying the transcripts/genes;
[0060] (iv) performing differential gene expression analyses (105) to determine the differential mRNA expression profile of the collected tumor specimen with respect to normal cells (control);
[0061] (v) producing a predictive computing module (106) based on the tumor specimen, wherein the predictive computing module utilizes the mRNA expression profile comprising select input proteins, performs mechanistic simulations to compute the frequencies of tumorigenic (i.e., LGG and GBM) and non-tumorigenic cells, followed by computing a novel quantitative (i.e., phenotype predictor) score of each sub-types of glioma tumor cells to predict the chances of occurrence of glioma tumor of respective grades.
[0062] Accordingly, the tumor sample collection process (101) requires to be performed by an expert surgeon for removing a full or partial segment of the tumor followed by freezing the specimen at low temperatures. The mRNA library preparation process (102) comprises extraction and purification of whole cell mRNA molecules from freshly frozen (FF) tumor specimen and is required to be performed for quantifying and enriching the mRNA library. The high-throughput mRNA sequencing process (103) requires a purified and enriched mRNA library for generating pair-end short reads of sequences, which is further checked by measuring the library size and base quality and shared in either BAM or FASTQ file.
[0063] If mRNA library preparation (102) and high-throughput sequencing (103) are not available at tumor resection and specimen collection site (101) or if necrotic and diagnosis are required to be done prior to specimen freezing, the freshly operated tumor specimen maybe fixated in formalin solution. The fixated tumor sample is then embedded in a paraffin wax block for long-term preservation. This formalin-fixed paraffin embedded (FFPE) blocks are used for mRNA extraction, library preparation followed by high-throughput sequencing. The mRNA library and sequencing maybe generated in replicates and if possible same experiments maybe repeated for generating mRNA sequence of the negative control or healthy tissue collected from the same patient. To perform process steps 101, 102, and 103, well-established and highly standardized protocols routinely used for such purposes are to be followed.
[0064] The raw RNA sequence files of all replicates and negative controls generated from the process 103 are required to be stored in a remote data storage system, which is connected with the high-throughput RNA-Seq machine via internet/intranet.
[0065] The bioinformatics pipeline (104) maybe used to process RNA-Seq files (both control and tumor specimens) for quality checking of the sequences, alignment of the sequence reads to the human reference genome (e.g., GRCh38), and quantification of the transcripts/genes. State-of-the-art bioinformatics pipelines developed for RNA-Seq reads alignment and quantification processes may be used at this step.
[0066] The differential mRNA expression profile (105) of the collected tumor specimen with respect to normal cells (control) is obtained by performing differential gene expression analyses. The count table of all transcripts obtained from each replicate of both groups (i.e., tumor specimen and control cells) shall be used for comparing and identifying the significantly expressed genes (up or down regulation) in tumor cells with respect to control samples. The list of significantly over or under expressed genes identified in the tumor sample is considered as the “transcriptomics profile” of human subject under investigation.
[0067] The predictive computing module (106) illustrated in
[0068] (i) defining a plurality of input, intermediate, and output molecules (e.g., proteins, metabolites) and their intra-cellular biochemical reactions involved in the process of normal neurogenesis and glioblastoma tumorigenesis in adult human brain,
[0069] (ii) selecting marker proteins (or end products of the intra-cellular biochemical reactions) responsible for common phenotypic outcomes observed during normal neurogenesis (viz. Apoptosis, NSC (Neural Stem Cell) renewal, NPC (Neuron Progenitor Cell) differentiation, ASPC (Astrocyte Progenitor Cell) differentiation) and tumorigenesis processes (viz. GSC (Glioblastoma stem cells) renewal, GBM (Glioblastoma) development),
[0070] (iii) writing logical equations for defining the functional and dynamic relationships between the molecules and their connections with the phenotypic expressions,
[0071] (iv) defining a cellular state “Quiescent (inactive)” in which no phenotypes are observed,
[0072] (v) randomly assigning the binary initial states (either 1 or 0) to the input proteins and creating 1 million random expression vectors of the input proteins,
[0073] (vi) dividing the 1 million binary expression vectors of the input proteins in 100 separated batches, in which each batch contains 10,000 random expression vectors,
[0074] (vii) starting the simulations for each batch (10,000 initial states) and recording the phenotypic output of each initial state,
[0075] (viii) iterating the process over all 100 batches and simultaneously recording the distributions of the phenotypic outcomes (or cellular states) of each batch,
[0076] (ix) computing the Shannon entropy of the observed phenotypic or cellular outcomes of each simulation batch and selecting the batch with highest entropy value for further analyses to compute the activity ratio (AR) score and frequency distributions of the observed cellular states,
[0077] (x) checking the cellular state or phenotypes viz. quiescent state, apoptosis, NSC renewal, NPC differentiation, ASPC differentiation, and GSC renewal have arrived in the distribution or not,
[0078] (xi) discarding the results if any of the phenotypes of (x) are missing in the simulation batch and repeating the steps from (iii) to (x) until observing the appropriate distributions of the desired phenotypes or cellular states mentioned in (x) of adult neurogenesis process,
[0079] (xii) selecting the simulation batch with highest entropy if the simulation successfully reproduce the normal and aNSC's developmental process by expressing the phenotypes mentioned in (x) and subsequently compute the activity ratio (−1.44≤AR≤+1.44) of each input protein for individual phenotype or cellular state,
[0080] (xiii) validating the activity ratio (AR) score of each protein (i.e., positive AR score for up-regulated/active states=+1.44; negative AR score for down-regulated/inactive states=−1.44) representing each phenotype with the experimental data, such as immunohistochemistry, western blot, mass spectrometry, transcriptomics, etc.,
[0081] (xiv) repeating the simulation from step (iii) to (xiii), if a protein shows wrong activity in any of the phenotype, otherwise proceed to the next step (xv),
[0082] (xiv)(a) declaring the proteins with positive (and maximum=+1.44) and negative (and minimum=−1.44) activity ratio (AR) score observed in a particular phenotype as the driver proteins for the development of that phenotype, for example, the positive and maximum AR score of P53 is the driver of the phenotype apoptosis,
[0083] (xiv)(b) naming the simulation strategy from (i) to (xiv) as the model of aNSC development simulation, in which the normal neurogenesis and gliogenesis including natural cell death (apoptosis) processes are successfully simulated and validated with the experimental observations,
[0084] (xiv)(c) fitting the distribution of the phenotype prediction scores of each cellular state observed in 100 simulation batches with different distribution functions and calculating the confidence interval of the phenotype predictor score of a cellular state from its corresponding distribution function,
[0085] (xiv)(d) finding the cellular state Glioblastoma stem cells (GSC) renewal with input protein P53 at inactive state (i.e., mutation or negative AR score) in the simulation outputs and thus proving the inactivation of tumor suppressor protein P53 as the driver of GSC development from aNSCs in human brain,
[0086] (xiv)(e) removing P53 protein from the input protein list made in step (v) and considering the P53 protein as constitutively down-regulated (binary: 0 or False) state in the logical equation model developed in step (iii) to create another new model.
[0087] The newly developed model with P53 mutation (constitutively at 0 or OFF state) was simulated by following the same strategy implemented in step (v) to (viii). The stepwise deduction methods (illustrated in
[0088] (xv) assessing the observed cellular states and their frequency distributions across 100 simulation batches and identifying the simulation batch with highest Shannon entropy as mentioned in (ix),
[0089] (xvi) selecting the simulation batch with highest Shannon entropy for activity ration calculation and assessing the distribution of the frequencies of cellular states,
[0090] (xvii) checking whether the cellular state “apoptosis” is present in the frequency distribution or not. If yes, then modify the logical equations as mentioned at step (iii) and repeat the steps from (iii) to (xvi), otherwise proceed to the next step,
[0091] (xviii) checking whether the cellular state “GSC renewal” and “ASPC differentiation” are present or not. If not, then modify the logical equations as mentioned at step (iii) and repeat the steps from (iii) to (xvii), otherwise proceed to the next step,
[0092] (xix) calculating the AR score of each input protein driving the cellular state “ASPC differentiation” and extracting those proteins which have maximum & positive (+1.44) and minimum & negative (−1.44) AR scores in the cellular state of “ASPC differentiation”,
[0093] (xx) repeating the simulation from step (iii) to (xix), if a protein shows wrong activity in any of the phenotype, otherwise proceed to the next step,
[0094] (xx)(a) declaring the proteins with positive (and maximum=+1.44) and negative (and minimum=−1.44) activity ratio (AR) score observed in a particular phenotype as the driver proteins for the development of that phenotype, for example, JAK2 and STAT3 proteins showing positive and maximum AR score are the driver of “ASPC differentiation” process and P53 protein having negative AR score=−1.44 is the driver of GSC renewal,
[0095] (xx)(b) renaming the simulation strategy from (xv)-(xix) as the model of “glioblastoma stem cells (GSC) development” model, in which the GSCs show continuous renewal process with no marker sign of apoptosis. GSCs also develop into matured but mutated (P53) astrocytes cells which clearly indicates the development of astrocytoma or glioma,
[0096] (xx)(c) fitting the distribution of the phenotype prediction scores of each cellular state observed in 100 simulation batches with different distribution functions and calculating the confidence interval of the phenotype predictor score of a cellular state from its corresponding distribution function,
[0097] (xx)(d) finding the cellular state Astrocytes progenitor cells (ASPC) differentiation and the AR scores of each protein which have shown either the value of +1.44 or −1.44,
[0098] (xx)(e) removing the proteins with AR score either +1.44 or −1.44 from the input protein list made in step (v) and considering the P53 protein as constitutively down-regulated (binary: 0 or False) state in the logical equation model developed in step (iii) to create another new model.
[0099] This new development is then simulated by following the same strategy implemented in steps (v) to (viii). However, at this time, the main objective of the simulation is to observe the appearances of low grade glioma (LGG) and high-grade glioblastoma (Grade-IV GBM) cells. The stepwise deduction methods (illustrated in
[0100] (i) performing the similar steps from (xv) to (xvii),
[0101] (ii) checking whether the cellular state “GSC renewal”, “ASPC differentiation”, “GBM development” are present or not. If not, then modify the logical equations as mentioned at step (iii) and repeat the steps from (iii) to (xvii), otherwise proceed to the next step,
[0102] (iii) calculating the AR score of each input protein driving the cellular state “GSC renewal”, “ASPC differentiation”, “GBM development”, and extracting those proteins which have maximum & positive (+1.44) and minimum & negative (−1.44) AR scores in cellular states of “GSC renewal”, “ASPC differentiation”, and “GBM development”,
[0103] (iv) repeating the simulation from step (iii) to (xix), if a protein shows wrong activity in any of the phenotype, otherwise proceed to the next step,
[0104] (iv)(a) declaring the proteins with positive (and maximum=+1.44) and negative (and minimum=−1.44) activity ratio (AR) score observed in a particular phenotype as the driver proteins for the development of that phenotype,
[0105] (iv)(b) renaming the simulation strategy from (xxi) to (xxiii) as the model of “general glioblastoma developmental” model,
[0106] (iv)(c) fitting the distribution of the phenotype prediction scores of each cellular state observed in 100 simulation batches with different distribution functions and calculating the confidence interval of the phenotype predictor score of a cellular state from its corresponding distribution function.
[0107] Accordingly, a network model comprising a plurality of input proteins involved in cellular states of phenotypic functions/cellular states were obtained, out of which 3 cellular states, i.e. Quiescent, Apoptosis and Neuron Progenitor Cells (NPC) Differentiation belonged to singleton attractor states referring to either the fully differentiated, quiescent or dead cells (
Adult Neural Stem Cell Model Development and Simulation
[0108] The chemical reactions of the Notch pathway and its cross talks are compiled from previously published model of Notch pathway, which is further modified to simulate the developmental dynamics of aNSCs. The reactions are translated into total of 69 logical equations with 122 logical nodes. The logical nodes consist of 53 input molecules, 39 intermediate molecules, 25 target proteins and 5 phenotypes.
[0109] To initiate the Notch pathway activation process, intermediate receptor proteins NOTCH 1/2/3/4 are considered as active and the rest of the molecules (except inputs) are kept as inactive at the initial state. At this stage, the constructed model has no mutation and is simulated for the normal development of aNSCs. The phenotypes are mapped with different marker proteins whose temporal expressions regulate transition dynamics of corresponding phenotypes (Table 1). The simulation outcomes of the model are represented by analyzing the state transition graph (STG), in which the expression vector of all nodes (i.e. molecules) at a particular time is considered as a “state”. The transitions of the states are continued until it reaches the steady-state levels i.e. either at singleton or fixed-point or periodic state.
[0110] Two models viz. master aNSC and general GBM are employed in embodiments wherein transcriptomics of glioma tumor patients are used as inputs. At first, the transcriptomics profile is introduced as inputs in the master aNSC model to run a new simulation.
[0111] In an embodiment, the aNSCs are highly biased towards development of quiescent, neural progenitor and apoptotic states Analysis of simulation results revealed that normalized frequencies of singleton attractor or cellular states are comparably higher than periodic attractor states (
[0112] Embodiments herein provide a process for determining the risk of future occurrence of GBM tumor of an individual.
[0113] Determining Normalized Frequency: The normalized frequencies of each of the cellular states generated from the new simulation are compared with normalized frequencies of the cellular states observed in the master aNSC model. If the observed normalized frequencies are found consistent with respect to the expected normalized frequencies at the significance level (i.e., P-Value ≤0.01), then the inspected subject is categorized as healthy having no risk of developing GBM tumor in the near future. The normalized frequencies are measured by Chi-square goodness-of-fit test. On the other hand, if there is an inconsistency found in the observed and expected normalized frequencies, then the individual's transcriptomics profile would be further tested to detect the risk of developing Low or High grade GBM tumor in future. In this case, the transcriptomics profile was considered as inputs in the master general GBM model and a new simulation will be allowed to run again. If the observed and expected normalized frequencies are well fitted with each other, then it is concluded that the subject has a risk of developing Low-grade glioblastoma (LGG) in future.
[0114] Determining Predictor score: To quantify the chances of occurrences of tumor cells of the subject, the phenotype predictor scores are determined for each tumorigenic state i.e. LGG-I, LGG-II and Grade-IV glioblastoma, which will be further helpful to assess the severity of tumor progression.
[0115] If it is observed that the expected and observed normalized frequencies are not consistent with each other and the phenotype predictor score of all the tumorigenic states is increased many folds as compared to the master general GBM model, then it is concluded that the subject has a risk of developing high grade GBM tumor. Quantification and further assessments of the chances of developing high-grade GBM tumor can be calculated by measuring the corresponding values of the phenotype predictor scores. Followed by these analyses, for high-grade GBM tumor patients, another simulation will be performed by using the omics profile as input in the master Grade-IV GBM model to screen and rank suitable drug targets for personalized, target based GBM tumor therapy.
[0116] In another embodiment, the activity ratio score is determined in step (ii)(a) by determining the contribution of an arbitrary pathway molecule (X.sub.i) in the cell signaling network to drive cellular dynamics towards a particular cellular state. Accordingly, the calculation of activity ratio score comprises the following steps:
[0117] (i) identifying the total number of input proteins (I), which do not have any upstream activator and inhibitor in the network model,
[0118] (ii) identifying the expressions of the proteins/genes (G) in a subject through proteomics or transcriptomics (microarray, RNA-Seq etc.) analyses, where G.Math.I,
[0119] (iii) identifying a sub-set of input proteins whose expressions are not determined experimentally, i.e. I−G=X,
[0120] (iv) generating a set of N number of expression vectors of total X number proteins, each consisting of randomly chosen binary element X.sub.i={0,1},
[0121] (v) simulating the network model by considering input vectors sequentially and detecting the cellular states C={C.sup.1, C.sup.2, C.sup.3, . . . , C.sup.m},
[0122] (vi) identifying the expression exponent (γ).sub.X.sub.
Wherein, C=Set of Cellular States={C.sup.1, C.sup.2, C.sup.3, . . . , C.sup.m}
m=Total Number Phenotypes observed
X.sub.i∈X=Set of Input Molecules={X.sub.1, X.sub.2, X.sub.3, . . . , X.sub.X}
X=Total Input proteins
Y:=Set of Input sequences={Y.sub.1, Y.sub.2, Y.sub.3, . . . , Y.sub.|S.sub.
Where, |S|=Total number of input sequences reaching to a particular cellular states out of N random sequences i.e. |S|.Math.N
[0123] (vii) calculating the activity ratio of an arbitrary input molecule (X.sub.i) in the development of a cellular state C.sup.j by the following equation 4:
[0124] If AR score of a protein is either +1.44 or −1.44 in a given cellular state, then it can be said that the specific protein is highly essential and positive or negative inducer, respectively for that particular cellular state. This scoring technique is specifically useful for assessing the biological activity of a protein in the development of different grades of tumor cells during tumorigenesis.
[0125] In yet another embodiment, the a predictor score is provided in step (ii) (b), wherein the score is measured as the ratio of the total number of observed occurrences (P.sub.C.sub.
[0126] (i) extracting the state transition graph (STG) of the logical model simulation, wherein the expression states of all the molecules of network model at any arbitrary transition time Z.sub.i={M.sub.t.sup.1, M.sub.t.sup.2, M.sub.t.sup.3, . . . M.sub.t.sup.m:m=Total molecules} is considered as “node”. Node Z.sub.t is connected with the next transition state Z.sub.t+1, where Z.sub.0 is the initial state (either Up or Down regulation) of all the molecules at time t=0, [0127] Initial states of the molecules Z.sub.0 is considered from the subject's/patient's proteomics/transcriptomics profile.
[0128] (ii) identifying the steady state solutions (either fixed-point or cyclic) of the transition nodes {Z.sub.t, Z.sub.t+1, Z.sub.t+1, . . . Z.sub.t+T:T=Total simulation time} obtained in the STG by using the following conditions, [0129] iff Z.sub.t=Z.sub.t+1=S.sub.t=Fixed-point attractor [0130] Z.sub.t=Z.sub.t+s=L.sub.t=Cyclic attractor
[0131] (iii) mapping the steady state points with the cellular states by measuring the activities of the marker proteins corresponding to the cellular states. If all marker proteins of a said cellular state are found active in the steady state, then that steady state would be denoted with that cellular state,
[0132] (iv) calculating the average state transition cost Φ.sub.C.sub.
Where, s=Total number of transition states require to reach at the steady state;
>1 for Periodic attractor (=Periodicities of the cyclic attractor)
μ=Total mutational costs of the entire model
d.sub.S.sub.
P.sub.C.sub.
[0133] (v) calculating the total mutation cost Ω induced in the developing/differentiating stem cells within a subject due to the genetic mutation by using the following equation (9),
Wherein,
[0134] (C.sup.n:C.sub.1.sup.n, C.sub.2.sup.n, C.sub.3.sup.n, . . . C.sub.X.sup.n)=Cellular states observed without any mutation
(C.sup.m:C.sub.1.sup.m, C.sub.2.sup.m, C.sub.3.sup.m, . . . C.sub.X.sup.m)=Cellular states observed with mutation
μ=Total number of genetic mutations carried by the subject/patient
|C.sup.n∪C.sup.m|=Total number of cellular states
|C.sup.n−(C.sup.n∩C.sup.m)|=Total number of cellular states which are not appeared after inducing mutation
|C.sup.m−(C.sup.n∩C.sup.m)|=Total number of cellular states which are newly appeared after inducing mutation
[0135] (vi) calculating the total phenotypic cost (Ψ.sub.C.sub.
Ψ.sub.C.sub.
[0136] (vii) calculating the predictor score[Θ(C.sub.i)] of a particular cellular state (C.sub.i) by measuring the ratio of the total number of observed occurrences (P.sub.C.sub.
[0138] The predictor score is particularly useful for predicting and quantifying the chances of developing a specific grade of glioblastoma tumor of a subject/patient in the future.
[0139] Further, the phenotypic cost calculated in step (VI) is a function calculated by measuring the ratio of the average Hamming distance traversed by all the molecules while reaching at the particular phenotype/cellular state to the normalized frequency of the phenotypes.
[0140] In one more preferred embodiment, target screening is performed by identifying the intermediate molecules of network model, which are mutually interconnected and highly correlated with the temporal activity pattern observed for high-grade (Grade-IV) glioblastoma tumorigenic cellular state.
Implementation of Phenotype Prediction Score in Tumor Risk Prediction
[0141] The method for determining the risk of the development of glioma of a new patient is depicted in
Implementation of Personalized Drug-Target Screening
[0142] The method for determining the personalized drug-targets for individual patient is illustrated in
[0143] (i) extracting the temporal activity profiles of intermediate molecules of the network model from the STG generated after simulation started from the initial states of the molecules to the cellular state representing high-grade (Grade-IV) glioblastoma cellular state;
[0144] (ii) extracting the activity profile of Grade-IV glioblastoma cellular state in STG;
[0145] (iii) identifying the delays and correlations of activity profiles of each intermediate molecule with the temporal activity profile observed for high-grade (Grade-IV) glioblastoma cellular state by Fast Fourier Transformation (FFT) analyses;
[0146] (iv) extracting the molecules having absolute correlation value ≥0.75 with P value <0.05, and Delay ≤3 with the activity pattern of high-grade (Grade-IV) glioblastoma cellular state, and selecting for perturbation analyses;
[0147] (v) perturbing the molecules by constitutively up-regulating molecules which have negative correlation with high-grade (Grade-IV) glioblastoma cellular state in the new perturbation analysis;
[0148] (vi) perturbing the molecules by constitutively down-regulating the molecules which have negative correlation with high-grade (Grade-IV) glioblastoma cellular state in the new perturbation analysis;
[0149] (vii) assessing the changes of frequencies of the cellular states representing the low-grade and high-grade glioblastoma cellular states in the perturbation studies with the previous simulation.
[0150] (viii) placing the perturbed molecules at the top of the list on the basis of its ability to maximum suppression of low-grade and high-grade glioblastoma cellular states in the perturbation study. The molecules held rank at the top of the list are considered as potential targets for target-based, personalized glioblastoma therapy.
[0151] The similar strategies from steps (i) to (viii) may be utilized for identifying drug-targets for LGG patients.
RNA-Seq Data Analyses and Differential mRNA Expression Study
[0152] The raw HTSeq counts data are further processed and analyzed for differential gene expression (DEG) analysis to find out the transcripts, which are significantly expressed in the LGG and GBM tumor cells in comparison to the normal, solid tumor cells.
Drug Targets Screening and Ranking
[0153] The molecules, which are significantly positive and negative inducer of high-grade (Grade-IV) cellular state, are extracted through FFT analyses performed on the time course activity sequences of all the intermediate proteins of Notch signaling network. The correlation and delay of the trajectories of the temporal expression dynamics of all the molecules are measured with respect to the reference trajectory of Grade-IV cellular state observed in the STG of “Grade-IV GBM model” simulation study.
[0154] In another embodiment, activity ratio scores are provided, indicating proteins selected from the group comprising EP300, RBPJ, APH1A, NCSTN, PSENEN, SNW1, HAT1, and MAML1 are highly expressed (i.e. having maximum AR scores) in all the cyclic attractor states. These said proteins are the core components of Notch pathway, which maintain the balance between self-renewal and differentiations of aNSCs by helping transcriptions of the target genes (HES 1-7/HEY 1, 2, L).
[0155] In yet another embodiment, protein molecules showing maximum delay ≤3 and correlation ≥0.6 with the target signal are selected to screen the suitable proteins as drug targets (Table 5).
[0156] In another embodiment, the application of the Boolean network model is improvised to simulate the human neural/glioma stem cell development, self-renewal, differentiation, and apoptosis processes.
[0157] It considers numerous combinations of binary input states (i.e., up or down-regulation) of the cell signaling molecules of Notch and its cross-talks pathways to observe the possible phenotypic conditions of the neural and glioma stem cells at the steady-states. Although Boolean models were previously used for simulating the biological systems, its implication in the prediction of the risk of developing histopathological grades of glioma is newly proposed in the embodiments of this disclosure. Additionally, the use of Fast Fourier Transformation (FFT) on the state transition dynamics of the tumorigenesis models to predict the possible vulnerable points in the network structure is newly introduced in embodiments herein, which finally helped to identify the important drug-targets in the glioblastoma tumor.
EXAMPLES
[0158] The following examples are given by way of illustration only, such that the person having ordinary skill in the art would recognize that the examples are not to be construed to limit the scope of this disclosure or the claims in any manner.
Example 1
Construction of the Logical Model
[0159] Notch signaling pathway and its cross-talk reactions with other pathway molecules were considered to construct a logical dynamic model to understand the neurogenesis of adult neural stem cells (aNSCs) in the SVZ of the human brain. Preliminary data for constructing the core pathway model was taken from the logical dynamic model of the Notch pathway by Chowdhury & Sarkar (2013) Clin Exp Pharmacol 3(137):2161-1459. Based on this pathway data, the entire model was then modified and restructured to simulate developmental dynamics of aNSCs and mutated Glioblastoma stem cells (GSCs). The overall reaction mechanisms of this pathway are translated into logical equations using universal logic gates AND, OR and NOT. It was observed that to reach a particular cellular phenotype, a specific set of molecules in the signaling cascade get activated, which forms a functional module or a sub-network and helps to express a class of marker proteins responsible for a specific cell type. Eq. 1-2 were considered while constructing the dynamic Boolean model of Notch along with its cross talk reactions. The initial states (1/ON or 0/OFF) of the input nodes were chosen randomly by generating a random number for each input node from uniform distribution (Unif(0,1)) in the range of 0 to 1 and setting the cut-off at 0.5 (Eq. 2).
Where, k.sub.i=Total number of Input Nodes
[0160] The pathway consists of 117 molecules out of which 53 were input molecules that refer to molecules that do not possess any upstream regulators and during the signaling event they can be at either up-regulated (ON) or down-regulated (OFF) state (Eq. 1). Hence, the possible highest numbers of expression patterns of these input molecules are 2.sup.53. In presence of such enormous possibilities, mainly occurring due to variations in expressions of several extrinsic and intrinsic molecules, the adult and inactive NSCs in the neurogenic niche of SVZ have to make a right decision to opt for any of the cellular phenotypes either by proliferation and differentiation during the developmental process or also choose to stop its cell division or undergo apoptosis.
[0161] To capture such huge possibilities at equilibrium, a robust simulation technique to stimulate the developmental dynamics of aNSCs with all input states was required. However, logical dynamic simulations by considering all input conditions and finding all possible attractors at equilibrium state are not feasible in terms of required computational cost. Hence, to reduce the computational cost, a random fraction, non-redundant initial input sequences were generated by using uniform random number distribution and subsequently assigned binary states ON (1) or OFF (0) to each input node of the reconstructed model (Eq. 2). The entire random input sequences (10.sup.6) are then further divided into 100 separate simulation batches (10,000 random sequences) and simulations are performed for all the 100 batches.
Example 2
Marker Proteins of aNSC, GSC and GBM Models
[0162] The marker proteins whose expressions are analyzed to observe dynamics of different cellular states or phenotypes in the aNSC, GSC and GBM models are enlisted in Table 1. Oscillatory behaviors of the state transitions of marker proteins were observed in the corresponding cyclic attractor states (e.g. NSC Differentiation), whereas steady state saturation (high or low) was observed for marker proteins in fixed-point attractor states (
TABLE-US-00001 TABLE 1 Marker proteins mapped with different phenotypes Cellular Phenotypes Marker Proteins Apoptosis Pro-apoptotic: PUMA, NOX, BAD, BAX Anti-apoptotic: FLIP, IAP, BCL2 Neural Stem Cells Renewal CYCLIN-D3, CYCLIN-D1, CDK, HES1, HES5 Neural Progenitor Cells NESTIN, NEUROD, β-TUBULIN-III Astrocyte Progenitor Cells GFAP Glioblastoma Stem Cells CYCLIN-D3, CYCLIN-D1, CDK, HES1, HES5, FLIP, IAP, BCL2 Glioblastoma tumor Cells CYCLIN-D3, CYCLIN-D1, CDK, C-MYC, TENASCIN-C, GFAP
[0163] The expressions of these marker proteins are denoted in the model simulation by discrete binary states 0 or 1. The occurrences of a particular phenotypic state in the model are mapped with a Boolean function of the corresponding marker proteins of that phenotype and therefore its expression values will be varied in the discrete binary domain of {0,1}.
[0164] ƒ: M.fwdarw.P
[0165] ƒ:=Boolean Function
[0166] P:=Phenotypes ∈{p.sub.1, p.sub.2, p.sub.3, . . . }∈{0,1}
[0167] M:=Marker Proteins ∈{m.sub.1, m.sub.2, m.sub.3 . . . }∈{0,1}
Example 3
Determining Phenotypes and Cellular States
[0168] A total of five phenotypic functions were considered in the developed dynamic model, which are dependent on expression of specific marker proteins of normal and tumorigenic brain cells. These phenotypes are i) Apoptosis, ii) NSC Renewal, iii) NPC Differentiation, iv) ASPC Differentiation, and v) GBM Development. Depending on the distribution of input signals different marker proteins were expressed with different distribution, which in turn regulate expressions of different phenotype in the binary domain {1,0}. Theoretically, it can be considered that in total 32 single and combinations of phenotypes are possible to occur with equal probability in the attractor distribution space. The probability to reach any of the phenotypes would be unbiased if the logical model is purely random. However, in reality, the topology of the constructed logical model is not completely random and unbiased in nature as logical rules defined for Notch signaling network is taken from experimental evidences.
[0169] The normal functioning of Notch signaling network is specifically oriented towards neural stem cell renewal and its differentiation into neural progenitor cells. It was also observed that at the time of Notch pathway simulation, only a few phenotypic states occur in attractor space and those states are denoted as cellular states. A cellular state can be an individual phenotype or a combination of multiple phenotypes. The following attributes were considered during model simulation to define two typical cellular states: Quiescent cells and GSC Renewal. [0170] Cellular States:=Quiescent(t)=1 iff ∀ Phenotypes: P.sub.i(t)∈{0}; where, i=1, 2, . . . , 5 GSC Renewal(t)=NSC Renewal(t) and not Apoptosis(t)
[0171] Quiescent state is a distinct cellular state found at ON state only when all phenotypes are at OFF state. In this state all other cellular activities such as proliferation, differentiation and apoptosis are found at the dormant stage or at a lower rate. GSC renewal will be at ON state if the phenotype Apoptosis is at OFF state and NSC Renewal is at ON state. It was also observed that based on steady state distributions and temporal expressions patterns (1 or 0) of intermediate and target molecules of Notch pathway, these cellular states were found in either “Fixed-point” or “Periodic” attractors space.
Example 4
Calculation of Normalized Frequencies of Cellular States, Shannon Entropy and Activity Ratio (AR) Scores
[0172] To reduce the computational cost and time to find out all possible attractor states, a simulation based search algorithm was used to find the maximum number of attractors of the present logical model. The basic criterion of this algorithm was to run the simulation by taking randomly non-redundant, finite set of initial states (N=10.sup.6) from the overall population (9.0071993×10.sup.15) of input states. The random initial states were further divided into 100 separate batches (i.e. each batch will contain 10,000 random input sequences) and frequencies of all attractor states (cellular states) are calculated for each batch of simulation (
[0173] In an arbitrary batch of simulation (B.sub.i), there exists a set of cellular states C.sub.i={C.sub.i.sup.1, C.sub.i.sup.2, C.sub.i.sup.3, . . . , C.sub.i.sup.m}. Here, ‘i’ is the simulation batch number and ‘m’ is the total number of observed cellular states or the size of maximum information content possible to be observed in the attractor distribution Π(A) generated in the simulation draw B.sub.i.
[0174] If the probability mass functions of all the elements of the sets of cellular states (C.sub.i) drawn from simulation batches were taken into consideration, then according to the “principle of maximum entropy” the probability mass function with highest information entropy will be chosen as the “proper one” for further data analyses. The information entropy of a given batch of simulation can be calculated by calculating the Shannon entropy score of the observed cellular states.
[0175] Shannon entropy H(C.sub.i) of the i.sup.th instance of a simulation containing the distribution of cellular states C.sub.i={C.sub.i.sup.1, C.sub.i.sup.2, C.sub.i.sup.3, . . . , C.sub.i.sup.m} is the negative logarithm of the probability mass function of the observed cellular states in that instance of simulation.
[0176] Theorem: If Π(A.sub.1), Π(A.sub.2), Π(A.sub.3), . . . , Π(A.sub.n) are the attractor distributions generated from initial input states Π(I.sub.1), Π(I.sub.2), Π(I.sub.3), . . . , Π(I.sub.n) and their corresponding normalized frequency distributions of cellular states are Π(C.sub.1), Π(C.sub.2), Π(C.sub.3), . . . , Π(C.sub.n), then the simulation instance containing highest Shannon entropy score Π(C.sub.i) will possess maximum numbers of different types of cellular states C.sub.i={C.sub.i.sup.1, C.sub.i.sup.2, C.sub.i.sup.3, . . . , C.sub.i.sup.m} (Proof not shown here). At the level of maximum Shannon entropy score, there exists a maximum number of distinct cellular states (i.e. max (m)).
[0177] Activity Ratio Score: This scoring technique is a metric to determine the contribution of an arbitrary pathway molecule (X.sub.i) in the cell-signaling network to drive cellular dynamics towards a particular cellular state. This scoring technique is useful for extracting important proteins from the set of input proteins, which help the signaling cascade to reach the specific phenotype. In the pathway simulation, given a set of total random input sequences (N), if S.sub.k is the total number of random input sequences which direct the model simulation towards a particular cellular state (C.sub.i.sup.m), then the activity ratio of any arbitrary input molecule (X.sub.i) is calculated by using the following equations (Eq. 4 and 5).
Let us assume, C:=Set of Cellular States={C.sup.1, C.sup.2, C.sup.3, C.sup.m}
m=Total Number Phenotypes observed
X.sub.i∈X:=Set of Input Molecules={X.sub.1, X.sub.2, X.sub.3, . . . , X.sub.N1}
N1=Total Input proteins
Y:=Set of Input sequences={Y.sub.1, Y.sub.2, Y.sub.3, . . . , Y.sub.|S.sub.
Where, |S.sub.k|=Total number of input sequences reaching to a particular cellular states out of N random sequences i.e. |S.sub.k|.Math.N
[0178] Here, the proportion (γ).sub.X.sub.
Example 5
Calculation of Phenotype Cost Function and Predictor Scores
[0179] The emergence route of intra-tumor heterogeneity and the sub-clones of GBM tumor cells from a common origin of mutated cells were analyzed using tumor cell evolution phylogeny. If the overall expressions of all the pathway molecules at t.sup.th time are considered as a ‘state’ of the cell Z.sub.t={M.sub.t.sup.1, M.sub.t.sup.2, M.sub.t.sup.2, . . . M.sub.t.sup.m:m=Total molecules}, then the entire developmental routes through which different states {Z.sub.t, Z.sub.t+1, Z.sub.t+2, . . . Z.sub.t+T:T=Total simulation time} reach at the attractor states (singleton or periodic) are called as ‘State Transition Graph’(STG). The nodes of a STG are the states and edges that are directed which defines transition of one state to another state at every Boolean update. Hence, STG was able to depict the expression dynamics of all pathway components and the transformation of cells towards different cell types, starting from a common origin of tumor initiating cells. It is observed that for reaching at a particular attractor state, STG has to pass through few transition/intermediate states or nodes {S.sub.t, S.sub.t+1, S.sub.t+2, . . . } and then land into either fixed-point node {S.sub.t+s} or cyclic attractor nodes{L.sub.t+1, L.sub.t+2, L.sub.t+3, . . . L.sub.t+l}. It is considered that the transition from one state to another state of a cell is also associated with a signaling cost function (i.e. for chemical reactions/any physical processes) and it affects the overall state transition rate. Hence, it was assumed that the probability of a particular attractor/cellular state (C.sub.i) in equilibrium state starting from initial states depends on the overall distance and the total number of molecular transitions to reach the particular state. Hence, to quantify the average signaling cost functions (Φ.sub.C.sub.
Where, s=Total number of transition states require to reach at the steady state;
>1 for Periodic attractor (=Periodicities of the cyclic attractor)
μ=Total mutational costs of the entire model
d.sub.S.sub.
P.sub.C.sub.
[0180] Calculation of “Hamming distance” between two successive nodes in STG defines temporal changes or evolution of expression dynamics of pathway molecules in successive time points. On assuming that S.sub.i={X.sup.1.sub.t, X.sup.2.sub.t, X.sup.3.sub.t . . . , X.sup.N.sub.t}& S.sub.t+1={X.sup.1.sub.t+1, X.sup.2.sub.t+1, X.sup.3.sub.t+1 . . . , X.sup.N.sub.t+1} are the expression vectors referring to arbitrary successive states in STG. The expression of elements (X.sup.k.sub.j) lies in binary domain i.e. {0, 1}. The Hamming distance (|d.sub.S.sub.
[0181] It is also assumed that, apart from the signaling cost a mutated model in which few molecules are constitutively activated/suppressed will also induce mutational cost (Ω) in the cells at the time of its transitions. The mutational cost function for a specific model is dependent on the total number of induced mutations (μ) and the change in total number of added and omitted cellular states in the mutated model with respect to the non-mutated model (μ=0).
[0182] Considering the non-mutated model, there are a total maximum X number of cellular states (C.sup.n:C.sub.1.sup.n, C.sub.2.sup.n, C.sub.3.sup.n, . . . C.sub.X.sup.n) and in the mutated model there are total maximum Y number of Cellular states (C.sup.m:C.sub.1.sup.m, C.sub.2.sup.m, C.sub.3.sup.m, . . . C.sub.X.sup.m). Suppose in the mutated model there are total μ mutations induced. Hence, the mutational cost (Ω) is defined as follows (Eq. 9).
μ=Total induced mutations in the mutated model
|C.sup.n∪C.sup.m|=Total number of cellular states
|C.sup.n−(C.sup.n∩C.sup.m)|=Total number of cellular states which are not appeared after inducing mutation
|C.sup.m−(C.sup.n∩C.sup.m)|=Total number of cellular states which are newly appeared after inducing mutation
[0183] The phenotype cost function (Ψ.sub.C.sub.
Ψ.sub.C.sub.
[0184] Phenotype Predictor Score: The predictor score[Θ(C.sub.i)] of a particular cellular state observed in the simulation is defined as the ratio of the total number of observed occurrences (P.sub.C.sub.
Where, E(P.sub.C.sub.
Example 6
Maximum Number of Attractor States Using Shannon Entropy in aNSC Model
[0185] In the aNSC model simulation, expected normalized frequencies and distributions of cellular states under the steady state situation of each simulation batch were found to be highly uncertain. By considering all distinct cellular states appearing in each simulation batch as “events,” Shannon entropy scores are calculated for every batch. The maximum Shannon entropy score, 1.456 Shannon was subsequently opted for further analyses. Selection of the simulation batch with highest information entropy asserted that the probability of obtaining maximum number of attractor or cellular states with highest normalized frequency is maximum, indicating that 10,000 random initial states used in each batch are sufficient enough to explore maximum number possible attractor states of the model.
Example 7
aNSCs are Biased Towards Development of Quiescent, Neural Progenitor and Apoptotic States
[0186] Analysis of simulation results (
[0187] Complete differentiation of aNSCs into matured neurons (i.e. NPC Differentiation state) was also observed in this simulation within the singleton attractor state. The normalized frequency (Mean 0.125±0.003 S.D.) of this cellular state observed in the simulation outcomes indicated the natural bias of aNSCs to differentiation of a matured neuron. In contrast, normalized frequency of normal ASPC differentiation state, obtained within periodic attractor space, was observed to be very less (Mean 0.00023±0.00014 S.D.) and was designated as a non-biased event during neurogenesis (
[0188] It was also observed that all the periodic cellular states mapped with the self-renewal process of aNSCs followed by apoptosis, NPC differentiation or both (viz. NSC Renewal/Apoptosis, NSC Renewal/NPC Differentiation, NSC Renewal/NPC Differentiation/Apoptosis) have similar normalized frequency values (
Example 8
Activity Ratio Scores for Extracting Driver Proteins at Different Cellular States
[0189] To identify the driver genes/proteins of a particular cellular state, input molecules having optimum “Activity Ratio” score (−1.44≤AR Score≤+1.44) were extracted in each cellular state through comparative analysis (
Example 9
Phenotype Predictor Scores to Predict the Appearances of Cellular States
[0190] A higher value of phenotype predictor score of an arbitrary cellular state signifies the greater chance of that cellular state to appear within the attractor space. The mean values with 95% CI of all the cellular states observed in aNSC, GSC, general GBM and Grade-IV models are provided in Table 2, which follow Cauchy-like distribution (
TABLE-US-00002 TABLE 2 Mean values with 95% Confidence Interval of the phenotype predictor scores of cellular states Models aNSC GSC General GBM Grade-IV GBM Observed Cellular Mean Mean Mean Mean States or Phenotypes [95% CI] [95% CI] [95% CI] [95% CI] Quiescent 172.813 334.160 301.827 147.688 [171.596 [333.246 [300.945 [146.630 174.032] 335.074] 302.709] 148.747] Apoptosis 286.581 NA NA NA [284.957 288.206] NPC 55.326 107.264 96.677 40.372 Differentiation [54.496 [106.388 [95.862 [39.804 56.156] 108.140] 97.492] 40.940] GSC Renewal 0.100 0.193 NA NA [0.078 [0.163 0.124] 0.224] NSC Renewal/ 0.123 NA NA NA NPC Differentiation [0.095 0.150] GSC Renewal/ NA 0.226 NA NA NPC Differentiation [0.194 0.259] NSC Renewal/ 0.118 NA NA NA Apoptosis [0.090 0.147] NSC Renewal/ 0.124 NA NA NA NPC Differentiation/ [0.098 Apoptosis 0.151] GSC Renewal/ NA NA 0.072 1.063 ASPC Differentiation/ [0.053 [0.984 NPC Differentiation 0.0917] 1.141] ASPC Differentiation 0.054 0.098 23.655 159.059 (LGG-I) [0.040 [0.073 [23.150 [158.118 0.069] 0.123] 24.160] 160.000] GSC Renewal/ NA NA 1.265 4.282 ASPC Differentiation [1.161 [4.096 (LGG-II) 1.370] 4.470] GSC Renewal/ NA NA 0.970 29.414 ASPC Differentiation/ [0.873 [28.875 GBM Development 1.066] 29.955] (Grade-IV)
[0191] It was observed that the mean phenotype predictor score (mean 147.688, 95% CI [146.630 148.747]) of the “Quiescent” cellular state is lowest in the highly mutated Grade-IV GBM as compared to the other models. These results were also in accordance with
Example 10
Case Study Using TCGA-LGG & TCGA-GBM RNASeq Samples Data
(i) Selection of Patient Cohorts and Preparation of RNASeq Sample Data Sets
[0192] Two patient cohorts consisting of low-grade (TCGA-LGG) and high-grade Glioblastoma (TCGA-GBM) from “The Cancer Genome Atlas (TCGA)” research networks were selected. HTSeq raw counts files (RNASeq experiment) of Glioblastoma patients with TP53 mutation were selected to determine mRNA expression profiles of these two patient cohorts. The following is the statistics of the two cohorts from which the RNASeq raw count data was downloaded on 27th Apr., 2017 (Table 3).
TABLE-US-00003 TABLE 3 Statistics of the TCGA Glioblastoma patient cohorts # of Samples (RNA-Seq Raw Counts # # # Normal Primary Recurrent # of Cases (Patients) Tumor Tumor Tumor Cohorts M F U Total Samples Samples Samples Total TCGA-LGG 282 228 1 511 0 513 16 529 (General) TCGA-LGG 141 100 1 242 0 240 15 255 (TP53 Mutated) TCGA-GBM 366 230 21 617 5 13 156 174 (General) TCGA-GBM 32 21 0 53 0 57 0 57 (TP53 Mutated) M = Male, F = Female, U = Undefined/Not mentioned
[0193] The TCGA-Low Grade Glioblastoma (TCGA-LGG) cohort contains tumor samples from Grade-II and Grade-III glioblastoma patients, whereas the TCGA-GBM cohort contains samples from Grade-IV tumor patients. The RNASeq raw counts data (available as TXT files) for each patient is available in the open source GDC Data portal of National Cancer Institute (https://portal.gdc.cancer.gov/). Further information about the workflows related to sample collection; mRNA sequencing and data processing, read alignments, and mRNA quantification etc. are available in GDC Data User's Guide (https://docs.gdc.cancer.gov/Data/PDF/Data_UG.pdf).
(ii) Differential Expression Analyses of mRNA Molecules
[0194] The RNA SEQ raw counts data files corresponding to the patient cohort, which have TP53 mutation, are extracted from TCGA-LGG (General) and TCGA-GBM (General) patient cohorts. Differential expression analyses are performed by forming the contrasts between primary (TP53 mutated) (i) LGG (total samples=240) and (ii) GBM (total samples=57) tumor samples versus solid normal tumor (total samples=5) samples using “edgeR” statistical package. The mRNA molecules of following proteins from the set of input proteins (not provided herein) are found to be significantly expressed (up or down) in differential expression analyses (Table 4). The mRNA expression patterns observed in this analysis were considered as transcriptomics profiles of two different patient cohorts.
TABLE-US-00004 TABLE 4 Differentiation expression of the mRNA molecules in two different patient cohorts TCGA-GBM (TP53 Mutation) vs. TCGA-LGG (TP53 Mutation) vs. Normal solid tumor Normal solid tumor Up Regulation Down Regulation Up Regulation Down Regulation APH1, DLL3, FRINGE, CNTN1, DVL, DTX1, DLL1, DLL3, FBW7, JAG2, NOV GASE, HAT, HDAC, FBW7, JAG2, JIP1 FRINGE, JAG1, MAGP1, NEDD4, MAGP2, NEDD4, POFUT1 POFUT1, SAP30
(iii) Logical Simulations Using TCGA-LGG and TCGA-GBM Transcriptomics Data
[0195] The transcriptomics profiles of the input protein molecules observed in TCGA-LGG and TCGA-GBM patient cohorts were taken as inputs for further simulations of aNSC and general GBM model simulations.
(iv) Analysis of TCGA-LGG Patient Cohort
[0196] In the TCGA-LGG patient cohort there were total 5 and 3 protein molecules found to be over and under expressed, respectively (Table 2). Mutation or down regulation of P53 protein is also considered. Hence, out of total 53 input molecules of the master aNSC model, there were a total of 9 protein molecules kept constitutively expressed at ON or OFF state and eliminated from the input list for further randomization in the new simulation. Hence, the total mutations introduced in TCGA-LGG transcriptomics data on master aNSC model is 9 and rest 44 input proteins were randomized 10000 times in 10 separate batches.
[0197] The mean normalized frequency values of each cellular state were calculated from these 10 independent simulation batches, which were further used for checking the goodness-of-fit with normalized frequency values observed for cellular states in the master aNSC model. Chi-square goodness of fit test is used to compare the normalized frequency distributions of observed normalized frequencies with expected normalized frequencies observed in the master aNSC model. This statistical test shows significant differences between the expected and observed data (P-value <0.001), which in turn proves that the transcriptomics profile extracted from the TCGA-LGG cohort do not indicate the development of normal neurogenesis of aNSCs (
[0198] Due to the imposed mutations such as P53 down-regulation, DLL1, DLL3 up-regulation in the new model, cellular states “Apoptosis” and “NSC/NPC/Apoptosis” were not found in the attractor space. On the other hand, the normalized frequencies of the cellular states “ASPC differentiation” and “GSC renewal” were found in higher numbers in the TCGA-LGG transcriptomics data model of aNSC. Hence, it proves that the neural stem cells having transcriptomics profile of TCGA-LGG cohort were more inclined to the development of glioblastoma stem-like (GSC) and mutated astrocytes or tumor cells (ASPC).
[0199] The comparative statistics of the normalized frequency distributions of each cellular state observed in these two simulations is represented in
[0200] The mean values of the phenotype predictor scores of the tumorigenic cellular states viz. LGG-I (mean 19.00, 95% CI [18.60 19.41]), LGG-II (mean 1.10, 95% CI [0.98 1.22]), and Grade-IV (mean 0.83, 95% CI [0.73 0.93]) observed in this new simulation are similar with the values observed in master general GBM model (Table 2). The comparative statistics of the normalized frequency distributions of each cellular state observed in these two simulations is represented in
(v) Analyses of TCGA-GBM Patient Cohort
[0201] Similar to the analyses of TCGA-LGG tumor samples cohort, the transcriptomics profile (Table 2) observed for the TCGA-GBM sample cohort is also considered at first as inputs in the master aNSC and general GBM models. Total 12 and 6 proteins are found to be up and down regulated, respectively, in the differential expression analyses in the tumor sample cohort with identified TP53 mutation. Therefore, in total 19 mutations (μ=12+6+1) of the input proteins are considered in the input in the master aNSC model and a new model using the TCGA-GBM transcriptomics data on master aNSC model is developed. Here, the extra 1 mutation is added for the P53 mutation. These 19 input proteins are kept constitutively at ON or OFF state as per the differentiation expression results (Table 2) and the rest 34 input proteins out of 53 are further randomized 10000 times in 10 separate simulation batches. Similar Chi-square goodness-of-fit test is performed here to assess the similarities of the normalized frequency values of each cellular state observed in master aNSC and new model simulations. Simulation results and the comparative statistics of the normalized frequency distributions between these two models are depicted in
[0202] On the other hand, another set of simulation is performed to examine the effects of the differentially expressed transcripts of the TCGA-GBM tumor samples cohort (Table 2) on the master general GBM model. Here, the master general GBM model contains 4 mutations (TP53, JAK2, STAT3, and RBPJ) and in the TCGA-GBM cohort another 18 proteins are found to be differentially expressed. Therefore, in total there are 22 mutations (=18+4) will be added in the new simulation. These 22 proteins are kept constitutively expressed (either up or down) based on the transcriptomics profile generated from differential expression analyses (Table 2) and the rest 31 out of 53 input proteins are randomized 10000 times in 10 separate batches. Similar Chi-square goodness-of-fit test is also performed to study the similarities of the normalized frequency values of each cellular state observed in master general GBM model and new model simulations. A comparative statistic of the cellular states observed in this new model as compared to the master general GBM model is represented in
Example 11
Methodology for Drug Target Screening
[0203] The simulation outcome of “Grade-IV GBM model” simulation shows significant increase in the number of Grade-IV tumor cells in the attractor space. The cellular state involved is GSC Renewal/ASPC Differentiation/GBM Development (
[0204] The intermediate molecules which are mutually interconnected and highly correlated with the temporal activity pattern observed for Grade-IV cellular state in the “Grade-IV GBM” model simulation study are required to be extracted. The correlation and delay between a pair of time course data could be calculated by using Fast Fourier Transform (FFT) analysis. Hence, the delay and pair-wise correlation between the activity patterns of Grade-IV tumorigenic cellular state (i.e. target signal) and the time-course logical expressions data of all the intermediated molecules (i.e. Query signal) are measured by the following method.
[0205] Considering that C.sup.Grade-IV={c.sub.1, c.sub.2, c.sub.3, . . . , c.sub.T} is the time series activity (ON or 1 and OFF or 0) profile of the Grade-IV cellular state observed in the STG of “Grade-IV GBM model” simulation at the discrete time points t=1, 2, 3, . . . , T. This time-course data of Grade-IV cellular state is considered as the “Target” signal. Similarly, let us consider that the time-course logical expression (ON/1, OFF/0) profile of any arbitrary molecule X.sub.i={x.sub.1, x.sub.2, x.sub.3, . . . , X.sub.T}, which is considered as “Query” signal. Both the temporal signals (S) are decomposed into cyclic patterns (i.e. frequency domain) with each frequency n=1, 2, 3, . . . , T−1 by following FFT analyses as shown in Eq. 13 & Eq. 14.
[0206] Amplitude of the cycle with frequency n=0 is neglected. The amplitudes and phase angles of the cycles with higher frequencies (n>0) are calculated for both the signals. The frequency of the cycle (n) for which the amplitude is found at maximum magnitude is at first identified. After that phase angles of the cycles from both the target and query signals (ϕ.sub.C.sup.n,ϕ.sub.X.sup.n) are calculated at that frequency (n) and the difference or delay Δ.sup.n.sub.CX=ϕ.sub.C.sup.n−ϕ.sub.X.sup.n between the two signals is measured. The delay between two signals δ.sub.CX is further calculated in the range of
by using the following Eq. 15.
[0207] Lagged Pearson correlation coefficient is also calculated for measuring the strength and association between the two signals or trajectories. In this work the delay and correlation between Grade-IV cellular state and for each pathway molecules are measured pair-wise. There are total six probable outcomes, which are found while comparing all such pairs of trajectories (i.e. Target vs. Query) using this approach. These entire mathematical calculations are done in “DynOmics” package developed in the statistical package “R”.
[0208] (i) The signals are positively correlated (correlation >0) with no delay (Delay=0). In this case both target and query signals are superimposed on each other and the phase (or direction) of the signals is same (i.e. positively correlated). Hence, it is considered that the molecular expression pattern (i.e. query signal) is positively influencing the activity pattern or dynamics of Grade-IV tumor state and there is no delay between the time-course profiles. Therefore, the molecule is a strong positive inducer or activator of Grade-IV tumor cells, which means when the molecule is at ON (i.e. up-regulated/active) state, the activity of Grade-IV cellular state is also High (or ON).
[0209] (ii) The signals are positively correlated (correlation >0) with negative delay (Delay <0). In this case the phase (or direction) of both the signals is same (i.e. positively correlated), but the initiation of the target signal at initial time point is delayed with respect to the query signal. Hence, it is considered that the molecular expression pattern (i.e. query signal) is positively influencing the activity pattern or dynamics of Grade-IV tumor state, but there is a lag of the Grade-IV time-course activity profile with the respect to that positively influencing molecule. Therefore, the molecule is a positive inducer or activator of Grade-IV tumor cells, which means activation of these molecules, will lead to the higher expression of Grade-IV tumor state.
[0210] (iii) The signals are negatively correlated (correlation <0) with positive delay (Delay >0). In this case the phases of the signals are opposite (i.e. negatively correlated) and the initiation of the target signal is ahead of the query signal at initial time point. Hence, it is considered that the molecular expression pattern (i.e. query signal) is negatively influencing the activity pattern or dynamics of Grade-IV tumor state, but the Grade-IV time-course activity profile is running ahead with the respect to that negatively influencing molecule. Therefore, the molecule is a negative inducer or activator of Grade-IV tumor cells, which means which means when the molecule is at ON (i.e. up-regulated/active) state, the activity of Grade-IV cellular state is Low (or OFF).
[0211] (iv) The signals are negatively correlated (correlation <0) with no delay (Delay=0). In this case the phases of the signals are opposite (i.e. negatively correlated), but there is no delay at the initiation of the signals at initial time point. Hence, it is considered that the molecular expression pattern (i.e. query signal) is negatively influencing the activity pattern or dynamics of Grade-IV tumor state, but there is no lag between the trajectories of these two signals. Therefore, the molecule is a strong negative inducer or inhibitor of Grade-IV tumor cells which means activations of these molecules, will inhibit the expression or activation of Grade-IV tumorigenic state.
[0212] (v) The signals are negatively correlated (correlation <0) with negatively delay (Delay <0). In this case the phases of the signals are opposite (i.e. negatively correlated), but the initiation of the target signal at initial time point is delayed with respect to the query signal. Hence, it is considered that the molecular expression pattern (i.e. query signal) is negatively influencing the activity pattern or dynamics of Grade-IV tumor state, but there is a lag of the Grade-IV time-course activity profile with the respect to that negatively influencing molecule. Therefore, the molecule is a negative inducer or inhibitor of Grade-IV tumor cells which means activations of these molecules, will inhibit the expression or activation of Grade-IV tumorigenic state.
[0213] (vi) The signals are positively correlated (correlation >0) with positive delay (Delay >0).
[0214] In this case the phase (or direction) of both the signals is same (i.e. positively correlated), and the initiation of the target signal is ahead of the query signal at initial time point. Hence, it is considered that the molecular expression pattern (i.e. query signal) is positively influencing the activity pattern or dynamics of Grade-IV tumor state, but the Grade-IV time-course activity profile is running ahead with respect to negatively influencing molecule. Therefore, the molecule is a positive inducer or activator of Grade-IV tumor cells which means activations of these molecules, will lead to higher expression of Grade-IV tumor state. The pathway molecules, which show significant positive correlation (correlation value ≥0.6) and Delay ≤3 with the activity trajectory of Grade-IV cellular state, are listed in Table 5.
TABLE-US-00005 TABLE 5 Delay difference and significant correlation observed between Grade-IV trajectory and pathway molecules Original Query Signal Signal Delay ≤3 Correlation >=0.6 P-Value Grade-IV GFAP 1 1 0 Grade-IV HIF1A* 1 1 0 Grade-IV NUC_STAT3* 2 1 0 Grade-IV MASH1* 3 −1 0 Grade-IV NESTIN 3 −1 0 Grade-IV NEUROD 3 −1 0 Grade-IV PTEN* 3 −1 0 Grade-IV STAT3_P* 3 1 0 Grade-IV BAD 0 −0.7698004 0.005588 Grade-IV AKT* 1 0.7637626 0.010131 Grade-IV NGN1* 2 −0.7559289 0.018452 Grade-IV PI3K* 2 0.7559289 0.018452 Grade-IV HES1 0 0.6236096 0.040347 Grade-IV HES5 0 0.6236096 0.040347 *Proteins selected for drug target screening as they do not belong to the class of marker proteins responsible for defining different cellular states.
[0215] Perturbation study: The molecules listed in Table 5 showing positive correlation with the activity profile of Grade-IV cellular state are suppressed (i.e. targeted) by freezing the expression states as down-regulated (or OFF) state in the “Grade-IV GBM” model. On the other hand, the molecules, which are negatively correlated, are kept up-regulated (i.e. ON) in the perturbation study. The outcomes of targeting these identified proteins (individually or combinations) in terms of normalized frequency distributions are shown in
[0216] In another aspect, transcriptomics data are provided as inputs, a novel predictive model is developed to predict the future risk of Glioblastoma tumor of an individual with appropriate grade by using the transcriptomics profile of that individual as input. Also, a novel technique is introduced to screen and rank the potential drug-targets for suppressing the growth and differentiation of tumor cells.
[0217] Thus, methods according to embodiments of this disclosure may be deployed for personalized, glioblastoma related pathological studies, such as risk prediction, tumor grade detection, biomarker identification, and ranking of most potential drug targets by using transcriptomics/proteomics data of glioma or suspected patients, thus identifying identify the combinations of most effective and drug-targetable signaling proteins for the personalized and target-based anti-glioma therapy. The present methods helps to screen and identify potential molecules for target-based anti-glioma therapeutics. Identifying driver proteins in Notch signaling by activity ratio helps to establish the role of Notch pathway as the guardian behind the maintenance of stem-like properties of adult NSCs and the suppressor of neuronal differentiation process thorough this in-silico approach.
[0218] Using the methods disclosed herein, the fundamental understandings of the underlying molecular processes of Notch and its cross-talks pathways are explored to assess the governing principles working behind the self-renewal, differentiation, apoptosis, and cell growth arrest (i.e. quiescent state) of aNSCs/GSCs or glioma tumor cells. By assessing the variances in the genetic makeup of individual tumor cells, methods according to embodiments herein are competent to dissect underlying mechanisms working on the emergence of different sub-types and sub-clonal heterogeneities of glioma tumor.
[0219] The chances of developing different sub-types of glioma tumor cells from GSCs or astrocytes cells are also quantified, which is further applied as a personalized approach to detect the risk of early onsets of glioma tumor with appropriate grades (or sub-types).