Forecasting bacterial survival-success and adaptive evolution through multiomics stress-response mapping and machine learning
11591634 · 2023-02-28
Assignee
Inventors
Cpc classification
C12Q1/18
CHEMISTRY; METALLURGY
G16B20/20
PHYSICS
Y02A90/10
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G16H50/20
PHYSICS
G06F17/18
PHYSICS
G16H20/10
PHYSICS
G16H50/80
PHYSICS
International classification
G06F17/18
PHYSICS
C12Q1/18
CHEMISTRY; METALLURGY
Abstract
The present disclosure provides a novel integrated entropy-based method that combines genome-wide profiling and network analyses for diagnostic and prognostic applications. The present disclosure further provides the integration of multiomics datasets, network analyses and machine learning that enable predictions on diagnosing infectious diseases and predicting the probability that they will escape treatment/the host immune system and/or become antibiotic resistant. The present disclosure provides a primary gateway towards the development of highly accurate infectious disease prognostics.
Claims
1. An entropy method for predicting a diagnostic or prognostic outcome, comprising: a) generating and analyzing a substantial dataset for a pathogen of interest, b) calculating entropy using time-series RNA-Seq data and dependencies amongst genes of the pathogen of interest, and c) providing a fitness prediction using a simple decision rule, where if entropy is either above or below a threshold, fitness is respectively low or high, wherein the fitness prediction corresponds to the diagnostic or prognostic outcome.
2. The entropy method of claim 1, wherein entropy is transcriptional entropy.
3. The entropy method of claim 2, wherein entropy is calculated on a transcriptomic dataset with multiple timepoints, wherein entropy (H) is as follows:
H=ln(|Σ.sub.ρ|) (1) Where Σ is the empirical covariance matrix (Σ.sub.ij is the empirical covariance of gene.sub.i and gene.sub.j computed from the time series data), and |Σ| denotes the determinant of Σ, and wherein Σ.sub.ρ is a graphical-lasso regularized Σ, where ρ denotes the regularization strength.
4. The entropy method of claim 3, wherein entropy is computed with multiple timepoints, wherein a) a temporal differential expression (DE) data is used to compute a gene-gene empirical covariance matrix Σ, b) Graphical lasso is applied to Σ to obtain a regularized inverse of this covariance matrix (Σρ.sup.−1) , wherein the matrix ρρ.sup.−1 represents a network of dependencies of regulatory interactions of the genes, and c) the inverse of this matrix (Σρ) is used in Equation 1 of claim 3 to compute entropy.
5. The entropy method of claim 2, wherein entropy is calculated for a single-timepoint transcriptome profile, wherein entropy (H.sub.stp) as follows:
H.sub.stp=ln(σ.sup.2) (2) Wherein σ.sup.2 is the variance of the distribution of differential expression across genes for a single timepoint.
6. The entropy method of claim 1, wherein the pathogen is bacterium.
7. The entropy method of claim 6, wherein the bacterium is explored to at least one antibiotic.
8. The entropy method of claim 7, wherein the fitness predictions indicate the antibiotic sensitivity.
9. An entropy-based method for predictions on infection progression in a patient, comprising: a) collecting a sample from the patient, b) directly performing RNA expression analyses on the sample and on the patient's response to a therapeutic agent, simultaneously, c) combining entropy level of the sample and the patient-response to the therapeutic agent, wherein the entropy fitness indicates whether the therapeutic agent works.
10. The entropy-based method of claim 9, wherein the sample is bacterium.
11. The entropy-based method of claim 9, wherein the therapeutic agent is an antibiotic.
12. An entropy-based method for Antibiotic Susceptibility Test (AST), comprising: a) collecting a bacterium from a patient or any bacterium of interest, b) exposing the bacterium to a tested antibiotic in an effective amount, c) collecting RNA from the bacterium that exposed to the tested antibiotic and performing RNA expression analyses, d) performing entropy calculation for survival probability and actual level of antibiotic sensitivity, and e) providing a recommendation on optimal treatment with the tested antibiotics in view of the entropy information.
13. The entropy-based method of claim 12, wherein the entropy calculation is performed on a transcriptomic dataset with multiple timepoints, wherein entropy (H) is as follows:
H=ln(|Σ.sub.ρ|) (1) Where is the empirical covariance matrix (Σ.sub.ij is the empirical covariance of gene.sub.i and gene.sub.j computed from the time series data), and |Σ| denotes the determinant of Σ, and wherein Σ.sub.ρ is a graphical-lasso regularized Σ, where ρ denotes the regularization strength.
14. The entropy-based method of claim 12, wherein entropy is calculated for a single-timepoint transcriptome profile, wherein entropy (H.sub.stp) as follows:
H.sub.stp=ln(σ.sup.2) (2) Wherein σ.sup.2 is the variance of the distribution of differential expression across genes for a single timepoint.
15. A transcriptional entropy method for predicting a diagnostic or prognostic outcome, comprising: d) generating and analyzing a substantial dataset for a pathogen of interest, e) calculating entropy using time-series RNA-Seq data and dependencies amongst genes of the pathogen of interest, and f) providing a fitness prediction using a simple decision rule, where if entropy is either above or below a threshold, fitness is respectively low or high, wherein the fitness prediction corresponds to the diagnostic or prognostic outcome, and wherein entropy is calculated on a transcriptomic dataset with multiple timepoints, wherein entropy (H) is as follows:
H−ln(|Σ.sub.ρ|) (1) Where is the empirical covariance matrix (Σ.sub.ij is the empirical covariance of gene.sub.i and gene.sub.j computed from the time series data), and |Σ| denotes the determinant of Σ, and wherein Σ.sub.ρ is a graphical-lasso regularized Σ, where ρ denotes the regularization strength.
16. The transcriptional entropy method of claim 15, wherein entropy is computed with multiple timepoints, wherein a) a temporal differential expression (DE) data is used to compute a gene-gene empirical covariance matrix Σ, b) Graphical lasso is applied to Σ to obtain a regularized inverse of this covariance matrix (Σρ.sup.−1) , wherein the matrix Σρ.sup.−1 represents a network of dependencies of regulatory interactions of the genes, and c) the inverse of this matrix (Σρ) is used in Equation 1 of claim 3 to compute entropy.
17. The transcriptional entropy method of claim 15, wherein entropy is calculated for a single-timepoint transcriptome profile, wherein entropy (H.sub.stp) as follows:
H.sub.stp=ln(σ.sup.2) (2) Wherein σ.sup.2 is the variance of the distribution of differential expression across genes for a single timepoint.
18. The transcriptional entropy method of claim 15, wherein the pathogen is bacterium.
19. The transcriptional entropy method of claim 18, wherein the bacterium is explored to at least one antibiotic.
20. The transcriptional entropy method of claim 19, wherein the fitness predictions indicate the antibiotic sensitivity.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION OF THE INVENTION
(14) Many modifications and other embodiments disclosed herein will come to mind to one skilled in the art to which the disclosed compositions and methods pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the disclosures are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. The skilled artisan will recognize many variants and adaptations of the aspects described herein. These variants and adaptations are intended to be included in the teachings of this disclosure and to be encompassed by the claims herein.
(15) Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
(16) As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present disclosure.
(17) Any recited method can be carried out in the order of events recited or in any other order that is logically possible. That is, unless otherwise expressly stated, it is in no way intended that any method or aspect set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not specifically state in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including matters of logic with respect to arrangement of steps or operational flow, plain meaning derived from grammatical organization or punctuation, or the number or type of aspects described in the specification.
(18) All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided herein can be different from the actual publication dates, which can require independent confirmation.
(19) While aspects of the present disclosure can be described and claimed in a particular statutory class, such as the system statutory class, this is for convenience only and one of skill in the art will understand that each aspect of the present disclosure can be described and claimed in any statutory class.
(20) It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosed compositions and methods belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the specification and relevant art and should not be interpreted in an idealized or overly formal sense unless expressly defined herein.
(21) Prior to describing the various aspects of the present disclosure, the following definitions are provided and should be used unless otherwise indicated. Additional terms may be defined elsewhere in the present disclosure.
A. Definitions
(22) As used herein, “comprising” is to be interpreted as specifying the presence of the stated features, integers, steps, or components as referred to, but does not preclude the presence or addition of one or more features, integers, steps, or components, or groups thereof. Moreover, each of the terms “by”, “comprising,” “comprises”, “comprised of,” “including,” “includes,” “included,” “involving,” “involves,” “involved,” and “such as” are used in their open, non-limiting sense and may be used interchangeably. Further, the term “comprising” is intended to include examples and aspects encompassed by the terms “consisting essentially of” and “consisting of.” Similarly, the term “consisting essentially of” is intended to include examples encompassed by the term “consisting of.
(23) As used herein, nomenclature for compounds, including organic compounds, can be given using common names, IUPAC, IUBMB, or CAS recommendations for nomenclature. When one or more stereochemical features are present, Cahn-Ingold-Prelog rules for stereochemistry can be employed to designate stereochemical priority, E/Z specification, and the like. One of skill in the art can readily ascertain the structure of a compound if given a name, either by systemic reduction of the compound structure using naming conventions, or by commercially available software, such as CHEMDRAW™ (Cambridgesoft Corporation, U.S.A.).
(24) As used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a dendrite,” “an antibody,” or “a biosensor,” includes, but is not limited to, two or more such dendrites, antibodies, biosensors, and the like, including a plurality of such dendrites, antibodies, biosensors, and the like.
(25) It should be noted that ratios, concentrations, amounts, and other numerical data can be expressed herein in a range format. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms a further aspect. For example, if the value “about 10” is disclosed, then “10” is also disclosed.
(26) When a range is expressed, a further aspect includes from the one particular value and/or to the other particular value. For example, where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure, e.g. the phrase “x to y” includes the range from ‘x’ to ‘y’ as well as the range greater than ‘x’ and less than ‘y’. The range can also be expressed as an upper limit, e.g. ‘about x, y, z, or less’ and should be interpreted to include the specific ranges of ‘about x’, ‘about y’, and ‘about z’ as well as the ranges of ‘less than x’, less than y′, and ‘less than z’. Likewise, the phrase ‘about x, y, z, or greater’ should be interpreted to include the specific ranges of ‘about x’, ‘about y’, and ‘about z’ as well as the ranges of ‘greater than x’, greater than y′, and ‘greater than z’. In addition, the phrase “about ‘x’ to ‘y’”, where ‘x’ and ‘y’ are numerical values, includes “about ‘x’ to about ‘y’”.
(27) It is to be understood that such a range format is used for convenience and brevity, and thus, should be interpreted in a flexible manner to include not only the numerical values explicitly recited as the limits of the range, but also to include all the individual numerical values or sub-ranges encompassed within that range as if each numerical value and sub-range is explicitly recited. To illustrate, a numerical range of “about 0.1% to 5%” should be interpreted to include not only the explicitly recited values of about 0.1% to about 5%, but also include individual values (e.g., about 1%, about 2%, about 3%, and about 4%) and the sub-ranges (e.g., about 0.5% to about 1.1%; about 5% to about 2.4%; about 0.5% to about 3.2%, and about 0.5% to about 4.4%, and other possible sub-ranges) within the indicated range.
(28) As used herein, the terms “about,” “approximate,” “at or about,” and “substantially” mean that the amount or value in question can be the exact value or a value that provides equivalent results or effects as recited in the claims or taught herein. That is, it is understood that amounts, sizes, formulations, parameters, and other quantities and characteristics are not and need not be exact, but may be approximate and/or larger or smaller, as desired, reflecting tolerances, conversion factors, rounding off, measurement error and the like, and other factors known to those of skill in the art such that equivalent results or effects are obtained. In some circumstances, the value that provides equivalent results or effects cannot be reasonably determined. In such cases, it is generally understood, as used herein, that “about” and “at or about” mean the nominal value indicated ±10% variation unless otherwise indicated or inferred. In general, an amount, size, formulation, parameter or other quantity or characteristic is “about,” “approximate,” or “at or about” whether or not expressly stated to be such. It is understood that where “about,” “approximate,” or “at or about” is used before a quantitative value, the parameter also includes the specific quantitative value itself, unless specifically stated otherwise.
(29) As used herein, the term “effective amount” refers to an amount that is sufficient to achieve the desired modification of a physical property of the composition or material. For example, an “effective amount” of an anbitody refers to an amount that is sufficient to achieve the desired improvement or effect modulated by indicated component, material, compound or protein, e.g. achieving the desired level of binding with an analyte bound by the antibody. The specific level in terms of concentration or amount as an effective amount will depend upon a variety of factors avidity of the antibody, target analyte, desired level of assay sensitivity and the like.
(30) Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its steps or it is not otherwise specifically stated in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including matters of logic with respect to arrangement of steps or operational flow; plain meaning derived from grammatical organization or punctuation; and the number or type of embodiments described in the specification.
(31) While specific elements and steps are discussed in connection to one another, it is understood that any element and/or steps provided herein is contemplated as being combinable with any other elements and/or steps regardless of explicit provision of the same while still being within the scope provided herein.
(32) It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.
(33) Since many possible aspects may be made without departing from the scope thereof, it is to be understood that all matter herein set forth or shown in the accompanying drawings and detailed description is to be interpreted as illustrative and not in a limiting sense.
(34) It is also to be understood that the terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting. The skilled artisan will recognize many variants and adaptations of the aspects described herein. These variants and adaptations are intended to be included in the teachings of this disclosure and to be encompassed by the claims herein.
(35) The present disclosure provides a novel integrated entropy-based approach for diagnostic and prognostic applications. Genes implicated in bacterial stress responses have been used for predicting bacterial growth outcome under antibiotic treatment, with the objective to develop novel diagnostic methods. Current approaches are specific to a species or antibiotic, limiting their application in new settings. It is unclear if methods can be developed that predict bacterial fitness independent of strain, species or type of stress. The present disclosure provides a substantial RNA-Seq and experimental evolution dataset for 9 strains and species, under 16 antibiotic and several non-antibiotic stress conditions, and provides a predictive approach that is generalizable. As a point of comparison to previous work, gene-panel-based methods were first implemented and tested how accurately they can predict antibiotic mechanism of action, as well as the fitness outcome of the bacterial pathogen Streptococcus pneumoniae in the presence of antibiotics or under nutrient depletion. Although these methods do well when evaluated on data obtained for a specific species, factors including limited gene homology restrict their application to other species. In contrast, a novel generalizable method was developed around the observation that global transcriptional disorder seems to be a common stress response in bacteria with low fitness. This disorder was quantified using entropy, which is a specific measure of randomness, and it was found that in low fitness cases increasing entropy and transcriptional disorder results from a loss of regulatory dependencies. Using entropy as a single feature, the present disclosure provides that fitness and antibiotic sensitivity (i.e. minimum inhibitory concentration) predictions can be made that generalize well beyond training data. Furthermore, entropy-based predictions were validated in severn (7) species under antibiotic and non-antibiotic conditions. By demonstrating the feasibility of universal predictions of bacterial fitness, this work establishes the fundamentals for potentially new approaches in infectious disease diagnostics.
(36) In certain embodiments, a major goal of this invention is to determine if there is a quantifiable feature that can accurately predict bacterial fitness in an environment, independent of strain, species or the type of stress. To be generalizable, the selected feature needs to be common across species and environments. By generating a large experimental dataset and analyzing published ones, the present disclosure provides that such a feature exists, namely transcriptomic entropy, which quantifies the level of transcriptional disorder while a bacterium is responding to the environment. It is important to realize that entropy is not simply a measure of large magnitude changes in the transcriptome. Instead, entropy takes into account condition-specific transcriptional dependencies among genes and quantifies the amount of independent changes. The underlying assumption is that gene expression patterns lose underlying dependencies and become more stochastic with increasing amounts of stress. The difference between simple measures of magnitude changes and more controlled measures of entropy is illustrated in
(37) In certain embodiments, the presen disclosure provides current gene-panel based approaches for two reasons: 1) to search for a gene-panel that would capture a general stress-response (if it exists), and thus would represent a set of genes and associated regulatory changes coordinated by the same mechanisms in response to different types of stress. The existence of such a general response has been mostly connected to the manner in which rpoS responds to stress in E. coli and a small number of other species. However, it is largely unclear which genes respond downstream of rpoS, whether this response is accompanied by stress-specific responses, to what extent these transcriptional changes overlap across species and in response to different types of stress.sup.16. Moreover, if such a general stress response exists widely across species, it is unclear whether there is any predictive information to be extracted from it. Importantly, it was unable to identify such a gene-panel within the dataset generated for S. pneumoniae and other species, as well as in the published datasets explored; 2) As a point of comparison for the entropy-based approach. This comparison highlights that an entropy-based approach yields better performance than a gene-panel based approach (Table 8) and has at least 3 additional advantages over existing gene-panel approaches: a) It is independent of specific genes, whereas gene-panels focus entirely on specific genes. This might lead researchers to interpret genes present in a particular panel as those most relevant to the stress response. However, caution should be taken in the interpretation of these gene panels, because it turns out that the genes that appear in these panels are strongly influenced by model parameters (λ) and input data (
(38) By demonstrating the feasibility of predictions of fitness outcomes and antibiotic sensitivity, several possibilities of integrating entropy-based predictions in a clinical diagnostic setting are envisioned. Currently, AST is often performed using culture-based methods. These methods may take days and even weeks for slow-growing species such as M. tuberculosis.sup.42, delaying diagnosis and treatment in clinical settings. Therefore, it is desirable to be able to predict the fitness outcome of such slow-growing species as early as possible, for instance using RNA expression data. Another potential application of the entropy-based fitness predictions is monitoring an active infection in vivo. Performing transcriptome profiling and predicting the fitness of the infectious agent directly in its host environment allow for monitoring of disease progression and determining if and when treatment is necessary. Simultaneously profiling the pathogen and the host using dual RNA-Seq.sup.43,44, and predicting the fitness of both is also valuable in assessing the state and progression of an infection.
(39) Admittedly, direct implementation of RNA-Seq in diagnostic tests might not (yet) be practical, as RNA-Seq experiments still remain relatively expensive, labor-intensive and time-consuming. With the advent of real-time sequencing technologies, such as Nanopore, the speed of data collection may soon be improved significantly. Additionally, a transcriptome can be sub sampled by monitoring conserved genes across species. In this scenario, transcriptional entropy can be obtained via more economical gene expression technologies, such as NanoString nCounter.sup.45 or the Luminex platform.sup.46. The present disclosure provides a novel approach that uses entropy to predicting fitness independently of gene-identity, gene-function, and type of stress. This approach can be applied as a fundamental building block for generalizable predictors of fitness and MICs for Gram-positive and negative species alike, and thereby possibly improve clinical decision-making.
(40) Now having described the aspects of the present disclosure, in general, the following Examples describe some additional aspects of the present disclosure. While aspects of the present disclosure are described in connection with the following examples and the corresponding text and figures, there is no intent to limit aspects of the present disclosure to this description. On the contrary, the intent is to cover all alternatives, modifications, and equivalents included within the spirit and scope of the present disclosure.
B. EXAMPLES
(41) The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, compositions, articles, devices and/or methods claimed herein are made and evaluated, and are intended to be purely exemplary of the disclosure and are not intended to limit the scope of what the inventors regard as their disclosure. Efforts have been made to ensure accuracy with respect to numbers (e.g., amounts, temperature, etc.), but some errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, temperature is in ° C. or is at ambient temperature, and pressure is at or near atmospheric.
(42) The following examples are provided to illustrate embodiments of the present invention but are by no means intended to limit its scope.
(43) The examples described herein will be understood by one of ordinary skill in the art as exemplary protocols. One of ordinary skill in the art will be able to modify the below procedures appropriately and as necessary.
Example 1
(44) Gene-panel based methods can make fitness predictions across different antibiotics but are sensitive to input data and model parameters, and fail to generalize to previously unseen species
(45) Previously, the expression levels of specific genes have been used to predict susceptibility of a specific species under a specific antibiotic stress.sup.8,11,23. In contrast, the goal here is to identify a general predictor of fitness (presence or absence of growth) that does not only work for a specific stress or species, but instead extends to as many previously unseen settings (i.e. species and conditions) as possible. In line with existing approaches, a gene-panel that predicts fitness are generated. This panel, when trained on expression data coming from multiple stress conditions, can predict bacterial fitness for any condition (rather than a specific condition). Importantly, it is also able to assess, for the first time, how sensitive such models are to input data and model parameters. Below it first shows that gene-panel models indeed are highly sensitive to these factors and thereby have limited generalizability. Subsequently, an alternative approach is developed using entropy, that is generalizable, robust, and condition-agnostic (i.e. applicable to many conditions).
(46) To test whether a gene-panel model can be trained that predicts fitness for many different conditions, a large RNA-Seq dataset was generated for the human pathogen Streptococcus pneumoniae. To produce transcriptomic response profiles from multiple stress conditions, S. pneumoniae strains TIGR4 (T4) and Taiwan-19F (19F) were grown in the presence or absence of 1× the minimum inhibitory concentration (MIC) of 16 antibiotics representing 4 mechanisms of action (MOA). These include cell wall synthesis inhibitors (CWSI), DNA synthesis inhibitors (DSI), protein synthesis inhibitors (PSI) and RNA synthesis inhibitors ((RSI);
(47) Transcriptome data was separated into a training set for parameter fitting, and a test set. The test set includes a completely different set of antibiotic conditions, to enable proper evaluation of model performance on previously unseen data (Table 1). A condition-agnostic predictor of fitness was developed by fitting a regression model on the training set, which includes high and low fitness outcomes from 5 antibiotics (representing 4 MOAs), 3 nutrient depletion conditions, and from 3 S. pneumoniae strain backgrounds. Lasso-regularization was used in order to limit the number of features, thereby lowering the risk of overfitting the model (there are over 1500 genes in common for the 3 strains, therefore there are as many potential features that could be used).sup.25. In order to avoid any bias in the selection of features, the regularization strength (λ) was automatically determined using crossvalidation analysis on the training data (
(48) Fitness predictions that rely on the expression of specific genes are potentially influenced by the data used during training.sup.23. A model robust to input data would recover mostly the same features (i.e. genes) when small subsets of input are omitted during parameter fitting. In order to test the sensitivity of the regression model to input data, the same type of regression model was trained on 5 different subsets of the training dataset, each time omitting a different 20% of the data. The features included and their coefficients varied greatly in these experiments (
(49) While a condition-agnostic gene-panel is sensitive to input data and model parameter λ, it remains to be seen whether condition-specific models suffer from the same issue as well. For three MOA's for which the data were generated for multiple antibiotics (CWSI, DSI, and PSI), regularized regression models were trained (Table 4), and the models' sensitivities to input data and λ were evaluated. In all 3 cases, the models change with input and λ, and show no enrichment for specific functional categories (
(50) Besides a lack of functional enrichment, neither the MOA-specific nor the condition-agnostic gene-panels developed here include genes that are known direct-targets of the antibiotics used. Moreover, in addition to being sensitive to input data and regularization strength, the condition-agnostic fitness gene-panel is limited in its applicability to other species, as genes in this panel lack homologs in other Gram-positive as well as Gram-negative species (
Example 2
(51) Building a universal predictor for fitness based on transcriptional responses is challenging because antibiotics trigger distinct responses based on their mechanisms of action
(52) One of the reasons why it may be non-trivial to produce a condition-agnostic model is because the different conditions (i.e. MOA's of different antibiotics) trigger such distinct responses that it is unlikely to identify a common signature among them. To determine whether responses from different antibiotics that fall under the same MOA cluster together, principal component analysis (PCA) was performed on the complete differential expression dataset. Each experiment is presented as one trajectory, connecting individual timepoints within that experiment (
(53) To further analyze whether different MOA's trigger different responses, a multi-class logistic regression model was fit on the training dataset and evaluated on the test set. If a simple classifier can successfully distinguish between different MOA's, this would imply that there are discriminating signals specific to each MOA. Similar to the fitness prediction, the regularization parameter was selected via a principled automatic procedure (without making any arbitrary decisions) to avoid overfitting. (
(54) Example 3
(55) Reimagining statistical entropy as a measurement of transcriptional disorder leads to a flexible fitness predictor
(56) While the practical application of the MOA model may be useful, the main goal of this example is to build a versatile toolbox for fitness predictions that does not have many parameters to tune, does not rely on specific genes, and therefore possibly has improved generalizability compared to gene-panel models. To accomplish this, the following observation that we made in the data presented in this work, as well as in previously published studies.sup.11,12,23,29 were focused on: bacteria with low-fitness in a given condition trigger larger, and seemingly more chaotic gene expression changes than those with high fitness (
(57) To calculate entropy on a transcriptomic dataset with multiple timepoints, the classical statistical concept of entropy (H) is redefined as follows:
H=ln(|Σρ|) (1)
(58) Where Σ is the empirical covariance matrix (Σ.sub.ij is the empirical covariance of gene.sub.i and gene.sub.j computed from the time series data), and |Σ| denotes the determinant of Σ.sup.30-33. Σ.sub.92 is a graphical-lasso regularized Σ, where ρ denotes the regularization strength.
(59) Entropy is computed from experiments with multiple timepoints as follows. 1) The temporal differential expression (DE) data is used to compute a gene-gene empirical covariance matrix Σ. 2) Graphical lasso.sup.34 is applied to Σ to obtain a regularized inverse of this covariance matrix (Σ.sub.ρ.sup.−1) . The matrix Σ.sub.ρ.sup.−1 represents a network of dependencies of the regulatory interactions of the genes. 3) The inverse of this matrix (Σ.sub.ρ) can then be used in Equation 1 to compute entropy (
(60) It is important to note that, with the described approach, a high entropy response reflects large changes in magnitude in the transcriptome that come from independently responding genes. This means that large changes in magnitude can still result in low entropy, when changes in expression are synchronized among genes (
(61) Training of this multi time-point entropy model includes the determination of two parameters: regularization strength ρ and threshold t. This is accomplished by first determining ρ by 5-fold crossvalidation (on the training set), and then determining t for this selected ρ. ρ at 1.5 minimizes crossvalidation error (
(62) Example 4
(63) An even simpler model of entropy can predict fitness outcome from a single time-point
(64) The time course experiments accurately capture a bacterium's survival in a test environment, but they are labor intensive and potentially expensive. In cases where temporal information may not be available, computing covariance across genes is not possible. However, entropy can still be determined for a single-timepoint transcriptome profile as follows.sup.38:
H.sub.stp=ln(σ.sup.2) (2)
Where σ.sup.2 is the variance of the distribution of differential expression across genes for a single timepoint (
(65) Overall, the entropy model (and its variants) has several advantages. First, it is based on a simple, and intuitive principle: large and independent changes in the transcriptome are indicative of dysregulation, and beyond a threshold predictive of low fitness. Second, it is possible to simplify the entropy-based model to accommodate less data (i.e. single timepoint transcriptome). Third, an entropy-based model has few parameters (at most 2 parameters need to be determined) and is therefore less likely to be overfit to data. Fourth, the model does not depend on the identity of specific genes, who may or may not be present in different strains/species. Fifth, the model could be easily applied to other data types (e.g. proteomics, metabolomics). Therefore, an entropy-based model is more likely than a gene-panel based approach to be generalizable to previously unseen conditions and species.
Example 5
(66) Entropy-based fitness predictions are strain, species and stress-type independent and can be used to infer the antibiotic minimum inhibitory concentration
(67) To test if the entropy-based approach is indeed generalizable and successfully predicts fitness for other S. pneumoniae strains and other species, a new RNA-Seq dataset was generated under ciprofloxacin exposure for Salmonella Typhimurium, Staphylococcus aureus, E. coli, Klebsiella pneumoniae and two additional S. pneumoniae strains representing serotypes 1 and 23F (Table 1). These five species represent both Gram-negative and Gram-positive bacteria and cover a wide range of ciprofloxacin MICs (
(68) Furthermore, the entropy measurement of each strain was found to be inversely proportional to the MIC.sub.CIP (
(69) To further validate the approach, data from Bhattacharyya et al.sup.11 was used. In this RNA-Seq dataset, susceptible and resistant strains from 3 species were exposed to 3 different antibiotics (2 of which were not present in our dataset). Again, by using the entropy threshold of 2.08 (obtained above through training on the S. pneumoniae data) susceptible strains with low fitness are successfully separated from resistant strains with high fitness (
(70) To explore the applicability of entropy beyond nutrient and antibiotic stress, entropy-based fitness classification was performed on a published collection of 193 M. tuberculosis transcription factor over-expression (TFOE) strains.sup.39. Upon TFOE, these strains exhibit fitness changes, ranging from severe growth defects to small growth advantages.sup.40. Over-expression of a single transcription factor can thereby exert stress on the bacterium that can result in different fitness outcomes. By calculating entropy from whole-genome microarray data collected from each TFOE strain, it is possible to distinguish strains based on their fitness levels at an accuracy of 0.78, using a newly trained entropy threshold for this dataset (
Example 6
Methods
(71) Bacterial strains, culture media and growth curve assays
(72) S. pneumoniae strain TIGR4 (T4; NC_003028.3) is a serotype 4 strain originally isolated from a Norwegian patient.sup.3,4, Taiwan-19F (19F; NC_012469.1) is a multi-drug resistant strain.sup.5,6 and D39 (NC_008533) is a commonly used serotype 2 strain originally isolated from a patient about 90 years ago.sup.7. Strain PG1 and PG19 were isolated from adults with pneumococcal bacteremia infection and included in the Pneumococcal Bacteremia Collection Nijmegen (PBCN).sup.8. All S. pneumoniae gene numbers refer to the T4 genome. Correspondence between homologous genes among S. pneumoniae strains and gene function annotations are described herewith. Escherichia coli strain AR538, Klebsiella pneumoniae strain AR497 and Salmonella enterica subsp Typhimurium strain AR635 were clinical isolates obtained from the Center of Disease Control (CDC). Staphylococcus aureus strain MN6 was kindly provided by George Sakoulas (Center of Immunity, Infection & Inflammation, UCSD School of Medicine). Unless otherwise specified, S. pneumoniae strains were cultivated in Todd Hewitt medium with 5% yeast extract (THY) with 5 μL/mL oxyrase (Oxyrase, Inc) or on sheep's blood agar plates (Northeastern Laboratories) at 37° C. with 5% CO2. A. baumannii, E. coll. K. pneumoniae, S. aureus and S. Typhimurium were cultured in Mueller Hinton broth II (Sigma) at 37° C. with 220 rpm constant shaking. Tn-Seq and RNA-Seq experiments of S. pneumoniae under nutrient-depletion and antibiotic conditions were performed in chemically defined medium (CDM).sup.9 and semi-defined minimal medium (SDMM).sup.10, respectively. RNA-Seq experiments for A. baumannii, S. Typhimurium, E. coll. K. pneumoniae, and S. aureus were performed in Mueller Hinton broth II. Single strain growth assays were performed at least three times using 96-well plates by taking OD.sub.600 measurements on a Tecan Infinite 200 PRO plate reader.
Temporal RNA-Seq Sample Collection, Preparation and Analysis
(73) In nutrient RNA-Seq experiments, T4, D39 and adapted D39 were collected at 30 and 90 min after depletion of D39-essential nutrients. In the training set antibiotic RNA-Seq experiments, wild-type and adapted T4 or 19F were collected at 10, 20, 30, 45, 60, 90, 120 min post-vancomycin, rifampicin or penicillin treatment. Additional time points at 150, 180, 210 and 240 min were collected in levofloxacin and kanamycin experiments due to the slower transcriptional response. In the test set antibiotic RNA-Seq experiments, wild-type T4 and 19F were collected at 30 and 120 min post-cefepime, ciprofloxacin, daptomycin or tetracycline treatment. Ciprofloxacin-adapted T4 and 19F were collected at 30 and 120 min post-ciprofloxacin treatment. T4 was collected at 30 and 120 min post-amoxicillin, ceftriaxone, imipenem, linezolid, moxifloxacin or tobramycin treatment. Wild-type strains were exposed to 1xMIC antibiotics; antibiotic-adapted strains were exposed to 1xMIC and 1.5-2xMIC of the respective antibiotic. Cell pellets were collected by centrifugation at 4000 rpm at 4° C. and snap frozen and stored at −80° C. until RNA isolation with the RNeasy Mini Kit (Qiagen). 400 ng of total RNA from each sample was used for generating cDNA libraries following the RNAtag-Seq protocol.sup.11 as previously described.sup.9. PCR amplified cDNA libraries were sequenced on an Illumina NextSeq500 generating a high sequencing depth of −7.5 million reads per sample.sup.12. RNA-Seq data was analyzed using an in-house developed analysis pipeline. In brief, raw reads are demultiplexed by 5′ and 3′ indices.sup.11, trimmed to 59 base pairs, and quality filtered (96% sequence quality>Q14). Filtered reads are mapped to the corresponding reference genomes using bowtie2 with the—very-sensitive option (-D 20 -R 3 -N 0 -L 20 -i S, 1, 0.50).sup.13. Mapped reads are aggregated by featureCount and differential expression is calculated with DESeq2.sup.14,15. In each pair-wise differential expression comparison, significant differential expression is filtered based on two criteria: |log2foldchange|>1 and adjusted p-value (padj) <0.05. All differential expression comparisons are made between the presence and absence of the antibiotic or nutrient at the same time point.
Experimental Evolution
(74) D39 was used as the parental strain in nutrient-depletion evolution experiments; T4 and 19F were used as parental strains in antibiotic evolution experiments. Four replicate populations were grown in fresh CDM with a decreasing concentration of uracil or L-Val for nutrient adaptation populations, or an increasing concentration of ciprofloxacin, cefepime, levofloxacin, kanamycin, penicillin, rifampicin, or vancomycin for antibiotic adaptation populations. Four replicate populations were serial passaged in CDM or SDMM as controls to identify background adaptations in nutrient or antibiotic adaptation experiments, respectively. When populations were adapted to their nutrient or antibiotic environment, a single colony was picked from each experiment and checked for its adaptive phenotype by growth curve experiments.
Determination of Relative Minimal Inhibitory Concentration (MIC)
(75) 1 to 5×10.sup.5 CFU of mid-exponential bacteria in 100 uL was diluted with 100 uL of fresh medium with a single antibiotic to achieve a final concentration gradient of cefepime (T4: 0.008-0.8 μg/mL; 19F: 0.6-2.4 μg/mL), ciprofloxacin (S. pneumoniae strains: 0.125-4.0 μg/mL; other species: 0.0125-25 μg/mL), daptomycin (15-55 μg/mL), levofloxacin (0.1-2 μg/mL), kanamycin (35-250 μg/mL), penicillin (T4: 0.02-0.055 μg/mL, 19F: 1-4 μg/mL), rifampicin (0.005-0.04 p82 g/mL), tetracycline (T4: 4-18 μg/mL, 19F: 19-22 μg/mL), amoxicillin (0.01-0.16 μg/mL), imipenem (0.0005-0.045 μg/mL), ceftriaxone (0.0005-0.009 μg/mL), linezolid (0.05-0.65 μg/mL), tobramycin (35-255 μg/mL), cotrimoxazole (0.5-7.5 μg/mL); moxifloxacin (0.05-0.70 μg/mL), and vancomycin (0.1-0.5 μg/mL) in 96-well plates. Each concentration was tested in triplicate. Growth was monitored on a Tecan Infinite 200 PRO plate reader at 37° C. for 16 hours. MIC is determined as the lowest concentration that abolishes bacterial growth (
Selection of Gene Panel For Fitness Prediction
(76) Differential expression data from experiments from all experimental timepoints with time≥60 min were assembled in R (v3.4.3). The data were split into training and test sets as described in Supplementary Table 1, yielding a training set of 138 and a test set of 19 experiments. Genes with incomplete data (e.g. genes unique to one strain) were omitted. The differential expression data was then scaled such that the values for each gene had mean=0 and variance=1. A binomial logistic regression model was fit to the training set with glmnet v3.0-2. In order to determine the appropriate value of the regularization parameter lambda, 5-fold crossvalidation was performed on the training set and mean squared error (MSE) of the crossvalidation set for each of the 5 folds was computed as a measure of classification error. The value of lambda was selected to be the largest at which the MSE is within 1 standard deviation of the minimal MSE overall.sup.16,17.
(77) Evaluation of the gene panel's sensitivity to input data was done using another 5-fold crossvalidation strategy, where for each fold, the training portion includes 80% of the original training dataset. The model was fit with the same strategy as above, selecting the best lambda for each fold.
(78) Evaluation of the gene panel's sensitivity to lambda was done using the standard output of the glmnet function.
(79) For gene panels specific to a single MOA, the training and test sets were filtered to include only experiments from that MOA. Otherwise, all analyses were done as described above.
PCA and Trajectory Clustering
(80) For principal component analysis (PCA), differential expression (log2fold change of +/− antibiotic comparisons) data from all 255 experimental conditions (per time point per antibiotic from all experiments excluding CIP-validation set with A. baumannii, E. coli, K. pneumoniae, S. Typhimurium, S. aureus, S. pneumoniae serotype 1 and 23F strains) were assembled in R (v3.4.3). The function “prcomp” was used for PCA. Timepoints of the same experiment were connected to form trajectories. Since not all experiments are on the exact same time scale (e.g. KAN experiments extend to 240 min whereas RIF experiments cover 120 min), equivalent timepoints for each experiment were determined to be
(81)
and t.sub.max being the latest time point available for the corresponding experiment. If a timepoint did not correspond to an existing RNA-Seq data point, this time point was inferred by linear interpolation of the existing trajectories. To cluster these trajectories, a trajectory-distance metric between two trajectories X and Y is defined as the sum of Euclidean distances (‘dist’, on the principal component coordinates)
(82)
of all timepoints i. All pairwise distances are computed for all pairs of trajectories included in the analysis (WT strains with low fitness, for PSI, DSI, CWSI and RSI). Kmeans clustering in MATLAB with k=4 is used on the pairwise distances to cluster the trajectories.
Selection of Gene Panel for MOA Prediction
(83) Differential expression (log2 fold change of drug/no drug comparison) data from all antibiotic experiments with low fitness outcome and time≥60 minutes were assembled in R (v3.4.3). The data were split into training and test sets as described in Supplemental Table 1, yielding a training set of 39 and a test set of 15 experiments. Similar to the fitness gene panel data preparation, genes with incomplete data were omitted. A multinomial logistic regression model was fit to the training set with glmnet v3.0-2. The appropriate value of lambda was selected using a similar crossvalidation scheme to the fitness gene panel: the largest lambda at which the crossvalidation error is within 1 standard deviation of the minimal error overall.
(84) Evaluation of the model's sensitivity to input and lambda were done as described in the “Selection of gene panel for fitness prediction” section above
Gene Set Enrichment Analysis
(85) Gene panels for S. pneumoniae were evaluated for enrichment of functional categories (the category annotation can be found in Supplemental File 1), using a hypergeometric test, and Benjamini-Hochberg correction for multiple comparisons. For gene panels in Bhattacharyya et al..sup.18, enrichment for GO terms was evaluated using the same procedure. The GO term annotation was acquired from Uniprot.
Quantifying Entropy of Single Timepoint and Temporal Transcriptional Data
(86) Entropy of a single-time point experiment (H.sub.stp) is calculated using the definition of entropy on normal distributions (equation 1)
(87)
Where σ.sup.2 is the variance of the differential expression distribution. Note that this is equivalent to Equation (2) in the main text, with some added constants. A threshold was selected by scanning a range of possible thresholds and selecting one that maximizes accuracy on the training set (same as gene panel training set). Performance is reported on this training set and test set (same as gene panel test set).
(88) While this metric is informative, it does not take into account the temporal changes that occur in genes. For the temporal entropy models, all experiments have at least 2 timepoints at which RNA-Seq was performed. For each experiment, all available timepoints were used for temporal entropy analysis. The variance of differential expression is quantified using equation 2
(89)
Where σ.sub.i.sup.2 is the variance in differential expression of gene.sub.i over t time points. Thresholding is done similarly to H.sub.stp.
(90) An assumption in the previous model is that genes' variation over time are independent of one another. Genes in the same regulon are known to be co-expressed and are examples of highly covarying genes. This means there are potential correlations between pairs of genes. In order to account for this phenomenon, the expression changes in N genes over t timepoints are considered to come from a multivariate normal distribution, with N dimensions. This is in contrast to equation 2, where N independent univariate normal distributions are considered. The entropy of a multivariate normal distribution is defined as (equation 3)
H.sub.temporal2=ln(|Σ|) (3)
(91) Where Σ∈.sup.N×N is the covariance matrix (Σ.sub.ij is the covariance of gene.sub.i and gene.sub.j), and |Σ| denotes the determinant of Σ. Thresholding is done similarly to H.sub.stp.
(92) It is likely that E takes into account indirect links between genes. To correct for this, regularization is applied using glasso (v.1.10), which eliminates spurious links that are potentially an artifact of such indirect covariances. Glasso applies an L1-penalty to estimate a sparse inverse covariance matrix (precision matrix).sup.19. The inverse of this sparse precision matrix is used as the regularized covariance matrix Σ.sup.ρ where ρ denotes the regularization strength. The higher the value of ρ, the sparser the matrix. Multiple values of ρ are scanned between 0 and 5 (ρ=0 being equivalent to H.sub.temporal2 and ρ>5 being equivalent to H.sub.temporal1). For each ρ, the inverse-regularized-inverse covariance entropy is computed as (equation 4)
H.sub.temporal3=ln(|Σ.sup.ρ|) (4)
and the appropriate threshold is selected. Note that this is the main temporal model for entropy presented in this work. The accuracy of each of these models is reported (with varying ρ,) on the training set. The final model selected has a ρ, and threshold value that maximizes accuracy.
(93) TABLE-US-00001 TABLE 1 Project setup Data Collection Predictions RNA-Seq Fitness MOA Entropy Experimental Setup timepoints (Gene- (Gene- (single Stress Species Strain Fitness (min) panel) panel) Entropy timpeoint) Amoxicillin Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Cefepime Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Cefepime Streptococcus pneumoniae 19F Low, 30, 120 Test Test Test Test High Ceftriaxone Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Ciprofloxacin Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Ciprofloxacin Streptococcus pneumoniae 19F Low 30, 120 Test Test Test Test Ciprofloxacin Streptococcus pneumoniae aT4 High 30, 120 Test Test Test Ciprofloxacin Streptococcus pneumoniae a19F High 30, 120 Test Test Test Ciprofloxacin Streptococcus pneumoniae 23F Low, 120 Validation High Ciprofloxacin Streptococcus pneumoniae 1 Low 120 Validation AR635 Ciprofloxacin Salmonella Typhimurium (CDC) Low 120 Validation Ciprofloxacin Staphylococcus aureus MN6 Low, 120 Validation (CDC) High Ciprofloxacin Escherichia coli AR538 Low, 120 Validation (CDC) High Ciprofloxacin Klebsiella pneumoniae AR497 Low, 120 Validation (CDC) High Ciprofloxacin Acinetobacter baumannii LAC-4 High 120 Validation Ciprofloxacin Acinetobacter baumannii 17978 Low 120 Validation Cotrimoxazol Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Daptomycin Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Daptomycin Streptococcus pneumoniae 19F Low 30, 120 Test Test Test Imipenem Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Kanamycin Streptococcus pneumoniae T4 Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120, 150, 180, 210, 240 Kanamycin Streptococcus pneumoniae 19F Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120, 150, 180, 210, 240 Kanamycin Streptococcus pneumoniae aT4 High 30, 60, 90, Training Training Training 120, 150, 180, 210, 240 Kanamycin Streptococcus pneumoniae a19F High 30, 60, 90, Training Training Training 120, 150, 180, 210, 240 Levofloxacin Streptococcus pneumoniae T4 Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120, 150, 180 Levofloxacin Streptococcus pneumoniae 19F Low 30, 60, 90, Training Training Training Training 120, 150, 180, 210, 240 Levofloxacin Streptococcus pneumoniae aT4 High 30, 60, 90, Training Training Training 120, 150, 180, 210, 240 Levofloxacin Streptococcus pneumoniae a19F High 30, 60, 90, Training Training Training 120, 150, 180, 210, 240 Linezolid Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Moxifloxacin Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test No Glycine Streptococcus pneumoniae T4 High 30, 90 Training Training Training No Glycine Streptococcus pneumoniae D39 Low 30, 90 Training Training Training No Uracil Streptococcus pneumoniae T4 High 30, 90 Training Training Training No Uracil Streptococcus pneumoniae D39 Low 30, 60, 90, Training Training Training 120 No Uracil Streptococcus pneumoniae aD39 High 30, 90 Training Training Training No Valine Streptococcus pneumoniae T4 High 30, 90 Training Training Training No Valine Streptococcus pneumoniae D39 Low 30, 90 Training Training Training No Valine Streptococcus pneumoniae aD39 High 30, 20 Training Training Training Penicillin Streptococcus pneumoniae T4 Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120 Penicillin Streptococcus pneumoniae 19F High 10, 20, 30, Training Training Training 45, 60, 90, 120 Penicillin Streptococcus pneumoniae a19F High 10, 20, 30, Training Training Training 45, 60, 90, 120 Rifampicin Streptococcus pneumoniae T4 Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120 Rifampicin Streptococcus pneumoniae 19F Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120 Rifampicin Streptococcus pneumoniae aT4 High 10, 20, 30, Training Training Training 45, 60, 90, 120 Rifampicin Streptococcus pneumoniae a19F High 10, 20, 30, Training Training Training 45, 60, 90, 120 Tetracycline Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Tetracycline Streptococcus pneumoniae 19F Low, 30, 120 Test Test Test Test High Tobramycin Streptococcus pneumoniae T4 Low 30, 120 Test Test Test Test Vancomycin Streptococcus pneumoniae T4 Low 10, 20, 30, Training Training Training Training 45, 60, 90 Vancomycin Streptococcus pneumoniae 19F Low 10, 20, 30, Training Training Training Training 45, 60, 90, 120 Vancomycin Streptococcus pneumoniae aT4 High 10, 20, 30, Training Training Training 45, 60, 90, 120 Vancomycin Streptococcus pneumoniae a19F High 10, 20, 30, Training Training Training 45, 60, 90, 120
(94) TABLE-US-00002 TABLE 2 Antibiotic minimum inhibitory concentrations (MIC) used in this study 1x MIC for 1x MIC for Antibiotic T4 (μg/mL) 19F (μg/mL) Amoxicilin (AMX) 0.12 na Cefepime (CEF) 0.8 2.2 Ceftriaxone (CFT) 0.007 na Ciprofloxacin (CIP) 1 1 Cotrimoxazol (COT) 6.5 na Daptomycin (DAP) 35 35 Imipenem (IMI) 0.03 na Kanamycin (KAN) 90 90 Levofloxacin (LVX) 1 1.1 Linezolid (LIN) 0.5 na Moxifloxacin (MOX) 0.45 na Penicillin (PEN) 0.03 2.25 Rifampicin (RIF) 0.035 0.035 Tetracycline (TET) 8 22 Tobramycin (TOB) 175 na Vancomycin (VNC) 0.24 0.24 na: not tested in this study.
(95) TABLE-US-00003 TABLE 3 Features and their coefficients in the fitness gene panel. Feature: the TIGR4 locus tag of the selected feature. Coefficient: the coefficient of the feature. Tag: Functional tag, Category: Functional category of the gene. Gene Feature Coefficient Tag Category Name Gene Description SP_0161 −0.010279271 GENETIC Transcription INFORMATION PROCESSING SP_0568 0.025001143 GENETIC Translation valyl-tRNA INFORMATION synthetase PROCESSING SP_1300 −0.034269552 HYPOTHETICAL/ NA Hypothetical UNKNOWN Protein SP_1869 −0.03957627 ENVIRONMENTAL Membrane INFORMATION transport PROCESSING SP_2195 −0.019990526 GENETIC Transcription ctsR Transcriptional INFORMATION regulator CtsR PROCESSING SP_0929 0.032539524 GENETIC Translation rluD Ribosomal INFORMATION large subunit PROCESSING pseudouridin synthase D SP_0141 −0.00392369 GENETIC Transcription transcriptional INFORMATION regulator, PROCESSING MutR family SP_0680 −0.019730034 GENETIC Translation rsuA-2 Ribosomal INFORMATION small subunit PROCESSING pseudouridine synthase A SP_0197 −0.00602077 METABOLISM Cofactor Dihydrofolate and vitamin synthase metabolism (EC 6.3.2.12) SP_0336 0.022187461 METABOLISM Cell wall Penicillin-binding metabolism protein 2x SP_1438 0.009735969 ENVIRONMENTAL Membrane ABC transporter, INFORMATION transport ATPase component PROCESSING SP_0434 0.024817121 HYPOTHETICAL/ NA UNKNOWN SP_0520 −0.023400459 HYPOTHETICAL/ NA UNKNOWN SP_1041 0.019562441 HYPOTHETICAL/ NA UNKNOWN SP_0589 0.126448171 METABOLISM Amino acid cysE serine metabolism acetyltransferase SP_1686 −0.017175824 HYPOTHETICAL/ NA Hypothetical UNKNOWN Protein SP_1895 −0.001292326 ENVIRONMENTAL Membrane rafG sugar ABC INFORMATION transport transporter PROCESSING substrate-binding protein SP_2167 −0.003983055 METABOLISM Carbohydrate fucK L-fuculose metabolism kinase fucK SP_1798 −0.017352287 ENVIRONMENTAL Membrane INFORMATION transport PROCESSING SP_1800 0.093566147 GENETIC Transcription INFORMATION PROCESSING SP_1856 −0.08663485 GENETIC Transcription INFORMATION PROCESSING SP_1857 −0.008283306 ENVIRONMENTAL Membrane Cation INFORMATION transport efflux PROCESSING system protein SP_1396 0.000580467 ENVIRONMENTAL Membrane pstB1 Phosphate ABC INFORMATION transport transporter, ATP- PROCESSING binding component 1 SP_1478 0.029056712 METABOLISM Various Aldo/keto reductase metabolism SP_1527 0.006450843 ENVIRONMENTAL Membrane aliB Oligopeptide ABC INFORMATION transport transporter, PROCESSING oligopeptide- binding protein SP_1597 −0.015615773 HYPOTHETICAL/ NA Hypothetical UNKNOWN Protein SP_2201 −0.040685009 CELLULAR Cellular cbpD Late competence PROCESSES community protein cbpD-murein hydrolase SP_1974 −0.016625727 METABOLISM Various Acylphosphatase metabolism (Intercept) 0.760173069 NA NA NA NA
(96) TABLE-US-00004 TABLE 4 Features and their coefficients in the MOA-specific fitness panels. Gene Panel Feature Coefficient Tag Category Name Gene Description CWSI- (Intercept) 0.802510556 specific fitness CWSI- SP_0091 −0.076552298 ENVIRONMENTAL Membrane ugpE ABC transporter permease specific INFORMATION transport fitness PROCESSING CWSI- SP_0410 −0.014990773 ENVIRONMENTAL Membrane NA exfoliative toxin specific INFORMATION transport fitness PROCESSING CWSI- SP_1974 −0.091374816 METABOLISM Various NA Acylphosphatase specific metabolism fitness CWSI- SP_2051 −0.021830947 ENVIRONMENTAL Membrane cgIC Late competence protein ComGC specific INFORMATION transport fitness PROCESSING DSI-specific (Intercept) 0.852083322 fitness DSI-specific SP_0079 −0.231416616 ENVIRONMENTAL Membrane trkA Trk family potassium uptake protein fitness INFORMATION transport PROCESSING DSI-specific SP_0106 −0.068270673 METABOLISM Amino acid sdaB L-serine dehydratase, iron-sulfur-dependent, fitness metabolism beta subunit (EC 4.3.1.17) DSI-specific SP_0894 0.020592366 METABOLISM Amino acid pepX x-prolyl-dipeptidyl aminopeptidase fitness metabolism DSI-specific SP_1398 0.007518056 ENVIRONMENTAL Membrane pstA Phosphate ABC transporter, permease fitness INFORMATION transport component PROCESSING DSI-specific SP_1472 0.007183581 METABOLISM Various NA Oxidoreductase fitness metabolism DSI-specific SP_2103 −0.00451615 GENETIC Translation rrmA 23S rRNA (guanine(745)-N(l))- fitness INFORMATION methyltransferase PROCESSING DSI-specific SP_2201 −0.005319221 CELLULAR Cellular cbpD Late competence protein cbpD - murein fitness PROCESSES community hydrolase DSI-specific SP_2217 −0.009811251 CELLULAR Cell division mreD rod shpae-determining protein fitness PROCESSES PSI-specific (Intercept) 0.760960208 fitness PSI-specific SP_0434 0.0496811 HYPOTHETICAL/ fitness UNKNOWN PSI-specific SP_0678 0.05860058 HYPOTHETICAL/ hypothetical protein fitness UNKNOWN PSI-specific SP_0913 −0.024961142 ENVIRONMENTAL Membrane ABC transporter permease fitness INFORMATION transport PROCESSING PSI-specific SP_0953 −0.053737175 METABOLISM Various Acetyltransferase fitness metabolism PSI-specific SP_1118 0.019527231 METABOLISM Cell wall pullulanase; fitness metabolism PSI-specific SP_1857 0.094105937 ENVIRONMENTAL Membrane Cation efflux system protein fitness INFORMATION transport PROCESSING Panel: which panel the feature belongs to. Feature: the TIGR4 locus tag of the selected feature. Coefficient: the coefficient of the feature. Tag: Functional tag, Category: Functional category of the gene.
(97) TABLE-US-00005 TABLE 5 Homologs in 6 pathogenic species of genes in 2 gene-panels that predict fitness (corresponding to FIGS. 1F, 1G). Homology data was obtained from the PATRIC database.sup.1. When present, the homolog is represented by its locus tag. An empty cell indicates absence of the homolog in that species. The two fitness panels included are the condition-agnostic gene-panel presented in this work, and the E. coli specific CIP gene-panel in a previous study.sup.2. Salmonella Escherichia enterica Staphylococcus PATRIC cross- Acinetobacter coli str. K- Klebsiella serovar aureus subsp. Streptococcus genus families baumannii 12 substr. pneumoniae Typhimurium aureus NCTC pneumoniae Gene-panel (PGfams) ATCC 17978 MG 1655 HS11286 str. LT2 8325 TIGR4 This work PGF_00004652 64490 KPHS_19230 SAOUHSC_00327 SP_1300 This work PGF_00013324 SP_0520 This work PGF_00015320 SP_1869 This work PGF_00018135 SP_1800 This work PGF_00035026 SP_1115 This work PGF_00057506 A1S_1043 STM0354 SAOUHSC_02461 SP_1856 This work PGF_00300574 SP_1041 This work PGF_00402268 SAOUHSC_01907 SP_1478 This work PGF_00417739 SP_2201 This work PGF_01264994 64280 STM1133 SP_1325 This work PGF_02147779 b1243 KPHS_30000 STM1746.S SAOUHSC_00927 SP_1527 This work PGF_02640939 SP_2167 This work PGF_03065340 SAOUHSC_02645 SP_0161 This work PGF_03285992 SP_1438 This work PGF_03515040 SP_1597 This work PGF_03520500 SP_1798 This work PGF_03889881 SAOUHSC_00502 SP_2195 This work PGF_04485073 A1S_2868 b2315 KPHS_37740 STM2365 SAOUHSC_01766 SP_0197 This work PGF_04695681 A1S_1407 b3607 KPHS_51110 STM3699 SAOUHSC_00510 SP_0589 This work PGF_05500127 A1S_2742 64258 KPHS_05160 STM4475 SAOUHSC_01767 SP_0568 This work PGF_06213055 A1S_2445 b3725 KPHS_52970 STM3854 SAOUHSC_01385 SP_1396 This work PGF_06874321 SP_1895 This work PGF_07619772 A1S_1045 SP_1857 This work PGF_09626318 A1S_2120 b2183 KPHS_36820 STM2222 SAOUHSC_01870 SP_0280 This work PGF_10302926 A1S_3204 b0084 KPHS_33530 STM0122 SAOUHSC_01145 SP_1673 This work PGF_10367439 b0968 KPHS_18770 STM1083 SAOUHSC_01406 SP_1974 This work PGF_10569727 A1S_0841 b2594 KPHS_39870 STM2662 SAOUHSC_01163 SP_0929 This work PGF_12783997 SP_0434 Barczak et PGF_00026615 A1S_0888 b3959 KPHS_01010 STM4122 SAOUHSC_00147 al. Barczak et PGF_00047078 A1S_1962 b2699 KPHS_41020 STM2829 SAOUHSC_01262 SP_1940 al. Barczak et PGF_00403095 A1S_0765 b2498 KPHS_38940 STM2498 SAOUHSC_02353 SP_0745 al. Barczak et PGF_00690318 b3645 KPHS_52620 al. Barczak et PGF_04041316 A1S_3295 64058 KPHS_02820 STM4254 SAOUHSC_00780 SP_0186 al.
(98) TABLE-US-00006 TABLE 6 Features and their coefficients in the MOA gene panel. Coefficient Coefficient Coefficient Coefficient Gene Feature CWSI DSI PSI RSI Tag Category Name Gene Description SP_0338 −0.06118 −0.08609 0.195218 −0.04794 GENETIC Folding, sorting, ATP-dependent Clp INFORMATION degradation protease ATP- PROCESSING binding subunit SP_0781 0.021672 −0.02267 −0.00596 0.006963 HYPOTHETICAL/ NA hypothetical protein UNKNOWN SP_0837 −0.09865 −0.03419 0.022513 0.110322 GENETIC Transcription flaR Putative DNA INFORMATION topology modulation PROCESSING protein FlaR SP_0959 −0.23 0.103554 0.240421 −0.11398 GENETIC Translation infC translation initiation INFORMATION factor IF-3 PROCESSING SP_1072 −0.09697 −0.06232 −0.04418 0.203463 GENETIC Replication dnaG DNA primase INFORMATION PROCESSING SP_1073 −0.31249 −0.31154 −0.20928 0.833316 GENETIC Transcription rpoD RNA polymerase INFORMATION sigma factor RpoD PROCESSING SP_1926 −0.00215 0.000952 0.002799 −0.0016 HYPOTHETICAL/ NA UNKNOWN SP_2107 0.099959 −0.02395 −0.06983 −0.00617 METABOLISM Carbohydrate mal 4-alpha- metabolism M glucanotransferase SP_0977 −0.02788 −0.04462 0.051361 0.021145 METABOLISM Various tehB tellurite resistance metabolism protein TehB SP_1027 0.144475 −0.08052 −0.0076 −0.05635 HYPOTHETICAL/ NA UNKNOWN SP_0151 −0.00295 0.016818 −0.01176 −0.00211 ENVIRONMENTAL Membrane transport ABC transporter INFORMATION ATP-binding PROCESSING protein SP_0330 0.144194 0.064263 −0.06385 −0.1446 GENETIC Transcription regR Sugar binding INFORMATION transcriptional PROCESSING regulator RegR SP_1438 0.286369 −0.31169 −0.11143 0.136752 ENVIRONMENTAL Membrane transport ABC transporter, INFORMATION ATPase component PROCESSING SP_0452 −0.07995 0.015807 −0.11159 0.17573 ENVIRONMENTAL Membrane transport amino acid ABC INFORMATION transporter ATP- PROCESSING binding protein SP_0961 −0.12918 0.06233 0.106884 −0.04003 GENETIC Translation rpIT 50S ribosomal INFORMATION protein L20 PROCESSING SP_2066 −0.11206 0.056273 −0.02984 0.08562 METABOLISM Amino acid thrC threonine synthase metabolism SP_1190 −0.02423 −0.12339 0.053684 0.093939 METABOLISM Carbohydrate lacD metabolism SP_1219 0.035421 −0.1574 0.31921 −0.19723 GENETIC Replication gyrA DNA gyrase, INFORMATION subunit A PROCESSING SP_1227 0.002386 −0.0231 0.065068 −0.04436 ENVIRONMENTAL Signal vicR DNA binding INFORMATION transduction response regulator PROCESSING SP_1805 0.057825 −0.13463 0.064938 0.011868 HYPOTHETICAL/ NA UNKNOWN SP_1584 −0.00664 −0.0483 0.030221 0.024721 GENETIC Transcription codY GTP-sensing INFORMATION transcriptional PROCESSING pleiotropic repressor codY SP_1588 0.284896 −0.00575 −0.14589 −0.13325 METABOLISM Nucleotide Pyridine metabolism nucleotide- disulfide oxidoreductase SP_1271 3.07E−05 −0.05722 0.053564 0.003624 METABOLISM Carbohydrate ispD 2-C-methyl-D- metabolism erythritol 4-phosphate cytidylyltransferase SP_2229 −0.08279 0.249454 −0.15575 −0.01092 GENETIC Translation trpS tryptophanyl-tRNA INFORMATION synthetase II PROCESSING SP_1630 −0.01263 −0.01856 0.15798 −0.12679 HYPOTHETICAL/ NA Hypothetical Protein UNKNOWN SP_2113 −0.04844 0.222414 −0.1673 −0.00668 HYPOTHETICAL/ NA UNKNOWN SP_1690 0.132823 −0.08118 −0.06945 0.017806 ENVIRONMENTAL Membrane transport Putative ABC INFORMATION transporter, PROCESSING substrate-binding component SP_1691 0.106428 −0.04025 −0.07081 0.004635 HYPOTHETICAL/ NA Hypothetical Protein UNKNOWN SP_2186 0.061442 0.085354 −0.10325 −0.04355 METABOLISM Lipid glpK glycerol kinase metabolism SP_2238 −0.05756 0.172883 −0.08982 −0.02551 GENETIC Translation rlmH 23S rRNA INFORMATION (pseudouridine PROCESSING (191S)-N(3))- methyltransferase SP_2084 0.21789 0.049282 −0.3045 0.037323 ENVIRONMENTAL Membrane pstS INFORMATION transport PROCESSING SP_2100 −0.09722 0.247676 −0.24355 0.093085 GENETIC Translation INFORMATION PROCESSING SP_2141 0.022763 0.037692 −0.08685 0.026395 METABOLISM Cell wall metabolism SP_2145 0.001032 0.014399 −0.0191 0.003666 METABOLISM Cell wall metabolism Alpha-l,2- mannosidase (Intercept) −0.98051 0.317393 1.643493 −0.98037 NA NA NA NA Feature: the TIGR4 locus tag of the selected feature. Coefficient: the coefficient of the feature. Tag: Functional tag, Category: Functional category of the gene.
(99) TABLE-US-00007 TABLE 7 Temporal entropy model predictions. Experiment Rho = 1.5 Rho = ∞ Rho = 0 Survive Strain Adapted AB Concentration Group Entropy Prediction Entropy Prediction Entropy Prediction TRUE 19F FALSE CEF L Test 975.3411 TRUE −4.2986 TRUE −82233.4 TRUE FALSE 19F FALSE CEF H Test 1054.675 TRUE −3.40738 TRUE −80373.9 TRUE FALSE 19F FALSE CIP L Test 1036.983 TRUE −3.49596 TRUE −80496.2 TRUE FALSE 19F FALSE DAP L Test 1082.904 FALSE −3.17811 TRUE −79772.5 TRUE FALSE 19F FALSE KAN L Train 1193.194 FALSE −1.7714 FALSE −75912.6 FALSE FALSE 19F FALSE LVX L Train 1287.214 FALSE −1.47674 FALSE −76695.9 FALSE TRUE 19F FALSE PEN L Train 923.4468 TRUE −4.11277 TRUE −82510.7 TRUE FALSE 19F FALSE RIF L Train 1306.995 FALSE −1.57052 FALSE −76930.1 FALSE FALSE 19F FALSE TET H Test 1086.258 FALSE −2.89806 TRUE −79282 TRUE TRUE 19F FALSE TET L Test 1117.299 FALSE −2.92531 TRUE −79344.8 TRUE FALSE 19F FALSE VNC L Train 1256.953 FALSE −1.62245 FALSE −76788.8 FALSE TRUE 19F TRUE CIP L Test 903.3138 TRUE −5.16339 TRUE −83965.8 TRUE TRUE T4 TRUE CIP L Test 914.6849 TRUE −5.97936 TRUE −88166.7 TRUE FALSE D39 FALSE GLY NA Train 1074.217 FALSE −2.84361 TRUE −76848.4 FALSE FALSE D39 FALSE URA NA Train 1296.135 FALSE −1.514 FALSE −73826 FALSE FALSE D39 FALSE VAL NA Train 1080.02 FALSE −2.71829 FALSE −76464 FALSE TRUE 19F TRUE KAN L Train 933.9229 TRUE −3.54389 TRUE −80165.2 TRUE TRUE 19F TRUE KAN H Train 1047.375 TRUE −2.42022 FALSE −78580.6 TRUE TRUE T4 TRUE KAN L Train 935.3213 TRUE −4.06503 TRUE −82044.5 TRUE TRUE T4 TRUE KAN H Train 1003.908 TRUE −3.12312 TRUE −79658.3 TRUE TRUE 19F TRUE LVX L Train 924.1036 TRUE −3.86135 TRUE −80932.6 TRUE TRUE 19F TRUE LVX H Train 975.5749 TRUE −3.15134 TRUE −80109.2 TRUE TRUE T4 TRUE LVX L Train 956.8751 TRUE −3.64705 TRUE −80767 TRUE TRUE T4 TRUE LVX H Train 957.6326 TRUE −3.60188 TRUE −80657.5 TRUE TRUE T4 TRUE PEN L Train 1029.411 TRUE −3.10914 TRUE −82619.2 TRUE TRUE T4 TRUE PEN H Train 1091.334 FALSE −2.74313 TRUE −81410.6 TRUE TRUE 19F TRUE RIF L Train 881.3857 TRUE −5.62567 TRUE −85750 TRUE TRUE 19F TRUE RIF H Train 888.8274 TRUE −4.94695 TRUE −84354.6 TRUE TRUE T4 TRUE RIF L Train 974.8755 TRUE −3.50127 TRUE −83054.6 TRUE TRUE T4 TRUE RIF H Train 956.5659 TRUE −3.67255 TRUE −83306.6 TRUE FALSE T4 FALSE AMX NA Test2 1341.037 FALSE −1.46209 FALSE −77140.8 FALSE FALSE T4 FALSE CEF L Test 1345.329 FALSE −1.70141 FALSE −78701.4 TRUE FALSE T4 FALSE CFT NA Test2 1166.756 FALSE −2.70692 FALSE −80896.4 TRUE FALSE T4 FALSE CIP L Test 1333.311 FALSE −2.04466 FALSE −79503.9 TRUE FALSE T4 FALSE COT NA Test2 1329.673 FALSE −1.71242 FALSE −78776.7 TRUE FALSE T4 FALSE DAP L Test 1128.687 FALSE −2.45533 FALSE −80438.6 TRUE TRUE T4 FALSE GLY NA Train 1004.896 TRUE −4.07824 TRUE −84076.7 TRUE FALSE T4 FALSE IMI NA Test2 1087.932 FALSE −3.26728 TRUE −82136.9 TRUE FALSE T4 FALSE KAN L Train 1119.428 FALSE −2.38138 FALSE −79686.8 TRUE FALSE T4 FALSE LIN NA Test2 1204.522 FALSE −2.07443 FALSE −79638.5 TRUE FALSE T4 FALSE LVX L Train 1215.97 FALSE −2.09028 FALSE −79662.6 TRUE FALSE T4 FALSE MOX NA Test2 1183.66 FALSE −2.62069 FALSE −80808.4 TRUE FALSE T4 FALSE PEN L Train 1091.652 FALSE −2.63357 FALSE −81456 TRUE FALSE T4 FALSE RIF L Train 1191.624 FALSE −2.03601 FALSE −80172.9 TRUE FALSE T4 FALSE TET L Test 1221.145 FALSE −1.52925 FALSE −78424.8 TRUE FALSE T4 FALSE TOB NA Test2 1353.414 FALSE −1.64539 FALSE −77322.3 FALSE TRUE T4 FALSE URA NA Train 953.8118 TRUE −4.95241 TRUE −86005.9 TRUE TRUE T4 FALSE VAL NA Train 1058.581 TRUE −3.60328 TRUE −83025.3 TRUE FALSE T4 FALSE VNC L Train 1162.361 FALSE −2.24791 FALSE −80640.9 TRUE TRUE D39 TRUE URA NA Train 881.8132 TRUE −5.58159 TRUE −82611.2 TRUE TRUE 19F TRUE VNC L Train 888.2264 TRUE −5.07248 TRUE −84705.3 TRUE TRUE 19F TRUE VNC H Train 920.2137 TRUE −3.84356 TRUE −81456 TRUE TRUE D39 TRUE VAL NA Train 975.0714 TRUE −3.89497 TRUE −79046.8 TRUE TRUE T4 TRUE VNC L Train 920.4918 TRUE −4.61982 TRUE −86106.7 TRUE TRUE T4 TRUE VNC H Train 1061.581 TRUE −2.77797 TRUE −81905.2 TRUE Survive: real survival outcome. Strain: strain background used. Adapted: whether the strain has been adapted to the condition. AB: antibiotic (or nutrient) condition. Concentration: concentration of antibiotic used (L = low, H = high). Group: whether the experiment is in the training set or test set. For each of the temporal models (Regularization value rho = 1.5, ∞, or 0), the entropy values and prediction are listed.
(100) TABLE-US-00008 TABLE 8 Model performance for all finalized fitness prediction models. Cohen's Model Group Number NIR TP TN FP FN Kappa AUROC Gene Training 138 0.6812 94 34 10 0 0.8224 0.993956 Panel Gene Test 19 0.7895 4 11 4 0 0.5366 0.75 Panel Temp Training 36 0.6667 23 11 1 1 0.875 0.982639 oral entropy (rho = ∞) Temporal Test 19 0.7895 4 10 5 0 0.4571 0.933333 entropy (rho = ∞) Temporal Training 36 0.6667 24 7 5 0 0.6512 0.913194 entropy (rho = 0) Temporal Test 19 0.7895 4 2 13 0 0.0608 0.85 entropy (rho = 0) Temporal Training 36 0.6667 23 12 0 1 0.9388 0.993056 entropy (rho = 1.5) Temporal Test 19 0.75 3 13 2 1 0.5649 0.916667 entropy (rho = 1.5) Entropy Training 231 0.6537 147 41 39 4 0.5417 0.790149 (single timepoint) Entropy Test 38 0.7895 8 15 15 0 0.2963 0.875 (single timepoint) Entropy Training 93 0.6129 54 16 20 3 0.4291 0.705166 (single timepoint; early) Entropy Test 19 0.7895 3 9 6 1 0.24 0.8 (single timepoint; early) Entropy Training 138 0.6812 90 31 13 4 0.7001 0.897727 (single timepoint; late) Entropy Test 19 0.7895 4 12 3 0 0.6275 1 (single timepoint; late) Balanced Model AUPRC Sensitivity Specificity PPV NPV Accuracy Accuracy F1 Gene 0.997277 1 0.772727 0.903846 1 0.927536 0.886364 0.949495 Panel Gene 0.314235 1 0.733333 0.5 1 0.789474 0.866667 0.666667 Panel Temp 0.967558 0.958333 0.916667 0.958333 0.916667 0.944444 0.9375 0.958333 oral entropy (rho = ∞) Temporal 0.980821 1 0.666667 0.444444 1 0.736842 0.833333 0.615385 entropy (rho = ∞) Temporal 0.862189 1 0.583333 0.827586 1 0.861111 0.791667 0.90566 entropy (rho = 0) Temporal 0.944888 1 0.133333 0.235294 1 0.315789 0.566667 0.380952 entropy (rho = 0) Temporal 0.986079 0.958333 1 1 0.923077 0.972222 0.979167 0.978723 entropy (rho = 1.5) Temporal 0.97502 0.75 0.866667 0.6 0.928571 0.842105 0.808333 0.666667 entropy (rho = 1.5) Entropy 0.771126 0.97351 0.5125 0.790323 0.911111 0.813853 0.743005 0.872404 (single timepoint) Entropy 0.963406 1 0.5 0.347826 1 0.605263 0.75 0.516129 (single timepoint) Entropy 0.709462 0.947368 0.444444 0.72973 0.842105 0.752688 0.695906 0.824427 (single timepoint; early) Entropy 0.90758 0.75 0.6 0.333333 0.9 0.631579 0.675 0.461538 (single timepoint; early) Entropy 0.861196 0.957447 0.704545 0.873786 0.885714 0.876812 0.830996 0.913706 (single timepoint; late) Entropy 1 1 0.8 0.571429 1 0.842105 0.9 0.727273 (single timepoint; late) Model: name of the model. Group: training or test set. Number: number of data points in the group. NIR: no information rate, which is the same as the prevalence of the majority outcome. TP: number of true positives. TN: number of true negatives. FP: number of false positives. FN: number of false negatives. PPV: positive predictive value. NPV: negative predictive value.
C. References
(101) References are cited herein throughout using the format of reference number(s) enclosed by parentheses corresponding to one or more of the following numbered references. For example, citation of references numbers 1 and 2 immediately herein below would be indicated in the disclosure as (1, 2) or (1-2).
(102) 1. Battesti, A. & Bouveret, E. Acyl carrier protein/SpoT interaction, the switch linking SpoT-dependent stress response to fatty acid metabolism. Molecular Microbiology 62, 1048-1063 (2006).
(103) 2. Lopez, J. M., Dromerick, A. & Freese, E. Response of Guanosine 5′-Triphosphate Concentration to Nutritional Changes and Its Significance for Bacillus subtilis Sporulation. Journal of Bacteriology 146, 605-613 (1981).
(104) 3. Chatterjee, A., Saranath, D., Bhatter, P. & Misty, N. Global Transcriptional Profiling of Longitudinal Clinical Isolates of Mycobacterium tuberculosis Exhibiting Rapid Accumulation of Drug Resistance. PLOS ONE 8, e54717 (2013).
(105) 4. Erill, I., Campoy, S. & Barbé, J. Aeons of distress: an evolutionary perspective on the bacterial SOS response. FEMS Microbiol Rev 31, 637-656 (2007).
(106) 5. Au, N. et al. Genetic Composition of the Bacillus subtilis SOS System. Journal of Bacteriology 187, 7655-7666 (2005).
(107) 6. Little, J. W. & Mount, D. W. The SOS regulatory system of Escherichia coli. Cell 29, 11-22 (1982).
(108) 7. Yim, G., McClure, J., Surette, M. G. & Davies, J. E. Modulation of Salmonella gene expression by subinhibitory concentrations of quinolones. The Journal of Antibiotics 64, 73-78 (2011).
(109) 8. Barczak, A. K. et al. RNA signatures allow rapid identification of pathogens and antibiotic susceptibilities. Proc Natl Acad Sci U S A 109, 6217-6222(2012).
(110) 9. Suzuki, S., Horinouchi, T. & Furusawa, C. Prediction of antibiotic resistance by gene expression profiles. Nature Communications 5, 5792 (2014).
(111) 10. Horinouchi, T. et al. Prediction of Cross-resistance and Collateral Sensitivity by Gene Expression profiles and Genomic Mutations. Sci Rep 7, (2017).
(112) 11. Bhattacharyya, R. P. et al. Simultaneous detection of genotype and phenotype enables rapid and accurate antibiotic susceptibility determination. Nature Medicine 1-7 (2019) doi:10.1038/s41591-019-0650-9.
(113) 12. Khazaei, T., Barlow, J. T., Schoepp, N. G. & Ismagilov, R. F. RNA markers enable phenotypic test of antibiotic susceptibility in Neisseria gonorrhoeae after 10 minutes of ciprofloxacin exposure. Sci Rep 8, (2018).
(114) 13. Zeitler, K. & Narayanan, N. The Present and Future State of Antimicrobial Stewardship and Rapid Diagnostic Testing: Can One Ideally Succeed Without the Other? Curr Treat Options Infect Dis 11, 177-187 (2019).
(115) 14. Boutte, C. C. & Crosson, S. Bacterial lifestyle shapes stringent response activation. Trends in Microbiology 21, 174-180 (2013).
(116) 15. Baharoglu, Z. & Mazel, D. SOS, the formidable strategy of bacteria against aggressions. FEMS Microbiol Rev 38, 1126-1145 (2014).
(117) 16. Gottesman, S. Trouble is coming: Signaling pathways that regulate general stress responses in bacteria. J. Biol. Chem. jbc.REV119.005593 (2019) doi:10.1074/jbc.REV119.005593.
(118) 17. Jensen, P. A., Zhu, Z. & van Opijnen, T. Antibiotics Disrupt Coordination between Transcriptional and Phenotypic Stress Responses in Pathogenic Bacteria. Cell Rep 20, 1705-1716 (2017).
(119) 18. van Opijnen, T., Bodi, K. L. & Camilli, A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nature Methods 6, 767-772 (2009).
(120) 19. van Opijnen, T. & Camilli, A. Transposon insertion sequencing: a new tool for systems-level analysis of microorganisms. Nature Reviews Microbiology 11, 435-442 (2013).
(121) 20. Opijnen, T. van & Camilli, A. A fine scale phenotype—genotype virulence map of a bacterial pathogen. Genome Res. 22, 2541-2551 (2012).
(122) 21. Opijnen, T. van & Camilli, A. Genome-Wide Fitness and Genetic Interactions Determined by Tn-seq, a High-Throughput Massively Parallel Sequencing Method for Microorganisms. Current Protocols in Microbiology 19, 1E.3.1-1E.3.16 (2010).
(123) 22. Opijnen, T. van, Dedrick, S. & Bento, J. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome. PLOS Pathogens 12, e1005869 (2016).
(124) 23. Wadsworth, C. B., Sater, M. R. A., Bhattacharyya, R. P. & Grad, Y. H. Impact of Species Diversity on the Design of RNA-Based Diagnostics for Antibiotic Resistance in Neisseria gonorrhoeae. Antimicrobial Agents and Chemotherapy 63, (2019).
(125) 24. Surujon, D. & van Opijnen, T. ShinyOmics: collaborative exploration of omics-data. BMC Bioinformatics 21, 22 (2020).
(126) 25. Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33, 1-22 (2010).
(127) 26. Krstajic, D., Buturovic, L. J., Leahy, D. E. & Thomas, S. Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics 6, 10 (2014).
(128) 27. Hutter, B. et al. Prediction of Mechanisms of Action of Antibacterial Compounds by Gene Expression Profiling. Antimicrobial Agents and Chemotherapy 48, 2838-2844 (2004).
(129) 28. Boshoff, H. I. M. et al. The Transcriptional Responses of Mycobacterium tuberculosis to Inhibitors of Metabolism NOVEL INSIGHTS INTO DRUG MECHANISMS OF ACTION. J. Biol. Chem. 279, 40174-40184 (2004).
(130) 29. Kaneko, K., Furusawa, C. & Yomo, T. Universal Relationship in Gene-Expression Changes for Cells in Steady-Growth State. Phys. Rev. X 5, 011014 (2015).
(131) 30. Ahmed, N. A. & Gokhale, D. V. Entropy expressions and their estimators for multivariate distributions. IEEE Transactions on Information Theory 35, 688-692 (1989).
(132) 31. Misra, N., Singh, H. & Demchuk, E. Estimation of the entropy of a multivariate normal distribution. Journal of Multivariate Analysis 92, 324-342 (2005).
(133) 32. Cai, T. T., Liang, T. & Zhou, H. H. Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions. Journal of Multivariate Analysis 137, 161-172 (2015).
(134) 33. Srivastava, S. & Gupta, M. R. Bayesian estimation of the entropy of the multivariate Gaussian. in 2008 IEEE International Symposium on Information Theory 1103-1107 (2008). doi:10.1109/ISIT.2008.4595158.
(135) 34. Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432-441 (2008).
(136) 35. Thieffry, D., Huerta, A. M., Pérez-Rueda, E. & Collado-Vides, J. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. BioEssays 20, 433-440 (1998).
(137) 36. di Bernardo, D. et al. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nature Biotechnology 23, 377-383 (2005).
(138) 37. Galagan, J. E. et al. The Mycobacterium tuberculosis regulatory network and hypoxia. Nature 499, 178-183 (2013).
(139) 38. Lazo, A. V. & Rathie, P. On the entropy of continuous probability distributions (Corresp.). IEEE Transactions on Information Theory 24, 120-122 (1978).
(140) 39. Rustad, T. R. et al. Mapping and manipulating the Mycobacterium tuberculosis transcriptome using a transcription factor overexpression-derived regulatory network. Genome Biology 15, 502 (2014).
(141) 40. Ma, S. et al. Network stress test reveals novel drug potentiators in Mycobacterium tuberculosis. bioRxiv 429373 (2018) doi:10.1101/429373.
(142) 41. Zhang, B. & Horvath, S. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology 4, (2005).
(143) 42. Jenkins, S. G. & Schuetz, A. N. Current Concepts in Laboratory Testing to Guide Antimicrobial Therapy. Mayo Clinic Proceedings 87, 290-308 (2012).
(144) 43. Westermann, A. J., Gorski, S. A. & Vogel, J. Dual RNA-seq of pathogen and host. Nature Reviews Microbiology 10, 618-630 (2012).
(145) 44. Aprianto, R., Slager, J., Holsappel, S. & Veening, J.-W. Time-resolved dual RNA-seq reveals extensive rewiring of lung epithelial and pneumococcal transcriptomes during early infection. Genome Biology 17, 198 (2016).
(146) 45. Geiss, G. K. et al. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nature Biotechnology 26, 317-325 (2008).46. Dunbar, S. A. Applications of Luminex® xMAPTM technology for rapid, high-throughput multiplexed nucleic acid detection. Clinica Chimica Acta 363, 71-82 (2006).
(147) 46. Wattam, A. R. et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res 42, D581-D591 (2014).
(148) 47. Barczak, A. K. et al. RNA signatures allow rapid identification of pathogens and antibiotic susceptibilities. Proc Natl Acad Sci U S A 109, 6217-6222 (2012).
(149) 48. Aaberge, I. S., Eng, J., Lermark, G. & Lovik, M. Virulence of Streptococcus pneumoniae in mice: a standardized method for preparation and frozen storage of the experimental bacterial inoculum. Microbial Pathogenesis 18, 141-152 (1995).
(150) 49. Tettelin, H. et al. Complete Genome Sequence of a Virulent Isolate of Streptococcus pneumoniae. Science 293, 498-506 (2001).
(151) 50. Shi, Z.-Y., Enright, M. C., Wilkinson, P., Griffiths, D. & Spratt, B. G. Identification of Three Major Clones of Multiply Antibiotic-Resistant Streptococcus pneumoniae in Taiwanese Hospitals by Multilocus Sequence Typing. J Clin Microbiol 36, 3514-3519 (1998).
(152) 51. McGee, L. et al. Nomenclature of Major Antimicrobial-Resistant Clones of Streptococcus pneumoniae Defined by the Pneumococcal Molecular Epidemiology Network. Journal of Clinical Microbiology 39, 2565-2571 (2001).
(153) 52. Lanie, J. A. et al. Genome Sequence of Avery's Virulent Serotype 2 Strain D39 of Streptococcus pneumoniae and Comparison with That of Unencapsulated Laboratory Strain R6. Journal of Bacteriology 189, 38-51 (2007).
(154) 53. Cremers, A. J. H. et al. The post-vaccine microevolution of invasive Streptococcus pneumoniae. Scientific Reports 5, 14952 (2015).
(155) 54. Jensen, P. A., Zhu, Z. & van Opijnen, T. Antibiotics Disrupt Coordination between Transcriptional and Phenotypic Stress Responses in Pathogenic Bacteria. Cell Rep 20, 1705-1716 (2017).
(156) 55. Opijnen, T. van & Camilli, A. A fine scale phenotype—genotype virulence map of a bacterial pathogen. Genome Res. 22, 2541-2551 (2012).
(157) 56. Opijnen, T. van & Camilli, A. A fine scale phenotype—genotype virulence map of a bacterial pathogen. Genome Res. 22, 2541-2551 (2012).
(158) 57. Shishkin, A. A. et al. Simultaneous generation of many RNA-seq libraries in a single reaction. Nature Methods 12, 323-325 (2015).
(159) 58. Haas, B. J., Chin, M., Nusbaum, C., Birren, B. W. & Livny, J. How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734 (2012).
(160) 59. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357-359 (2012).
(161) 60. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014).
(162) 61. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014).
(163) 62. Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 33, 1-22 (2010).
(164) 63. Krstajic, D., Buturovic, L. J., Leahy, D. E. & Thomas, S. Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics 6, 10 (2014).
(165) 64. Bhattacharyya, R. P. et al. Simultaneous detection of genotype and phenotype enables rapid and accurate antibiotic susceptibility determination. Nature Medicine 1-7 (2019) doi:10.1038/s41591-019-0650-9.
(166) 65. Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432-441 (2008).
(167) It will be apparent to those skilled in the art that various modifications and variations can be made in the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.