Methods and devices for nucleic acid-based real-time determination of disease states

Abstract

The present invention is directed to methods and devices for real-time diagnosis of disease states in subjects, for example, infections caused by one or more microorganisms or cancer.

Claims

1. A method of treating a disease or infection caused by a microorganism in a subject comprising administering to the subject a compound that inhibits the growth of a microorganism whose significance score meets or exceeds a threshold value, wherein the significance score is calculated by: comparing sequence reads with one or more databases comprising the genetic information from a control subject of the same species and the genetic information from a plurality of microorganisms to determine whether or not a compared sequence read maps to a species comprised within the one or more databases, wherein the sequence reads are obtained by sequencing nucleic acids present in a biological sample obtained from the subject; determining over time the number of compared sequence reads mapping to a particular microorganism and the number of compared sequence reads mapping to a species, wherein the sequence reads mapping to a species comprise sequence reads mapping to the species of the subject; and computing a significance score for the probability of finding in the subject a compared sequence read mapping to the particular microorganism based on the number of compared sequence reads mapping to the particular microorganism and the number of compared sequence reads mapping to a species.

2. The method according to claim 1, wherein when the significance score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be present in the subject.

3. The method according to claim 1, wherein when the significance score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be relevant for causing a disease in the subject.

4. The method according to claim 3, wherein when the significance score for the particular microorganism exceeds a threshold value with few sequence reads, the disease due to the presence of the microorganism is considered to be severe.

5. A method of treating a disease or infection caused by a microorganism in a subject comprising administering to the subject a compound that inhibits the growth of a microorganism whose significance score meets or exceeds a threshold value, wherein the significance score is calculated by: computing over time a significance score for the probability of finding in the subject a sequence read mapping to a particular microorganism based on the number of sequence reads mapping to the particular microorganism and the number of sequence reads mapping to a species, wherein the sequence reads mapping to a species comprise sequence reads mapping to the species of the subject; wherein the sequence reads mapping to the particular microorganism and the sequence reads mapping to a species are obtained by comparing sequence reads with one or more databases comprising the genetic information from a control subject of the same species and the genetic information from a plurality of microorganisms to determine whether or not a compared sequence read maps to a species comprised within the one or more databases, and wherein the sequence reads are generated by sequencing nucleic acids present in a biological sample obtained from the subject.

6. The method according to claim 5, wherein when the significance score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be present in the subject.

7. The method according to claim 5, wherein when the significance score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be relevant for causing a disease in the subject.

8. The method according to claim 7, wherein when the significance score for the particular microorganism exceeds a threshold value with few sequence reads, the disease due to the presence of the microorganism is considered to be severe.

9. A method of treating a disease or infection caused by a microorganism in a subject comprising administering to the subject a compound that inhibits the growth of a microorganism whose significance score meets or exceeds a threshold value, wherein the significance score is calculated by: determining over time the number of sequence reads mapping to a particular microorganism and the number of sequence reads mapping to a species, wherein the sequence reads mapping to a species comprise sequence reads mapping to the species of the subject; wherein the sequence reads are obtained by comparing sequence reads with one or more databases comprising the genetic information from a control subject of the same species and the genetic information from a plurality of microorganisms to determine whether or not a sequence read maps to a species comprised within the one or more databases, and wherein the sequence reads are generated by sequencing nucleic acids present in a biological sample obtained from the subject; and computing a significance score for the probability of finding in the subject a sequence read mapping to the particular microorganism based on the number of sequence reads mapping to the particular microorganism and the number of sequence reads mapping to a species.

10. The method according to claim 9, wherein when the significance score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be present in the subject.

11. The method according to claim 9, wherein when the significance score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be relevant for causing a disease in the subject.

12. The method according to claim 11, wherein when the significance score for the particular microorganism exceeds a threshold value with few sequence reads, the disease due to the presence of the microorganism is considered to be severe.

13. The method according to claim 9, wherein the sequence reads mapping to a species further comprise sequence reads mapping to any other microorganism present in the sample.

14. The method according to claim 1, wherein the sequence reads mapping to a species further comprise sequence reads mapping to any other microorganism present in the sample.

15. The method according to claim 5, wherein the sequence reads mapping to a species further comprise sequence reads mapping to any other microorganism present in the sample.

Description

FIGURES

(1) FIG. 1 shows the complete test run for patient S9 (test was not stopped for microorganism being labelled as significant) for seven different microorganisms. A horizontal dashed line is also drawn indicating the statistical relevance threshold.

(2) FIG. 2 shows the complete test run for patient S11 (test was not stopped for microorganism being labelled as significant) for four different microorganisms. A horizontal dashed line is also drawn indicating the statistical relevance threshold.

(3) FIG. 3 shows the complete test run for patient S60 (test was not stopped for microorganism being labelled as significant) for five different microorganisms. A horizontal dashed line is also drawn indicating the statistical relevance threshold.

EXAMPLES

(4) The techniques and methods used herein are described herein or carried out in a manner known per se and as described, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2.sup.nd Edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. All methods including the use of kits and reagents are carried out according to the manufacturers' information unless specifically indicated.

Example 1

(5) Biological samples, i.e., blood plasma, were obtained from human subjects suspected of suffering from an infectious disease. The nucleic acids in the samples were sequenced using a next-generation sequence method, generating a plurality of sequence reads. This data was stored and subsequently analyzed as follows.

(6) The individual sequence reads were compared to one or more databases comprising the genetic information of both humans and a plurality of microorganisms, such that each read, if possible, was mapped either to a particular microorganism or to the human genome. The mapping provided the total number of reads mapping to a particular microorganism and the total number of reads that could be mapped to a species, i.e., the particular microorganism, the human genome, as well as any other microorganisms, in real time. Thus, the number of reads attributed to a particular microorganism or to the human subject were known at every time point during the diagnostic procedure.

(7) This information allowed for the generation of a count-vector C: c.sub.m, . . . , c.sub.l; m=1, . . . l which holds the number of reads for every species m in a sample/patient j at an arbitrary but fixed point in time during diagnosis. c.sub.m changes over time during the diagnosis of a patient j while new reads are mapped to a species. In addition, C can grow as new microorganism species are identified. At first, an empty vector is initialized and dynamic one is generated during the runtime of the method. C describes the microbial burden of the patient currently diagnosed. To identify those microbes which burden is of abnormal abundance, the inverse cumulative density function (cdf) of this specific microbial burden in patient j at a given time was calculated as follows:

(8) $\begin{matrix} p (M_{j}) = 1 - cdf (c_{m}; n, p_{M}) = 1 - {.Math.}_{i = 0}^{c_{m}} (\begin{matrix} n \\ i \end{matrix}) {p_{m}^{i} (1 - p_{m})}^{n - i} & [1] \end{matrix}$
where c.sub.m is the number of reads measured for species m in patient j at the current time and n the number of reads which are able to be mapped in total (microbial and host). p.sub.M describes the discovery probability which is calculated in real time and represents the probability to detect a read for species m.

(9) In contrast to conventional testing, this is not an endpoint test but is running in the framework of sequential testing. Thus, through the sequential testing approach all necessary and important information is available while the test runs and not after the test finishes. This provides a new way of infection diagnosis and a new way of testing procedures in the area of next generation sequencing. The information provided is a p-value which describes whether or not the current amount of nucleic acid mapped to a certain species is considered unusual and therefore reaches a very low p-value given the discovery probability for this species and the current running test setup.

(10) This method allows for the definition of new characteristic variables such as “microbial signals per event”. These variables are directly dependent on the times a microorganism becomes statistically relevant and therefore the new variables are of particular importance. Possible characteristic variables are “microbial reads per second” or “microbial reads per human reads”. For every subject and every microorganism such variables can be calculated and therefore will provide a deeper insight into the degree of severity of infection for each sample analyzed. In addition those characteristic variables will enable the comparison of samples sequenced with different technologies because of the technological independence of such variables.

Example 2

(11) Nucleic acids from a biological sample of blood plasma obtained from subject S9 were sequenced such that the probability of finding in the subject a compared sequence read mapping to the particular microorganism based on the number of compared sequence reads mapping to the particular microorganism and the number of compared sequence reads mapping to a species was calculated according to the invention. The results are presented in FIG. 1.

(12) FIG. 1 shows the complete test sequence (the test was not interrupted or aborted by significance for certain microorganisms) for 7 different microorganisms at the same time. A horizontal red-dashed line is also shown which represents a statistical threshold which must be exceeded before a microorganism is considered “relevant” for causing the infection. It is also clear that the blue line, representing the microorganism Enterobacter cloacae, exceeded the statistical threshold only after a few moments of generating data such that the test could have been terminated after only a few moments for this microorganism. The purple line, belonging to the bacterium E. coli, shows a slow increase in value but does not cross the significance level as being relevant until after 500 k reads, indicating that it and the other microorganisms are either contamination or commensal microorganisms.

Example 3

(13) Nucleic acids from a biological sample of blood plasma obtained from subject S11 were sequenced such that the probability of finding in the subject a compared sequence read mapping to the particular microorganism based on the number of compared sequence reads mapping to the particular microorganism and the number of compared sequence reads mapping to a species was calculated according to the present invention. The results are presented in FIG. 2.

(14) Similarly to FIG. 1, FIG. 2 shows a fast rise in the probability of a single bacterium, here K. pneumoniae (in green), is relevant to the disease state, i.e., the causative agent of the infection. It is noted that Cutibacterium acnes, which is a bacterium living on human skin is detected but the relevance/probability for this bacterium to be the causative agent of the infection is zero. This indicates that the method, as intended, filters out commensal species. In contrast, the relevance of E. coli increases up to the significance threshold over a time frame of 350 k reads. Although it is not shown as relevant this might indicate that the patient is in danger of developing a secondary infection caused by E. coli.

(15) This indicates that the method generates information that a current “endpoint based” test may not be able to provide. Thus, this method provides data that indicates to the clinician to act against an infection before it is actually clinically relevant. Another advantage of the herein described method is the ability to detect infections caused by multiple microorganisms, and to further break down which microorganisms are the main causative agent(s).

Example 4

(16) Nucleic acids from a biological sample of blood plasma obtained from subject S60 were sequenced such that the probability of finding in the subject a compared sequence read mapping to the particular microorganism based on the number of compared sequence reads mapping to the particular microorganism and the number of compared sequence reads mapping to a species was calculated according to the present invention. The results are presented in FIG. 3.

(17) As is clearly depicted in FIG. 3, the main infectious agent is B. fragilis, since the green line crosses the relevance threshold right at the start of the method. However, after some events (reads analyzed) two other bacteria make a significant leap over the relevance threshold showing that those two are also contributing to the sepsis of the subject, indicated by the orange and purple line for E. coli and S. aureus, respectively.

(18) Comparing this result with the traditional outcome based on a routine test for all three bacteria, the results would look the same. Each microorganism would have been assigned more or less the same relevance. However, using the method described herein, the main causative agent was clearly identified, and through the objective use of characteristics variables such as “events per time” the main causative agent as well as other microorganisms contributing to the infection were identified.

(19) The axis in the above the figures is always the logarithm of the p-value calculated with Formula 1 and the number of reads analyzed. Of course, it is possible to alter the units depicted on this axis. Here it is solely necessary that through the new units, a unique ordering of reads is possible. This might be, for example, the order with which reads were generated or the time they were compared to the database. Using the above-described method, the above-mentioned characteristic variables, for example “reads per event until relevant” for a specific microorganism and patient can be calculated. These variables can be used to compare different patients suffering from the same microorganism. Further, the main causative agent can be identified by comparing the variables of different microbes in the very same patient.

(20) It is assumed that real infections range in a certain interval say [x-y] measured, for example, by “reads per event”. Contaminants and commensals would then show up outside of the boundaries of this “infection interval”. Therefore, statistical analysis using those infection intervals suffices to identify infections and assess the relevance of the identified microorganisms. In addition, the severity of the infection is assessed by those intervals. This is accomplished using the statistical framework of waiting time analysis. Most of the time, waiting times analyses are carried out using an exponential function. Therefore, assuming that a variable describing the “characteristic infection variable” is distributed following the exponential random variable:
X˜Exp (λ) [2]
and assuming that the wait time for a certain microorganism is between 500-1000 reads, we have λ= 1/500 and λ= 1/1000. Since we are interested in the probability of P(500<X<1000), we calculate P(x<1000)−P(x≤500). This describes the probability of NOT suffering from an infection. Since we want a faster interval than this, we calculate P (X≤500). Now, if the 500th read is again a microorganism read, what we do is calculate P(X>500)=e.sup.−500λ≈0.36. So, it is very likely to see a microorganism read after 500 reads of the host given the interval of 500-1000 for this particular species. However, if we see a second microorganism read just after 10 reads, we calculate P(X>10)=e.sup.−100λ≈0.98 since we have seen a microorganism read after 10 signals (compared reads) we are interested in P(X≤10) and therefore 1−P(X>10)=0.019. Thus, it is highly unlikely to detect a microorganism after 10 signals, so that if a microorganism is detected after 10 signals, there is a need to report it to the clinician.

(21) Both approaches, the coupling of probabilities with a fixed but arbitrary amount of events given a set of events and the resulting waiting time analysis are not described in infectious disease diagnostics or in diagnostics in general. In general, if data generation can be separated into different channels or blocks, we can again parallelize the testing towards each individual channel (i.e., testing every channel individually and treating each channel as a separate experiment) and therefore minimizing the time to result. This, as well, is not possible using endpoint testing, meaning that the method described herein is scalable towards higher throughput in contrast to endpoint based testing.

(22) The invention provides, in particular, the following:

(23) 1. A method for determining the presence of microorganisms in a subject comprising: (a) sequencing nucleic acids present in a biological sample obtained from the subject to obtain a plurality of nucleic acid sequence reads; (b) comparing sequence reads obtained in step (a) with one or more databases comprising the genetic information from a control subject of the same species and the genetic information from a plurality of microorganisms to determine whether or not a compared sequence read maps to a species comprised within the one or more databases; and (c) determining over time the number of compared sequence reads mapping to a particular microorganism and the number of compared sequence reads mapping to a species.

(24) 2. A method for determining the presence of microorganisms in a subject comprising: (a) comparing sequence reads with one or more databases comprising the genetic information from a control subject of the same species and the genetic information from a plurality of microorganisms to determine whether or not a compared sequence read maps to a species comprised within the one or more databases, wherein the sequence reads are obtained by sequencing nucleic acids present in a biological sample obtained from the subject; and (b) determining over time the number of compared sequence reads mapping to a particular microorganism and the number of compared sequence reads mapping to a species.

(25) 3. The method according to item 1 or 2, wherein the method further comprises computing a significance score for the probability of finding in the subject a compared sequence read mapping to the particular microorganism based on the number of compared sequence reads mapping to the particular microorganism and the number of compared sequence reads mapping to a species.

(26) 4. The method according to item 3, wherein when the score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be present in the subject.

(27) 5. The method according to item 3, wherein when the score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be relevant for causing a disease in the subject.

(28) 6. The method according to item 5, wherein when the score for the particular microorganism exceeds a threshold value with few sequence reads, the disease due to the presence of the microorganism is considered to be severe.

(29) 7. A method for determining the presence of a disease state in a subject comprising: (a) sequencing nucleic acids present in a biological sample obtained from the subject to obtain a plurality of nucleic acid sequence reads; (b) comparing sequence reads obtained in step (a) with one or more databases comprising the genetic information from a control subject of the same species to determine whether or not a compared sequence read maps to the control subject; and (c) determining over time the number of compared sequence reads mapping and not mapping to the control subject.

(30) 8. The method according to item 7, wherein the method further comprises computing a significance score for the probability of finding in the subject a compared sequence read not mapping to the control subject based on the number of compared sequence reads not mapping to the control subject and the number of compared sequence reads mapping to the control subject.

(31) 9. The method according to item 8, wherein when the score meets or exceeds a threshold value, the disease state is determined to be present in the subject.

(32) 10. The method according to any one of items 7 to 9, wherein the disease state is cancer.

(33) 11. The method according to item 10, wherein the cancer is caused by a genetic abnormality.

(34) 12. The method according to any one of items 7 to 9, wherein the disease state is an infection caused by a microorganism.

(35) 13. The method according to item 12, wherein the microorganism is a virus, a bacterium, a fungus or a parasite.

(36) 14. The method according to any one of the preceding items, wherein the biological sample is selected from the group consisting of whole blood, serum, blood plasma, amniotic fluid, synovial fluid, liquor, tissue or cell smear, tissue or cell swab, urine, tissue, sputum, stool, gastrointestinal secretions, lymph fluid, and lavage.

(37) 15. The method according to any one of the preceding items, wherein the subject is a vertebrate, preferably a mammal, for example, human, dog, cat, pig, horse, cattle, sheep, goat, mouse, or rat.

(38) 16. The method according to item 15, wherein the subject is human.

(39) 17. The method according to any one of the preceding items, wherein the sequencing is performed by molecular high-throughput sequence analysis.

(40) 18. The method according to any one of the preceding items, wherein when the particular microorganism or the disease state is determined to be present in the subject, the method further comprises administering to the subject a pharmaceutically-active compound known to treat a disease caused by the particular microorganism or the disease state.

(41) 19. A method for diagnosing an infectious disease caused by microorganisms in a subject comprising: (a) sequencing nucleic acids present in a biological sample obtained from the subject to obtain a plurality of nucleic acid sequence reads; (b) comparing sequence reads obtained in step (a) with one or more databases comprising the genetic information from a control subject of the same species and the genetic information from a plurality of microorganisms to determine whether or not a compared sequence read maps to a species comprised within the one or more databases; (c) determining over time the number of compared sequence reads mapping to a particular microorganism and the number of compared sequence reads mapping to a species; and (d) computing a significance score for the probability of finding in the subject a compared sequence read mapping to the particular microorganism based on the number of compared sequence reads mapping to the particular microorganism and the number of compared sequence reads mapping to a species, wherein when the score for the particular microorganism meets or exceeds a threshold value, the particular microorganism is determined to be causing the infectious disease.

(42) 20. A computer-readable storage medium storing program code comprising instructions which when executed by a processor carry out the method according to any one of items 1 to 19.

(43) 21. A computer system comprising a processor configured to carry out the method according to any one of items 1 to 19.

Methods and devices for nucleic acid-based real-time determination of disease states

Assignee

Inventors

Cpc classification

Classification Explorer

C12Q2535/122

CHEMISTRY; METALLURGY

Classification Explorer

Y02A90/10

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

G16B35/00

PHYSICS

Classification Explorer

C12Q1/689

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2535/122

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/689

CHEMISTRY; METALLURGY

Classification Explorer

G16B30/00

PHYSICS

International classification

Classification Explorer

A61B5/00

HUMAN NECESSITIES

Classification Explorer

A61B10/00

HUMAN NECESSITIES

Classification Explorer

C12Q1/04

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/06

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/68

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/689

CHEMISTRY; METALLURGY

Classification Explorer

G16B30/00

PHYSICS

Abstract

Claims

Description