Method for detecting abnormal values of a biomarker

Abstract

The invention proposes a new method for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, based on appropriate Z-scores.

Claims

1. A method for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, said biomarker corresponding to a physiological parameter or to the level of any biological or chemical entity measured from a biological sample of said mammal, the series of values (x.sub.1, x.sub.2, . . . , x.sub.n) related to said at least one biomarker being obtained from independent variables (X.sub.1, X.sub.2, . . . , X.sub.n) normally distributed; said method comprising the following steps of: E1: determining from each biological sample collected at different periods of time from said mammal, a value related to said at least one biomarker thereby acquiring a series of values (x.sub.1, x.sub.2, . . . , x.sub.n) related to said biomarker, E2: storing the series of values (x.sub.1, x.sub.2, . . . , x.sub.n) on a database (DB) stored in a memory (12); said method comprising the following steps run with a processor (14) that can retrieve data from the database (DB) in the memory: E3: calculating, for the whole series of values of step E2 a single value t.sub.n of an indicator (T.sub.n), said indicator (T.sub.n) being based on a studentized form (R.sub.n,i) of the variables (X.sub.1, X.sub.2, . . . , X.sub.n), said calculation consisting of extracting the maximum observed value (t.sub.n) of the studentized form (R.sub.n,i) calculated for each value of the series of values (x.sub.1, x.sub.2, . . . , x.sub.n), E4: comparing the observed value (t.sub.n) of the indicator (T.sub.n) to the quantile (c.sub.a,n) of the distribution of (T.sub.n) said quantile being stored in the memory, E5: if the observed value (t.sub.n) of the indicator (T.sub.n) is above the quantile, reporting, on displaying means (20), a presence of an abnormal value in the series thereby indicating the occurrence of a health event for said mammal; wherein, when searching for one series of samples which comprises an abnormal value: a single series x.sub.1, x.sub.2, . . . , x.sub.n of n values is stored on the database (DB), n is the number of variables, X.sub.1, X.sub.2, . . . , X.sub.n represent the n variables, the studentized form is a studentized residual expressed as: $R_{n, i} = \frac{X_{i} - {\bar{X}}_{n, - i}}{{\hat{σ}}_{n, - i} \sqrt{1 + \frac{1}{n - 1}}}$ the indicator T.sub.n is expressed as: $T_{n} = \max_{i \in {1, .Math., n}} .Math. R_{n, i} .Math.$ $where$ ${\overline{X}}_{n, - i} = \frac{1}{n - 1} {.Math.}_{k = 1, k \neq i}^{n - 1} X_{k}$ ${\hat{σ}}_{n, - i}^{2} = \frac{1}{n - 2} {.Math.}_{k = 1, k \neq i}^{n - 1} {(X_{k} - {\overline{X}}_{n, - i})}^{2}$ the relevant table of quantiles of the distribution of T.sub.n is table 1; or, in multivariate case, a single series x.sub.1, x.sub.2, . . . , x.sub.n of n values is stored on the database (DB), n is the number of variables of dimension d, X.sub.1, X.sub.2, . . . , X.sub.n are n independent R.sup.d valued random vectors, where X.sub.i˜N(μ.sub.i, C) and C the covariance matrix is assumed to be invertible the studentized form is expressed as a normalized length of the residual vector: $R_{n, i} = \frac{n - 1}{n d} {(X_{i} - {\bar{X}}_{n, - i})}^{'} C_{n, - i}^{- 1} (X_{i} - {\bar{X}}_{n, - i})$ where the notation z′ means the transpose of the vector Z, the indicator T.sub.n is expressed as: $T_{n} = \max_{i \in {1, .Math., n}} R_{n, i}$ $where$ ${\overline{X}}_{n, - i} = \frac{1}{n - 1} {.Math.}_{k = 1, k \neq i}^{n - 1} X_{k}$ $C_{n, - i} = \frac{1}{n - 1 - d} {.Math.}_{k = 1, k \neq i}^{n - 1} (X_{k} - {\overline{X}}_{n, - i}) {(X_{k} - {\overline{X}}_{n, - i})}^{'}$ and the relevant table of quantiles of the distribution of T.sub.n is for instance table 1 (for d=2 and d=3), or wherein, when searching for a series of consecutive observations that are abnormal compared to the rest of the series, a single series x.sub.1, x.sub.2, . . . , x.sub.n of n values is stored on the database (DB), n is the number of variables, X.sub.1, X.sub.2, . . . , X.sub.n represent the variables, φ is the collection of all possible intervals I of consecutive integers included in {1, . . . , n} with length 1≤|I|<n, {1, . . . , n}=I∪Ī and I∩Ī=ø, the studentized form is expressed as: $R_{n, I} = \frac{X_{I} - {\bar{X}}_{\overline{I}}}{{\hat{σ}}_{n, I} \sqrt{1 + \frac{1}{n - 1}}}$ the indicator T.sub.n is expressed as: $T_{n} = \max_{I \in φ} .Math. R_{n, I} .Math. where {\bar{X}}_{I} = \frac{1}{.Math. I .Math.} \underset{k \in I}{.Math.} X_{k} {\overline{X}}_{\overline{I}} = \frac{1}{n - .Math. I .Math.} \underset{k \in \overline{I}}{.Math.} X_{k} {\hat{σ}}_{n p}^{2} = \frac{1}{n - 2} (\underset{k \in I}{.Math.} {(X_{k} - {\bar{X}}_{\overline{I}})}^{2} + \underset{k \in \overline{I}}{.Math.} {(X_{k} - {\bar{X}}_{\overline{I}})}^{2})$ the relevant table of quantiles of the distribution of T.sub.n is for instance table 2, or wherein, when searching for an abnormal series when a sample is split in two subsamples of observations, two series of n.sub.1 and n.sub.2 values are stored on the database (DB), x.sub.1, x.sub.2, . . . , x.sub.n.sub.1 and y.sub.1, y.sub.2, . . . , y.sub.n.sub.2, the studentized forms are studentized residuals expressed as: $R_{n_{1}, i} = \frac{X_{i} - {\bar{X}}_{n_{1}, - i}}{{\hat{σ}}_{X, - i, Y} \sqrt{1 + \frac{1}{n_{1} - 1}}} and R_{n_{2}, j} = \frac{Y_{j} - {\bar{Y}}_{n_{2}, - j}}{{\hat{σ}}_{Y, - j, X} \sqrt{1 + \frac{1}{n_{2} - 1}}}$ the indicator T.sub.n is expressed as: $T_{n} = \max {\max_{i \in {1, .Math., n_{1}}} .Math. R_{n_{1}, i} .Math., \max_{j \in {1, .Math., n_{2}}} .Math. R_{n_{2}, j} .Math.} where {\bar{X}}_{n_{1,} - i} = \frac{1}{n_{1} - 1} {.Math.}_{k = 1, k \neq i}^{n_{1}} X_{k} {\overline{Y}}_{n_{2}, - j} = \frac{1}{n_{2} - 1} {.Math.}_{k = 1, k \neq j}^{n_{2}} Y_{k} {\hat{σ}}_{X, - i, Y}^{2} = \frac{1}{n - 3} ({.Math.}_{k = 1, k \neq i}^{n_{1}} {(X_{k} - {\bar{X}}_{n_{1}, - i})}^{2} + {.Math.}_{k = 1}^{n_{2}} {(Y_{k} - {\bar{Y}}_{n_{2}})}^{2}) {\hat{σ}}_{Y, - j, X}^{2} = \frac{1}{n - 3} ({.Math.}_{k = 1}^{n_{1}} {(X_{k} - {\bar{X}}_{n_{1}})}^{2} + {.Math.}_{k = 1, k \neq j}^{n_{2}} {(Y_{k} - {\bar{Y}}_{n_{2}, - j})}^{2})$ and the relevant table of quantiles of the distribution of T.sub.n is for instance table 3, or, in a multivariate case, two series of n.sub.1 and n.sub.2 values are stored on the database (DB), x.sub.1, x.sub.2, . . . , x.sub.n.sub.1 and y.sub.1, y.sub.2, . . . , y.sub.n.sub.2, -X.sub.1, X.sub.2, . . . , X.sub.n.sub.1 and Y.sub.1, Y.sub.2, . . . , Y.sub.n.sub.2 represent the variables as two independent series of independent gaussian R.sup.d valued random vectors where X.sub.i˜N(μ.sub.x.sub.i, C), Y.sub.j˜N(μ.sub.Y.sub.j, C) and C, the covariance matrix, is assumed to be invertible, n is the number of variables of dimension d, the studentized form is expressed as a normalized length of the residual vector: $R_{n_{1}, i} = \frac{n_{1} - 1}{n_{1} d} {(X_{i} - {\bar{X}}_{n_{1}, - i})}^{'} C_{X, - i, Y}^{- 1} (X_{i} - {\bar{X}}_{n_{1}, - i}) R_{n_{2}, j} = \frac{n_{2} - 1}{n_{2} d} {(Y_{j} - {\bar{Y}}_{n_{2}, - j})}^{'} C_{Y, - j, X}^{- 1} (Y_{j} - {\bar{Y}}_{n_{2}, - i})$ and the indicator T.sub.n is expressed as: $T_{n} = \max {\begin{matrix} \max_{i \in {1, .Math., n_{1}}} R_{n_{1}, i} \\ \max_{j \in {1, .Math., n_{2}}} R_{n_{2}, j} \end{matrix}} where {\bar{X}}_{n_{1}, - i} = \frac{1}{n_{1} - 1} {.Math.}_{k = 1, k \neq i}^{n_{1}} X_{k} {\bar{Y}}_{n_{2}, - j} = \frac{1}{n_{2} - 1} {.Math.}_{k = 1, k \neq j}^{n_{2}} Y_{k} C_{X, - i, Y} = \frac{1}{n - 2 - d} ({.Math.}_{k = 1, k \neq i}^{n_{1}} (X_{k} - {\bar{X}}_{n_{1}, - i}) {(X_{k} - {\bar{X}}_{n_{1}, - i})}^{'} + {.Math.}_{k = 1}^{n_{2}} (Y_{k} - {\bar{Y}}_{n_{2}}) {(Y_{k} - {\bar{Y}}_{n_{2}})}^{'}) C_{Y, - j, X} = \frac{1}{n - 2 - a} ({.Math.}_{k = 1}^{n_{1}} (X_{k} - {\bar{X}}_{n_{1}}) {(X_{k} - {\bar{X}}_{n_{1}})}^{'} + {.Math.}_{k = 1, k \neq j}^{n_{2}} (Y_{k} - {\bar{Y}}_{n_{2}, - j}) {(Y_{k} - {\bar{Y}}_{n_{2}, - j})}^{'}) .$

2. The method according to claim 1, comprising a further step of: E6: identifying from the series of values reported in step E2 at least one value being considered as abnormal, and E7: reporting on displaying means (20) that at least one abnormal value.

3. The method according to claim 1, further comprising a step ES of running a Shapiro test on the stored values (x.sub.1, x.sub.2, . . . , x.sub.n.sub.1) to check if the variables are normally distributed.

4. The method according to claim 1, further comprising a step ET of applying a function to the values to turn them into normally distributed variables, such as a log transformation.

5. The method according to claim 1, wherein the at least one biomarker represented by the series of values is chosen in the following list: ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels, complete blood count, platelets, reticulocytes, soluble transferrin receptor, vitamin B9 in red blood cell, blood sugar, cholesterol, triglycerides, serum glutamic oxaloacetic transaminase (SGOT), serum glutamate pyruvate transaminase (SGPT), gamma-glutamyltransferase (γ-GT), lactate dehydrogenase (LDH), bilirubin, electrolytes (e.g. Na.sup.+, Cl.sup.−, K.sup.+, HCO3.sup.−, Ca2.sup.+, Mg2.sup.+), alkaline phosphatases Magnesium in red blood cells, creatinine, androstenedione, urea, uric acid, haptoglobin, C-reactive protein (CRP), transthyretin, orosomucoid, creatine phosphokinase (CPK), inorganic phosphate (PO4), thyroid-stimulating hormone (TSH), testosterone, cortisol, erythropoietin (EPO), ferritin, luteinizing hormone (LH), Insulin-like growth factor 1 (IGF-1), osteocalcin, calcifediol (25 OHD3).

6. A program for computer comprising code lines loaded on to a non-transitory computer readable medium and to be executed by a processor (14), said code lines being configured to operate a method according to claim 1 from its step E2 to E4.

7. A system for monitoring a health event in a mammal, comprising detecting at least one abnormal value within a series of values related to at least one biomarker, said biomarker corresponding to a physiological parameter or to the level of any biological or chemical entity measured from a biological sample of said mammal, the system comprising: a database (DB) stored on a memory (12), the database comprising at least a series of values (x.sub.1, x.sub.2, . . . , x.sub.n) related to said at least one biomarker being obtained from independent variables (X.sub.1, X.sub.2, . . . , X.sub.n) normally distributed according to N (μ.sub.x, σ.sup.2); the values representing the evolution of the biomarker at different period of times, the values (x.sub.1, x.sub.2, . . . , x.sub.n) of the biomarker being obtained from independent variables (X.sub.1, X.sub.2, . . . , X.sub.n) normally distributed N (μ.sub.X, σ.sup.2), a processor (14) comprising: calculating means to calculate, for the whole series of values (x.sub.1, x.sub.2, . . . , x.sub.n) stored in the database (DB), a single value (t.sub.n) of an indicator (T.sub.n), said indicator (T.sub.n) being based on a studentized form (R.sub.n,i) of the variables (X.sub.1, X.sub.2, . . . , X.sub.n), said calculation consisting of extracting the maximum value (t.sub.n) of the studentized residual form (R.sub.n,i) calculated for each value of the series (x.sub.1, x.sub.2, . . . , x.sub.n), comparing means to compare the observed value (t.sub.n) of the indicator (T.sub.n) to the quantile (c.sub.a,n) of the distribution of (T.sub.n), instruction means to instruct displaying means to report, or reporting means to report on displaying means, a presence of an abnormal value in the series, if the observed value (t.sub.n) of the indicator (T.sub.n) is above the quantile, wherein, when searching for one series of samples which comprises an abnormal value: a single series x.sub.1, x.sub.2, . . . , x.sub.n of n values is stored on the database (DB), n is the number of variables, X.sub.1, X.sub.2, . . . , X.sub.n represent the n variables, the studentized form is a studentized residual expressed as: $R_{n, i} = \frac{X_{i} - {\bar{X}}_{n, - i}}{{\hat{σ}}_{n, - i} \sqrt{1 + \frac{1}{n - 1}}}$ the indicator T.sub.n is expressed as: $T_{n} = \max_{i \in {1, .Math., n}} .Math. R_{n, i} .Math. where {\bar{X}}_{n, - i} = \frac{1}{n - 1} {.Math.}_{k = 1, k \neq i}^{n - 1} X_{k} {\hat{σ}}_{n, - i}^{2} = \frac{1}{n - 2} {.Math.}_{k = 1, k \neq i}^{n - 1} {(X_{k} - {\bar{X}}_{n, - i})}^{2}$ the relevant table of quantiles of the distribution of T.sub.n is table 1; Or, in multivariate case, a single series x.sub.1, x.sub.2, . . . , x.sub.n of n values is stored on the database (DB), n is the number of variables of dimension d, X.sub.1, X.sub.2, . . . , X.sub.n are n independent R.sup.d valued random vectors, where X.sub.i˜N(μ.sub.i, C) and C the covariance matrix is assumed to be invertible the studentized form is expressed as a normalized length of the residual vector: $R_{n, i} = \frac{n - 1}{n d} {(X_{i} - {\bar{X}}_{n, - i})}^{'} C_{n, - i}^{- 1} (X_{i} - {\bar{X}}_{n, - i})$ where the notation z′ means the transpose of the vector Z, the indicator T.sub.n is expressed as: $T_{n} = \max_{i \in {1, .Math., n}} R_{n, i} where {\bar{X}}_{n, - i} = \frac{1}{n - 1} {.Math.}_{k = 1, k \neq i}^{n - 1} X_{k} C_{n, - i} = \frac{1}{n - 1 - d} {.Math.}_{k = 1, k \neq i}^{n - 1} (X_{k} - {\bar{X}}_{n, - i}) {(X_{k} - {\bar{X}}_{n, - i})}^{'}$ and the relevant table of quantiles of the distribution of T.sub.n is for instance table 1 (for d=2 and d=3), Or wherein, when searching for a series of consecutive observations that are abnormal compared to the rest of the series, a single series x.sub.1, x.sub.2, . . . , x.sub.n of n values is stored on the database (DB), n is the number of variables, X.sub.1, X.sub.2, . . . , X.sub.n represent the variables, φ is the collection of all possible intervals I of consecutive integers included in {1, . . . , n} with length 1≤|I|<n, {1, . . . , n}=I∪Ī and I∩Ī=ø, the studentized form is expressed as: $R_{n, I} = \frac{{\overline{X}}_{I} - {\overline{X}}_{\overline{I}}}{{\hat{σ}}_{n, I} \sqrt{1 + \frac{1}{n - 1}}}$ the indicator T.sub.n is expressed as: $T_{n} = \max_{I \in φ} .Math. R_{n, I} .Math. where {\bar{X}}_{I} = \frac{1}{.Math. I .Math.} \underset{k \in I}{.Math.} X_{k} {\bar{X}}_{\overline{I}} = \frac{1}{n - .Math. I .Math.} \underset{k \in \overline{I}}{.Math.} X_{k} {\hat{σ}}_{n, I}^{2} = \frac{1}{n - 2} (\underset{k \in I}{.Math.} {(X_{k} - {\bar{X}}_{I})}^{2} + \underset{k \in \overline{I}}{.Math.} {(X_{k} - {\bar{X}}_{\overline{I}})}^{2})$ the relevant table of quantiles of the distribution of Tn is for instance table 2, or wherein, when searching for an abnormal series when a sample is split in two subsamples of observations, two series of n.sub.1 and n.sub.2 values are stored on the database (DB), x.sub.1, x.sub.2, . . . , x.sub.n.sub.1 and y.sub.1, y.sub.2, . . . , y.sub.n.sub.2, the studentized forms are studentized residuals expressed as: $R_{n_{1}, i} = \frac{X_{i} - {\bar{X}}_{n_{1}, - i}}{{\hat{σ}}_{X, - i, Y} \sqrt{1 + \frac{1}{n_{1} - 1}}} and R_{n_{2}, j} = \frac{Y_{j} - {\bar{Y}}_{n_{2,} - j}}{{\hat{σ}}_{Y, - j, X} \sqrt{1 + \frac{1}{n_{2} - 1}}}$ the indicator T.sub.n is expressed as: $T_{n} = \max {\max_{i \in {1, .Math., n_{1}}} .Math. R_{n_{1}, i} .Math., \max_{j \in {1, .Math., n_{2}}} | R_{n_{2}, j} |} where {\bar{X}}_{n_{1,} - i} = \frac{1}{n_{1} - 1} {.Math.}_{k = 1, k \neq i}^{n_{1}} X_{k} {\overline{Y}}_{n_{2}, - j} = \frac{1}{n_{2} - 1} {.Math.}_{k = 1, k \neq j}^{n_{2}} Y_{k} {\hat{σ}}_{X, - i, Y}^{2} = \frac{1}{n - 3} ({.Math.}_{k = 1, k \neq i}^{n_{1}} {(X_{k} - {\bar{X}}_{n_{1}, - i})}^{2} + {.Math.}_{k = 1}^{n_{2}} {(Y_{k} - {\bar{Y}}_{n_{2}})}^{2}) {\hat{σ}}_{Y, - j, X}^{2} = \frac{1}{n - 3} ({.Math.}_{k = 1}^{n_{1}} {(X_{k} - {\bar{X}}_{n_{1}})}^{2} + {.Math.}_{k = 1, k \neq j}^{n_{2}} {(Y_{k} - {\bar{Y}}_{n_{2}, - j})}^{2})$ and the relevant table of quantiles of the distribution of T.sub.n is for instance table 3, or, in a multivariate case, two series of n.sub.1 and n.sub.2 values are stored on the database (DB), x.sub.1, x.sub.2, . . . , x.sub.n.sub.1 and y.sub.1, y.sub.2, . . . , y.sub.n.sub.2, -X.sub.1, X.sub.2, . . . , X.sub.n.sub.1 and Y.sub.1, Y.sub.2.sup., . . . , Y.sub.n.sub.2 represent the variables as two independent series of independent gaussian R.sup.d valued random vectors where X.sub.i˜N (μ.sub.X.sub.i, C), Y.sub.j˜N(μ.sub.Y.sub.j, C) and C, the covariance matrix, is assumed to be invertible, n is the number of variables of dimension d, the studentized form is expressed as a normalized length of the residual vector: $R_{n_{1}, i} = \frac{n_{1} - 1}{n_{1} d} {(X_{i} - {\bar{X}}_{n_{1}, - i})}^{'} C_{X, - i, Y}^{- 1} (X_{i} - {\bar{X}}_{n_{1}, - i}) R_{n_{2}, j} = \frac{n_{2} - 1}{n_{2} d} {(Y_{j} - {\bar{Y}}_{n_{2}, - j})}^{'} C_{Y, - j, X}^{- 1} (Y_{j} - {\bar{Y}}_{n_{2}, - i})$ and the indicator T.sub.n is expressed as: $T_{n} = \max {\begin{matrix} \max_{i \in {1, .Math., n_{1}}} R_{n_{1}, i} \\ \max_{j \in {1, .Math., n_{2}}} R_{n_{2}, j} \end{matrix}} where {\bar{X}}_{n_{1}, - i} = \frac{1}{n_{1} - 1} {.Math.}_{k = 1, k \neq i}^{n_{1}} X_{k} {\bar{Y}}_{n_{2}, - j} = \frac{1}{n_{2} - 1} {.Math.}_{k = 1, k \neq j}^{n_{2}} Y_{k} C_{X, - i, Y} = \frac{1}{n - 2 - d} ({.Math.}_{k = 1, k \neq i}^{n_{1}} (X_{k} - {\bar{X}}_{n_{1}, - i}) {(X_{k} - {\bar{X}}_{n_{1}, - i})}^{'} + {.Math.}_{k = 1}^{n_{2}} (Y_{k} - {\bar{Y}}_{n_{2}}) {(Y_{k} - {\bar{Y}}_{n_{2}})}^{'}) C_{Y, - j, X} = \frac{1}{n - 2 - d} ({.Math.}_{k = 1}^{n_{1}} (X_{k} - {\bar{X}}_{n_{1}}) {(X_{k} - {\bar{X}}_{n_{1}})}^{'} + {.Math.}_{k = 1, k \neq j}^{n_{2}} (Y_{k} - {\bar{Y}}_{n_{2}, - j}) {(Y_{k} - {\bar{Y}}_{n_{2}, - j})}^{'}) .$

8. Use of a method according to claim 1 for detecting a potential doping issue.

9. Use of a method according to claim 1 wherein monitoring a health event comprises detecting any potential health issue of the individual, in order to launch further investigations.

10. Use of a program according to claim 6 for detecting a potential doping issue.

11. Use of a system according to claim 7 for detecting a potential doping issue.

12. Use of a program according to claim 6 wherein monitoring a health event comprises detecting any potential health issue of the individual, in order to launch further investigations.

13. Use of a system according to claim 7 wherein monitoring a health event comprises detecting any potential health issue of the individual, in order to launch further investigations.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The following figures are given as a complement to understand the invention in a non-limitative way:

(2) FIG. 1 illustrates a set-up to run the invention,

(3) FIG. 2 illustrates the main steps of an embodiment of the invention,

(4) FIG. 3 illustrates the frequency of abnormal values detected by different methods according to embodiments of the invention,

(5) FIG. 4 illustrates the distribution of estimated T.sub.n of an embodiment of the invention versus the normal distribution for erythrocyte count and hematocrit,

(6) FIG. 5 illustrates the frequency of abnormal series for each biomarker and methods according different embodiments of the invention (α=5%),

(7) FIG. 6a and FIG. 6b illustrate examples of results for the different embodiments of the invention, for several biomarkers,

(8) FIG. 7 illustrates a flow chart representing some steps of an embodiment according to the invention,

(9) FIG. 8 to FIG. 15 show tables 1 to 3 illustrating the value of the quantile for the different disclosed embodiments. Table 1 includes three tables 1a, 1b, 1c.

(10) FIG. 16 illustrates the frequency of abnormal series for each biomarker and methods according different embodiments of the invention for α=1.0, 2.5, 5.0 or 10.0% (number of abnormal series/total of series (Percentage of abnormal series)),

(11) FIG. 17 illustrates an example of an abnormal series identified with method 2 for ferritin (increase of ferritin levels over the time),

(12) FIG. 18 illustrates an example of an abnormal series identified with method 2 for ferritin (decrease of ferritin levels over the time),

DETAILED DESCRIPTION

(13) In the biological field, working under the assumptions that the biomarkers can be represented as independent variables is true. A study by Sottas (Pierre-Edouard Sottas, Norbert Baume, Christophe Saudan, Carine Schweizer, Matthias Kamber, Martial Saugy; Bayesian detection of abnormal values in longitudinal biomarkers with an application to T/E ratio, Biostatistics, Volume 8, Issue 2, 1 Apr. 2007, Pages 285-296) showed that after three days, no correlation can be observed between two values of a biomarker.

(14) In the case of blood sample for instance, the average sampling rate is of the order of several months.

(15) “Biomarker” correspond for the present invention to the level of any biological or chemical entity measured from a biological sample of a mammal. Complementarily, biomarkers can also represent physiological parameters of the mammal, which have to be first collected and gathered and then extemporaneously applied in the method according to the invention, thereby not requiring any interaction with the body of the subject.

(16) The biomarker(s) is (are) for instance one or more of biomarker(s) chosen in the following list: the concentrations of ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels, complete blood count, platelets, reticulocytes, soluble transferrin receptor, vitamin B9 in red blood cell, blood sugar, cholesterol, triglycerides, serum glutamic oxaloacetic transaminase (SGOT), serum glutamate pyruvate transaminase (SGPT), gamma-glutamyltransferase (γ-GT), lactate dehydrogenase (LDH), bilirubin, electrolytes (e.g. Na.sup.+, Cl.sup.−, K.sup.+, HCO3.sup.−, Ca2.sup.+, Mg2.sup.+), alkaline phosphatases magnesium in red blood cells, creatinine, androstenedione, urea, uric acid, haptoglobin, C-reactive protein (CRP), transthyretin, orosomucoid, creatine phosphokinase (CPK), inorganic phosphate (PO4), thyroid-stimulating hormone (TSH), testosterone, cortisol, erythropoietin (EPO), ferritin, luteinizing hormone (LH), Insulin-like growth factor 1 (IGF-1), osteocalcin, calcifediol (25 OHD3).

(17) In a particular aspect, the biomarker(s) is (are) for instance one or more of biomarker(s) chosen in the following list: the concentrations of ferritin, serum iron, hemoglobin, erythrocyte count, hematocrit levels.

(18) Biomarkers corresponding to the level of any biological or chemical entity are measured from a sample of a subject, for example, a fluid sample as blood, plasma, serum or urine.

(19) Examples of basic physiological parameters of the art are ECG, heart rate, respiratory rate, respiratory volume, body temperature, blood pressure, electromyogram measured by any technical mean of the art.

(20) The terms “health event” correspond to any situation related to the state of health of a subject, being suspected from the detection using the method of the invention of an abnormal value in a series for one or more biomarker.

(21) In the present description, reference will be made to a human subject but the disclosed method applies to any mammal.

(22) The rationale of the invention is based on the fact that the joint distribution of some appropriate residuals is free of the parameters in a Gaussian context.

(23) In a first step E1, several biological samples of the mammal, collected at different periods of time are analyzed. For each sample, a value related to at least one of the precited biomarker is acquired. Therefore, for each mammal, a series of values x.sub.1, x.sub.2, . . . , x.sub.n is determined in step E1.

(24) Those values represent the evolution of the biomarker in function of the time. Typically, index 1 refers to the oldest measurement while index n generally refers to the latest one.

(25) Step E1 can encompass the in vitro analysis run in labs on the blood sample. In a particular aspect step E1 also encompasses the tests run on the individual to obtain a value of a biomarker, more particularly a biomarker related to a physiological parameter. In another particular aspect, step E1 does not encompass the test run on the individual but only the gathering and formatting of the data related to said biomarkers in order to allow their processing in the further steps of the method. In a more particular aspect, step E1, when related to a physiological parameter does not encompass the test run on the individual but only the gathering and the formatting of the data related to said physiological parameter.

(26) In second step E2, the series of values x.sub.1, x.sub.2, . . . , x.sub.n are stored on a database DB. This database preferably contains several series related to different biomarkers

(27) As show on FIG. 1, which illustrates a system according to an embodiment of the invention, the database is stored on a memory 12, which can be ROM or RAM, of a calculation unit 10. The calculation unit 10 also comprises processing means 14, such as a processor, to compute data from the database DB of the memory 12. To implement data in the memory 12, the calculation unit 10 comprises interface means 16. The processing means 14 comprises calculation means, comparing means and instruction means (all of them can be included in the same processor for instance).

(28) The system can also comprise acquisition means (sensors, laboratory materials, etc.) which can be used for during step E1.

(29) The results obtained through the calculation unit 10 are shown on displaying means 20, for instance a screen. Communication between the calculation unit 10 and the displaying means 20 is achieved via a wire for instance (VGA, HMDI, etc.)

(30) The calculation unit 10 can be a personal computer or a delocalized server (cloud computing) communicating with a local interface for instance through a network (ethernet, WIFI, etc.)

(31) Before being computed by the processor according to the method that shall be disclosed herebelow, the series of values x.sub.1, x.sub.2, . . . , x.sub.n can be treated to be in a proper shape in a step ET. Step ET will be illustrated later.

(32) The database DB is updated with data obtained from different processes.

(33) The method of the invention used the maximum of a “studentized” form. Several embodiments are going to be disclosed herebelow.

(34) That method allows detecting an abnormal value (or an abnormal subseries of values) in a series of values.

(35) The invention will be disclosed in details for the first embodiment. The other embodiments will be disclosed more briefly.

(36) FIG. 2 illustrates the main steps of a method according to an embodiment.

(37) Method 1 of the Invention

(38) In a first aspect of the invention, the method proposes a global test, defined by the following indicator:

(39) $T_{n} = \max_{i \in {1, .Math., n}} .Math. R_{n, i} .Math.$

(40) Where

(41) $R_{n, i} = \frac{X_{i} - {\overline{X}}_{n, - i}}{{\hat{σ}}_{n, - i} \sqrt{1 + \frac{1}{n - 1}}}$

(42) R.sub.n,i being the studentized residual form

(43) ${\overline{X}}_{n, - i} = \frac{1}{n - 1} {.Math.}_{k = 1, k \neq i}^{n - 1} X_{k} {\hat{σ}}_{n, - i}^{2} = \frac{1}{n - 2} {.Math.}_{k = 1, k \neq i}^{n - 1} {(X_{k} - {\overline{X}}_{n, - i})}^{2}$

(44) In that embodiment, the studentized residual form R.sub.n,i is actually a studentized residual.

(45) This method is based on the maximum of n (non independent) variables with Student(n−2) distribution.

(46) Under the null hypothesis H.sub.0, μ.sub.1=μ.sub.2= . . . =μ.sub.n=μ, the distribution of T.sub.n does not depend on the unknown parameters μ,σ.sup.2 and can therefore be tabulated.

(47) Indeed, although an analytical resolution is not possible, a largely good enough precision can be obtained through tabulation (for instance twenty million tries per threshold are easily computable with standard computational means, to allow a 10.sup.−3 precision).

(48) Once the indicators T.sub.n has been implemented, the processor 14 computes the calculus in a step E3. As a matter of fact, at least n operations are calculated: each value of student residuals form R.sub.n,i and then at least one operation to extract the maximum value t.sub.n of T.sub.n.

(49) According to that method, for one series of samples x.sub.1, x.sub.2, . . . , x.sub.n representing the values of the variables X.sub.1, X.sub.2, . . . , X.sub.n the step of calculation shall provide a single output, noted T.sub.n.

(50) The next step E4 is the comparison of the observed value t.sub.n of the indicator T.sub.n to a threshold. The threshold is taken from a quantile table, representing the distribution of T.sub.n.

(51) The quantile tables are typically precomputed and stored in the memory.

(52) For a threshold α, it can be considered that the series (i.e. at least one value of the series) is abnormal if t.sub.n>c.sub.α,n, where c.sub.α,n is such that P.sub.n([c.sub.α,n,∞[=α.

(53) Therefore the processor computes that comparison with the results t.sub.n obtained at the end of the former step E3.

(54) In a step E5, the result of the comparison is displayed on the displaying means 20 if t.sub.n>c.sub.α,n, where c.sub.α,n is the quantile of order 1−α of the distribution of T.sub.n. In other words, if the observed value t.sub.n of the indicator T.sub.n is above the quantile, a presence of an abnormal value in the series is reported on the displaying means, thereby indicating the occurrence of an health event for said mammal.

(55) Table 1a provides values for the quantile of the distribution of T.sub.n.

(56) The method can be applied as soon as n≥3.

(57) With the information of an abnormal value, application in the doping control or the medical follow-up of individuals are immediate.

(58) In another embodiment this method can be equally applied to multivariate extension, that is when X.sub.1, X.sub.2, . . . , X.sub.n are n independent R.sup.d valued random vector, where X.sub.i˜N(μ.sub.i,C) and C (the covariance matrix) is assumed to be invertible.

(59) To detect if a vector of the series is abnormal, indicator T.sub.n is implemented in the following way:

(60) $T_{n} = \max_{i \in {1, .Math., n}} R_{n, i}$

(61) Where

(62) $R_{n, i} = \frac{n - 1}{nd} {(X_{i} - {\overline{X}}_{n, - i})}^{'} C_{n, - i}^{- 1} (X_{i} - {\overline{X}}_{n, - i})$

(63) R.sub.n,i being a kind of studentized residual form, more precisely a normalized length of the residual vector, z′ being the transpose vector of vector z.

(64) ${\overline{X}}_{n, - i} = \frac{1}{n - 1} {.Math.}_{k = 1, k \neq i}^{n - 1} X_{k} C_{n, - i} = \frac{1}{n - 1 - d} {.Math.}_{k = 1, k \neq i}^{n - 1} (X_{k} - {\overline{X}}_{n, - i}) {(X_{k} - {\overline{X}}_{n, - i})}^{'}$

(65) Step E2 is identical, excepted for the indicators to be applied.

(66) As previously explained, under the null hypothesis, μ.sub.1=μ.sub.2= . . . =μ.sub.n=μ, the distribution of T.sub.n does not depend on μ or C and can therefore be tabulated.

(67) The comparison step E4 is the same and the displaying step E5 is also the same.

(68) This method can be applied as soon as n≥d+2.

(69) Table 1b or 1c provides values for the quantile of the distribution of T.sub.n (for d=2 and d=3).

(70) The univariate case described before is just a specific case of the multivariate case (up to a square in the expression of T.sub.n).

(71) Method 2 of the Invention

(72) A further implementation of the invention would be to detect a series of consecutive observations that are abnormal compared to the rest of the series.

(73) Formally speaking, there is need to know if there is a set I={i, . . . , j} of consecutive integers, for which the expectations of the X.sub.k, k∈I is different from the expectation of the other variables, that is μ.sub.k=μ.sub.i=μ.sub.A if k, | do not belong to I and μ.sub.k=μ.sub.l=μ.sub.B if k, | belong to I.

(74) The rationale behind method 2 is the same as behind method 1. The maximum of a studentized form is calculated by the processor 14.

(75) More precisely, the proposed indicator is

(76) $T_{n} = \max_{i \in φ} .Math. R_{n, I} .Math.$

(77) Where φ is the collection of all possible intervals I included in {1, . . . , n} with length 1≤|I|<n, {1, . . . , n}=I∪Ī and I∩Ī=ϕ

(78) 0 $R_{n, I} = \frac{{\overline{X}}_{I} - {\overline{X}}_{\overline{I}}}{{\hat{σ}}_{n, I} \sqrt{1 + \frac{1}{n - 1}}} {\overline{X}}_{I} = \frac{1}{.Math. I .Math.} \underset{k \in I}{.Math.} X_{k} {\overline{X}}_{\overline{I}} = \frac{1}{n - | I |} \underset{k \in \overline{I}}{.Math.} X_{k} {\hat{σ}}_{n, I}^{2} = \frac{1}{n - 2} (\underset{k \in I}{.Math.} {(X_{k} - {\overline{X}}_{I})}^{2} + \underset{k \in \overline{I}}{.Math.} {(X_{k} - {\overline{X}}_{\overline{I}})}^{2})$

(79) As the indicator is working on intervals I, the studentized form has been adapted from the studentized residual stricto sensu of Method 1.

(80) Under the null hypothesis, again, the distribution of T.sub.n does not depend on the unknown parameters μ,σ.sup.2 and can be tabulated.

(81) The comparison and displaying steps E4, E5 are similar to the ones previously disclosed.

(82) If the comparison step E4 is positive, then it means that there is a subseries of consecutive values that can be considered as abnormal compared to the rest of the series.

(83) Table 2 provides values for the quantile of the distribution of T.sub.n.

(84) Like method 1, method 2 can be extended to multivariate extension.

(85) Method 3 of the Invention

(86) At last, the method of the invention can be applied to two series of observations x.sub.1, x.sub.2, . . . , x.sub.n.sub.1, y.sub.1, y.sub.2, . . . , y.sub.n.sub.2. In that case, there is a need to know if one of these observations is abnormal given the other ones in the same subsample.

(87) This method allows considering the case where there is an influence of season on the expectation of the variables. Indeed, the behavior of the biomarkers may differ according to the period of the year (summer and winter for instance). The sample is thus split into two subsamples x.sub.1, x.sub.2, . . . , x.sub.n.sub.1 and y.sub.1, y.sub.2, . . . , y.sub.n.sub.2.

(88) The proposed indicator is

(89) $T_{n} = \max {\max_{i \in {1, .Math., n_{1}}} .Math. R_{n_{1}, i} .Math., \max_{j \in {1, .Math., n_{2}}} .Math. R_{n_{2}, j} .Math.}$

(90) Where

(91) $R_{n_{1}, i} = \frac{{\overline{X}}_{i} - {\overline{X}}_{n_{1}, - i}}{{\hat{σ}}_{X, - i, Y} \sqrt{1 + \frac{1}{n_{1} - 1}}} R_{n_{2}, j} = \frac{{\overline{Y}}_{j} - {\overline{Y}}_{n_{2}, - j}}{{\hat{σ}}_{Y, - j, X} \sqrt{1 + \frac{1}{n_{2} - 1}}}$

(92) R.sub.n.sub.1.sub.,1, R.sub.n.sub.2.sub.,j being studentized residuals,

(93) ${\overline{X}}_{n_{1}, - i} = \frac{1}{n_{1} - 1} {.Math.}_{k = 1, k \neq i}^{n_{1}} X_{k} {\overline{Y}}_{n_{2}, - j} = \frac{1}{n_{2} - 1} {.Math.}_{k = 1, k \neq j}^{n_{2}} Y_{k} {\hat{σ}}_{X, - i, Y}^{2} = \frac{1}{n - 3} ({.Math.}_{k = 1, k \neq i}^{n_{1}} {(X_{k} - {\overline{X}}_{n_{1}, - i})}^{2} + {.Math.}_{k = 1}^{n_{2}} {(Y_{k} - {\overline{Y}}_{n_{2}})}^{2}) {\hat{σ}}_{Y, - j, X}^{2} = \frac{1}{n - 3} ({.Math.}_{k = 1}^{n_{1}} {(X_{k} - {\overline{X}}_{n_{1}})}^{2} + {.Math.}_{k = 1, k \neq j}^{n_{2}} {(Y_{k} - {\overline{Y}}_{n_{2}, - j})}^{2})$

(94) Under the null hypothesis again, the distribution of T.sub.n does not depend on the unknown parameters μ.sub.A,μ.sub.B,σ.sup.2 and can therefore be tabulated.

(95) Table 3 provides values for the quantile of the distribution of T.sub.n.

(96) This method can be applied as soon as n.sub.1≥2 and n.sub.2≥2

(97) The same extension to multivariate cases also applies.

(98) In that case, let us consider X.sub.1, x.sub.2, . . . , X.sub.n.sub.1 and Y.sub.1, Y.sub.2, . . . , Y.sub.n.sub.2 and two independent series of independent Gaussian R.sup.d valued random vectors with X.sub.i˜N(μ.sub.x.sub.i,C), Y.sub.j˜N(μ.sub.Y.sub.j,C) and C is assumed to be invertible.

(99) The purpose is to detect if the vector x.sub.k or y.sub.l is abnormal given the other ones in the same subsample.

(100) The following indicators is proposed

(101) $T_{n} = \max {\begin{matrix} \max_{i \in {1, .Math., n_{1}}} .Math. R_{n_{1}, i} .Math., \\ \max_{j \in {1, .Math., n_{2}}} .Math. R_{n_{2}, j} .Math. \end{matrix}}$

(102) Where

(103) $R_{n_{1}, i} = \frac{n_{1} - 1}{n_{1} d} {(X_{i} - {\overline{X}}_{n_{1}, - i})}^{'} C_{X, - i, Y}^{- 1} (X_{i} - {\overline{X}}_{n_{1}, - i}) R_{n_{2}, j} = \frac{n_{2} - 1}{n_{2} d} {(Y_{j} - {\overline{Y}}_{n_{2}, - j})}^{'} C_{Y, - j, X}^{- 1} (Y_{j} - {\overline{X}}_{n_{2}, - i})$

(104) R.sub.n.sub.1.sub.,i, R.sub.n.sub.2.sub.,j being the normalized lengths of the residual vectors,

(105) ${\overline{X}}_{n_{1}, - i} = \frac{1}{n_{1} - 1} {.Math.}_{k = 1, k \neq i}^{n_{1}} X_{k} {\overline{Y}}_{n_{2}, - j} = \frac{1}{n_{2} - 1} {.Math.}_{k = 1, k \neq j}^{n_{2}} Y_{k} C_{X, - i, Y} = \frac{1}{n - 2 - d} ({.Math.}_{k = 1, k \neq i}^{n_{1}} (X_{k} - {\overline{X}}_{n_{1}, - i}) {(X_{k} - {\overline{X}}_{n_{1}, - i})}^{'} + {.Math.}_{k = 1}^{n_{2}} (Y_{k} - {\overline{Y}}_{n_{2}}) {(Y_{k} - {\overline{Y}}_{n_{2}})}^{'}) C_{Y, - j, X} = \frac{1}{n - 2 - d} ({.Math.}_{k = 1}^{n_{1}} (X_{k} - {\overline{X}}_{n_{1}}) {(X_{k} - {\overline{X}}_{n_{1}})}^{'} + {.Math.}_{k = 1, k \neq j}^{n_{2}} (Y_{k} - {\overline{Y}}_{n_{2}, - j}) {(Y_{k} - {\overline{Y}}_{n_{2}, - j})}^{'})$

(106) Under the null hypothesis, μ.sub.x.sub.1= . . . =μ.sub.X.sub.n1=μ.sub.X and μ.sub.Y.sub.1= . . . =μ.sub.Y.sub.n2=μ.sub.Y, the distribution of Tn does not depend on the unknown parameters μ.sub.X, μ.sub.Y, C and can be tabulated.

(107) The comparison and displaying steps E4, E5 are the same as previously explained.

(108) The table of the values for the quantile of the distribution of T.sub.n can be computed (not disclosed in the Tables here).

(109) This method can be applied as soon as n.sub.1≥2 and n.sub.2≥2 and n.sub.2≥d+3.

(110) The steps of the different embodiments of the disclosed method are summarized in flow chart in FIG. 7.

(111) Further Elements

(112) To check that the variables are normally distributed, a Shapiro test can be run on the stored values (step ES). This step ES can be run during step E2 for instance, or between E2 and E3.

(113) To ensure that some variables are normally distributed, some series are log-transformed in a further step ET. Any mathematical transformation that works would be applicable.

(114) This step ET can be run during step E2 for instance, or between steps E2 and E3.

(115) The computation of the quantiles is known for a skilled person and will not be disclosed here.

(116) Once step E5 has been achieved and it is known whether or not an abnormal value (or a subseries of value) exists, that abnormal value can be identified from the series and extracted in step E6 and reported on the displaying means 20 in step E7.

(117) The abnormal value is the value that maximizes the indicators T.sub.n.

(118) Step E5 and E6 are not resources-consuming since all the values of the indicators T.sub.n, have already been computed.

(119) The present invention comprises a new and simple method based on maxima of Z-scores that do not rely on the use of data from an external population (reference population). This method is extended to three Z-scores-based analyses for detecting abnormal values in a series of biological measures from an individual's sample with particular specificities. For example, making use of this invention, it is possible to assess the individual baseline while taking into account the seasonal changes that alter the values of biomarkers. The multivariate approach is also developed in order to avoid multi-test issues and to take care of the possible correlations between biomarkers.

(120) As it will be disclosed below, the embodiments of the invention (Methods 1 to 3) have been tested on the follow-up of elite athletes and the results are in accordance with the expected false discovery rate, in most of the cases.

(121) Performance of Method 1

(122) This example shows how Method 1 detects an abnormal value when all the others are identically distributed.

(123) 1×10.sup.6 sequences consisting of n independent Gaussian random variables are simulated. The third random variable has distribution N(μ,1), with μ>0, whereas all the other random variables have distribution N(0,1). The simulations are performed for a number of observations n ranging from 4 to 15 and μ ranging from 0 to 10 with a step of 0.1.

(124) The same procedure is used for the multivariate analysis, with d=2. The third random vector has distribution n({tilde over (μ)},C), whereas all the other random vectors have distribution N(0,Id). Here ũ=(μ,μ) with the same possible values of μ as in the uni-variate case, and the covariance matrix C is given by:

(125) $C = [\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}]$

(126) The results are given in FIG. 3 (Method 1: sketches A and C and its multivariate extension: sketches B and D, as a function of μ and μ.sub.d for the levels α=5% (A,B) and α=1% (C,D); the dash line μ=3 is merely an indicator).

(127) As expected, the power curve is increasing with n and μ. Percentages of abnormal sequences detected for n=9 and μ, respectively, equals to 2, 4 and 6 are 10.53%, 51.06% and 90.57%.

(128) Application to Football Players

(129) After obtaining the approval of the Institutional Ethics Committee the 3 methods were applied to a database of elite soccer players. A) Application to blood biomarkers from a sample of 2577 soccer players.

(130) It consists of five typical biological markers from 2577 male soccer players from the French elite leagues 1 & 2. Biomarkers include concentrations of ferritin (μmol/L), serum iron (μmol/L), hemoglobin (g/L), erythrocytes (T/L) and hematocrit levels (%).

(131) The biomarkers were collected every six months in July/August and in January/March from 2006 to 2012 for a total of twelve collections. The large interval between two measures (around six months) allowed for independent sampling (Sharpe and others (2006)). Only the series for which at least five measurements over the twelve possible measurements were available had been analyzed, totalizing: ferritin & serum iron levels from 757 players, erythrocytes & hematocrit levels from 799 players and for hemoglobin levels from 807 players because of the high number of transfers between clubs, injuries and the progressive inclusion of new clubs in the elite league.

(132) Moreover, a technical issue with the sampling instrument resulted in the loss of the data in the markers of the 2009 July/August collection. According to Custer and others (1995); Tufts and others (1985) the measure of the ferritin and serum iron have a log-normal distribution, so the logarithm of the observations for these two biomarkers was used (step E1).

(133) The individual series of biomarkers must comply with the conditions related to the normal distribution: each individual and each biomarker, step ES was conducted and a Shapiro test was achieved to confirm that it is not unrealistic to assume that the observations are drawn from a normal sample.

(134) Some non-detailed analysis on the data set is run to conclude that only ferritin and serum iron are eligible for methods 1 and 2. For the other biomarkers, the third method can be applied as they were found subjected to a significant seasonal variability.

(135) For the two markers ferritin and serum iron, the empirical distribution of T.sub.n (Method 1) is close to the theoretical one under H0 (see FIG. 4: distribution of estimated T.sub.n with its correspond simulated distribution 10.sup.6 individuals, gray curve) for erythrocytes (A) and hematocrit (B)). This confirms that the quantiles of the theoretical distribution of T.sub.n can be used to detect abnormal values on our dataset.

(136) The frequency of abnormal series detected by the procedures are reported in FIG. 5 for a level α=5%.

(137) Most of the results are not too far from the expected false discovery rate: between 5% and 6.7% for Ferritin/Method 1, Serum iron/Method 2 and all three others biomarkers for Method 3; 8.19% for Serum Iron/Method 1. However, the percentage of abnormal sub-series for Ferritin/Method 2 is close to 11%, hence significantly different from the expected level α=5%. Though further analyses are required to shed light on these series, and notably on the actual cause for these abnormal values, method of the invention is found effective in detecting abnormal values within series of data related to biomarkers. Indeed, the same percentage of abnormal values is obtained by applying the multi-variate version of Method 1 or by applying the two uni-variate procedures (6.9% in both cases).

(138) In FIG. 6a and FIG. 6b, some examples are given of abnormal observations detected by the three methods, for different biomarkers: method 1 (A, B, C), Method 2 (D, E, F) and Method 3 (G, H, I); panels J, K, L show the multivariate version of Method 1. The y-axis represents the biomarker values (units depend on the biomarker), and the x-axis corresponds to time (semesters). B) Application to blood biomarkers from a sample of 3936 soccer players and blind analysis validation study.

(139) As above, concentrations of ferritin (μmol/L), serum iron (μmol/L), hemoglobin (g/L), erythrocytes (T/L) and hematocrit levels (%) were five typical biological markers from 3936 male soccer players from the French elite leagues 1 & 2 collected from 21/06/2005 to 05/10/2017.

(140) The frequency of abnormal series detected by the procedures are reported in FIG. 16 for a level α=1, 2.5, 5 or 10%. The more alpha is high, the more the number of supposed abnormal series is important.

(141) Results confirm those of the above analysis. A percentage of abnormal series close to that of false discovery rate is observed, except for ferritin/Method 2, for which the number of abnormal series is found unexpectedly high. (15.5% for α=5%). As shown for two typical cases depicted in FIGS. 17 and 18 ferritin levels are found to increase or to decrease as a function of time for several individuals of the tested population. In FIGS. 17 and 18, the x-axis corresponds to the indices of the samples, starting at x=1. The y-axis is the ferritin value in μmol/L.

(142) Validation of Data

(143) Data related to hemoglobin levels of 60 individuals (comprising at least 7 measures) randomly chosen were extracted from the base and independently analyzed by a medicine doctor (data not transformed with log) and by the methods of the invention (choosing a threshold as low as α=2.5) for seeking abnormal profiles. The abnormal profiles identified by the doctor and those while using the methods of the invention 1, 2 or 3 were then compared.

(144) The doctor spotted out 13 “abnormal” series (22% of “abnormal” series) of the 60 series: four of the four series (100%) detected by method 1 are found in the doctor's list. Six of the seven series (85.7%) detected by method 2 are found also in the doctor's list. Five of the seven series (71.4%) detected by method 3 are in the doctor's list.

(145) Then it can be inferred that methods 1, 2 and 3 are in line with the diagnostic of the doctor, and therefore can be used to detect abnormal series with good accuracy. A lower concordance rate is found between the doctor analysis and results identified by method 3, this might be explained by the fact that seasonality has not been considered by the doctor, but used in method 3.

Method for detecting abnormal values of a biomarker

Assignee

Inventors

Cpc classification

Classification Explorer

A61B5/004

HUMAN NECESSITIES

Classification Explorer

A61B5/0004

HUMAN NECESSITIES

Classification Explorer

G16H50/20

PHYSICS

Classification Explorer

G06F17/18

PHYSICS

Classification Explorer

A61B5/7264

HUMAN NECESSITIES

Classification Explorer

G16H50/30

PHYSICS

International classification

Classification Explorer

G01N33/48

PHYSICS

Classification Explorer

G16H50/20

PHYSICS

Classification Explorer

G16H50/30

PHYSICS

Classification Explorer

A61B5/00

HUMAN NECESSITIES

Classification Explorer

G06F17/18

PHYSICS

Classification Explorer

G01N33/50

PHYSICS

Abstract

Claims

Description