PATIENT DIABETES MONITORING SYSTEM WITH CLUSTERING OF UNSUPERVISED DAILY CGM PROFILES (OR INSULIN PROFILES) AND METHOD THEREOF
20170251980 · 2017-09-07
Inventors
Cpc classification
A61B5/14532
HUMAN NECESSITIES
G06F2218/00
PHYSICS
G06F17/16
PHYSICS
A61B5/7246
HUMAN NECESSITIES
A61B5/14503
HUMAN NECESSITIES
International classification
A61B5/00
HUMAN NECESSITIES
A61B5/145
HUMAN NECESSITIES
Abstract
A patient diabetes monitoring system with an efficient unsupervised daily monitoring profile clustering algorithm, a method, and a computer product thereof are disclosed. The system may include a physiological data input device or sensor which receives a plurality of physiological measurements to generate a dataset, a memory which stores a clustering algorithm, and a processor. The clustering algorithm when executed by the processor, causes the processor to automatically pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles, thereby generating a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output an optimum number of similarity clusters found by the processor from the similarity matrix.
Claims
1. A patient diabetes monitoring system for a patient comprising: a physiological data input device which acquires a plurality of physiological measurements of the patient within a time window to generate at least one time window dataset of collected unsupervised daily monitoring profiles; a memory storing an unsupervised daily monitoring profile clustering algorithm; and a processor in communication with said input device to receive said generated at least one time window dataset, and in communication with said memory in order to execute said unsupervised daily monitoring profile clustering algorithm, wherein said unsupervised daily monitoring profile clustering algorithm when executed by said processor causes said processor automatically to: pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles to generate a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output an optimum number of similarity clusters found by the processor from the similarity matrix.
2. The system of claim 1, wherein the pre-processing of the dataset controls the amount of bias/aggressiveness via a data transformation of the dataset that makes the pre-processed dataset symmetric for retrospective analysis.
3. The system of claim 2, wherein the data transformation for retrospective analysis result from processing by the dataset with a hazard function defined by: G.sub.t=α*ln(G−β)−α*ln(α), where parameter α=T.sub.c−β, and parameter β=D.sub.r−1, in which T.sub.c is a center of a transformed space, D.sub.r is a minimum defined glucose level, G.sub.t is the transformed data of blood glucose concentration measurements provided in the dataset, and “g” is original glucose level values of the blood glucose concentration measurements provided in the dataset and measured in millimoles per liter.
4. The system of claim 1, wherein after the pre-processing of the dataset, the pre-processed dataset is then processed to build the similarity matrix to account for time-series dynamics in the pre-processed dataset.
5. The system of claim 4, wherein the time-series dynamics in the pre-processed dataset is accounted for by a distance matrix that accounts for glucose value levels in an actual space or transformed space as well as via a rate of change of the glucose value levels to compute a distance between each pair of similar time series of data presented in the pre-processed dataset.
6. The system of claim 5, wherein the distance matrix is defined by:
d(X.sub.i,Y.sub.i)=k*|X.sub.i−Y.sub.i|+(1−k)*|(m.sub.x−m.sub.y)*(X.sub.i−Y.sub.i)|, where, X.sub.i is a glucose level value in a first time series X at time i, Y.sub.i is a glucose value in a second time series Y at time i, k is a weighing factor, m.sub.x is the slope at time i for the first time series X.sub.i, and m.sub.y is the slope at time i for time series X.sub.i.
7. The system of claim 6, wherein a sum of distances between the first and second time series X and Y is used in an elastic alignment procedure to account for varying temporal responses/shifts in the pre-processed dataset.
8. The system of claim 7, wherein the elastic alignment procedure is a dynamic time warping process which allows for elastic matching of the first and second time series X and Y by local compression or elongation along a time axis.
9. The system of claim 8, wherein the dynamic time warping process results in any penalty being added to the sum of the distances between the first and second time series X and Y.
10. The system of claim 9, wherein the first and second time series are CGM curves.
11. The system of claim 9, wherein the first and second time series X and Y of the pre-processed dataset are processed by the processor with the penalty as follows: (e) Start at origin, distance between curves of the first time series X and the second time series Y is: X(1,1)=Y(1,1); (f) Keep first row a constant distance by: X(i,1)=X(i−1,1)+Y(i,1); (g) Keep first column constant by: X(1,j)=X(1,j−1)+Y(1,j); and (h) Carry on for next row and next column to end of search space of the pre-processed dataset as defined by: X(i,j)=min(X(i,j−1), X(i−1,j−1), X(i−1,j))+Y(i,j).
12. The system of claim 1, wherein output of the build a similarity matrix process is checked against one or more conditions to evaluate if a determined alignment path is a valid path, the one or more conditions being: monotonicity, continuity, boundary conditions, search window, and slope.
13. The system of claim 12, wherein output of the similarity matrix process is then used in an agglomerative clustering process to output similarity clusters, the agglomerative clustering process having the following pseudo code: (a) Compute a distance matrix between data points of the output; (b) Let each of the data points be a cluster; (c) Repeat following: i. Merge two closest clusters, and ii. Update the distance matrix; and (d) Do Repeat until only a single cluster remains.
14. The system of claim 13, wherein an inflection point in the distance matrix is calculated by the processor to find the optimal minimum number of clusters.
15. The system of claim 14, wherein if d(l) is a distance curve in the distance matrix, d′(l) is a first derivative of the distance curve, and d″(l) is a second derivative of the distance curve, and if d′(l) exists, then the optimal minimum number of clusters along the curve d(l) is calculated by the processor to be a point l where d″(l)=0.
16. The system of claim 14, wherein the processor calculates the inflection point as follows: (e) Let first k points on the distance curve d(l) with p points be 1, 2, . . . k, and find slopes: m.sub.1=d(2)−d(1)/(2−1), m.sub.2=d(3)−d(1)/(3−1), . . . , m.sub.k=d(n)−d(1)/(n−1); (f) Calculate median of slopes from step (a): m.sub.a=median(m.sub.1, m.sub.2 . . . m.sub.k); (g) Let last n points on the distance curve d(l) with p points be p−n, . . . , p−1, p, and find slopes: m.sub.p=d(p)−d(p−1)/(p−(p−1)), m.sub.2=d(p)−d(p−2)/(p−(p−2)), . . . m.sub.n=d(p)−d(p−n)/(p−(p−n)); and (h) Calculate median of slopes from step (c): m.sub.b=median (m.sub.1, m.sub.2 . . . m.sub.n), where a first line defined by the median slope m.sub.a with a starting point as the first point along the distance curve d(l), and second line being defined by the median slope m.sub.b with a starting point as the end point along the distance curve d(l), the inflection point being a projection of an intersection point between the first and second lines on the distance curve d(l) denoted by l.sub.p, and if inflection point l.sub.p is not an integer, then the optimal minimum number of clusters L.sub.min is found by:
17. The system of claim 1, wherein the physiological data input device is a CGM.
18. A non-transitory computer-readable medium that stores a program that, when executed by a processor, causes the processor to execute, via a patient diabetes monitoring system having a physiological data input device which acquires a plurality of physiological measurements of the patient within a time window to generate at least one time window dataset of collected unsupervised daily monitoring profiles and which is in communication with said processor, such that said processor receives said generated at least one time window dataset, and in communication with said memory, an unsupervised daily monitoring profile clustering algorithm that causes said processor to automatically: pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles to generate a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output an optimum number of similarity clusters.
19. The non-transitory computer-readable medium of claim 18, wherein CGM profile or insulin profile is the at least one time window dataset from a patient, and comprises raw data, transformed data, raw data associated with related data tags, transformed data associated with related data tags, or combinations thereof.
20. A method for identifying day(s) where a diabetes control therapy was inadequate for a patient using a monitoring system comprising a display device, a physiological data input device and a processor, the method comprising: receiving automatically from physiological data input device a plurality of physiological measurements of the patient within a time window to generate at least one time window dataset of collected unsupervised daily monitoring profiles; and executing from a memory a stored an unsupervised daily monitoring profile clustering algorithm and causing the processor automatically to: pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles, thereby generating a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output on the display an optimum number of similarity clusters found by the processor from the similarity matrix.
Description
BRIEF DESCRIPTION OF THE SEVERAL DRAWING VIEWS
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031] The drawings are not intended to be limiting in any way, and it is contemplated that various embodiments of the invention may be carried out in a variety of other ways, including those not necessarily depicted in the drawings. The accompanying drawings incorporated in and forming a part of the specification illustrate several aspects of the present invention, and together with the description serve to explain the principles of the invention; it being understood, however, that this invention is not limited to the precise arrangements shown.
DETAILED DESCRIPTION
[0032] The following description of certain examples should not be used to limit the scope of the present invention. Other features, aspects, and advantages of the versions disclosed herein will become apparent to those skilled in the art from the following description, which is by way of illustration, one of the best modes contemplated for carrying out the invention. As will be realized, the versions described herein are capable of other different and obvious aspects, all without departing from the invention. Accordingly, the drawings and descriptions should be regarded as illustrative in nature and not restrictive.
Exemplary Devices and Methods
[0033]
[0034] The device 105 may further include an input mechanism or user interface 145 to input information and/or make data/output requests. Exemplary input mechanisms or user interfaces 145 may include a touch screen, input buttons, a keyboard, a mouse, a microphone, and combinations thereof. In one embodiment, the patient diabetes monitoring system 100 enables continuous glucose monitoring in which device 105 is operable to take multiple measurements of a concentration of glucose or a substance indicative of the concentration or presence of glucose via the physiological data input device 110, and process that dataset, e.g. a dataset 131 containing a plurality of unsupervised daily CGM glucose measurements (CGM profiles), using the processor 120 to find similar patterns represented in the dataset. As used herein, continuous (or continual) glucose monitor (or monitoring) may include the period in which monitoring of glucose concentration is continuously, continually, and/or intermittently (e.g., regularly or irregularly) performed.
[0035] Referring to
[0036] As used herein, the physiological data input device 110 may be a continuous device, for example, a subcutaneous, transdermal (e.g., transcutaneous), or intravascular device. However, it should be understood that the devices and methods described herein can be applied to any device (including external devices) capable of detecting a concentration of glucose and providing an output signal that represents the concentration of glucose. The physiological data input device 110 in another embodiment can be hardware and/or software which can analyze a plurality of intermittent biological samples, for example, blood, interstitial fluid, other desired biological fluid, etc. The physiological data input device 110 can use any method of glucose-sensing, including enzymatic, chemical, physical, electrochemical, spectrophotometric, polarimetric, calorimetric, radiometric, etc. The physiological data input device 110 may use any method, including invasive, minimally invasive, and non-invasive sensing techniques, to provide an output signal indicative of, e.g., the glucose concentration or other physiological data. The output signal can be a raw data measurement that is used to provide a useful value of glucose to a user, such as a patient or physician, who may be using the device. Smoothing, evaluation methods, etc. may be applied to the raw data measurement to provide transformed data measurements to the user, such as discussed hereafter in later sections with reference made to
[0037] Data measurements provided in the dataset 131 may be derived from the intermittent collection of data comprising measurements made by a device, such as e.g., the physiological data input device 110 (for example, a current measurement that ultimately corresponds to a glucose amount or concentration). The data measurements may be further associated with relevant data tags. By way of example only, a data tag may include when a meal was eaten, insulin was given, exercise took place, etc. Additionally, a data tag may include the amount of nutritional content in a meal, insulin, oral medication, exercise, etc. The data measurements may further comprise determining transformed data measurements from one or more raw data measurements and associating those transformed data measurements with relevant data tags.
[0038] The data measurements in the dataset 131 are obtained from a particular biological system (e.g., blood, interstitial fluid, etc.) using a device, such as e.g., the physiological data input device 110, maintained in operative contact with the biological system over a time window. The time window may be a defined period of time (e.g., hour(s), day(s), etc.) to obtain a series of data measurements (e.g., second(s), minute(s), hour(s), etc.) resulting in at least one time window dataset, e.g., dataset 131. The time window may be started and stopped by the diabetic patient 102 as well. By way of example only, the diabetic patient 102 may start the time window at the beginning of a meal and stop the time window at some later date after the meal. The at least one time window dataset (or data measurements) 131 may be collected from a single individual. Alternatively, the at least one time window dataset (or data measurements) 131 may be collected from multiple individuals and compiled into a database, at either the time the at least one time window dataset (or data measurements) 131 was collected or subsequently. The at least one time window dataset 131 may include raw data measurements, transformed data measurements, raw or transformed data measurements associated with data tags, or a combination thereof from the sensor.
[0039] The physiological data input device 110 may be capable of measuring only glucose in one embodiment. Alternately, in other embodiments, the physiological data input device 110 may be capable of measuring any other physiological analyte of interest that is a specific substance or component that is being detected and/or measured by chemical, physical, enzymatic, or optical analysis. The dataset 131 for each physiological analyte is collected and compiled into a multi-analyte database such as, e.g., database 130. In another example, the database 130 can also be formulated by compiling data measurements collected using multiple monitors, each of which measures a single substance, resulting in the multi-analyte database.
[0040] Examples of physiological analytes can include any specific substance, component, or combinations thereof that one is desirous of detecting and/or measuring in a chemical, physical, enzymatic, or optical analysis. Such physiological analytes include, but are not limited to, urate/uric acid, glucose, urea (blood urea nitrogen), lactate and/or lactic acid, hydroxybutyrate, cholesterol, triglycerides, creatine, creatinine, insulin, hematocrit, and hemoglobin), carbonate, calcium, potassium, sodium, chloride, bicarbonate, blood gases (e.g., carbon dioxide, oxygen, etc.), heavy metals (e.g., lead, copper, etc.), lipids, amino acids, enzyme substrates or products indicating a disease state or condition, other markers of disease states or conditions, etc. In the case of multi-analyte data databases, all of the physiological analytes may be related to a single physiologic state or condition; alternatively, in other embodiments, each physiological analyte may be relevant to a different physiological state or condition.
[0041] In still other embodiments, one or more of the above described physiological data/information may be entered manually by the patient 102 to be included in the dataset 131, as well as requested for output (e.g., displayed on display 140, sent to another external device via data interface 115, etc.), via the user interface 145. In still other embodiments, the input device 110 may also include, for example, a controller, microcontroller, processor, microprocessor, etc. that is configured to receive and/or process signals, communicate with processor 120, and generate a CGM profile (or insulin profile). The CGM profile (or insulin profile) can be the most recent dataset 131 (e.g., the most recent at least one time window dataset gathered by the input device 110, a dataset from the current day, hour(s), minute(s), etc. provided in memory 135 and/or database 130) and/or for any other dataset of interest, e.g., historical data (previous day(s), week(s), month(s), year(s), etc.) of the patient 102. The dataset 131 can be provided from the input device 110, the database 130, the memory 135, the user interface 145, and/or from any another external source of patient data that the device 105 may communicate with via the data interface 115. It is to be appreciated that as such the CGM profile (or insulin profile) can be generated from any of the data available to the device 105, and by any method performed by the processor 120, the input device 110 (if provided with processing means), or an external device(s) operating on the data (and provided to the device via the data interface 115), in which to provide on the display 140), a pattern(s) of interest such as e.g., one or more glucose curves 133 depicted by
[0042] Exemplary methods for generating a glucose curve 133 may include: having the processor 120 draw a glucose curve using glucose data measurements provided by the physiological data input device 110, having the processor 120 draw a glucose curve using glucose data measurements read from database 130 and/or memory 135 for the at least one time window or other time periods, having the processor 120 draw a glucose curve using input received via the user interface 145, having the processor 120 select a glucose curve that represents a common behavior or condition (e.g., falling blood glucose during exercise, rise of blood glucose after a meal, etc.) that may be detected in the data of the patient 102, and combinations thereof. In other embodiments, the glucose curve need not be selected from actual glucose data measurements as discussed above in regard to historical and/or external data. The CGM profile (or insulin profile) can also be generated from data resulting from a query inputted via the user interface 145 and run by the processor 120 on recent data gathered by the input device 110 or stored data provided in database 130, memory 135 and/or in other external sources that were queried by the processor 120 via data interface 115. The CGM profile (or insulin profile) may also include any relevant data tags or multi-analyte data, and the generated and/or received CGM profile (or insulin profile) may be stored in the database 130 and/or memory 135 until needed by the processor 120 for an unsupervised daily monitoring profile clustering process discussed hereafter in a later section.
[0043] The data interface 115 may be hardware and/or software which provides the device 105 with the ability to communicate wired and/or wirelessly with other devices and components as discussed hereafter in some embodiments, as well as to read from and write to non-transitory computer-readable products or storage medium, such as non-transitory computer-readable medium 148, in other embodiments. For the purposes of this description, a non-transitory computer readable product or storage medium can be any apparatus that can contain or store, programs and/or code for use by or in connection with processor, apparatus or devices. Examples of a non-transitory computer readable product or storage medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
[0044] Still referring to
[0045] Still referring to
[0046] Still referring to
[0047] Again referring to
[0048]
[0049]
[0050]
[0051]
Pre-Processing
[0052] The purpose of the pre-processing 202 of the dataset 131 is to control the amount of bias/aggressiveness of any penalty, either on hyper side or hypo-side, that may exist in the dataset due to the unsupervised conditions of the data collection as well as to provide a data transformation that makes the transformed data symmetric for better statistical analysis. It is to be appreciated that the current consensus of accepted normal blood sugar levels for healthy people lies between 4.0 to 6.0 mmol/L during pre-prandial state and up to 7.0 mmol/L during post prandial states. For people with diabetes (T1 or T2), recommended normal glucose levels are between 4.0-9.0 mmol/L. Outside these ranges the person is at a “risk” of hyperglycemia if above 9 mmol/L or risk of hypoglycemia if below 4.0 mmol/L. It has been proposed by others to use a hazard function for SMBG measurements to evaluate a risk associated with each BG value. Specifically, others have proposed using equation (1), often called the “Kovatchev function,” as such a hazard function. Equation (1) is as follows:
H(g)=(1.509(log(g).sup.1.0804−5.381)).sup.2 (1).
where “H(g)” is the transformed blood glucose, and “g” is the blood glucose concentration measured in millimoles per liter. See, e.g., Kovatchev et al., “Symmetrization of the blood glucose measurement scale and its applications,” Diabetes Care, 1997, 20, 1655-1658. The hazard function described above in equation (1) has a center at 112.5 mg/dl (6.3 mmol/L), which is referred to as optimal blood glucose concentration. Furthermore, the hazard associated with hypoglycemia rises significantly faster than hyperglycemia. This hazard function, however, is not useful for retrospective analysis of CGM data since a healthcare professional would have the same concerns for a person with postprandial peak glucose level of 12 mmol/L or 15 mmol/L even though the risk calculated by equation (1) would be significantly different, for example, as shown for illustration purposes in
[0053]
G.sub.t=α*ln(G−β)−α*ln(α) (2),
where parameter α=T.sub.c−β, and parameter β=D.sub.r−1, in which parameter T.sub.c is the center of the transformed space, parameter D.sub.r is a minimum defined glucose level, parameter G.sub.t is a transformed glucose level value, and parameter G is the original glucose level value in the dataset 131. The transformation for retrospective analysis performed on the same CGM profile traces depicted by
Similarity Matrix Process
[0054] When mapped with glucose along the y axis and time along the x axis, a similarity between two CGM profile traces can be computed by computing a straight line distance between data points along each of the two time series. This is known as the L.sup.2 norm. For a time series vector X, with data points i=1, 2, 3 . . . n, and another time series vector Y.sub.i with members I=1, 2, 3 . . . n, the distance between the two time series vectors can be calculated according to equation (3), defined as:
[0055] It is to be appreciated that the Euclidean distance in equation (3) is one of the most routinely used, but such an equation fails to take into account the dynamic nature of the time series. Therefore, according to the embodiments of the present invention, a distance metric is disclosed by equation (4) which takes into account the glucose value (in actual or transformed space) and the dynamic components, i.e. slope or rate of change of glucose, to compute a distance between the two time series. Equation (4) is defined as follows:
d(X.sub.i,Y.sub.i)=k*|X.sub.i−Y.sub.i|+(1−k)*|(m.sub.x−m.sub.y)*(X.sub.i−Y.sub.i)| (4),
where, parameter X.sub.i—is a glucose value in a first time series X at time i; parameter Y.sub.i—is a glucose value in a second time series Y at time i; parameter k—is a weighing factor; parameter m.sub.x—is a slope at time i for time series Xi; and parameter m.sub.y—is a slope at time i for time series Yi. The distance metric 900 of equation (4) is illustrated in the
[0056] It is to be appreciated that patient behavior may not be consistent within a day or between two days and as a result of the unsupervised CGM profile traces might show either to be out of phase, or have a delayed or compressed response to prescribed therapy such as, e.g., to a correction bolus, an insulin bolus, meals or physical activity, etc. For example,
[0057] Dynamic Time warping allows for elastic matching of two time-series my local compression or elongation along time axis. See Lucero, J. C., et al.; Munhall, K. G.; Gracco, V. G.; Ramsay, J. O. (1997), “On the Registration of Time and the Patterning of Speech Movements”, Journal of Speech, Language, and Hearing Research 40: 1111-1117; and see also, Sakoe, Hiroaki; Chiba, Seibi, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Transactions on Acoustics, Speech and Signal Processing 26 (1): 43-49. The principal of dynamic time warping 1100 is briefly described with reference to
(a) Monotonicity, i.e. the path 1102 only moves forward in time;
(b) Continuity, i.e. the path 1102 cannot have breaks i.e. cannot skip data while moving forward as depicted by arrows 1104;
(c) Boundary conditions satisfied, i.e. the path 1102 has to travel entire length and does not allow some sample matching (e.g., boxes 1106);
(d) Search window satisfied, i.e., local temporal shifts of the alignment path 1102 have to be within pre-determined search width (e.g., lines 1108); and
(e) Slope satisfied, i.e. temporal compressions or elongation should not exceed a pre-determined width (e.g., line 1110).
[0058] It is to be appreciated that during the aligning process the dynamic warping function temporally compresses or elongates curves locally, which for better result, the inventors have the algorithm 200 add a penalty to the total distance between the two curves of the time series. To illustrate, first and second time series X, Y of the transformed dataset is processed with the penalty as follows:
(a) Start at origin, distance between curves of the first time series X and the second time series Y is: X(1,1)=Y(1,1);
(b) Keep first row a constant distance by: X(i,1)=X(i−1,1)+Y(i,1);
(c) Keep first column constant by: X(1,j)=X(1,j−1)+Y(1,j); and
(d) Carry on for next row and next column to end of search space of the transformed dataset as defined by:
X(i,j)=min(X(i,j−1), X(i−1,j−1), X(i−1,j))+Y(i,j) (5).
[0059] In the equation (5) above, instead of using a simple L2 norm (Euclidean distance), the distance metric described earlier is used in equation (4). As a result when the two curves X and Y are aligned the dynamic warping function returns a total distance between the two curves accounting for differences along time axis as well as glucose values.
Agglomeration Process
[0060] Due to its deterministic nature hierarchical clustering yields consistent labeling, i.e. cluster members do not migrate from one cluster to another on repeated runs. See, e.g., Kaufman, L.; Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis (1 ed.). New York: John Wiley. ISBN 0-471-87876-6. This is particularly important for example in the case of electronic consultations where the health care provider and the patient may be looking at the same dataset remotely on their respective computers, smartphones, etc. If the clustering algorithm 200 was not deterministic in nature, the HCP and the patient could potentially end up looking at different cluster members within the same labeled cluster which would result in confusion and potentially induce medical error (in correct therapy).
[0061] Using the output of the similarity matrix process 204, e.g., the distance matrix shown in
(b) Let each of the data points be a cluster;
(c) Repeat following: [0062] i. Merge two closest clusters, and [0063] ii. Update the distance matrix; and
(d) Do Repeat until only a single cluster remains.
Without pre-defined stopping condition, the above process 206 starts with each data point in the distance matrix being treated as its own cluster, in which the process automatically keeps moving forward until only one ‘super’ cluster remains.
[0064] It is to be appreciated that there are several ways to merge data points into clusters or merge two-clusters into one cluster. Perhaps the most robust method uses Ward's linkage, which minimizes the overall increase in within cluster variance. See Ward, J. H., Jr. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 58, 236-244. The hierarchical clustering as the name suggests yields a relationship between the data points as the clustering progresses from one stage to another. This relationship can be represented using a tree-structure also known as dendrogram 1400 as shown in
[0065] Finding a “right” number of clusters is perhaps one of the most challenging problems in data mining. This problem is somewhat solved by analyzing the ‘relationship’ as depicted in the dendrogram 1400. Each stage where the data is merged gives an indication of the similarity of the members within the dataset, which is shown
[0066] A plot to find an optimum (minimum) number of clusters 1500 is graphically depicted by
[0067] (a) Let first k points on the distance curve d(l) with p points be 1, 2, . . . k, and find slopes: [0068] m.sub.1=d(2)−d(1)/(2−1), [0069] m.sub.2=d(3)−d(1)/(3−1), . . . , [0070] m.sub.k=d(n)−d(1)/(n−1);
[0071] (b) Calculate median of slopes from step (a): m.sub.a=median(m.sub.1, m.sub.2 . . . m.sub.k);
[0072] (c) Let last n points on the distance curve d(l) with p points be p−n, . . . , p−1, p, and find slopes: [0073] m.sub.p=d(p)−d(p−1)/(p−(P−1)), [0074] m.sub.2=d(P)−d(p−2)/(p−(p−2)), . . . , [0075] m.sub.n=d(p)−d(p−n)/(p−(p−n)); and
[0076] (d) Calculate median of slopes from step (c): m.sub.b=median(m.sub.1, m.sub.2 . . . m.sub.n),
where a first line 1506 is defined by the median slope m.sub.a with a starting point as the first point along the distance curve d(l) 1502, and a second line 1508 is defined by the median slope m.sub.b with a starting point as the end point along the distance curve d(l), such that the inflection point 1504 is a projection of an intersection point 1510 between the first and second lines 1506 and 1508 on the distance curve d(l) and denoted by l.sub.p. Next in the process 206, if inflection point l.sub.p is not an integer, then the optimal minimum number of clusters L.sub.min is determined by the algorithm 200 in process 206 by the following:
[0077] For example, in starting with a dataset 131 of unsupervised CGM profile traces 133 from 10 days worth of data from a diabetic user wearing a CGM, which is graphically depicted by
[0078] In addition, it is to be appreciated that the processor 120 in determining and producing clusters, such as the displayed clusters 1600A, 1600B, 1600C, 1600D and 1600E, based on the dataset 131 as disclosed herein, creates symbolic groupings/representations of unambiguous qualitative data regarding the sufficiency of the diabetes control therapy of the user, thereby providing a more specific and concrete way of processing and representing information (transformed via the clustering) than previously found in the prior art. For example, as depicted by
[0079] Likewise, as indicated in the next distinct cluster 1600B depicted by
[0080] To conveniently and quickly view the found distinct clusters 1600A, 1600B, 1600C, 1600D and 1600E, e.g. from an initial data (dump) plot 1605 display of the dataset 131 depicted by
[0081] In view of the above disclosure, it is apparent that in one embodiment disclosed is a patient diabetes monitoring system for a patient. The system comprises a physiological data input device which acquires a plurality of physiological measurements of the patient within a time window to generate at least one time window dataset of collected unsupervised daily monitoring profiles; a memory storing an unsupervised daily monitoring profile clustering algorithm; and a processor in communication with said input device to receive said generated at least one time window dataset, and in communication with said memory in order to execute said unsupervised daily monitoring profile clustering algorithm, wherein said unsupervised daily monitoring profile clustering algorithm when executed by said processor causes said processor automatically to: pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles to generate a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output an optimum number of similarity clusters found by the processor from the similarity matrix. In another embodiment of the system, the pre-processing of the dataset controls the amount of bias/aggressiveness via a data transformation of the dataset that makes the pre-processed dataset symmetric for retrospective analysis. In another embodiment of the system the data transformation for retrospective analysis result from processing by the dataset with a hazard function defined by: G.sub.t=α*ln(G−β)−α*ln(α), where parameter α=T.sub.c−β, and parameter β=D.sub.r−1, in which is a center of a transformed space, D.sub.r is a minimum defined glucose level, G.sub.t is the transformed data of blood glucose concentration measurements provided in the dataset, and “g” is original glucose level values of the blood glucose concentration measurements provided in the dataset and measured in millimoles per liter. In another embodiment of the system, the physiological data input device is a CGM.
[0082] In another embodiment of the above mentioned system, after the pre-processing of the dataset, the pre-processed dataset is then processed to build the similarity matrix to account for time-series dynamics in the pre-processed dataset. In another embodiment of the system, the time-series dynamics in the pre-processed dataset is accounted for by a distance matrix that accounts for glucose value levels in an actual space or transformed space as well as via a rate of change of the glucose value levels to compute a distance between each pair of similar time series of data presented in the pre-processed dataset. In another embodiment of the system, the distance matrix is defined by: d(X.sub.i,Y.sub.i)=k*|X.sub.i−Y.sub.i|+(1−k)*|(m.sub.x−m.sub.y)*(X.sub.i−Y.sub.i)|, where, X.sub.i is a glucose level value in a first time series X at time i, Y.sub.i is a glucose value in a second time series Y at time i, k is a weighing factor, m.sub.x is the slope at time i for the first time series X.sub.i, and m.sub.y is the slope at time i for time series X.sub.i. In another embodiment of the system, a sum of distances between the first and second time series X and Y is used in an elastic alignment procedure to account for varying temporal responses/shifts in the pre-processed dataset. In another embodiment of the system, the elastic alignment procedure is a dynamic time warping process which allows for elastic matching of the first and second time series X and Y by local compression or elongation along a time axis. In another embodiment of the system, the dynamic time warping process results in any penalty being added to the sum of the distances between the first and second time series X and Y. In another embodiment of the system, the first and second time series are CGM curves. In another embodiment of the system, the first and second time series X and Y of the pre-processed dataset are processed by the processor with the penalty as follows:
[0083] (a) Start at origin, distance between curves of the first time series X and the second time series Y is: X(1,1)=Y(1,1);
[0084] (b) Keep first row a constant distance by: X(i,1)=X(i−1,1)+Y(i,1);
[0085] (c) Keep first column constant by: X(1,j)=X(1,j−1)+Y(1,j); and
[0086] (d) Carry on for next row and next column to end of search space of the pre-processed dataset as defined by: X(i,j)=min(X(i,j−1), X(i−1,j−1), X(i−1,j))+Y(i,j).
[0087] In another embodiment of the above disclosed system, output of the build a similarity matrix process is checked against one or more conditions to evaluate if a determined alignment path is a valid path, the one or more conditions being: monotonicity, continuity, boundary conditions, search window, and slope. In another embodiment of the system, output of the similarity matrix process is then used in an agglomerative clustering process to output similarity clusters, the agglomerative clustering process having the following pseudo code:
(a) Compute a distance matrix between data points of the output;
(b) Let each of the data points be a cluster;
(c) Repeat following: [0088] i. Merge two closest clusters, and [0089] ii. Update the distance matrix; and
(d) Do Repeat until only a single cluster remains.
[0090] In another embodiment of the system, an inflection point in the distance matrix is calculated by the processor to find the optimal minimum number of clusters. In another embodiment of the system, if d(l) is a distance curve in the distance matrix, d′(l) is a first derivative of the distance curve, and d″(l) is a second derivative of the distance curve, and if d′(l) exists, then the optimal minimum number of clusters along the curve d(l) is calculated by the processor to be a point l where d″(l)=0. In another embodiment of the system, the processor calculates the inflection point as follows:
(a) Let first k points on the distance curve d(l) with p points be 1, 2, . . . k, and find slopes: [0091] m.sub.1=d(2)−d(1)/(2−1), m.sub.2=d(3)−d(1)/(3-1), . . . , m.sub.k=d(n)−d(1)/(n−1);
(b) Calculate median of slopes from step (a): m.sub.a=median(m.sub.1, m.sub.2 . . . m.sub.k);
(c) Let last n points on the distance curve d(l) with p points be p−n, . . . , p−1, p, and find slopes: [0092] m.sub.p=d(p)−d(p−1)/(p−(p−1)), m.sub.2=d(p)−d(p−2)/(p−(p−2), . . . , m.sub.n=d(p)−d(p−n)/(p−(p−n)); and
(d) Calculate median of slopes from step (c): m.sub.b=median (m.sub.1, m.sub.2 . . . m.sub.n),
where a first line defined by the median slope m.sub.a with a starting point as the first point along the distance curve d(l), and second line being defined by the median slope m.sub.b with a starting point as the end point along the distance curve d(l), the inflection point being a projection of an intersection point between the first and second lines on the distance curve d(l) denoted by l.sub.p, and if inflection point l.sub.p is not an integer, then the optimal minimum number of clusters L.sub.min is found by:
[0093] In still another embodiment, disclosed is a non-transitory computer-readable medium that stores a program that, when executed by a processor, causes the processor to execute, via a patient diabetes monitoring system having a physiological data input device which acquires a plurality of physiological measurements of the patient within a time window to generate at least one time window dataset of collected unsupervised daily monitoring profiles and which is in communication with said processor, such that said processor receives said generated at least one time window dataset, and in communication with said memory, an unsupervised daily monitoring profile clustering algorithm that causes said processor to automatically: pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles to generate a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output an optimum number of similarity clusters. In another embodiment of the non-transitory computer-readable medium, CGM profile or insulin profile is the at least one time window dataset from a patient, and comprises raw data, transformed data, raw data associated with related data tags, transformed data associated with related data tags, or combinations thereof.
[0094] In yet another embodiment, disclosed is a method for identifying day(s) where a diabetes control therapy was inadequate for a patient using a monitoring system comprising a display device, a physiological data input device and a processor. The method comprises receiving automatically from physiological data input device a plurality of physiological measurements of the patient within a time window to generate at least one time window dataset of collected unsupervised daily monitoring profiles; and executing from a memory a stored an unsupervised daily monitoring profile clustering algorithm and causing the processor automatically to: pre-process the dataset to control an amount of bias/aggressiveness from the collected unsupervised daily monitoring profiles, thereby generating a pre-processed dataset, build a similarity matrix from the pre-processed dataset, and output on the display an optimum number of similarity clusters found by the processor from the similarity matrix.
[0095] While several devices and components thereof have been discussed in detail above, it should be understood that the components, features, configurations, and methods of using the devices discussed are not limited to the contexts provided above. In particular, components, features, configurations, and methods of use described in the context of one of the devices may be incorporated into any of the other devices. Furthermore, not limited to the further description provided below, additional and alternative suitable components, features, configurations, and methods of using the devices, as well as various ways in which the teachings herein may be combined and interchanged, will be apparent to those of ordinary skill in the art in view of the teachings herein.
[0096] Having shown and described various versions in the present disclosure, further adaptations of the methods and systems described herein may be accomplished by appropriate modifications by one of ordinary skill in the art without departing from the scope of the present invention. Several of such potential modifications have been mentioned, and others will be apparent to those skilled in the art. For instance, the examples, versions, geometrics, materials, dimensions, ratios, steps, and the like discussed above are illustrative and are not required. Accordingly, the scope of the present invention should be considered in terms of the following claims and understood not to be limited to the details of structure and operation shown and described in the specification and drawings.