ANALYSIS OF CARDIAC DATA
20210353166 · 2021-11-18
Assignee
Inventors
- Marek Sirendi (Tallinn, EE)
- Joshua Steven Oppenheimer (Arlington, VA, US)
- Marek REI (Cambridge, Cambridgeshire, GB)
Cpc classification
A61B5/7282
HUMAN NECESSITIES
G16H50/20
PHYSICS
A61B5/352
HUMAN NECESSITIES
A61B5/0816
HUMAN NECESSITIES
G16H50/30
PHYSICS
A61B5/349
HUMAN NECESSITIES
G16H50/70
PHYSICS
A61B5/7275
HUMAN NECESSITIES
A61B5/746
HUMAN NECESSITIES
A61B5/0245
HUMAN NECESSITIES
International classification
A61B5/00
HUMAN NECESSITIES
A61B5/0245
HUMAN NECESSITIES
A61B5/08
HUMAN NECESSITIES
G16H50/20
PHYSICS
Abstract
The present invention relates to a method of analysing cardiac data relating to a patient, comprising: providing cardiac data relating to the patient—optionally by using a means for providing physiological data (20); determining one or more properties of the data, wherein the or each property is determined over a particular context length, the context length being selected based on the or each property—optionally using an analysis module (24); comparing the or each property against a respective predetermined threshold value, thereby to indicate a probability of the patient experiencing a cardiac event—optionally using a means for providing an output (26); and providing an output based on the comparison. A system and apparatus corresponding to this method is also disclosed.
Claims
1. A method of analysing cardiac data relating to a patient, comprising: providing cardiac data relating to the patient; determining a property of the data, wherein the property is determined over a particular context length, the context length being selected based on the property; comparing one or more features of the property against a predetermined threshold value, thereby to indicate a probability of the patient experiencing a cardiac event; and providing an output based on the comparison.
2. The method of claim 1, further comprising modelling the property using a function; wherein comparing the one or more features of the property against the predetermined threshold value comprises comparing one or more descriptors of the function against a predetermined feature threshold value.
3. The method of claim 1, wherein determining a property of the data comprises: determining a plurality of datapoints related to the property; and modelling the property using a function comprises modelling the distribution of the datapoints using a function.
4. The method of claim 2, wherein modelling the property using a function comprises one or more of: determining a probability density function for the property; and superposing one or more Gaussian functions, preferably superposing Gaussian functions of equal surface.
5. (canceled)
6. The method of claim 2, wherein comparing one or more descriptors of the function comprises comparing at least one of: a mean; a variance; and a kurtosis.
7. The method of claim 1, further comprising providing contextual data relating to the patient; wherein the threshold value is dependent upon the contextual data.
8. The method of claim 1, further comprising: comparing a further property against a predetermined contextual threshold value, wherein the contextual threshold value is dependent upon contextual data; and providing an output based on both the comparison of the property and the comparison of the further property.
9. The method of claim 8, wherein the contextual data comprises at least one of: historic data related to the patient, an electronic health record related to the patient, physical characteristics of the patient; and demographic characteristics of the patient.
10. The method of claim 1, comprising: representing the data as a series of fixed size representations; providing an attention mechanism arranged to identify one or more points of interest within the data; and providing an output based on the points of interest; optionally, wherein representing the data as a series of fixed size representations comprises using a network operating over fixed-sized windows of data and/or wherein representing the data comprises using a neural network and/or a long short-term memory network.
11. (canceled)
12. The method of claim 1, wherein the threshold value is determined based on a dataset comprising a plurality of data obtained from multiple sources.
13. The method of claim 1, wherein the or each property is determined over a context length which is an optimally discriminating context length for that property.
14. The method of claim 1, wherein the properties comprise at least one of: a mean; a standard deviation; a standard deviation in successive differences; a measured heart rate variability (HRV) of a patient; and a fraction of multiple heartbeats that exceed an abnormality threshold.
15. The method of claim 1, wherein the predetermined threshold is determined by: training at least two classifiers to classify a property of multiple heartbeats within the cardiac data using at least one machine learning algorithm; and combining the at least two classifiers to produce a hybrid classifier; wherein the combination is based on a performance metric.
16. A method of training a hybrid classifier for analysing cardiac data related to a patient, the method comprising the steps of: training at least two classifiers to classify a property of multiple heartbeats within the cardiac data using two or more different machine learning algorithms; and combining the at least two classifiers to produce a hybrid classifier; wherein the combination is based on a performance metric.
17. The method of claim 16, further comprising determining a best performing classifier and a second best performing classifier based upon a performance metric; outputting the classification of the best performing classifier when the output of the best performing classifier is not close to a decision boundary; and outputting the classification of the second best performing classifier when the output of the best performing classifier is close to the decision boundary; optionally, wherein the output of the best performing classifier is considered to be not close to the decision boundary when a threshold probability of a correct classification is exceeded.
18. The method of claim 16, wherein training at least two classifiers comprises one or more of: combining at least two trained classifiers to produce a hybrid classifier; a. wherein combining the at least two trained classifiers comprises applying weightings to each classifier based on a performance metric associated with each respective classifier; b. providing annotated cardiac data, wherein the annotation indicates the occurrence of one or more cardiac events; training a detection classifier to detect cardiac events using the annotated cardiac data; labelling unannotated cardiac data using the trained detection classifier; and training a classifier to classify a property of multiple heartbeats using the labelled cardiac data; optionally, wherein labelling unannotated cardiac data using the trained detection classifier comprises labelling a subset of unannotated cardiac data dependent upon a threshold probability of correctness; and using a genetic algorithm and/or simulated annealing.
19. The method of claim 16, wherein the performance metric comprises at least one of: an accuracy; a sensitivity; a specificity; and an area under a receiver operating characteristic (ROC) curve.
20. (canceled)
21. (canceled)
22. The method of claim 16, further comprising: providing a reference dataset of annotated cardiac data; providing an input dataset of unannotated cardiac data; normalising each member of the reference dataset and each member of the input dataset to have the same dimensions; comparing each normalised member of the input dataset with one or more normalised members of the reference dataset to identify a measure of similarity; determining labels for the input dataset dependent upon the respective measures of similarity; and training a classifier to classify a property of multiple heartbeats using the labelled cardiac data; optionally wherein comparing each normalised member of the input dataset with one or more normalised members of the reference dataset comprises determining a root mean square error (RMSE).
23. The method of claim 16, wherein the cardiac data comprises ECG signals.
24. A system for analysing cardiac data relating to a patient, comprising: means for providing cardiac data relating to the patient; an analysis module for determining a property of the data, wherein the property is determined over a particular context length, the context length being selected based on the property; a comparison module for comparing the property against a predetermined threshold value, thereby to indicate a probability of the patient experiencing a cardiac event; and a presentation module for providing an output based on the comparison; optionally, wherein the analysis module comprises a hybrid classifier trained according to the method of claim 16.
25. (canceled)
Description
[0094] At least one exemplary embodiment of the present invention will now be described with reference to the accompanying figures, wherein similar reference numerals may be used to refer to similar features, and in which:
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114] In what follows, cardiac data that terminate with a Ventricular Tachyarrhythmia (VTA) is referred to as ‘arrhythmic’, and cardiac data from control samples is labelled as ‘normal’.
[0115] Prediction Method for Use on Patients
[0116]
[0117] The analysis module uses the input physiological data to analyse the heartbeat of the patient, and determine one or more probabilities of the patient experiencing a cardiac event within a period of time in the future.
[0118] RR interval sequences, as illustrated in
[0119] The method will now be described in more detail, with reference to the process flow illustrated in
[0120] Patients that suffer from VTAs are also likely to suffer from ectopic beats such as premature ventricular complexes. Therefore, as indicated in the process flow in
[0121] The effect of outlier removal on the data is illustrated in
[0122] The cleaned physiological data are then pre-processed as indicated in the process flow of
[0123] A series of derived quantities are computed based on RR interval data. The derived quantities (listed below) are referred to (interchangeably) as ‘features’ or ‘properties’:
[0124] i) Time Domain [0125] The arithmetic mean, μ of the RR intervals; [0126] The standard deviation, σ of the RR intervals; [0127] The standard deviation in successive differences, σ.sub.Diff, of the RR intervals.
[0128] The distribution of RR intervals in the time domain can provide valuable data relating to the probability of a patient undergoing a cardiac event.
[0129] For example,
[0130]
[0131]
[0132] ii) Nonlinear Poincaré [0133] Poincaré nonlinear analysis variables, SD1, SD2, and SD1/SD2.
[0134] A Poincaré HRV plot is a graph in which successive RR intervals are plotted against one another. From this plot values for SD1 (the dispersion of points perpendicular to the line of identity) and SD2 (the dispersion of points parallel to the line of identity) are determinable. These plots, and the determination of the SD1 and SD2 values, are well known. SD1, SD2, or a combination of SD1 and SD2 are used as inputs to the AI classifier.
[0135] iii) Sample Entropy [0136] Sample entropy over four epochs, S1, S2, S3 and S4.
[0137] iv) Frequency Domain [0138] Frequency domain parameters, VLF, LF, HF and LF/HF, derived from the spectral power calculated from the Welch periodogram.
[0139] v) Ectopic Beat Frequency [0140] The relative frequency of ectopic beats, f.sub.e.
[0141] The optimal context for each feature, i.e. the optimal—or maximally discriminating—‘context length’ (as discussed below) for determining whether a feature is indicative of a cardiac event, is determined before each feature is input into an Artificial Intelligence based classifier.
[0142] The features derived from the RR interval data are input into an Artificial Intelligence Based Classifier (the AI classifier). The AI classifier can comprise a pre-trained classifier, or preferably multiple pre-trained classifiers combined into a hybrid classifier, that has been trained (as described below) to identify abnormal beats in the physiological data by assigning a probability (i.e. a number in [0,1]) to each heartbeat that reflects the likelihood for the given heartbeat to lead to an arrhythmic episode.
[0143] More specifically, as is shown in
[0144] The beat-level classifier is trained to identify abnormal heartbeats within a dataset of heartbeats. This data, and the output of the beat-level classifier, is combined with patient-level data within the patient-level classifier, during which the beat data is assessed along with contextual patient data, such as an electronic health record, or a record of existing health conditions, to determine whether the beat-level data is indicative of an arrhythmic episode in a particular patient, or set of patients. The output of the patient-level classifier is fed into the decision-level classifier, which is trained to combine data from the preceding classifiers to output the probability of the given heartbeat indicating an upcoming arrhythmic episode. The decision-level classifier may be optimised for a certain metric, e.g. accuracy or specificity, as is described below.
[0145] The training of these classifiers occurs on three levels corresponding to the classifiers themselves: the beat-level, the patient-level, and finally the decision-level. At the beat-level, classifiers are trained to separate arrhythmic and normal heartbeats. At the patient-level, classifiers are trained to separate arrhythmic and normal patients with a combination of beat-level and patient-level inputs.
[0146] In practice, this typically involves data first being examined on the heartbeat-level and then, if the beat-level classifier indicates the data as high risk, data is further examined on a patient level. By using a multi-level system, the classifier is capable of accounting for contextual data, such as a patient having an abnormally high resting heart rate—this may reduce the probability of an elevated heart rate suggesting an upcoming cardia event.
[0147] Examples of patient-level inputs include the arithmetic mean, the standard deviation, and an abnormality fraction computed from beat-level classifier outputs. Other aspects of the patient, such as those conventionally found in an Electronic Health Record, can also be incorporated at this stage. At the decision-level, the entire process is optimised for a metric such as accuracy or specificity by scanning over the space of all classifier hyperparameters. In some embodiments, a short long-term memory model is used and the hyperparameters comprise the learning rate and/or the network size).
[0148] In some embodiments, the hyperparameters used within one or more of the classifiers are optimised using evolutionary algorithms, preferably genetic algorithms. Characteristics of the classifiers are modified in a random or semi-random manner and the resulting performance of the classifiers is compared to the non-modified classifier. This process is repeatedly performed for a number of modified classifier architectures (those showing potential improvement over the non-modified classifier); this may lead to the discovery of well-performing hyperparameters that would not otherwise be considered.
[0149] In some embodiments, simulated annealing is used in order to optimise at least one of the beat-level classifier, patient-level classifier, and decision-level classifier.
[0150] In order to arrive at a robust decision, the number of ‘abnormal’ heartbeats (e.g. which cross a threshold probability) are counted, and the fraction of said ‘abnormal’ heartbeats occurring in a given time window (for example, five minutes) is computed. This leads to an abnormality fraction, F, which is attributed to each patient. A ‘yes/no’ decision is then made based on this fraction, and an alert may be issued (or another action taken) for positive decisions. The alert may, for example, indicates that a cardiac event is predicted; in some embodiments, it also provides additional data related to the probability of the event occurring.
[0151] The counting of ‘abnormal’ heartbeats may also be used to obtain a rate of change of the occurrence of ‘abnormal’ heartbeats, where this rate of change may be used to identify both that a cardiac event is likely, and also to predict an urgency—where a high rate of change may indicate that a cardiac event is likely to occur soon.
[0152] Classifier Training/Architecture
[0153] The AI classifier, and more specifically each of the beat-level classifier, patient-level classifier, and decision-level classifier, can be trained by a machine learning system receiving as input examples of heartbeats from a training dataset comprising known normal and abnormal heartbeats from which the system can learn to predict whether an arrhythmia is going to occur. Each heartbeat in the training data set is represented as a real-valued vector containing values for features that describe the specific heartbeat, and enable a classification to be made. The training data is pre-processed in the same way as described above in relation to
[0154] Each of the beat-level classifier, the patient-level classifier, and the decision-level classifier may be trained using any combination of the methods described below.
[0155] There is freedom in the number of preceding heartbeats that should be included in the computation of a feature. This is referred to herein as ‘context length’. Multiple context lengths from 10 beats to 100,000 beats (though preferably context lengths of less than around 3,600) are considered as variables for time domain measures (μ, σ, and σ.sub.Diff) and Poincaré nonlinear analysis.
[0156] A χ.sup.2-test (‘chi-squared’ test) for statistical compatibility is performed for each ‘feature’ (i.e. derived quantity) and each context length between the ‘arrhythmic’ and ‘normal’ data sample distributions. Context lengths that are optimally discriminating, i.e. where the data range is the most significant for detecting a cardiac event, can then be selected as evidenced by a large χ.sup.2/ndf between the respective distributions, where “ndf” is the number of degrees of freedom.
[0157] Referring to
[0158] Referring to
[0159] In some embodiments, the input datasets are processed to reduce the effect of statistical fluctuations present in the histograms of the properties (e.g. SD1, SD2). This reduces the effects of the binning density chosen to analyse those histograms.
[0160] In some embodiments, an adaptive kernel density estimation technique is used in which the histograms are smoothed with a function, preferably a continuous function, that represents the distribution of the property. Typically, a probability density function (PDF) is used for this smoothing, where an appropriate PDF may be determined by superposing Gaussian distributions with equal surface, but varying width. The width of each Gaussian is dependent upon the local event density of the measured histogram; generally, a wide Gaussian distribution is used if the local event density is low and a narrow Gaussian distribution is used if the local event density is high.
[0161] In typical embodiments, the primary feature used to discriminate between normal and arrhythmic distributions is the mean of the Gaussian distributions determined as suitable for smoothing the histogram of a considered property. In some embodiments, the standard deviation of the determined Gaussian distributions is also considered. The Gaussian distributions used for smoothing a measured dataset are compared to predetermined distributions for measuring known normal and arrhythmic cardiac datasets, this is useable to evaluate an input dataset.
[0162] In practice, determining a PDF typically involves monitoring cardiac data over a period of time. Using this data, a distribution can be measured that is suitable for representation as a histogram. Using the measured distribution, a PDF that approximates the distribution of the property is determined; this PDF approximates the shape of an unbinned feature distribution and so reduces the adverse effects from sub-optimal binning.
[0163] As an example, and for illustrative purposes only, consider a dataset containing one point for an SD1 of 90 and one point for an SD1 of 92. Since the variance within this range is more likely due to a lack of data than a large variance within the probabilities of the considered SD1s, it might be more appropriate to use a bin of size five from 90-95 than five bins of size one. The use of a PDF reduces the effect of the binning size used by smooth the histogram of measured data, thereby avoiding inaccurate peaks based upon sub-optimal binning.
[0164] In order to obtain PDFs to compare the measured data to, PDFs are predetermined for normal and arrhythmic datasets. This enables (the features of) a PDF determined using measured data to be evaluated (e.g. compared to threshold values that are indicative of arrythmia). More specifically, descriptors of the PDFs used, such as the mean, variance, and/or kurtosis, are used within a comparison.
[0165] PDFs may be determined for specific situations, for example there may be determined a PDF for arrhythmic heartbeat data in patients over 60 with pre-existing heart conditions. The threshold values used for comparing features of a determined PDF are then selected from an appropriate predetermined PDF.
[0166]
[0167]
[0168] Determining the optimal context length for each feature preferably occurs prior to training. The context length is then held constant during the classifier training phase.
[0169] A maximum context length may also be enforced in order to limit the data storage needed, the recording time needed, and to ensure that a rapid decision is possible. The 3,600 beats mentioned previously may be used to limit the amount of data which must be considered.
[0170] In order to use the available dataset maximally, a 10-fold cross-validation is performed, whereby the dataset is divided into ten parts and the model is trained ten times. Each time, eight parts are used for training, one part for hyper-parameter tuning and one part for testing. The assignment of different folds is rotated during the ten times.
[0171] Five separate machine learning algorithms, in particular, can be used in order to train classifiers (although this method is, of course, extendable to other algorithms). The algorithms are then, preferably, later combined to form a hybrid algorithm, in order to take advantage of each of their strengths.
[0172] In some embodiments, the combining of algorithms comprises “committee voting”, where the outputs from each classifier are combined, with these outputs weighted dependent upon the performance of the corresponding classifier. Better performing classifiers, as determined using the metrics described below, have higher weightings and a larger effect in the determination of the committee classifier output.
[0173] In some embodiments, “ask a friend” voting is used, where the best performing classifier is used, unless the output of the best performing classifier is close to the decision boundary associated with this classifier. The decision boundary is a boundary, as described with reference to equation 1.1, that separates distributions identified as normal from distributions identified as arrhythmic;
[0174] close to a decision boundary the output of the associated classifier is less certain. Where the output of the best performing classifier is close to the decision boundary associated with the best performing classifier, the second best performing classifier is used either instead of or in combination with the best performing classifier.
[0175] In some embodiments, being close to the decision boundary relates to being beneath a certain threshold probability of correctness. Close to the decision boundary, the probability of a false positive and/or a false negative is relatively large, further from the decision boundary the ‘certainty’ of the classifier, that is how confident the classifier is that the output answer is the correct one, increases.
[0176] In some embodiments, both “committee voting” and “ask a friend” voting are used, where a “committee voting” classifier is formed from the weighted classifiers and this classifier is considered within the “ask a friend” voting.
[0177] In some embodiments, the classifier is a long short-term memory unit which may record values over an arbitrary time interval. This type of classifier is particularly useful for processes which have time lags between events (such as cardiac events).
[0178] In some embodiments, a convolutional neural network could be used to detect patterns within the recorded data, where this may be combined with an attention mechanism. An attention mechanism enables the neural network to ‘learn’ where it needs to focus and dynamically assign more importance to those areas. The attention mechanism calculates a weight for each time-window in the input stream and uses it to scale the importance of information coming from that window. This method has been shown to be very successful in other domains such as language processing and also enables visualisation of where the model is focusing, thereby making the actions of the system more human-interpretable.
[0179] More specifically, in some embodiments, the neural model is arranged to to represent the temporal stream as a series of fixed sized representations. This can be achieved using a long short-term memory (LSTM) architecture or a convolution neural network operating over fixed-sized windows. An attention mechanism is constructed on top of the network to allow the model to dynamically predict how much focus should be assigned to each position in the temporal stream. When analysing the cases where the model predicts positive labels for VTA, the attention weights are usable to visualise which areas in the signal were most important for making the prediction. In addition, the gradient on individual feature vectors is usable to find which specific features were most important for making the prediction at that time. This allows specific features and specific time in the data stream to be flagged as of relevance, which enables a practitioner to rapidly identify relevant parts of a data stream. Therefore an informed decision can be made regarding the health of the patient, furthermore anomalies within the data that are worthy of further inspection are observable.
[0180] 1. Artificial Neural Network
[0181] The feature vectors are given as input to an artificial neural network consisting of three layers. The first layer is an “input layer”, the size of which depends on the number of features in the feature vectors. The second layer is a “hidden layer” with tanh activation, with size 10. Finally, the third layer is a single neuron with sigmoid activation. The neurons in the hidden layer will automatically discover useful features from the input data. The model can then make a prediction based on this higher-level representation. The network may be optimised using AdaDelta, for example. Parameters may be updated based on mean squared error as the loss function. The model may be tested on the development set after every full pass through the training data, preferably wherein the best model is used for final evaluation.
[0182] 2. Support Vector Machines (SVM)
[0183] Support Vector Machines (SVM) are a separate class of supervised machine learning algorithms. Instead of focusing on finding useful features, they treat the problem as a task of separation in a high-dimensional space. Given that the feature vectors contain n features, they aim to find an n−1 dimensional hyperplane that best separates the positive and negative cases. This hyperplane is optimised during training so that the distance to the nearest datapoint in either class is maximal.
[0184] 3. k-Nearest Neighbours
[0185] k-Nearest Neighbours (k-NN) is an algorithm that analyses individual points in the high-dimensional feature space. Given a new feature vector that we wish to classify, k-NN returns k most similar points from the training data. Since we know the labels of these points, k-NN assigns the most frequent label as the prediction for the new point. This offers an alternative view to the problem—it no longer assumes that heartbeats of a single class are in a similar area in the feature space, but instead allows us to look for individual points that have very similar features.
[0186] 4. Gaussian Process
[0187] Gaussian Process is a statistical model where each datapoint is associated with a normally distributed random variable. The Gaussian Process itself is a distribution over distributions, which is learned during training. This model associates each prediction also with a measure of uncertainty, allowing us to evaluate how confident the model is in its own classification. As this type of model is difficult to train with more than 3,000 datapoints, it is preferable to ensure that a suitable size is sampled during training.
[0188] 5. Random Forest
[0189] Random forests are based on constructing multiple decision trees and averaging the results. Each decision tree is a model that attempts to separate two samples based on sequential splittings for each input feature. In this implementation, datapoints that are misclassified are given a weight larger than one (referred to as ‘boosting’ or as a ‘boosted decision tree’ method).
[0190] Each classifier assigns a probability (i.e. a number in [0,1]) to each heartbeat that reflects the likelihood for the given heartbeat to lead to an arrhythmic episode. Several different thresholds for the probability may be considered and the value that optimally separates the ‘arrhythmic’ and ‘normal’ datasets is chosen. This may be referred to as optimal classification separation.
[0191] In some embodiments, the methods of predicting cardiac events are used (and/or embedded) within a portable device, such as a pacemaker, or an implantable cardioverter-defibrillator. Within such a device, it is important that computations are minimised, to maximise the battery life of the device. In order to achieve this algorithms with low computational cost are used (possibly at the expense of some accuracy).
[0192] An example of using low computational cost algorithms is the use of difference of area (DOA) methods, which have a low complexity, within waveform analysis. Bin area methods (BAM) may also be used as these provide a trade-off between complexity and accuracy. More generally, it is preferable to use algorithms which analyse time domain features as opposed to those which analyse frequency domain features.
[0193] In order to speed up the execution of the Random Forest algorithm, in some embodiments each input feature is discretised so that the volume of information fed to the decision trees is reduced. This approach is used to speed up the execution of the classifier and to reduce the effect of noise by choosing step sizes greater than the fluctuations present in the features on account of noise.
[0194] In some embodiments, classifiers are formed using ‘distilling’. First, a very complex and computation-intensive neural network is trained. Next, a simpler and faster model is constructed, before being trained it on the output of the former model. This approach results in models (and classifiers) that have the benefits of both speed and accuracy.
[0195] ‘Batching’ is another method that is used in some embodiments to speed up computation. If a model has limited processing power and cannot process one heartbeat at a time, the incoming data can be combined into batches of ten heartbeats to reduce the computational burden. This results in the model being up to ten beats behind in making predictions, but enables the use of more accurate models.
[0196] In some embodiments, an adversarial training model is used, where cases for which the classifier would misclassify data are determined and these cases are used to improve the performance of the classifiers.
[0197] As an example: a neural network is provided that is trained to classify RR sequences. Starting with a healthy rhythm, it is determined which (small) changes need to be made to this rhythm in order for the network to misclassify it as a VT example. This method then enables identification of the weak points of the network. These examples (of misclassified datasets) are subsequently introduced into the training data and the classifiers are trained to classify them correctly. This results in a more robust model with a decreased likelihood of misclassifications.
[0198] In some embodiments, existing training data, include small random noise in the signal, is added to the training set with the same labels. Given that the noise is small, it is valid to expect that the true label of these examples should not change. Using this data for training introduces the model to a wider variation of datapoints around the known recorded instances, making it more robust during testing.
[0199] In some embodiments, noise is generated such that it maximally confuses the model. Using gradient descent, it is calculable how individual feature values should be changed in order to make the model give a wrong prediction. L2 regularization may be used to ensure that the modifications will be minimal, therefore the true label of the example can be assumed to be the same as the original datapoint and any mistakes the model makes are due to discovered blindspots.
[0200] By including these adversarial examples in the training data, the model is able to learn to correct for these incorrect predictions. The process can be repeated iteratively to continue improving the model.
[0201] Annotated data for VT/VF detection and prediction is very limited and therefore it is beneficial to make use of available data in unannotated datasets of ECG signals. In some embodiments, a detection system is trained and using it to return new examples for training the prediction system. Referring to
[0202] In a first step 102, the available annotated data is used to train a detection system. This training uses machine learning techniques as are known and/or as are described herein.
[0203] In a second step 104, the detection system is applied on unannotated data.
[0204] In a third step 106, the detection system returns a labelled subset of the data that the model finds most likely to be positive (e.g. indicative of there being a cardiac event) and/or a labelled subset of the data examples that the model finds most likely to be negative (e.g. indicative of there not being a cardiac event).
[0205] In some embodiments, the detection system labels a subset of data for which the probability of a correct label being applied, as determined by the detection system, exceeds a certain threshold. This enables the provision of a large training set while minimising the risk of incorrect labelling.
[0206] In a fourth step 108, an updated training set is provided comprising the previously available annotated data and the previously unannotated data alongside the predicted. While these labels are not guaranteed to be correct, they are likely to assist the prediction system—that used to predict cardiac events in patients—and move the performance closer to the detection system.
[0207] In a fifth step 110, both the detection system and the prediction system are retrained with the updated training set.
[0208] In a sixth step 112, it is determined whether the performance of the prediction system is improving on a dedicated test set.
[0209] If the performance of the prediction system is improving, the process is repeated from the first step 102. That is, the new data is used as annotated data within the training of the detection system, and this retrained detection system is used to label a subset of the remaining unannotated data. The retrained detection system is retrained is better able to label the previously unlabelled data, so that a repeatedly retrained detection system is able to piecemeal label an unannotated dataset with only a small possibility of erroneous labelling.
[0210] Once the performance of the prediction system is no longer improving, in a seventh step 114, the detection system and the prediction system are output.
[0211] In some embodiments, repeating the process involves reclassifying data within the third step to determine whether there is an improvement in the performance of the prediction system, e.g. a datapoint previously labelled as positive is labelled as negative and it is determined whether this relabelling improves the performance of the prediction system. In some embodiments, the measure of improvement may be improvement within a specified number of iterations, so that a single iteration without improvement does not halt the method.
[0212] Similarly, the number of available training examples for Ventricular Tachycardia identification is limited. In some embodiments, there is performed a method for VT detection within unlabelled datasets based only on a subset of known examples and the presence of wide QRS complexes, which is an identifying feature of Ventricular Tachycardia. Such a method is shown in
[0213] In a first step 122, individual QRS complexes are extracted from both the reference ECG signal (the known examples) and an input ECG signal (unknown data). Existing algorithms for RR interval conversion can be used for this.
[0214] In a second step 124, complexes are normalised into the same range in each dimension so that they are comparable.
[0215] In a third step 126, the similarity between the shapes of the complexes from both the input and reference signals is calculated. Root mean square error (RMSE) is used for comparing the shapes.
[0216] In a fourth step 128, it is determined whether the similarity is higher than an assigned threshold.
[0217] If the similarity is higher than the threshold, in a fifth step it is determined that, based upon the exceeding of the threshold, the QRS complex is abnormally wide and should therefore be assigned for manual review or directly classified as Ventricular Tachycardia.
[0218] In various embodiments, the similarity threshold is used to determined various characteristics. In the example of
[0219]
[0220]
[0221] An optimal decision boundary is arrived at by minimising the root mean square error, denoted as RMSE, and defined as:
[0222] Where F.sub.i is the fraction of abnormal heartbeats for the i'th misclassified patient and F.sub.decision is the abnormality fraction under consideration. RMSE can be thought of as a measure of distance from the decision boundary for misclassifications.
[0223] A hybrid classifier may be created by combining the abnormality fractions, F, for each model listed above. The combination is a weighted sum defined as:
[0224] Where w.sub.j is the weight attributed to the j'th classifier and F.sub.j is the corresponding abnormality fraction, F. The weights, w.sub.j, are determined according to the performance of the classifiers, as measured by their RMSE value.
[0225] More specifically, the weights, wj, are determined dependent upon their RMSE value over misclassifications. The motivation for doing so is to achieve optimal performance of the resulting hybrid classifier in an unbiased way. Other commonly used metrics could lead to the wrong weights being attributed to classifiers and, consequently, suboptimal decisions.
[0226] The performance of the method described herein may be determined according to a number of performance metrics, exemplary metrics are listed below: [0227] Accuracy (A), defined as:
[0228] Where the numerator is a sum of true positives (TP) and true negatives (TN) and the denominator includes false positives (FP) and false negatives (FN). [0229] Sensitivity (SE), defined as:
[0233] The method described herein may be integrated with and/or implemented by existing patient monitoring equipment.
[0234]
[0235]
[0236] The decision of which classifier to use may depend upon the dataset, where the boosted decision tree may be used as long as the boosted decision tree is not operating close to the associated decision boundary, and the committee voter is used if the boosted decision tree is operating close to the associated decision boundary.
[0237] System Architecture
[0238]
[0239] The analysis module 24 is configured to evaluate the extracted physiological data, for example evaluating a property of multiple heartbeats in the data, and determine whether said property exceeds an abnormality threshold. This information is then used to derive a probability of the patient experiencing a cardiac event, for example using the method described above in relation to
[0240] The analysis module 24 comprises a hybrid classifier trained and operating as described above in relation to
[0241] If the analysis module 24 determines that the probability of the patient experiencing a cardiac event in a subsequent time period is above some pre-defined threshold, then the analysis module will trigger a means for providing an output 26, for example an alarm, or other alert, that can alert a healthcare provider that the patient is at risk. This can enable the healthcare provider to take preventative action.
[0242] In some embodiments, the output comprises alerting a healthcare provider and providing medical records, such as ECG readings, to the provider. The provider may then be inclined to analyse the data further and/or maintain a close watch on the patient, so that rapid action can be taken if the patient does suffer a cardiac arrest. This output may comprise a ‘risk window’ in which a heightened risk is identified; a close watch could then be maintained in this period.
[0243] Display of Output
[0244] The output displays one or more probabilities, as determined using the methods described. The probabilities are output in numerous forms, notably: [0245] A binary assessment is used as a threshold indicator, where a critical value triggers an alarm. This is particularly useful as a first indicator that a patient may require attention. A threshold here is used to indicate that urgent help is required, or that patient data should be looked at more closely. There may be multiple thresholds which each have a differing level of urgency. [0246] A probability of a cardiac event is output, where this allows a user to allocate resources, and make other decisions, appropriately. An uncertainty estimate is output alongside this probability. [0247] In differing embodiments, the probability is output quantitatively (for example as a percentage risk) and/or qualitatively, (for example a patient may be categorised as one of low risk, medium risk, or high risk, where these correspond to probability ranges). A qualitative measure may be used to simplify the immediate interpretation by a user. [0248] A probability density function is output, where this allows a user to more fully assess a situation.
[0249] These probabilities are typically used in conjunction so that, upon a threshold risk being passed, a user is directed to view a probability, or a probability function, to determine an appropriate action. This can then be used as a general indicator of a patient's health, where an increased likelihood of a cardiac event indicates that a patient is more likely to need attention during a certain period.
[0250] An uncertainty also being displayed further aids the determination of an appropriate action. A potential problem with any data based analysis, particularly an analysis of a complex situations, such as the prediction of a cardiac event, is that a precise result is rarely achievable; this leads to a figure (such as a probability) on its own having limited use—especially due to the difficulty in determining if this figure is reasonable. The inclusion of an uncertainty based measure (such as a variance, or error bounds), enables a better judgement to be made regarding any given figure/probability.
[0251] Advantageously, a probability enables a user to make a rapid assessment, as a probability is intuitively interpreted more easily than, for example, a risk score. Additionally, a probability density function gives a user a large amount of information in a concise format.
[0252] In various embodiments, probabilities are also output for a number of timeframes. An initial output is simply a probability without any time reference. A more useful output is a probable time-to-cardiac-event. More specifically, probabilities may be output for time ranges, where this allows efficient allocation of resources.
[0253] The outputting of probability density functions for numerous timeframes enables limited resources to be scheduled effectively: for example a limited number of staff to be directed to be ready to assist certain patients at times of increased risk; a probability density function may be used to assess whether a cardiac event is almost certain or whether the risk is more unpredictable.
[0254] In some embodiments, a probability density function is displayed numerically, where a mean, a standard deviation, and a kurtosis (indicating the skew of the distribution) are displayed. In these, or other, embodiments, the function is (also) displayed graphically.
[0255] There are, in some embodiments, numerous, user selectable, ways to illustrate a probability, for example a best fit normal distribution, a skew normal distribution, or a Poisson distribution. A preferred distribution is suggested during analysis, where a suitable distribution depends on, for example, the amount of information available.
[0256] In some embodiments, the probability assessment is continuously updated, where this occurs as relevant information is obtained. An initial assessment uses historic data, and/or admissions data; this initial assessment is then updated (and improved) using recorded and evaluated data (such as the RR intervals above) as it becomes available.
[0257] In preferred embodiments, a Bayesian probabilistic framework is used in this updating, where Bayesian inference is used to obtain a probability. This is related to a form of Bayes rule, which is displayed in equation 2.1 below:
[0258] where: P(Y|α) is the prior distribution (e.g. the previously calculated probability);
[0259] P(Y|X,α) is the posterior distribution (e.g. the updated probability);
[0260] P(X|α) is the marginal likelihood (e.g. the likelihood of the recently sampled data given the entire set of data);
[0261] P(X|Y) is the sampling distribution (e.g. the probability of the observed data given the current distribution); and
[0262] α is the statistical hyperparameter of the parameter distribution (e.g. Y˜P(Y|α)).
[0263] This equation is used to derive an updated probability based upon a prior probability and the probability of the occurrence of the recently sampled data. Using this equation, recent data which is indicative of a cardiac event being likely would be more concerning in a patient previously judged to be high-risk than it would in a patient previously judged to be low-risk (an interpretation of this is that in the low-risk patient this data is more likely to be anomalous). The use of Bayesian inference is then useful for reducing the rate of false positives, as the prior probability will be small for low-risk patients.
[0264] Notably, in the given example, the occurrence of data indicative of a cardiac event would be unlikely given the prior distribution, and so this would have a significant effect on the posterior distribution. Due to this, the data would not simply be written off entirely as anomalous; while it may not immediately result in a warning, continued occurrence of data indicative of a likely cardiac event would rapidly increase the probability (so that the chance of missing a cardiac event is unlikely); however, advantageously, a single (potentially anomalous) datapoint would not trigger a false positive warning.
[0265] To further reduce the likelihood of false negatives, in some embodiments, a Bayesian inference model is used alongside a threshold marginal likelihood: a marginal likelihood which is indicative of a very high chance of an upcoming cardiac event then triggers a warning even if the overall probability remains low due to a consistently low prior probability.
[0266] The updating of the probability takes place periodically (for example each five seconds, or each minute), where a longer update (or refresh) period use less computing power. This update period is, in some embodiments, small enough that the probability is updated effectively continuously (i.e. the period is so small as to not be noticeable by a user).
[0267] In some embodiments, there is a component within the apparatus which allows a choice of the update period—this may also be selectively determined based on the use of the apparatus (where an implanted device may prioritise battery longevity over rapid updates).
[0268] A consideration here is that, in many situations, it is possible to maintain an accurate probability while making only periodic updates, especially where there is a large prior distribution (i.e. where measurements have been taken for a long time). The update period is then based upon the prior distribution. As an upper limit for the time, these updates may be limited, so as to be regular enough that they do not miss a cardiac event.
[0269]
[0270] One or more measurement device(s) (e.g. an ECG, a patient file) 32 transmit(s) data to a local server 34. These data are then transmitted to a network server 36, and fed through an analysis module 24 (as discussed above, e.g. with reference to
[0271] By sending data via a network server 36, instead of storing all data on a local device, the data can be displayed to numerous users simultaneously. This allows the gathering multiple opinions, or to alert numerous users simultaneously, so that the user in the best position to may be notified.
[0272] The use of a network server 36 also enables remote monitoring of a patient. This may be used for a patient with an implantable device, where data recorded by the device is transferred to a network server 36, evaluated by the analysis module 24, and then displayed on a UI 42 to both the user and (separately) a healthcare professional, who may then check on the user at an appropriate time.
[0273] The figures as described above show a system for monitoring a patient. As a general overview: in
[0274] The analysis module 24 is provided with the specific data (the RR intervals) as in
[0283] This output is then presented using a means for providing an output 26.
[0284] The means for providing physiological data, and the means for providing an output are described in more detail above with reference to
[0285]
[0286] It can be seen from
[0287] Alternatives and Modifications
[0288] Data Types
[0289] The use of RR intervals is an example of a type of data—more specifically a type of physiological data, and even more specifically a type of cardiac data—which is usable with the described methods; more generally, any type of patient data, or any combination of types of data could be used with these methods, where the use of a combination of patient data may lead to fewer false positives (or false negatives). Examples of preferred types of data are (with some overlap as, for example, telemetry records and clinical data both comprise physiological data): [0290] telemetry records, such as arterial blood pressure, pulse contour data, or pulse rate; [0291] demographic data, such as age, sex, or race (this may come from an electronic health report/patient profile); [0292] Admission/historic data, such as a recent illness or any history of illness; in particular concomitant conditions, such as emphysema or diabetes; [0293] clinical data, such as haemoglobin values; [0294] laboratory data, such as the results of tests; [0295] imaging data, such as x-rays or MRI scans.
[0296] Where multiple data types are considered, each of these types of data is treated similarly to the RR intervals: properties (such as a mean or a standard deviation) are extracted, and an optimal context length for these features determined—as an example, there is an optimal length of patient history to consider, where data more than, for example, 10 years old may have a negligible contribution to a prediction of future health. Numerous data types are considered in the determination of a probability, where, in some embodiments, each data type has a different weighting (where this weighting is based upon historic data and determined by the classifiers).
[0297] In various embodiments, the data types used are optimised, where this is used within the display of a probability. In each situation, there is selected a combination of data features with the most significant effect; this is particularly useful where an implantable device is used, and using a low number of data types is desirable, as this minuses the computational burden.
[0298] In some embodiments, to avoid the need for new measuring equipment, analysis occurs only using data which is attainable using current measuring methods.
[0299] While the data recording methods discussed have primarily involved specialist equipment (e.g. electrocardiograms), the methods discussed could equally be used with other, more widely available equipment. As an example, there exist many user wearable devices which are used to monitor a heartrate or a pulse (such as a Fitbit™). The data recorded using this, or a similar, device could be used with the AI classifier described above to obtain a probability of a cardiac event, or to output a general health measure. If used in such a device, the output may be a displayed probability, or measure of health, to the user, or an automatic warning sent to, for example, an ambulance, if a threshold probability is exceeded. This may be particularly useful in devices such as a Fitbit™, which are used during periods of increased activity (where stress may be placed upon the heart).
[0300] Context Length Determination
[0301] The context length determination has been explained using the example of a χ.sup.2-test (‘chi-squared’ test); numerous other tests could be used to make this determination. Various embodiments use one of (or a combination of): a Kolmogorov-Smirnov test, a comparison of the moments of distributions, or an Energy Test (as described by Guenter Zech and Berkan Aslan).
[0302] When using the Energy Test an Energy Test metric, T, is computed between two distinct unbinned multivariate distributions. One such example is arrhythmic and normal heartbeat distributions, which give a non-zero T-value. This is used in some embodiments as an additional test on the probability of a cardiac event: an Energy Test is performed and a T-value calculated, this T-value is updated after each heartbeat and a warning is issued if the T-value exceeds a predetermined threshold (which is based on past data, and may be determined for each patient based upon their specific data). The context length over which the Energy Test is performed is determined as with any other dataset. This test may be used in isolation, or in conjunction with any other method described, where use in conjunction with other methods may reduce the likelihood of false negatives or false positives.
[0303] Autoregressive Models
[0304] In some embodiments, autocorrelation is considered along with a measure of the lag required to obtain an autocorrelation. As an example, in the short term, the occurrence one cardiac event may be indicative of another cardiac event being likely to occur (i.e. recent cardiac events may have high autocorrelation), as these events are often related to periods of otherwise poor health. In the long term, a previously occurring cardiac event (e.g. a cardiac event which occurred in a previous year), may be a poor indicator of a subsequent cardiac event (i.e. distant cardiac events may have low autocorrelation), as the period of poor health may have passed. The suitability of using an autoregressive model is determined by comparing these correlations and lags.
[0305] A consideration with autocorrelation is that (useful) autocorrelation may be negative or positive. In the previously used example, it may be the case that a previous, but distant cardiac event (e.g. one that occurred in a previous year), is a good indicator that a cardiac event is unlikely, as the person may have worked to improve their health in response to the previous event.
[0306] Other Conditions
[0307] The methods described could be used for a range of other conditions, for example, as well as a cardiac event, indicators of an upcoming arrhythmia may also be used to predict a stroke. The methods disclosed herein could also be used to measure conditions away from the heart: the flow of blood could, for example, be monitored as relates to transfer to the brain. In this situation, a context length would still be of relevance: monitoring the blood flow into the brain could be used to give a prediction of brain related events (such as brain aneurysms).
[0308] More generally, the methods disclosed could be used as a general indication of health. Abnormal operation of any pulse based condition is a possible indicator of not only the probability of a specific event (e.g. arrhythmia), but also that the patient is likely to be at heightened risk of a more general health-related incident. These methods may then be used to indicate that a patient may need more careful monitoring during a determined period, or that it may be valuable to analyse patient data in more detail and/or to carry out tests.
[0309] It will be understood that the invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.
[0310] Each feature disclosed in the description, and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination.
[0311] Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.