PREDICTING SURGERY DURATION
20230411005 ยท 2023-12-21
Inventors
- Mariana Nikolova-Simons (Eindhoven, NL)
- Marko Djikic (Eindhoven, NL)
- Rikkert Keldermann (Eindhoven, NL)
- Leon Montenij (Eindhoven, DE)
Cpc classification
G16H40/20
PHYSICS
G16H50/70
PHYSICS
G16H50/20
PHYSICS
G16H20/40
PHYSICS
G16H40/00
PHYSICS
International classification
G16H50/20
PHYSICS
G16H20/40
PHYSICS
Abstract
A system and method are provided for generating a predictive model for predicting a surgery duration. The system and method use a feature selection technique to identify a set of features in training data, which set of features is predictive of the surgery duration, train a number of predictive models using the set of features as input and the surgery duration as prediction target, wherein the predictive models include at least a linear predictive model and a non-linear predictive model, and generate an ensemble model which combines at least two of the predictive models. Such an ensemble model may optimally combine linear and non-linear predictions and therefore allow linear and non-linear relationships between features and the surgery duration to be taken into account. Advantageously, more accurate surgery planning may safeguard the health of patients, for example by ensuring that there are sufficient resources available for acute surgeries, or by avoiding that elective surgeries have to be postponed due to a presumed lack of resources.
Claims
1. A system for generating a predictive model for predicting a surgery duration, comprising: an input interface for accessing medical data, the medical data comprising records of surgeries, wherein a record is indicative of at least a type of surgery and a surgery duration; a processor subsystem configured to generate a predictive model for predicting the surgery duration by: using a feature selection technique to identify a set of features in training data, which set of features is predictive of the surgery duration, wherein the training data comprises a first part of the medical data; training a number of predictive models using the set of features in the training data as input and the surgery duration as prediction target, wherein the predictive models include at least a linear predictive model and a non-linear predictive model; using a second part of the medical data, evaluating a performance of each the predictive models in predicting the surgery duration, wherein the evaluating of the performance comprises using a performance metric which characterises a time difference between a predicted surgery duration and an actual surgery duration; based on the performance of the predictive models, generating an ensemble model which combines at least two of the predictive models; and outputting the ensemble model for use in predicting the surgery duration.
2. The system according to claim 1, wherein the performance metric characterises whether the time difference between the predicted surgery duration and the actual surgery duration is positive or negative and a degree of the difference.
3. The system according to claim 2, wherein the performance metric categorizes the time difference into a number of categories, wherein the categories include at least a first category indicating that the predicted surgery duration exceeds the actual surgery duration by a first margin and a second category which indicates that the predicted surgery duration is less than the actual surgery duration by a second margin.
4. The system according to claim 1, wherein in evaluating the performance, a positive time difference between the predicted surgery duration and the actual surgery duration is weighted differently than a negative time difference.
5. The system according to claim 1, further comprising a user interface subsystem comprising a user input interface to a user input device for receiving user input and a display output to a display for displaying output of the system, wherein the processor subsystem is configured to, using the user interface subsystem: enable a user to evaluate the performance of a previous predictive model for predicting the surgery duration using the performance metric; request a new predictive model to be generated by the system if the performance of the previous predictive model is deemed insufficient by the user or the system.
6. The system according to claim 5, wherein the processor subsystem is configured to, using the user interface subsystem, enable the user to further evaluate the performance of the new predictive model using the performance metric.
7. The system according to claim 1, wherein the processor subsystem is configured to: generate two or more ensemble models which each combines at least two of the predictive models; using a third part of the medical data, evaluating a performance of the two or more ensemble models in predicting the surgery duration, wherein the evaluating of the performance comprises using the performance metric; and based on the performance of the two or more ensemble models, selecting the ensemble model for output.
8. The system according to claim 1, wherein a record in the medical data is further indicative of one or more of: an identity of a surgeon, a clinical role of a surgeon, a type of surgical procedure, a surgical urgency, a type of post-surgery bed, an average surgery duration for a type of surgical procedure for a surgeon, and a number of surgical procedures performed during the surgery.
9. The system according to claim 1, wherein the medical data further comprises patient data of patients of the respective surgeries.
10. The system according to claim 9, wherein the patient data is indicative of one or more of: an age, a gender, a body-mass index, an American Society of Anaesthesiology (ASA)-score, a number of medications taken, a number of comorbidities, and a creatine level, of a respective patient.
11. The system according to claim 1, wherein the processor subsystem is configured to identify the set of features in the training data using multivariate inferential analysis, wherein the processor subsystem is further configured to use univariate inferential analysis as a filter to determine an input set of features to the multivariate inferential analysis, wherein the features of the input set of features are individually predictive of the surgery duration.
12. The system according to claim 1, wherein the predictive models comprise one or more of: a linear regression-based model, a random forest-based model, and a gradient boosting-based model.
13. The system according to claim 1, wherein a record in the medical data is indicative of whether a surgery is an elective surgery or an acute surgery, wherein the processor subsystem is configured to generate different predictive models for predicting the surgical duration of elective surgeries and for predicting the surgical duration of acute surgeries.
14. A computer-implemented method for generating a predictive model for predicting a surgery duration, comprising: accessing medical data, the medical data comprising records of surgeries, wherein a record is indicative of at least a type of surgery and a surgery duration; generating a predictive model for predicting the surgery duration by: using a feature selection technique to identify a set of features in training data, which set of features is predictive of the surgery duration, wherein the training data comprises a first part of the medical data; training a number of predictive models using the set of features in the training data as input and the surgery duration as prediction target, wherein the predictive models include at least a linear predictive model and a non-linear predictive model; using a second part of the medical data, evaluating a performance of each the predictive models in predicting the surgery duration, wherein the evaluating of the performance comprises using a performance metric which characterises a time difference between a predicted surgery duration and an actual surgery duration; based on the performance of the predictive models, generating an ensemble model which combines at least two of the predictive models; and outputting the ensemble model for use in predicting the surgery duration.
15. A transitory or non-transitory computer-readable medium comprising data representing a computer program, the computer program comprising instructions for causing a processor system to perform the method according to claim 14.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0054] These and other aspects of the invention will be apparent from and elucidated further with reference to the embodiments described by way of example in the following description and with reference to the accompanying drawings, in which
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066] It should be noted that the figures are purely diagrammatic and not drawn to scale. In the figures, elements which correspond to elements already described may have the same reference numerals.
LIST OF REFERENCE NUMBERS
[0067] The following list of reference numbers is provided for facilitating the interpretation of the drawings and shall not be construed as limiting the claims. [0068] 20 data storage [0069] 22 training data [0070] 24 test data [0071] 26 validation data [0072] 60 display [0073] 62 display data [0074] 80 user input device(s) [0075] 82 user input data [0076] 100 system for generating and/or using predictive model [0077] 120 data storage interface [0078] 140 processor subsystem [0079] 142-146 data communication [0080] 160 memory [0081] 180 user interface subsystem [0082] 182 display output interface [0083] 184 user input interface [0084] 200 surgery duration predictor [0085] 210 data splitting [0086] 212 feature selection from data source(s) [0087] 214 training predictive models [0088] 216 testing predictive models [0089] 218 creating ensemble of predictive models [0090] 220 surgery duration analyser [0091] 230 quantify deltas between predicted and actual surgery durations [0092] 232 categorize surgeries based on deltas [0093] 234 visualize deltas [0094] 236 select best predictive model [0095] 300 categories representing deltas between actual surgery [0096] duration and predicted surgery duration [0097] 302 ahead of schedule categories [0098] 304 on time category [0099] 306 behind schedule categories [0100] 310 surgical procedures according to medical subspecialty [0101] 312 number of surgeries [0102] 314 surgery duration [0103] 320 features [0104] 322 feature importance [0105] 400 method of generating predictive model [0106] 410 accessing medical data [0107] 420 identifying features in medical data [0108] 430 training predictive models [0109] 440 evaluating performance of predictive models [0110] 450 generating and outputting ensemble model [0111] 17.06.2022 [0112] 500 non-transitory computer-readable medium [0113] 510 data representing computer program
DETAILED DESCRIPTION OF EMBODIMENTS
[0114]
[0115] The system 100 is shown to comprise a data storage interface 120 to a data storage 20. In some embodiments, the data storage 20 may store medical data comprising records of surgeries. As also shown in
[0116] The system 100 is further shown to comprise a processor subsystem 140 configured to internally communicate with the data storage interface 120 via data communication 142, with a memory 160 via data communication 144 and with a user interface subsystem 180 via data communication 146. The memory 160 may for example be a volatile memory in which a computer program may be loaded which may cause the processor subsystem 140 to carry out functions which are described in this specification, for example in relation to generating a predictive model and/or to using the predictive model.
[0117] In some embodiments, the system 100 may comprise a user interface subsystem 180, which user interface subsystem may be configured to, during operation of the system 100, enable a user to interact with the system 100, for example using a graphical user interface. In particular, as also described elsewhere, the graphical user interface may enable the user to analyse the performance of predictive model(s). For that and other purposes, the user interface subsystem 180 is shown to comprise a user input interface 184 configured to receive user input data 82 from one or more user input devices 80 operable by the user. The user input devices 80 may take various forms, including but not limited to a keyboard, mouse, touch screen, microphone, etc.
[0118] In some embodiments, the processor subsystem 140 may be configured to, during operation of the system 100, generate a predictive model for predicting the surgery duration. For that purpose, the processor subsystem 140 may be configured to use a feature selection technique to identify a set of features in the training data 22, which set of features is predictive of the surgery duration, train a number of predictive models using the set of features in the training data as input and the surgery duration as prediction target, wherein the predictive models include at least a linear predictive model and a non-linear predictive model, and use the test data 24 to evaluate a performance of each the predictive models in predicting the surgery duration, wherein the evaluating of the performance comprises using a performance metric which characterises a time difference between a predicted surgery duration and an actual surgery duration. The processor subsystem 140 may be further configured to, based on the performance of the predictive models, generate an ensemble model which combines at least two of the predictive models and output the ensemble model so that it can be used by other systems and methods to predict the surgery duration.
[0119] In other embodiments, the processor subsystem 140 may alternatively or additionally be configured to, during operation of the system 100, use the predictive model in form of the ensemble model for inference purposes, namely, to predict the surgery duration from input data. In such embodiments, the processor subsystem 140 may be configured to, using the data storage interface 120, access model data representing the ensemble mode and input data indicative of at least a type of surgery, and to use the input data as input to the predictive model to obtain a prediction of the surgery duration as an output. The predicted surgery duration may for example be output by the system 100 on the display 60.
[0120] These and other operations of the system 100, and various optional aspects thereof, will be explained in more detail with reference to
[0121] In general, the system 100 may be embodied as, or in, a single device or apparatus. The device or apparatus may be a general-purpose device or apparatus, such as a workstation or a computer, but may also be application-specific, such as a patient monitor. The device or apparatus may comprise one or more microprocessors which may represent the processor subsystem, and which may execute appropriate software. The software may have been downloaded and/or stored in a corresponding memory, e.g., a volatile memory such as RAM or a non-volatile memory such as Flash. Alternatively, the functional units of the system, e.g., the input interface, the user interface subsystem, and the processor subsystem, may be implemented in the device or apparatus in the form of programmable logic, e.g., as a Field-Programmable Gate Array (FPGA). In general, each functional unit of the system 100 may be implemented in the form of a circuit. It is noted that the system 100 may also be implemented in a distributed manner, e.g., involving different devices or apparatuses. For example, the distribution may be in accordance with a client-server model, e.g., using a server and workstation. For example, the user input interface and the display output interface may be part of the workstation, while the processor subsystem may be a subsystem of the server. It is noted that various other distributions are equally conceivable.
[0122]
[0123] The SDA may be configured to allow the performance of one or more predictive models to be analysed. In an exemplary workflow using the SDA, in a first step 230, an error may be quantified between an actual surgery duration and a surgery duration as predicted by a current predictive model, e.g., as used within the hospital or other health care institution. The current predictive model may for example be a predictive model which was previously generated using machine learning or may be a non-machine learned model which was generated based on heuristics. The error, which may elsewhere also be referred to as a delta, may be determined for surgeries for which the actual surgery duration is known, for example for a set of surgeries which were previously performed within a hospital, e.g., within the last month(s) or year(s). By way of example, the following example refers to a set of cardiothoracic surgeries performed within a particular timeframe and within one hospital, on which the systems and methods described in this specification were evaluated. The deltas may for example be evaluated for all surgeries within the set, or for relevant sub-categories of surgeries. For example, the performance evaluation may distinguish between elective and acute surgeries and/or between medical specialties.
[0124] The deltas may in general be expressed in various ways, for example as a Root Mean Square Error (RMSE) or Mean Absolute Error (MAE). However, while such error metrics are well-known in the field of data analysis, they are not intuitive for use by clinical staff and planners. Further, these error measures do not distinguish between positive and negative time deltas. Instead, or additionally, a performance metric may be used which characterises whether the time difference between the predicted surgery duration and the actual surgery duration is positive or negative, as well as a degree of the difference. For example, such a performance metric may categorize the time difference into a number of categories, as for example explained with reference to
[0125] With continued reference to
[0126]
[0127] For example, as also shown in
[0128] With continued reference to
[0129] In a second step 212, features may be extracted from the medical data. For example, some types of features may characterize the surgery while other types of features may characterize the patient. Examples of the former type of features are given in Table 1 below, while examples of the latter type of features are given in Table 2 below. It is noted that typically, all features are available before the surgery is performed, e.g., at the point of surgery planning, and typically even before the patient is hospitalized.
TABLE-US-00001 TABLE 1 Features based on surgery characteristics Features Examples of values Surgeon ID unclustered ID_6791, ID_86 clustered Hierarchical clustering - ID_C1, ID_C2 Surgeon type categorical Attendings vs. Residents Surgery Procedures unclustered CABG, AVR, etc. (Px) clustered According to: medical subspecialties Hierarchical clustering - Px_C1, Px_C2 Ave Px time per surgeon 183 min, 316 min Number of Px categorical Single, double, multiple (>=3) discrete 1, 2, 3, 4, 5, . . . Surgery urgency categorical Acute, Elective Post OR type of bed categorical ICU, PACU, general ward (GW)
TABLE-US-00002 TABLE 2 Features based on patient characteristics Features Examples of values Age discrete 67 years, 86 years categorical [18-29], [30-44], [45-69], . . . Gender categorical Male vs. Female BMI categorical Underweight, Normal, Overweight, Obese ASA score discrete 1, 2, 3, 4, 5 Medication discrete 1, 2, . . . , 22 numbers categorical [1, 5], [6-10], [11-15], [16++] Creatinine levels categorical renal failure, severe decrease, moderate decrease, mild decrease, normal
[0130]
[0131] After having extracted a set of features from the training data, the feature selection may be performed, by which a set of features may be selected from the larger set of extracted features, which set of features is most predictive of the surgery duration. For the feature selection, any known feature selection technique may be used. In a specific example, a two-step feature selection technique may be used. For example, the SDP may first perform univariate inferential analysis to investigate how well each of the extracted features can predict the surgery duration. Features that are not statistically significant predictors (e.g., p-values>0.05) may be dropped, see for example Table 3 below in which it is shown that gender is not a statistically significant predictor for surgery duration in this particular instance of medical data, i.e., this particular set of cardiothoracic surgeries.
TABLE-US-00003 TABLE 3 Feature selection - univariate analysis Univariate inferential analysis by linear regression models for surgery duration Pt char, n (%) p (<0.05) Pt char, n (%) p (<0.05) Age category p < 0.001 ASA score p = 0.0385 18-29 years, 67 (3%) 1 cat, 33 (1%) 30-44 years, 63 (3%) 2 cat, 127 (5%) 45-59 years, 477 3 cat, 1032 (44%) (20%) 4 cat, 665 (28%) 60-74 years, 1265 5 cat, 9 (0%) (54%) Unknown, 497 (21%) 75+ years, 491 (21%) Gender p = 0.899 Meds number p = 0.0041 M, 1777 (75%) Range [0, 22] F, 586 (25%) BMI category p < 0.001 Creatinine levels p = 0.01592 Unknown, 365 (15%) renalFailure, 9 (0%) Underweight, 20 (1%) severeDecr, 120 (6%) Normal, 543 (23%) modDecr, 166 (7%) Overweight, 923 mildDecr, 1013 (43%) (39%) normal, 759 (32%) Obese, 512 (22%) unknown, 296 (12%)
[0132] Secondly, the SDP may perform multivariate inferential analysis, for example using the Boruta algorithm based on Random Forest ML technique, which may categorize features into important, tentative, and unimportant features. An example of the output of the Boruta algorithm is visualized in
[0133] With continued reference to
[0134] Having trained the predictive models, in a fourth step 216, a performance of each of the predictive models in predicting the surgery duration may be evaluated. For that purpose, a second part of the medical data may be used, which part may elsewhere also be referred to as test data. Such performance evaluation may for example make use of the traditional RMSE and MAE errors, which yield the performance numbers listed in Table 4 below, in which separate models were trained for elective and acute surgeries. Preferably, however, the performance evaluation comprises using a performance metric which characterises the time difference between the predicted surgery duration and the actual surgery duration. For example, the performance metric may indicate whether the difference between the predicted surgery duration and actual surgery duration is positive or negative and the magnitude of the difference. In some embodiments, the difference may also be categorized in several categories, e.g., as shown in
TABLE-US-00004 TABLE 4 Predictive model performance. Elective* Acute** Models RMSE MAE RMSE MAE Current 0.9894 0.7080 1.8740 1.2210 New LM 0.8375 0.6464 0.9246 0.6561 models RF 0.8278 0.6213 0.9189 0.6054 GB 0.7994 0.6075 0.8910 0.6165 *LM, RF, GB predictive models trained on cardiothoracic elective surgeries only. **LM, RF, GB predictive models trained on cardiothoracic acute surgeries only.
[0135] With continued reference to
[0136] Once the ensemble model(s) are generated, the SDA may again perform its steps 230-236 to quantify the performance improvement over the current predictive model. If there are multiple predictive models, for example a set of ensemble models or a combination of individual predictive models and ensemble models, the performance improvement may be evaluated for each of these models. For the performance evaluation, the aforementioned performance metric may be used which uses categorization with clinical meaning such as behind-schedule, on time, ahead of schedule. Additionally, or alternatively, known error metrics such as the RMSE/MAE may be used in the evaluation. In general, the performance evaluation may be on a third part of the medical data, which third part is elsewhere also referred to as validation data. The best performance model may then be selected for subsequent inferential use, for example for use as a predictor in actual surgery planning.
[0137] For the medical data of the set of cardiothoracic surgeries, the best predictive model for surgery duration of acute surgeries was an ensemble model, namely LM+GB blended by RF, highlighted in Table 5 below. This ensemble model reduced the number of surgeries behind-schedule by 28% (from 60% to 32%) and increased the surgery on-time by 15% (from 30% to 45%). The number of surgeries ahead-of-schedule was increased by 13% (from 10% to 23%), which may be considered to be an acceptable trade-off.
TABLE-US-00005 TABLE 5 Predictive models performance of cardiothoracic acute surgeries Cardiothoracic acute surgeries ahead of schedule on time behind the schedule 60 (60, 20] (20, 20] (20, 60] (60, 120] 60++ Models min min min min min min Current 3% 7% 30% 30% 11% 19% New LM 5% 29% 32% 20% 12% 2% single RF 3% 24% 39% 22% 8% 4% models GB 4% 31% 37% 16% 9% 3% New LM + RF 5% 21% 38% 24% 6% 6% ensemble blended by GB models LM + GB 6% 17% 45% 17% 8% 7% blended by RF RF + GB 3% 21% 44% 21% 7% 4% blended by LM LM + RF + GB 5% 17% 41% 23% 8% 6% blended by GB LM + RF + GB 3% 24% 43% 17% 9% 4% blended by RF LM + RF + GB 3% 22% 42% 21% 9% 3% blended by LM
[0138] The best predictive model for surgery duration of elective cardiothoracic surgeries was also an ensemble model, namely the combination of LM+RF+GB blended by LM, highlighted in Table 6. This ensemble model reduced the number of surgeries behind-schedule by 9% (from 37% to 28%) and increased the surgery on-time by 5% (from 33% to 38%). The number of surgeries ahead-of-schedule was increased by 4% (from 30% to 34%), which again may be considered to be an acceptable trade-off.
TABLE-US-00006 TABLE 6 Predictive models performance of cardiothoracic elective surgeries Cardiothoracic elective surgeries ahead of schedule on time behind the schedule 60 (60, 20] (20, 20] (20, 60] (60, 120] 60++ Models min min min min min min Current 7% 23% 33% 21% 11% 5% New LM 9% 27% 33% 20% 9% 2% single RF 7% 25% 37% 22% 7% 2% models GB 7% 26% 35% 23% 7% 2% New LM + RF 9% 28% 34% 19% 7% 3% ensemble blended by GB models LM + GB 10% 23% 36% 20% 9% 2% blended by RF RF + GB 7% 28% 37% 19% 7% 2% blended by LM LM + RF + GB 8% 27% 36% 20% 7% 2% blended by GB LM + RF + GB 8% 25% 36% 20% 8% 3% blended by RF LM + RF + GB 7% 27% 38% 20% 6% 2% blended by LM
TABLE-US-00007 TABLE 7 Surgery duration improvements for elective and acute surgeries. Surgeries Cardiothoracic elective* Cardiothoracic acute** behind the reduced 9% (from 37% reduced 28% (from 60% schedule to 28%) to 32%) on time increased 5% (from 33% increased 15% (from 30% to 38%) to 45%) ahead of increased 4% (from 30% increased 13% (from 10% schedule to 34%) to 23%) *Predictive model (LM + RF + GB by LM) trained on cardiothoracic elective surgeries only. **Predictive model (LM + GB by RF) trained on cardiothoracic acute surgeries only.
[0139]
[0140]
[0141] The method 400 is shown to comprise, in an operation titled ACCESSING MEDICAL DATA, accessing 410 medical data as described elsewhere in this specification, and generating a predictive model for predicting the surgery duration by, in an operation titled IDENTIFYING FEATURES IN MEDICAL DATA, using a feature selection technique to identify 420 a set of features in training data, which set of features is predictive of the surgery duration, wherein the training data comprises a first part of the medical data, in an operation titled TRAINING PREDICTIVE MODELS, training 430 a number of predictive models using the set of features in the training data as input and the surgery duration as prediction target, wherein the predictive models include at least a linear predictive model and a non-linear predictive model, in an operation titled EVALUATING PERFORMANCE OF PREDICTIVE MODELS, using a second part of the medical data, evaluating 440 a performance of each the predictive models in predicting the surgery duration, wherein the evaluating of the performance comprises using a performance metric which characterises a time difference between a predicted surgery duration and an actual surgery duration, and in an operation titled GENERATING AND OUTPUTTING ENSEMBLE MODEL, based on the performance of the predictive models, generating 450 an ensemble model which combines at least two of the predictive models and outputting the ensemble model for use in predicting the surgery duration. It will be appreciated that in general, operations of method 400 of
[0142] The method may be implemented on a computer as a computer implemented method, as dedicated hardware, or as a combination of both. As also illustrated in
[0143] Examples, embodiments or optional features, whether indicated as non-limiting or not, are not to be understood as limiting the invention as claimed.
[0144] It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of the verb comprise and its conjugations does not exclude the presence of elements or stages other than those stated in a claim. The article a or an preceding an element does not exclude the presence of a plurality of such elements. Expressions such as at least one of when preceding a list or group of elements represent a selection of all or of any subset of elements from the list or group. For example, the expression, at least one of A, B, and C should be understood as including only A, only B, only C, both A and B, both A and C, both B and C, or all of A, B, and C. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.