Abstract
Assessing the risk of postoperative cardiovascular events before performing non-cardiac surgery is clinically important. The current risk score systems for preoperative evaluation may not adequately represent a small subset of high-risk populations. Accordingly, this study aimed at applying iterative random forest to analyze combinations of factors that could potentially be clinically valuable in identifying these high-risk populations. To this end, we used the Japan Medical Data Center database, which includes claims data from Japan between January 2005 and April 2021, and employed iterative random forests to extract factor combinations that influence outcomes. The analysis demonstrated that a combination of a prior history of stroke and extremely low LDL-C levels was associated with a high non-cardiac postoperative risk. The incidence of major adverse cardiovascular events in the population characterized by the incidence of previous stroke and extremely low LDL-C levels was 15.43 events per 100 person-30 days [95% confidence interval, 6.66–30.41] in the test data. At this stage, the results only show correlation rather than causation; however, these findings may offer valuable insights for preoperative risk assessment in non-cardiac surgery.
Similar content being viewed by others
Introduction
More than 200 million surgeries are estimated to be performed annually worldwide1, with cardiovascular events, such as myocardial infarction and cardiac arrest, being among the most critical complications2. Accurate risk assessment is essential for balancing the benefits and risks of surgery and determining the need for interventions to mitigate risks. Furthermore, accurate risk assessment plays a crucial role in scheduling surgery, optimizing necessary hospital resources, and planning postoperative monitoring3,4,5. Risk assessment encompasses both patient and surgical factors, with known patient risk factors such as poor functional status, advanced age, diabetes, and renal dysfunction2. For preoperative assessment of non-cardiac surgery, simple scores combining these patient and surgical factors are used in medical practice. International guidelines recommend using such risk assessment scores6,7,8,9. Numerous risk scores have been developed and validated for non-cardiac surgery10,11,12,13,14,15,16,17; however, combinations of multiple factors that have not been established as risk factors may also identify high-risk groups for developing postoperative cardiovascular events, but exploratory studies on such findings are limited. Traditionally, risk score development has relied on calculating odds ratios from generalized linear models and hazard ratios from Cox proportional hazard models18. While the final prediction scores can stratify the risk, they may not capture a high-risk small population defined by a combination of factors that are not statistically significant in linear models. Identifying high-risk populations using the combination of multiple factors is highly beneficial in the preoperative assessment of patients undergoing non-cardiac surgery. This approach enables the screening of small yet high-risk populations.
The recent advances in machine learning techniques owing to improvements in computing power led to numerous developments in medical research19,20,21. A significant advantage of machine learning technology is its ability to automate feature extraction based on data22,23, which has the potential to identify high-risk patient populations. Combining multiple factors to detect high-risk patient groups can be applied using methods for identifying interactions or effect modifications. In this context, tree-based machine learning models are particularly noted for their suitability for interaction search, and algorithms have been refined to detect beneficial interactions more efficiently24,25. Tree-based machine learning uses combinations of multiple explanatory factors to make classifications through a step-by-step process of building a decision tree. To uncover effect modifications, capturing complex nonlinear relationships between covariate variables and outcomes is essential. Random Forests26, which consist of a set of decision trees, are recognized as machine learning algorithms capable of uncovering interactions27,28,29. However, even with the appropriate fitting of the random forest model, considerable challenges remain, including imbalances in the resulting data, potential biases in feature extraction, and difficulties in interpreting effect modification among variables. Consequently, extracting meaningful effect modifications from the model for predicting postoperative risk is not straightforward. To address these challenges, we applied iterative random forests (iRF) to extract effect modifications from highly imbalanced data and detect combinations of factors identifying high-risk populations in imbalanced clinical data. iRF is recognized as a method for extracting influential combinations of multiple explanatory factors30. Originally developed to identify gene combinations exhibiting interactions from gene expression data, iRF can be applied to clinical data for the same purpose. This study aimed at applying machine learning to identify novel combinations of features that define high-risk groups for cardiovascular events after undergoing non-cardiac surgery, which are not captured by existing prediction scores.
Results
Baseline characteristics of the study population
Using the selection flow, 616,019 surgical cases were included in the analysis. They were split by stratified 2-fold splitting into 308,009 and 308,010 cases for training and test dataset, respectively (Fig. 1; Table 1). No significant differences were observed in explanatory variables between the training and test data, except for preoperative treatment with insulin. Based on Pearson’s correlation coefficient, the iRF was performed using the remaining explanatory variables (sex, age, body mass index [BMI], low-density lipoprotein [LDL] cholesterol, aspartate aminotransferase [AST], history of ischemic heart disease, history of congestive heart failure, history of cerebrovascular disease, and elevated-risk surgery) after excluding the explanatory variables with high correlation coefficients (Figure S1-3).
Cumulative incidence plots using the Kaplan–Meier method. The plots show the cumulative incidence of postoperative MACE occurrence stratified by a combination of history of cerebrovascular disease and LDL-C < 51.34 mg/dL. The light colors at the top and bottom of the plot indicate confidence intervals. MACEs, major adverse cardiac events.
Extraction of high-risk conditions based on iRF
Table 2 shows the combinations of explanatory variables that were identified by iRF and threshold optimization which had a high incidence in the training data and the results of validation on the test data. The high incidence and reproducibility in the test data identified a combination of previous stroke and extremely low LDL-C (51.34 mb/dL). The cumulative incidence plots constructed using the Kaplan–Meier method revealed that the population with a combination of previous stroke and extremely low LDL-C displayed a significant event occurrence using the log-rank test (Fig. 2). In Fig. 2, the wide confidence intervals reflect the small sample size of the target population. However, since the incidence of outcomes within 30 days exceeds 10%, this population should be considered high risk.
To examine the patient characteristics according to the presence or absence of previous stroke and extremely low LDL levels, explanatory variables were tabulated for each combination of conditions. The results, tabulated for each cluster of each condition combination, are shown in Table 3. The group of patients with a history of stroke and extremely low LDL-C had a higher proportion of males (87.8%) and relatively older mean age of 56.3 years. This group tended to have higher levels of triglycerides and γ-glutamyl transpeptidase than the other groups. In addition, this group tended to have higher percentages of patients with known risk factors of ischemic heart disease, heart failure, and insulin use for diabetes than the other groups. Renal dysfunction was another known risk factor, and this group tended to have higher mean creatinine levels.
Validation of effect modification in a high-risk condition
One approach to evaluate the effects of two factors is to test for effect modification. Effect modification occurs when the effect of a particular exposure on an outcome varies depending on the value of another variable, known as the effect modifier, which may not necessarily be part of the causal pathway. To rigorously assess effect modification, it is recommended to evaluate both additive and multiplicative scales31. We adopted relative excess risk due to interaction (RERI), attributable proportion due to interaction (AP), and synergy index (SI) as evaluation indices for effect modification by additive scale according to recommendations. Multiplicative modification effects occur when the interaction among variables results in combined effects that are proportional to the product of their individual effects rather than a simple sum. Metrics such as RERI, AP, and SI are used to evaluate additive modification effects, determining whether the combination of two risk factors increases the risk beyond what would be expected from their individual contributions. To validate the effect modification of previous history of stroke and extremely low LDL-C, after visually confirming the proportional hazard assumption with complementary log-log plots (Figure S4), hazard ratios for each condition, multiplicative scale, RERI, AP, and SI were examined (Table 4). The obtained multiplicative scales did not show significant multiplicative effect between previous history of stroke and extremely low LDL-C. However, RERI and AP were significantly greater than 0 and SI significantly greater than 1. These results indicated the presence of additive effect modification between previous stroke and extremely low LDL-C.
Discussion
In this study, machine learning was used to extract combinations of explanatory variables to identify novel high-risk conditions associated with the composite outcome of cardiovascular events and death following non-cardiac surgery. The combination of prior stroke and extremely low LDL-C levels was found to be consistently high risk, despite being validated in a small patient population in both the training and test datasets. However, the sample size of the obtained group was very small, and the confidence interval of the incident plot was wide; therefore, it is considered that it is at a stage where reproducibility must be verified using different data in the future. Existing risk scores, such as the Revised Cardiac Risk Index (RCRI)12 and Cardiovascular Risk Index15, do not incorporate LDL-C as a parameter. The results of this survey suggest that it is possible to identify high-risk individuals by searching for cases with extremely low LDL-C and a history of stroke among populations that have been classified as low-risk based on these screening scores. Within the scope of this analysis, it is possible that patients with both a history of stroke and extremely low LDL-C levels had a higher prevalence of known perioperative risk factors, potentially identifying an already recognized high-risk group. However, the results indicate an additive effect modification between prior stroke and extremely low LDL-C levels, as supported by the measures of RERI, AP, and SI.
Previous stroke is a known risk factor for perioperative risk; however, the effect of extremely low LDL-C levels on the risk of developing cardiovascular events and death is debatable. The relationship between LDL-C levels and the risk of cardiovascular events or mortality may follow a U-shaped pattern rather than a simple linear relationship32,33. High LDL-C is a well-established contributor to the progression of atherosclerotic diseases. From the perspective of cardiovascular disease prevention, it is widely accepted that lower LDL-C levels are generally beneficial34. Consequently, elevated LDL-C levels are typically considered a significant risk factor for cardiovascular disease34,35,36,37. However, there is also evidence suggesting that extremely low LDL-C levels such as 70 mg/dL or lower increase the risk of cardiovascular events36,32,38,39,40. These extremely low levels can be categorized into two types: those induced by statin therapy and those occurring naturally. Previous studies have shown that individuals with naturally and extremely low baseline LDL-C levels may have an increased cardiovascular risk, potentially indicating underlying malnutrition41. In the context of surgery, extremely low LDL-C levels following coronary artery bypass grafting have been identified as a risk factor for adverse cardiovascular outcomes42. However, to the best of our knowledge, the effect of extremely low LDL-C levels in the context of non-cardiac surgeries has not been addressed. In this regard, our study suggests that even in non-cardiac surgeries, extremely low preoperative LDL-C levels may be linked to an increased risk of postoperative cardiovascular complications.
In terms of stroke and low LDL-C aspects, although a causal relationship between LDL-C lowering treatment and hemorrhagic stroke remains unclear43, previous meta-analyses of 23 studies on stroke and LDL-C have reported an inverse association between LDL-C and risk of hemorrhagic stroke44; our study findings are consistent with that report. However, lowering LDL-C after stroke is an established effective secondary prevention strategy against recurrence45. It is important to emphasize that our study does not establish a causal relationship between strong reduction of LDL-C levels after stroke and the occurrence of non-cardiac postoperative cardiovascular events and mortality. An important finding of this study was the identification of a population characterized by a combination of previous stroke and extremely low LDL-C levels as potentially at high risk for postoperative complications following non-cardiac surgery. This finding may be useful for further research into preoperative detection of patients at high risk of complications after non-cardiac surgery.
Although no significant multiplicative effect modification was observed for the obtained combination of previous stroke and extremely low LDL-C conditions, results of RERI, AP and SI supported the existence of additive effect modification. The results observed in this study indicate that more attention should be directed at the presence of extremely low LDL-C in conjunction with a history of stroke, although extremely low LDL-C had a significant hazard ratio with or without a history of stroke. However, concerning the ability to detect high-risk groups, this study is limited to the internal validation of the databases in Japan; thus, external validity on larger populations needs further validation.
Additionally, the causality between those variables is unclear. The fact that the cluster with previous stroke and extremely low LDL-C had a higher proportion of known risk factors in the patient characteristic aggregation for each condition does not rule out the possibility that the correlation with known risk factors identifies a high-risk group when these two conditions are present. Future studies are needed to determine whether the risk is increased by a causal interaction of these two factors.
In this study, iRF uncovered combinations of factors with additive effect modification in clinical data. The used algorithm changed the process of testing the stability of combinations of explanatory variables by bagging to an algorithm that focuses on the statistical significance and the estimated magnitude of the incident rate. The statistical significance of the incident rate and magnitude of the point estimate were replicated in the test and training data in only one of the 10 selected combinations. While the effectiveness of combination extraction was not outstanding, significant combinations in the test data were still identified and reproduced. Although iRF was originally developed as an algorithm to extract interactions between genes from gene expression data, and such modifications were made to apply it to clinical data, more optimal conditions for modifying the algorithm may exist for clinical data.
This study has some limitations. First, the analysis is based on claim data and the analyzed information is based on recorded codes. This may lead to biases with real-life clinical phenomena. In addition, the ICD-10 codes defining major adverse cardiovascular events (MACE) are not standardized46. In particular, this study relied primarily on ICD-10 codes for diagnostic information, a widely used method in database research. While the use of ICD-10 codes in Japanese medical claims data is known to have high sensitivity, it has been reported to exhibit lower positive predictive value47. This raises the possibility of a diagnostic bias, particularly toward overdiagnosis. Therefore, future research should consider using alternative datasets and conducting prospective validation to improve the robustness of the findings. Second, the Japan Medical Data Center (JMDC) data used in this study were primarily collected from corporate health insurance associations48. Since the dataset predominantly includes middle-aged to young workers and their families, it lacks sufficient representation of older individuals aged 75 and older, as well as non-working populations, which may limit the generalizability of the findings. It is important to consider the possibility that older non-workers may exhibit different characteristics from the study’s target population, and, in particular, that employment itself may serve as an indicator of good health. Additionally, given that the data are from Japan, where a universal health insurance system is in place, further validation using data from countries with different insurance systems is essential to ensure the broader applicability of these findings. Third, each analysis performed in this study was a complete case analysis drawn from data containing missing data. As shown in Table S1 in supporting information 1, the proportion of missing blood test data derived from health checkup records was too high to reliably apply imputation methods. In this study, group identification using iRF involved defining groups through combinations of multiple variables and assessing their associated risks. Consequently, a complete case analysis was conducted at each stage of defining these variable combinations. This approach may have exacerbated selection bias inherent to the database, which primarily consists of data from corporate health insurance associations. As noted in the database limitations, caution is warranted when interpreting the findings.
By applying machine learning techniques, a combination of previous stroke and extremely low LDL-C was identified to be correlated with high risk for developing composite outcome of MACE and death after non-cardiac surgery. However, it is important to note that the study findings do not demonstrate multiplicative modification effect of prior stroke and extremely low LDL-C levels but are limited to significant additive modification effect. Furthermore, as discussed earlier, this study is subject to several limitations. To establish external validity, it would be beneficial to verify the findings within Japan using other datasets and replicate the analysis using data from countries outside Japan. Additionally, given that this is a retrospective observational study, conducting a prospective study in the future would enhance the robustness of the findings. Considering these perspectives, this finding provides useful information for risk assessment prior to non-cardiac surgery. If the findings of this study are further validated, the accuracy of screening high-risk populations during the perioperative period could be enhanced.
Methods
Study design and population
This research utilized data from the JMDC claims database, which includes medical claims and health examination records in Japan48. The JMDC database, available for purchase, contains information on around 11.6 million individuals under the age of 75 between January 2005 and April 2021. Based on the surgical codes in Japan, K-codes, we excluded codes related to cardiac surgery, supplementary procedures, and blood transfusions to ensure that only non-cardiac surgeries were included in the analysis. In cases where multiple non-cardiac surgery codes were recorded on the same day, the procedure with the highest insurance points was selected for analysis. As a result, 3,797,257 cases with recorded K-codes for non-cardiac surgery were selected for analysis. The list of K-codes analyzed is presented in the supporting information 2. From the cases with these K-codes, we extracted those where the patient was at least 18 years old at the time of surgery, surgery was performed during a hospital stay, and surgery was performed under general or spinal anesthesia. Based on the number of outcome cases, data were divided into training and test datasets by stratified 2-fold splitting. The study was designed to detect combinations of factors predicting a high-risk population using the training dataset and validate the obtained combinations using test dataset.
Ethics approval
The study was conducted in accordance with the principles of the Declaration of Helsinki. Although this study used anonymized data and was outside the scope of the guidelines requiring informed consent or opt-out procedures in Japan, it was conducted after registration with the Ethics Committee of the University of Tokyo Hospital (Approval No. 2024105NIe).
Measurements
The K-codes used for the medical claims process of surgery in Japan were used to distinguish between cardiac and non-cardiac surgery. The administration of general or spinal anesthesia was determined by whether the medical claims code was calculated on the same day. The incidence of previous ischemic heart disease was determined by whether I20–I25 of the ICD-10 codes recorded in the database were preoperatively recorded. Similarly, history of stroke was determined by the presence of ICD-10 codes I60–I64 and G459, while history of diabetes was determined by the presence of ICD-10 codes E10–E14. For drugs, the JMDC database has Anatomical Therapeutic Chemical codes available, and insulin use was determined by the presence of the code A10A. For the items in the blood sample examination results, values from health examinations within one year prior to the operation were adopted.
Outcomes
MACE was defined as the combined outcome of myocardial infarction, heart failure, stroke, cardiopulmonary arrest, and death within 30 days after surgery. Each diagnosis was determined by the recorded ICD-10 codes, with myocardial infarction defined by I21–I22, heart failure by I50 and I110, stroke by I64–I64 and G459, and cardiopulmonary arrest by I46. Information about death was obtained from the diagnostic outcome information. The maximum observation period for outcome occurrence was 30 days after surgery.
Variable selection
In addition to the various items measured during health check-ups and recorded in a database, explanatory variables used in the RCRI were obtained from the database as the first explanatory variables. The variables used in the search for combinations were selected from these variables, with Pearson’s correlation coefficients calculated a priori to ensure that no factor exhibited high correlation to avoid multicollinearity. Specifically, we examined which variables should be retained based on clinical importance for combinations that presented absolute value of Pearson’s correlation coefficient of 0.3 or more, and ultimately retained the following 10 items for later analysis: sex, age, BMI, LDL-C, AST, history of ischemic heart disease, history of congestive heart failure, history of cerebrovascular disease, preoperative treatment with insulin, and elevated-risk surgery. The plots of the Pearson’s correlation coefficients before and after variable selection are shown in Figure S2 and Figure S3.
Combination extraction with machine learning
iRF was used to extract the combinations of factors that predict a high risk of postoperative MACE and occurring death events. The previously reported iRF algorithmic processes can be divided into three steps30. First, in the process of iteratively training the random forest model with the training data, the weighting of the explanatory variables is iteratively modified using the importance of the explanatory variables to generate a set of decision trees in which important explanatory variables predominantly appear. Second, the discriminant rules in the decision tree group are mapped to binary rules for each branch. Third, the algorithm evaluates and presents the stability of the discriminant rules by means of a bagging step. In this study, the third step was modified to suit clinical data analysis. Specifically, the training data were divided into two groups according to the obtained combination of explanatory variables, the incident rates were calculated, and combinations with overlapping confidence intervals for the incident rate of the two groups were excluded from the analysis. The combinations of explanatory variables were then descendingly sorted according to the incident rate point estimates, and the top 10 pairs were adopted for analysis using the test data. For the explanatory variables included in the combination, the study optimized the Matthews correlation coefficient (MCC) using the greedy method for the training data and calculated the thresholds for each explanatory variable. The greedy method is an algorithm for solving optimization problems, in which the most profitable partial solution is selected at each stage of the calculation, with the final solution being the combination of these partial solutions. The MCC is a performance metric that yields a high score only when the classifier demonstrates strong performance across all four elements of the confusion matrix: sensitivity, specificity, accuracy, and negative predictive value49. In summary, the proposed method identifies the optimal binary values (0 or 1) for categorical variables and determines the optimal thresholds for continuous variables to stratify risk into two distinct groups based on the data. The top 10 combinations in terms of incidence rates were assessed for their capability to extract high-risk groups in the test data. Factor combinations that were able to extract high risk groups in the test and training data were finally adopted. Machine learning and threshold optimization calculations were performed using R (version 4.3.3). The iRF (version 3.0.0) was used in this study to extract combinations of explanatory variables30. During the execution of iRF, the number of weighted random forest iterations was set to 10. All other hyperparameters were configured to their default values: the number of trees grown in each iteration was 500, the depth of the random intersection trees 5, the number of random intersection trees 100, and number of children in each split of a random intersection tree 2.
Statistical analyses
Chi-square test was used for categorical variables, which are presented as numbers and percentages. While Student’s t-test were used for continuous variables, which are presented as means and standard deviations. For incidence calculations, an incidence rate of 30 days per 100 persons was calculated. The Kaplan–Meier method was used to delineate cumulative incidence plots and the log-rank test for testing. After visually confirming the proportional hazard assumption with a complementary log-log plot, the Cox proportional hazards analysis was performed to assess whether effect modification existed for the explanatory variables included in the obtained conditions. Cases with missing values for the included variables were excluded and analyzed in each of the incidence calculations, depiction of cumulative incidence plots, log-rank tests, and Cox proportional hazards analysis. The aggregate results of missing values are shown in S1 Table. The statistical analyses were performed using R (version 4.3.3; R Foundation for Statistical computing; Vienna; Austria). To perform effect modification analysis, InteractionR (version 0.1.7)50 was used. We performed a Cox proportional hazards analysis that included the two factors of LDL-C < 50.34 mg/dL and a history of stroke, and the outcome as a binary variable (0 or 1). In creating the model, a model was created that included an interaction term for the two factors of LDL-C < 50.34 mg/dL and a history of stroke, and multiplicative scale, RERI, AP, and SI were calculated based on that model. Multiplicative scale, RERI, AP, and SI were calculated using all data and reported in line with the reported recommendations31.
Data availability
The data that support the findings of this study are available from JMDC, Inc. but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the corresponding author upon reasonable request and with permission of JMDC, Inc.
References
Weiser, T. G. et al. An estimation of the global volume of surgery: A modelling strategy based on available data. Lancet 372, 139–144 (2008).
Smilowitz, N. R. & Berger, J. S. Perioperative cardiovascular risk assessment and management for noncardiac surgery: A review. JAMA 324, 279–290 (2020).
International Surgical Outcomes Study group. Global patient outcomes after elective surgery: Prospective cohort study in 27 low-, middle- and high-income countries. Br. J. Anaesth. 117, 601–609 (2016).
Pearse, R. M. et al. Mortality after surgery in Europe: A 7 day cohort study. Lancet 380, 1059–1065 (2012).
Smilowitz, N. R. et al. Perioperative major adverse cardiovascular and cerebrovascular events associated with noncardiac surgery. JAMA Cardiol. 2, 181–187 (2017).
Fleisher, L. A. et al. 2014 ACC/AHA guideline on perioperative cardiovascular evaluation and management of patients undergoing noncardiac surgery. Circulation 130, e278–e333 (2014).
Halvorsen, S. et al. ESC Guidelines on cardiovascular assessment and management of patients undergoing non-cardiac surgery: Developed by the task force for cardiovascular assessment and management of patients undergoing non-cardiac surgery of the European Society of Cardiology (ESC) Endorsed by the European Society of Anaesthesiology and Intensive Care (ESAIC). Eur. Heart J. 43, 3826–3924 (2022).
Duceppe, E. et al. Canadian cardiovascular society guidelines on perioperative cardiac risk assessment and management for patients who undergo noncardiac surgery. Can. J. Cardiol. 33, 17–32 (2017).
Hiraoka, E. et al. JCS 2022 Guideline on Perioperative Cardiovascular Assessment and Management for non-cardiac surgery. Circ. J. advpub (2023).
Goldman, L. et al. Multifactorial index of cardiac risk in noncardiac surgical procedures. N Engl. J. Med. 297, 845–850 (1977).
Detsky, A. S. et al. Predicting cardiac complications in patients undergoing non-cardiac surgery. J. Gen. Intern. Med. 1, 211–219 (1986).
Lee, T. H. et al. Derivation and prospective validation of a simple index for prediction of cardiac risk of major noncardiac surgery. Circulation 100, 1043–1049 (1999).
Gupta, P. K. et al. Development and validation of a risk calculator for prediction of cardiac risk after surgery. Circulation 124, 381–387 (2011).
Bilimoria, K. Y. et al. Development and evaluation of the Universal ACS NSQIP Surgical Risk Calculator: A decision aid and informed Consent Tool for patients and surgeons. J. Am. Coll. Surg. 217 (2013).
Dakik, H. A. et al. A new index for pre-operative cardiovascular evaluation. J. Am. Coll. Cardiol. 73, 3067–3078 (2019).
Alrezk, R. et al. Derivation and validation of a geriatric-sensitive perioperative cardiac risk index. J. Am. Heart Assoc. 6, e006648. https://doi.org/10.1161/JAHA.117.006648 (2017).
Bertges, D. J. et al. The vascular Study Group of New England Cardiac Risk Index (VSG-CRI) predicts cardiac complications more accurately than the revised Cardiac Risk Index in vascular surgery patients. J. Vasc Surg. 52, 674–683e3 (2010).
Zhang, Z., Zhang, H. & Khanal, M. Development of scoring system for risk stratification in clinical medicine: A step-by-step tutorial. Annals Translational Med. 5(2017).
An, Q., Rahman, S., Zhou, J. & Kang, J. J. A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors 23, 4178 (2023).
Krittanawong, C., Zhang, H., Wang, Z., Aydar, M. & Kitai, T. Artificial intelligence in precision cardiovascular medicine. J. Am. Coll. Cardiol. 69, 2657–2664 (2017).
Chakraborty, C., Bhattacharya, M., Pal, S. & Lee, S. From machine learning to deep learning: An advances of the recent data-driven paradigm shift in medicine and healthcare. Curr. Res. Biotechnol. 100164 (2023).
Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) [Internet] 9, 381–386 (2020).
Zheng, A. & Casari, A. In Feature Engineering for Machine Learning: Principles and Techniques for data Scientists (O’Reilly Media, Inc., 2018).
Lampa, E., Lind, L., Lind, P. M. & Bornefalk-Hermansson, A. The identification of complex interactions in epidemiology and toxicology: A simulation study of boosted regression trees. Environ. Health 13, 1–17 (2014).
García-Magariños, M., López‐de‐Ullibarri, I., Cao, R. & Salas, A. Evaluating the ability of tree‐based methods and logistic regression for the detection of SNP‐SNP interaction. Ann. Hum. Genet. 73, 360–369 (2009).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Wright, M. N., Ziegler, A. & König, I. R. Do little interactions get lost in dark random forests? BMC Bioinform. 17, 1–10 (2016).
Touw, W. G. et al. Data mining in the life sciences with random forest: A walk in the park or lost in the jungle? Brief. Bioinform. 14, 315–326 (2013).
Hornung, R. & Boulesteix, A. Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects. Comput. Stat. Data Anal. 171, 107460 (2022).
Basu, S., Kumbier, K., Brown, J. B. & Yu, B. Iterative random forests to discover predictive and stable high-order interactions. Proceedings of the National Academy of Sciences 115, 1943–1948 (2018).
Knol, M. J. & VanderWeele, T. J. Recommendations for presenting analyses of effect modification and interaction. Int. J. Epidemiol. 41, 514–520 (2012).
Rong, S. et al. Association of low-density lipoprotein cholesterol levels with more than 20-year risk of cardiovascular and all-cause mortality in the general population. J. Am. Heart Assoc. 11, e023690 (2022).
Peng, K., Li, X., Wang, Z., Li, M. & Yang, Y. Association of low-density lipoprotein cholesterol levels with the risk of mortality and cardiovascular events: A meta-analysis of cohort studies with 1,232,694 participants. Medicine 101, e32003 (2022).
Ference, B. A. et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel. Eur. Heart J. 38, 2459–2472 (2017).
Abdullah, S. M. et al. Long-term association of low-density lipoprotein cholesterol with cardiovascular mortality in individuals at low 10-year risk of atherosclerotic cardiovascular disease: Results from the Cooper Center Longitudinal Study. Circulation 138, 2315–2325 (2018).
Liu, Y. et al. Association between low density lipoprotein cholesterol and all-cause mortality: Results from the NHANES 1999–2014. Sci. Rep. 11, 22111 (2021).
Brunner, F. J. et al. Application of non-HDL cholesterol for population-based cardiovascular risk stratification: Results from the multinational Cardiovascular Risk Consortium. Lancet 394, 2173–2183 (2019).
Wu, M. et al. Association of low-density lipoprotein-cholesterol with all-cause and cause-specific mortality. Diabetes Metabolic Syndrome: Clin. Res. Reviews 17, 102784 (2023).
Johannesen, C. D. L., Langsted, A., Mortensen, M. B. & Nordestgaard, B. G. Association between low density lipoprotein and all cause and cause specific mortality in Denmark: Prospective cohort study. BMJ 371, m4266 (2020).
Kip, K. E., Diamond, D., Mulukutla, S. & Marroquin, O. C. Is LDL cholesterol associated with long-term mortality among primary prevention adults? A retrospective cohort study from a large healthcare system. BMJ Open. 14, e077949 (2024).
Zhao, X., Wang, D. & Qin, L. Lipid profile and prognosis in patients with coronary heart disease: A meta-analysis of prospective cohort studies. BMC Cardiovasc. Disord. 21, 1–15 (2021).
Rezaee, M. et al. The prognostic role of the low and very low baseline LDL-C level in outcomes of patients with cardiac revascularization; comparative registry-based cohort design. J. Cardiothorac. Surg. 18, 240 (2023).
Gurevitz, C., Auriel, E., Elis, A. & Kornowski, R. The association between low levels of low density lipoprotein cholesterol and intracerebral hemorrhage: Cause for concern? J. Clin. Med. 11, 536. https://doi.org/10.3390/jcm11030536 (2022).
Wang, X., Dong, Y., Qi, X., Huang, C. & Hou, L. Cholesterol levels and risk of hemorrhagic stroke: A systematic review and meta-analysis. Stroke 44, 1833–1839 (2013).
Lee, M. et al. Association between intensity of low-density lipoprotein cholesterol reduction with statin-based therapies and secondary stroke prevention: A meta-analysis of randomized clinical trials. JAMA Neurol. 79, 349–358 (2022).
Bosco, E., Hsueh, L., McConeghy, K. W., Gravenstein, S. & Saade, E. Major adverse cardiovascular event definitions used in observational analysis of administrative databases: A systematic review. BMC Med. Res. Methodol. 21, 241 (2021).
Kanaoka, K. et al. Validity of diagnostic algorithms for cardiovascular diseases in Japanese health insurance claims. Circ. J. 87, 536–542 (2023).
Nagai, K. et al. Data resource profile: JMDC claims database sourced from health insurance societies. J. Gen. Fam Med. 22, 118–127 (2021).
Chicco, D. & Jurman, G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16, 4 (2023).
Alli, B. Y. InteractionR: An R package for full reporting of effect modification and interaction. Softw. Impacts 10, 100147 (2021).
Acknowledgements
The authors would like to thank all of the staff and the graduate students of the Department of Healthcare Information Management at the University of Tokyo Hospital for providing an opportunity to continue this research.
Author information
Authors and Affiliations
Contributions
Conception and design of the study: T.S, Y.K; Data curation: T.S; Analysis and interpretation of data: T.S, T.T, Y.A, H.I, K.K, K.M, M.O, Y.K; Funding acquisition: Y.K; Visualization: T.S; Writing—original draft: T.S; Writing—review and editing: T.T, Y.A, H.I, K.K, K.M, M.O, Y.K; All authors reviewed and approved the final version to be submitted.
Corresponding author
Ethics declarations
Competing interests
YK is affiliated with the Artificial Intelligence and Digital Twin Development in Healthcare, Graduate School of Medicine, The University of Tokyo which is an endowment department. However, the sponsors had no influence over the interpretation, writing, or publication of this work. TS, TT, YA, HI, KK, KM, and MO have no conflicts of interest directly relevant to the content of this article.
Financial disclosure statement
This work was supported by Cross-ministerial Strategic Innovation Promotion Program on “Integrated Health Care System” (Grant Number JPJ012425). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Seki, T., Takiguchi, T., Akagi, Y. et al. Iterative random forest-based identification of a novel population with high risk of complications post non-cardiac surgery. Sci Rep 14, 26741 (2024). https://doi.org/10.1038/s41598-024-78482-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-78482-4