Iterative random forest-based identification of a novel population with high risk of complications post non-cardiac surgery

Seki, Tomohisa; Takiguchi, Toru; Akagi, Yu; Ito, Hiromasa; Kubota, Kazumi; Miyake, Kana; Okada, Masafumi; Kawazoe, Yoshimasa

doi:10.1038/s41598-024-78482-4

Download PDF

Article
Open access
Published: 05 November 2024

Iterative random forest-based identification of a novel population with high risk of complications post non-cardiac surgery

Tomohisa Seki¹,
Toru Takiguchi¹,
Yu Akagi³,
Hiromasa Ito¹,
Kazumi Kubota¹,
Kana Miyake¹,
Masafumi Okada¹ &
…
Yoshimasa Kawazoe^1,2

Scientific Reports volume 14, Article number: 26741 (2024) Cite this article

1594 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Assessing the risk of postoperative cardiovascular events before performing non-cardiac surgery is clinically important. The current risk score systems for preoperative evaluation may not adequately represent a small subset of high-risk populations. Accordingly, this study aimed at applying iterative random forest to analyze combinations of factors that could potentially be clinically valuable in identifying these high-risk populations. To this end, we used the Japan Medical Data Center database, which includes claims data from Japan between January 2005 and April 2021, and employed iterative random forests to extract factor combinations that influence outcomes. The analysis demonstrated that a combination of a prior history of stroke and extremely low LDL-C levels was associated with a high non-cardiac postoperative risk. The incidence of major adverse cardiovascular events in the population characterized by the incidence of previous stroke and extremely low LDL-C levels was 15.43 events per 100 person-30 days [95% confidence interval, 6.66–30.41] in the test data. At this stage, the results only show correlation rather than causation; however, these findings may offer valuable insights for preoperative risk assessment in non-cardiac surgery.

Interpretable machine learning models for predicting in-hospital and 30 days adverse events in acute coronary syndrome patients in Kuwait

Article Open access 12 January 2024

Machine learning-based prediction of 90-day prognosis and in-hospital mortality in hemorrhagic stroke patients

Article Open access 09 May 2025

Using machine learning for predicting intensive care unit resource use during the COVID-19 pandemic in Denmark

Article Open access 23 September 2021

Introduction

More than 200 million surgeries are estimated to be performed annually worldwide¹, with cardiovascular events, such as myocardial infarction and cardiac arrest, being among the most critical complications². Accurate risk assessment is essential for balancing the benefits and risks of surgery and determining the need for interventions to mitigate risks. Furthermore, accurate risk assessment plays a crucial role in scheduling surgery, optimizing necessary hospital resources, and planning postoperative monitoring^3,4,5. Risk assessment encompasses both patient and surgical factors, with known patient risk factors such as poor functional status, advanced age, diabetes, and renal dysfunction². For preoperative assessment of non-cardiac surgery, simple scores combining these patient and surgical factors are used in medical practice. International guidelines recommend using such risk assessment scores^6,7,8,9. Numerous risk scores have been developed and validated for non-cardiac surgery^{10,11,12,13,14,15,16,17}; however, combinations of multiple factors that have not been established as risk factors may also identify high-risk groups for developing postoperative cardiovascular events, but exploratory studies on such findings are limited. Traditionally, risk score development has relied on calculating odds ratios from generalized linear models and hazard ratios from Cox proportional hazard models¹⁸. While the final prediction scores can stratify the risk, they may not capture a high-risk small population defined by a combination of factors that are not statistically significant in linear models. Identifying high-risk populations using the combination of multiple factors is highly beneficial in the preoperative assessment of patients undergoing non-cardiac surgery. This approach enables the screening of small yet high-risk populations.

The recent advances in machine learning techniques owing to improvements in computing power led to numerous developments in medical research^19,20,21. A significant advantage of machine learning technology is its ability to automate feature extraction based on data^22,23, which has the potential to identify high-risk patient populations. Combining multiple factors to detect high-risk patient groups can be applied using methods for identifying interactions or effect modifications. In this context, tree-based machine learning models are particularly noted for their suitability for interaction search, and algorithms have been refined to detect beneficial interactions more efficiently^24,25. Tree-based machine learning uses combinations of multiple explanatory factors to make classifications through a step-by-step process of building a decision tree. To uncover effect modifications, capturing complex nonlinear relationships between covariate variables and outcomes is essential. Random Forests²⁶, which consist of a set of decision trees, are recognized as machine learning algorithms capable of uncovering interactions^27,28,29. However, even with the appropriate fitting of the random forest model, considerable challenges remain, including imbalances in the resulting data, potential biases in feature extraction, and difficulties in interpreting effect modification among variables. Consequently, extracting meaningful effect modifications from the model for predicting postoperative risk is not straightforward. To address these challenges, we applied iterative random forests (iRF) to extract effect modifications from highly imbalanced data and detect combinations of factors identifying high-risk populations in imbalanced clinical data. iRF is recognized as a method for extracting influential combinations of multiple explanatory factors³⁰. Originally developed to identify gene combinations exhibiting interactions from gene expression data, iRF can be applied to clinical data for the same purpose. This study aimed at applying machine learning to identify novel combinations of features that define high-risk groups for cardiovascular events after undergoing non-cardiac surgery, which are not captured by existing prediction scores.

Results

Baseline characteristics of the study population

Using the selection flow, 616,019 surgical cases were included in the analysis. They were split by stratified 2-fold splitting into 308,009 and 308,010 cases for training and test dataset, respectively (Fig. 1; Table 1). No significant differences were observed in explanatory variables between the training and test data, except for preoperative treatment with insulin. Based on Pearson’s correlation coefficient, the iRF was performed using the remaining explanatory variables (sex, age, body mass index [BMI], low-density lipoprotein [LDL] cholesterol, aspartate aminotransferase [AST], history of ischemic heart disease, history of congestive heart failure, history of cerebrovascular disease, and elevated-risk surgery) after excluding the explanatory variables with high correlation coefficients (Figure S1-3).

Table 1 Background characteristics of the study population.

Full size table

Extraction of high-risk conditions based on iRF

Table 2 shows the combinations of explanatory variables that were identified by iRF and threshold optimization which had a high incidence in the training data and the results of validation on the test data. The high incidence and reproducibility in the test data identified a combination of previous stroke and extremely low LDL-C (51.34 mb/dL). The cumulative incidence plots constructed using the Kaplan–Meier method revealed that the population with a combination of previous stroke and extremely low LDL-C displayed a significant event occurrence using the log-rank test (Fig. 2). In Fig. 2, the wide confidence intervals reflect the small sample size of the target population. However, since the incidence of outcomes within 30 days exceeds 10%, this population should be considered high risk.

Table 2 Results of iterative random forest and threshold optimization.

Full size table

To examine the patient characteristics according to the presence or absence of previous stroke and extremely low LDL levels, explanatory variables were tabulated for each combination of conditions. The results, tabulated for each cluster of each condition combination, are shown in Table 3. The group of patients with a history of stroke and extremely low LDL-C had a higher proportion of males (87.8%) and relatively older mean age of 56.3 years. This group tended to have higher levels of triglycerides and γ-glutamyl transpeptidase than the other groups. In addition, this group tended to have higher percentages of patients with known risk factors of ischemic heart disease, heart failure, and insulin use for diabetes than the other groups. Renal dysfunction was another known risk factor, and this group tended to have higher mean creatinine levels.

Table 3 Patient characteristics divided by history of stroke and extremely low LDL-C.

Full size table

Validation of effect modification in a high-risk condition

One approach to evaluate the effects of two factors is to test for effect modification. Effect modification occurs when the effect of a particular exposure on an outcome varies depending on the value of another variable, known as the effect modifier, which may not necessarily be part of the causal pathway. To rigorously assess effect modification, it is recommended to evaluate both additive and multiplicative scales³¹. We adopted relative excess risk due to interaction (RERI), attributable proportion due to interaction (AP), and synergy index (SI) as evaluation indices for effect modification by additive scale according to recommendations. Multiplicative modification effects occur when the interaction among variables results in combined effects that are proportional to the product of their individual effects rather than a simple sum. Metrics such as RERI, AP, and SI are used to evaluate additive modification effects, determining whether the combination of two risk factors increases the risk beyond what would be expected from their individual contributions. To validate the effect modification of previous history of stroke and extremely low LDL-C, after visually confirming the proportional hazard assumption with complementary log-log plots (Figure S4), hazard ratios for each condition, multiplicative scale, RERI, AP, and SI were examined (Table 4). The obtained multiplicative scales did not show significant multiplicative effect between previous history of stroke and extremely low LDL-C. However, RERI and AP were significantly greater than 0 and SI significantly greater than 1. These results indicated the presence of additive effect modification between previous stroke and extremely low LDL-C.

Table 4 Effect modification analysis of history of stroke and extremely low LDL-C.

Full size table

Discussion

In this study, machine learning was used to extract combinations of explanatory variables to identify novel high-risk conditions associated with the composite outcome of cardiovascular events and death following non-cardiac surgery. The combination of prior stroke and extremely low LDL-C levels was found to be consistently high risk, despite being validated in a small patient population in both the training and test datasets. However, the sample size of the obtained group was very small, and the confidence interval of the incident plot was wide; therefore, it is considered that it is at a stage where reproducibility must be verified using different data in the future. Existing risk scores, such as the Revised Cardiac Risk Index (RCRI)¹² and Cardiovascular Risk Index¹⁵, do not incorporate LDL-C as a parameter. The results of this survey suggest that it is possible to identify high-risk individuals by searching for cases with extremely low LDL-C and a history of stroke among populations that have been classified as low-risk based on these screening scores. Within the scope of this analysis, it is possible that patients with both a history of stroke and extremely low LDL-C levels had a higher prevalence of known perioperative risk factors, potentially identifying an already recognized high-risk group. However, the results indicate an additive effect modification between prior stroke and extremely low LDL-C levels, as supported by the measures of RERI, AP, and SI.

Previous stroke is a known risk factor for perioperative risk; however, the effect of extremely low LDL-C levels on the risk of developing cardiovascular events and death is debatable. The relationship between LDL-C levels and the risk of cardiovascular events or mortality may follow a U-shaped pattern rather than a simple linear relationship^32,33. High LDL-C is a well-established contributor to the progression of atherosclerotic diseases. From the perspective of cardiovascular disease prevention, it is widely accepted that lower LDL-C levels are generally beneficial³⁴. Consequently, elevated LDL-C levels are typically considered a significant risk factor for cardiovascular disease^34,35,36,37. However, there is also evidence suggesting that extremely low LDL-C levels such as 70 mg/dL or lower increase the risk of cardiovascular events^{36,32,38,39,40}. These extremely low levels can be categorized into two types: those induced by statin therapy and those occurring naturally. Previous studies have shown that individuals with naturally and extremely low baseline LDL-C levels may have an increased cardiovascular risk, potentially indicating underlying malnutrition⁴¹. In the context of surgery, extremely low LDL-C levels following coronary artery bypass grafting have been identified as a risk factor for adverse cardiovascular outcomes⁴². However, to the best of our knowledge, the effect of extremely low LDL-C levels in the context of non-cardiac surgeries has not been addressed. In this regard, our study suggests that even in non-cardiac surgeries, extremely low preoperative LDL-C levels may be linked to an increased risk of postoperative cardiovascular complications.

In terms of stroke and low LDL-C aspects, although a causal relationship between LDL-C lowering treatment and hemorrhagic stroke remains unclear⁴³, previous meta-analyses of 23 studies on stroke and LDL-C have reported an inverse association between LDL-C and risk of hemorrhagic stroke⁴⁴; our study findings are consistent with that report. However, lowering LDL-C after stroke is an established effective secondary prevention strategy against recurrence⁴⁵. It is important to emphasize that our study does not establish a causal relationship between strong reduction of LDL-C levels after stroke and the occurrence of non-cardiac postoperative cardiovascular events and mortality. An important finding of this study was the identification of a population characterized by a combination of previous stroke and extremely low LDL-C levels as potentially at high risk for postoperative complications following non-cardiac surgery. This finding may be useful for further research into preoperative detection of patients at high risk of complications after non-cardiac surgery.

Although no significant multiplicative effect modification was observed for the obtained combination of previous stroke and extremely low LDL-C conditions, results of RERI, AP and SI supported the existence of additive effect modification. The results observed in this study indicate that more attention should be directed at the presence of extremely low LDL-C in conjunction with a history of stroke, although extremely low LDL-C had a significant hazard ratio with or without a history of stroke. However, concerning the ability to detect high-risk groups, this study is limited to the internal validation of the databases in Japan; thus, external validity on larger populations needs further validation.

Additionally, the causality between those variables is unclear. The fact that the cluster with previous stroke and extremely low LDL-C had a higher proportion of known risk factors in the patient characteristic aggregation for each condition does not rule out the possibility that the correlation with known risk factors identifies a high-risk group when these two conditions are present. Future studies are needed to determine whether the risk is increased by a causal interaction of these two factors.

In this study, iRF uncovered combinations of factors with additive effect modification in clinical data. The used algorithm changed the process of testing the stability of combinations of explanatory variables by bagging to an algorithm that focuses on the statistical significance and the estimated magnitude of the incident rate. The statistical significance of the incident rate and magnitude of the point estimate were replicated in the test and training data in only one of the 10 selected combinations. While the effectiveness of combination extraction was not outstanding, significant combinations in the test data were still identified and reproduced. Although iRF was originally developed as an algorithm to extract interactions between genes from gene expression data, and such modifications were made to apply it to clinical data, more optimal conditions for modifying the algorithm may exist for clinical data.

This study has some limitations. First, the analysis is based on claim data and the analyzed information is based on recorded codes. This may lead to biases with real-life clinical phenomena. In addition, the ICD-10 codes defining major adverse cardiovascular events (MACE) are not standardized⁴⁶. In particular, this study relied primarily on ICD-10 codes for diagnostic information, a widely used method in database research. While the use of ICD-10 codes in Japanese medical claims data is known to have high sensitivity, it has been reported to exhibit lower positive predictive value⁴⁷. This raises the possibility of a diagnostic bias, particularly toward overdiagnosis. Therefore, future research should consider using alternative datasets and conducting prospective validation to improve the robustness of the findings. Second, the Japan Medical Data Center (JMDC) data used in this study were primarily collected from corporate health insurance associations⁴⁸. Since the dataset predominantly includes middle-aged to young workers and their families, it lacks sufficient representation of older individuals aged 75 and older, as well as non-working populations, which may limit the generalizability of the findings. It is important to consider the possibility that older non-workers may exhibit different characteristics from the study’s target population, and, in particular, that employment itself may serve as an indicator of good health. Additionally, given that the data are from Japan, where a universal health insurance system is in place, further validation using data from countries with different insurance systems is essential to ensure the broader applicability of these findings. Third, each analysis performed in this study was a complete case analysis drawn from data containing missing data. As shown in Table S1 in supporting information 1, the proportion of missing blood test data derived from health checkup records was too high to reliably apply imputation methods. In this study, group identification using iRF involved defining groups through combinations of multiple variables and assessing their associated risks. Consequently, a complete case analysis was conducted at each stage of defining these variable combinations. This approach may have exacerbated selection bias inherent to the database, which primarily consists of data from corporate health insurance associations. As noted in the database limitations, caution is warranted when interpreting the findings.

By applying machine learning techniques, a combination of previous stroke and extremely low LDL-C was identified to be correlated with high risk for developing composite outcome of MACE and death after non-cardiac surgery. However, it is important to note that the study findings do not demonstrate multiplicative modification effect of prior stroke and extremely low LDL-C levels but are limited to significant additive modification effect. Furthermore, as discussed earlier, this study is subject to several limitations. To establish external validity, it would be beneficial to verify the findings within Japan using other datasets and replicate the analysis using data from countries outside Japan. Additionally, given that this is a retrospective observational study, conducting a prospective study in the future would enhance the robustness of the findings. Considering these perspectives, this finding provides useful information for risk assessment prior to non-cardiac surgery. If the findings of this study are further validated, the accuracy of screening high-risk populations during the perioperative period could be enhanced.

Methods

Study design and population

This research utilized data from the JMDC claims database, which includes medical claims and health examination records in Japan⁴⁸. The JMDC database, available for purchase, contains information on around 11.6 million individuals under the age of 75 between January 2005 and April 2021. Based on the surgical codes in Japan, K-codes, we excluded codes related to cardiac surgery, supplementary procedures, and blood transfusions to ensure that only non-cardiac surgeries were included in the analysis. In cases where multiple non-cardiac surgery codes were recorded on the same day, the procedure with the highest insurance points was selected for analysis. As a result, 3,797,257 cases with recorded K-codes for non-cardiac surgery were selected for analysis. The list of K-codes analyzed is presented in the supporting information 2. From the cases with these K-codes, we extracted those where the patient was at least 18 years old at the time of surgery, surgery was performed during a hospital stay, and surgery was performed under general or spinal anesthesia. Based on the number of outcome cases, data were divided into training and test datasets by stratified 2-fold splitting. The study was designed to detect combinations of factors predicting a high-risk population using the training dataset and validate the obtained combinations using test dataset.

Ethics approval

The study was conducted in accordance with the principles of the Declaration of Helsinki. Although this study used anonymized data and was outside the scope of the guidelines requiring informed consent or opt-out procedures in Japan, it was conducted after registration with the Ethics Committee of the University of Tokyo Hospital (Approval No. 2024105NIe).

Measurements

The K-codes used for the medical claims process of surgery in Japan were used to distinguish between cardiac and non-cardiac surgery. The administration of general or spinal anesthesia was determined by whether the medical claims code was calculated on the same day. The incidence of previous ischemic heart disease was determined by whether I20–I25 of the ICD-10 codes recorded in the database were preoperatively recorded. Similarly, history of stroke was determined by the presence of ICD-10 codes I60–I64 and G459, while history of diabetes was determined by the presence of ICD-10 codes E10–E14. For drugs, the JMDC database has Anatomical Therapeutic Chemical codes available, and insulin use was determined by the presence of the code A10A. For the items in the blood sample examination results, values from health examinations within one year prior to the operation were adopted.

Outcomes

MACE was defined as the combined outcome of myocardial infarction, heart failure, stroke, cardiopulmonary arrest, and death within 30 days after surgery. Each diagnosis was determined by the recorded ICD-10 codes, with myocardial infarction defined by I21–I22, heart failure by I50 and I110, stroke by I64–I64 and G459, and cardiopulmonary arrest by I46. Information about death was obtained from the diagnostic outcome information. The maximum observation period for outcome occurrence was 30 days after surgery.

Variable selection

In addition to the various items measured during health check-ups and recorded in a database, explanatory variables used in the RCRI were obtained from the database as the first explanatory variables. The variables used in the search for combinations were selected from these variables, with Pearson’s correlation coefficients calculated a priori to ensure that no factor exhibited high correlation to avoid multicollinearity. Specifically, we examined which variables should be retained based on clinical importance for combinations that presented absolute value of Pearson’s correlation coefficient of 0.3 or more, and ultimately retained the following 10 items for later analysis: sex, age, BMI, LDL-C, AST, history of ischemic heart disease, history of congestive heart failure, history of cerebrovascular disease, preoperative treatment with insulin, and elevated-risk surgery. The plots of the Pearson’s correlation coefficients before and after variable selection are shown in Figure S2 and Figure S3.

Combination extraction with machine learning

iRF was used to extract the combinations of factors that predict a high risk of postoperative MACE and occurring death events. The previously reported iRF algorithmic processes can be divided into three steps³⁰. First, in the process of iteratively training the random forest model with the training data, the weighting of the explanatory variables is iteratively modified using the importance of the explanatory variables to generate a set of decision trees in which important explanatory variables predominantly appear. Second, the discriminant rules in the decision tree group are mapped to binary rules for each branch. Third, the algorithm evaluates and presents the stability of the discriminant rules by means of a bagging step. In this study, the third step was modified to suit clinical data analysis. Specifically, the training data were divided into two groups according to the obtained combination of explanatory variables, the incident rates were calculated, and combinations with overlapping confidence intervals for the incident rate of the two groups were excluded from the analysis. The combinations of explanatory variables were then descendingly sorted according to the incident rate point estimates, and the top 10 pairs were adopted for analysis using the test data. For the explanatory variables included in the combination, the study optimized the Matthews correlation coefficient (MCC) using the greedy method for the training data and calculated the thresholds for each explanatory variable. The greedy method is an algorithm for solving optimization problems, in which the most profitable partial solution is selected at each stage of the calculation, with the final solution being the combination of these partial solutions. The MCC is a performance metric that yields a high score only when the classifier demonstrates strong performance across all four elements of the confusion matrix: sensitivity, specificity, accuracy, and negative predictive value⁴⁹. In summary, the proposed method identifies the optimal binary values (0 or 1) for categorical variables and determines the optimal thresholds for continuous variables to stratify risk into two distinct groups based on the data. The top 10 combinations in terms of incidence rates were assessed for their capability to extract high-risk groups in the test data. Factor combinations that were able to extract high risk groups in the test and training data were finally adopted. Machine learning and threshold optimization calculations were performed using R (version 4.3.3). The iRF (version 3.0.0) was used in this study to extract combinations of explanatory variables³⁰. During the execution of iRF, the number of weighted random forest iterations was set to 10. All other hyperparameters were configured to their default values: the number of trees grown in each iteration was 500, the depth of the random intersection trees 5, the number of random intersection trees 100, and number of children in each split of a random intersection tree 2.

Statistical analyses

Chi-square test was used for categorical variables, which are presented as numbers and percentages. While Student’s t-test were used for continuous variables, which are presented as means and standard deviations. For incidence calculations, an incidence rate of 30 days per 100 persons was calculated. The Kaplan–Meier method was used to delineate cumulative incidence plots and the log-rank test for testing. After visually confirming the proportional hazard assumption with a complementary log-log plot, the Cox proportional hazards analysis was performed to assess whether effect modification existed for the explanatory variables included in the obtained conditions. Cases with missing values for the included variables were excluded and analyzed in each of the incidence calculations, depiction of cumulative incidence plots, log-rank tests, and Cox proportional hazards analysis. The aggregate results of missing values are shown in S1 Table. The statistical analyses were performed using R (version 4.3.3; R Foundation for Statistical computing; Vienna; Austria). To perform effect modification analysis, InteractionR (version 0.1.7)⁵⁰ was used. We performed a Cox proportional hazards analysis that included the two factors of LDL-C < 50.34 mg/dL and a history of stroke, and the outcome as a binary variable (0 or 1). In creating the model, a model was created that included an interaction term for the two factors of LDL-C < 50.34 mg/dL and a history of stroke, and multiplicative scale, RERI, AP, and SI were calculated based on that model. Multiplicative scale, RERI, AP, and SI were calculated using all data and reported in line with the reported recommendations³¹.

Data availability

The data that support the findings of this study are available from JMDC, Inc. but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the corresponding author upon reasonable request and with permission of JMDC, Inc.

References

Weiser, T. G. et al. An estimation of the global volume of surgery: A modelling strategy based on available data. Lancet 372, 139–144 (2008).
Article PubMed Google Scholar
Smilowitz, N. R. & Berger, J. S. Perioperative cardiovascular risk assessment and management for noncardiac surgery: A review. JAMA 324, 279–290 (2020).
Article PubMed Google Scholar
International Surgical Outcomes Study group. Global patient outcomes after elective surgery: Prospective cohort study in 27 low-, middle- and high-income countries. Br. J. Anaesth. 117, 601–609 (2016).
Article Google Scholar
Pearse, R. M. et al. Mortality after surgery in Europe: A 7 day cohort study. Lancet 380, 1059–1065 (2012).
Article PubMed PubMed Central Google Scholar
Smilowitz, N. R. et al. Perioperative major adverse cardiovascular and cerebrovascular events associated with noncardiac surgery. JAMA Cardiol. 2, 181–187 (2017).
Article PubMed PubMed Central Google Scholar
Fleisher, L. A. et al. 2014 ACC/AHA guideline on perioperative cardiovascular evaluation and management of patients undergoing noncardiac surgery. Circulation 130, e278–e333 (2014).
PubMed Google Scholar
Halvorsen, S. et al. ESC Guidelines on cardiovascular assessment and management of patients undergoing non-cardiac surgery: Developed by the task force for cardiovascular assessment and management of patients undergoing non-cardiac surgery of the European Society of Cardiology (ESC) Endorsed by the European Society of Anaesthesiology and Intensive Care (ESAIC). Eur. Heart J. 43, 3826–3924 (2022).
Article PubMed Google Scholar
Duceppe, E. et al. Canadian cardiovascular society guidelines on perioperative cardiac risk assessment and management for patients who undergo noncardiac surgery. Can. J. Cardiol. 33, 17–32 (2017).
Article PubMed Google Scholar
Hiraoka, E. et al. JCS 2022 Guideline on Perioperative Cardiovascular Assessment and Management for non-cardiac surgery. Circ. J. advpub (2023).
Goldman, L. et al. Multifactorial index of cardiac risk in noncardiac surgical procedures. N Engl. J. Med. 297, 845–850 (1977).
Article PubMed CAS Google Scholar
Detsky, A. S. et al. Predicting cardiac complications in patients undergoing non-cardiac surgery. J. Gen. Intern. Med. 1, 211–219 (1986).
Article PubMed CAS Google Scholar
Lee, T. H. et al. Derivation and prospective validation of a simple index for prediction of cardiac risk of major noncardiac surgery. Circulation 100, 1043–1049 (1999).
Article PubMed CAS Google Scholar
Gupta, P. K. et al. Development and validation of a risk calculator for prediction of cardiac risk after surgery. Circulation 124, 381–387 (2011).
Article PubMed Google Scholar
Bilimoria, K. Y. et al. Development and evaluation of the Universal ACS NSQIP Surgical Risk Calculator: A decision aid and informed Consent Tool for patients and surgeons. J. Am. Coll. Surg. 217 (2013).
Dakik, H. A. et al. A new index for pre-operative cardiovascular evaluation. J. Am. Coll. Cardiol. 73, 3067–3078 (2019).
Article PubMed Google Scholar
Alrezk, R. et al. Derivation and validation of a geriatric-sensitive perioperative cardiac risk index. J. Am. Heart Assoc. 6, e006648. https://doi.org/10.1161/JAHA.117.006648 (2017).
Article PubMed PubMed Central Google Scholar
Bertges, D. J. et al. The vascular Study Group of New England Cardiac Risk Index (VSG-CRI) predicts cardiac complications more accurately than the revised Cardiac Risk Index in vascular surgery patients. J. Vasc Surg. 52, 674–683e3 (2010).
Article PubMed Google Scholar
Zhang, Z., Zhang, H. & Khanal, M. Development of scoring system for risk stratification in clinical medicine: A step-by-step tutorial. Annals Translational Med. 5(2017).
An, Q., Rahman, S., Zhou, J. & Kang, J. J. A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges. Sensors 23, 4178 (2023).
Article ADS PubMed PubMed Central Google Scholar
Krittanawong, C., Zhang, H., Wang, Z., Aydar, M. & Kitai, T. Artificial intelligence in precision cardiovascular medicine. J. Am. Coll. Cardiol. 69, 2657–2664 (2017).
Article PubMed Google Scholar
Chakraborty, C., Bhattacharya, M., Pal, S. & Lee, S. From machine learning to deep learning: An advances of the recent data-driven paradigm shift in medicine and healthcare. Curr. Res. Biotechnol. 100164 (2023).
Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) [Internet] 9, 381–386 (2020).
Article Google Scholar
Zheng, A. & Casari, A. In Feature Engineering for Machine Learning: Principles and Techniques for data Scientists (O’Reilly Media, Inc., 2018).
Lampa, E., Lind, L., Lind, P. M. & Bornefalk-Hermansson, A. The identification of complex interactions in epidemiology and toxicology: A simulation study of boosted regression trees. Environ. Health 13, 1–17 (2014).
Article Google Scholar
García-Magariños, M., López‐de‐Ullibarri, I., Cao, R. & Salas, A. Evaluating the ability of tree‐based methods and logistic regression for the detection of SNP‐SNP interaction. Ann. Hum. Genet. 73, 360–369 (2009).
Article PubMed Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Wright, M. N., Ziegler, A. & König, I. R. Do little interactions get lost in dark random forests? BMC Bioinform. 17, 1–10 (2016).
Article CAS Google Scholar
Touw, W. G. et al. Data mining in the life sciences with random forest: A walk in the park or lost in the jungle? Brief. Bioinform. 14, 315–326 (2013).
Article PubMed Google Scholar
Hornung, R. & Boulesteix, A. Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects. Comput. Stat. Data Anal. 171, 107460 (2022).
Article MathSciNet Google Scholar
Basu, S., Kumbier, K., Brown, J. B. & Yu, B. Iterative random forests to discover predictive and stable high-order interactions. Proceedings of the National Academy of Sciences 115, 1943–1948 (2018).
Knol, M. J. & VanderWeele, T. J. Recommendations for presenting analyses of effect modification and interaction. Int. J. Epidemiol. 41, 514–520 (2012).
Article PubMed PubMed Central Google Scholar
Rong, S. et al. Association of low-density lipoprotein cholesterol levels with more than 20-year risk of cardiovascular and all-cause mortality in the general population. J. Am. Heart Assoc. 11, e023690 (2022).
Article PubMed PubMed Central Google Scholar
Peng, K., Li, X., Wang, Z., Li, M. & Yang, Y. Association of low-density lipoprotein cholesterol levels with the risk of mortality and cardiovascular events: A meta-analysis of cohort studies with 1,232,694 participants. Medicine 101, e32003 (2022).
Article PubMed PubMed Central CAS Google Scholar
Ference, B. A. et al. Low-density lipoproteins cause atherosclerotic cardiovascular disease. 1. Evidence from genetic, epidemiologic, and clinical studies. A consensus statement from the European Atherosclerosis Society Consensus Panel. Eur. Heart J. 38, 2459–2472 (2017).
Article PubMed PubMed Central CAS Google Scholar
Abdullah, S. M. et al. Long-term association of low-density lipoprotein cholesterol with cardiovascular mortality in individuals at low 10-year risk of atherosclerotic cardiovascular disease: Results from the Cooper Center Longitudinal Study. Circulation 138, 2315–2325 (2018).
Article PubMed CAS Google Scholar
Liu, Y. et al. Association between low density lipoprotein cholesterol and all-cause mortality: Results from the NHANES 1999–2014. Sci. Rep. 11, 22111 (2021).
Article ADS PubMed PubMed Central CAS Google Scholar
Brunner, F. J. et al. Application of non-HDL cholesterol for population-based cardiovascular risk stratification: Results from the multinational Cardiovascular Risk Consortium. Lancet 394, 2173–2183 (2019).
Article PubMed PubMed Central CAS Google Scholar
Wu, M. et al. Association of low-density lipoprotein-cholesterol with all-cause and cause-specific mortality. Diabetes Metabolic Syndrome: Clin. Res. Reviews 17, 102784 (2023).
Article CAS Google Scholar
Johannesen, C. D. L., Langsted, A., Mortensen, M. B. & Nordestgaard, B. G. Association between low density lipoprotein and all cause and cause specific mortality in Denmark: Prospective cohort study. BMJ 371, m4266 (2020).
Article PubMed PubMed Central Google Scholar
Kip, K. E., Diamond, D., Mulukutla, S. & Marroquin, O. C. Is LDL cholesterol associated with long-term mortality among primary prevention adults? A retrospective cohort study from a large healthcare system. BMJ Open. 14, e077949 (2024).
Article PubMed PubMed Central Google Scholar
Zhao, X., Wang, D. & Qin, L. Lipid profile and prognosis in patients with coronary heart disease: A meta-analysis of prospective cohort studies. BMC Cardiovasc. Disord. 21, 1–15 (2021).
Article CAS Google Scholar
Rezaee, M. et al. The prognostic role of the low and very low baseline LDL-C level in outcomes of patients with cardiac revascularization; comparative registry-based cohort design. J. Cardiothorac. Surg. 18, 240 (2023).
Article PubMed PubMed Central Google Scholar
Gurevitz, C., Auriel, E., Elis, A. & Kornowski, R. The association between low levels of low density lipoprotein cholesterol and intracerebral hemorrhage: Cause for concern? J. Clin. Med. 11, 536. https://doi.org/10.3390/jcm11030536 (2022).
Article PubMed PubMed Central CAS Google Scholar
Wang, X., Dong, Y., Qi, X., Huang, C. & Hou, L. Cholesterol levels and risk of hemorrhagic stroke: A systematic review and meta-analysis. Stroke 44, 1833–1839 (2013).
Article PubMed CAS Google Scholar
Lee, M. et al. Association between intensity of low-density lipoprotein cholesterol reduction with statin-based therapies and secondary stroke prevention: A meta-analysis of randomized clinical trials. JAMA Neurol. 79, 349–358 (2022).
Article PubMed PubMed Central Google Scholar
Bosco, E., Hsueh, L., McConeghy, K. W., Gravenstein, S. & Saade, E. Major adverse cardiovascular event definitions used in observational analysis of administrative databases: A systematic review. BMC Med. Res. Methodol. 21, 241 (2021).
Article PubMed PubMed Central Google Scholar
Kanaoka, K. et al. Validity of diagnostic algorithms for cardiovascular diseases in Japanese health insurance claims. Circ. J. 87, 536–542 (2023).
Article PubMed Google Scholar
Nagai, K. et al. Data resource profile: JMDC claims database sourced from health insurance societies. J. Gen. Fam Med. 22, 118–127 (2021).
Article PubMed PubMed Central Google Scholar
Chicco, D. & Jurman, G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16, 4 (2023).
Article PubMed PubMed Central Google Scholar
Alli, B. Y. InteractionR: An R package for full reporting of effect modification and interaction. Softw. Impacts 10, 100147 (2021).
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank all of the staff and the graduate students of the Department of Healthcare Information Management at the University of Tokyo Hospital for providing an opportunity to continue this research.

Author information

Authors and Affiliations

Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
Tomohisa Seki, Toru Takiguchi, Hiromasa Ito, Kazumi Kubota, Kana Miyake, Masafumi Okada & Yoshimasa Kawazoe
Artificial Intelligence and Digital Twin in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Yoshimasa Kawazoe
Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Yu Akagi

Authors

Tomohisa Seki
View author publications
You can also search for this author inPubMed Google Scholar
Toru Takiguchi
View author publications
You can also search for this author inPubMed Google Scholar
Yu Akagi
View author publications
You can also search for this author inPubMed Google Scholar
Hiromasa Ito
View author publications
You can also search for this author inPubMed Google Scholar
Kazumi Kubota
View author publications
You can also search for this author inPubMed Google Scholar
Kana Miyake
View author publications
You can also search for this author inPubMed Google Scholar
Masafumi Okada
View author publications
You can also search for this author inPubMed Google Scholar
Yoshimasa Kawazoe
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conception and design of the study: T.S, Y.K; Data curation: T.S; Analysis and interpretation of data: T.S, T.T, Y.A, H.I, K.K, K.M, M.O, Y.K; Funding acquisition: Y.K; Visualization: T.S; Writing—original draft: T.S; Writing—review and editing: T.T, Y.A, H.I, K.K, K.M, M.O, Y.K; All authors reviewed and approved the final version to be submitted.

Corresponding author

Correspondence to Tomohisa Seki.

Ethics declarations

Competing interests

YK is affiliated with the Artificial Intelligence and Digital Twin Development in Healthcare, Graduate School of Medicine, The University of Tokyo which is an endowment department. However, the sponsors had no influence over the interpretation, writing, or publication of this work. TS, TT, YA, HI, KK, KM, and MO have no conflicts of interest directly relevant to the content of this article.

Financial disclosure statement

This work was supported by Cross-ministerial Strategic Innovation Promotion Program on “Integrated Health Care System” (Grant Number JPJ012425). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Seki, T., Takiguchi, T., Akagi, Y. et al. Iterative random forest-based identification of a novel population with high risk of complications post non-cardiac surgery. Sci Rep 14, 26741 (2024). https://doi.org/10.1038/s41598-024-78482-4

Download citation

Received: 08 August 2024
Accepted: 31 October 2024
Published: 05 November 2024
DOI: https://doi.org/10.1038/s41598-024-78482-4

Subjects

Abstract

Similar content being viewed by others

Interpretable machine learning models for predicting in-hospital and 30 days adverse events in acute coronary syndrome patients in Kuwait

Machine learning-based prediction of 90-day prognosis and in-hospital mortality in hemorrhagic stroke patients

Using machine learning for predicting intensive care unit resource use during the COVID-19 pandemic in Denmark

Introduction

Results

Baseline characteristics of the study population

Extraction of high-risk conditions based on iRF

Validation of effect modification in a high-risk condition

Discussion

Methods

Study design and population

Ethics approval

Measurements

Outcomes

Variable selection

Combination extraction with machine learning

Statistical analyses

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Financial disclosure statement

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Supplementary Material 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links