Laboratory blood parameters and machine learning for the prognosis of esophageal squamous cell carcinoma

BackgroundIn contemporary study, the death of esophageal squamous cell carcinoma (ESCC) patients need precise and expedient prognostic methodologies.ObjectiveTo develop and validate a prognostic model tailored to ESCC patients, leveraging the power of machine learning (ML) techniques and drawing insights from comprehensive datasets of laboratory-derived blood parameters.MethodsThree ML approaches, including Gradient Boosting Machine (GBM), Random Survival Forest (RSF), and the classical Cox method, were employed to develop models on a dataset of 2521 ESCC patients with 27 features. The models were evaluated by concordance index (C-index) and time receiver operating characteristics (Time ROC) curves. We used the optimal model to evaluate the correlation between features and prognosis and divide patients into low- and high-risk groups by risk stratification. Its performance was analyzed by Kaplan-Meier curve and the comparison with AJCC8 stage. We further evaluate the comprehensive effectiveness of the model in ESCC subgroup by risk score and KDE (kernel density estimation) plotting.ResultsRSF’s C-index (0.746) and AUC (three-year AUC 0.761, five-year AUC 0.771) had slight advantage over GBM and the classical Cox method. Subsequently, 14 features such as N stage, T stage, surgical margin, tumor length, age, Dissected LN number, MCH, Na, FIB, DBIL, CL, treatment, vascular invasion, and tumor grade were selected to build the model. Based on these, we found significant difference for survival rate between low-(3-year OS 81.8%, 5-year OS 69.8%) and high-risk (3-year OS 25.1%, 5-year OS 11.5%) patients in training set, which was also verified in test set (all P