An Empirical Study of Machine Learning Robustness and Scalability for Imbalanced Tabular Clinical Data in Emergency and Critical Care

arXiv:2512.21602v3 Announce Type: replace Abstract: Every year, millions of patients pass through emergency departments and intensive care units, where clinicians must make high-stakes decisions under time pressure and uncertainty. Machine learning could support prediction of deterioration, triage, and rare critical outcomes, but clinical data are often severely imbalanced, biasing models toward majority classes and reducing predictive performance. Developing robust and efficient models for imbalanced clinical tabular data therefore remains an important challenge. We evaluated six model families on imbalanced tabular data from the MIMIC-IV-ED and eICU databases: Decision Tree, Random Forest, XGBoost, TabNet, TabICL, and TabPFN v2.6. Trainable models were optimized using Bayesian hyperparameter tuning, while foundation models were evaluated in their pretrained inference regime without task-specific reweighting. Models were assessed using Macro F1-score, robustness to increasing imbalance, and computational scalability across seven clinical prediction tasks. Results differed across datasets. On MIMIC-IV-ED, TabPFN v2.6 and TabICL achieved the strongest average Macro F1 ranks, with XGBoost remaining competitive. On eICU, XGBoost consistently performed best, followed by other tree-based methods, while foundation models achieved intermediate performance. Across both datasets, TabNet showed the largest degradation under increasing imbalance and the highest computational cost. Training-time analysis showed that tree-based methods scaled most favorably with dataset size, while foundation models offered low per-task adaptation cost. These findings suggest that no single model family dominates across all clinical settings. However, tabular foundation models are narrowing the performance gap with strong classical baselines while offering a distinct efficiency-performance trade-off that may benefit resource-constrained clinical environments.

Sources

X mentions

—

First seen

6Dago

Velocity

+2%/6h