Machine learning for Alzheimer's disease progression under extreme class imbalance.

Patrick O Akinwumi, Meihua Qian, Taiwo A Olorunsogbon, Chaoyi Zhou, Siyu Huang

BACKGROUND: Timely identification of individuals at risk for Alzheimer's disease (AD) progression remains a major clinical challenge. Traditional cognitive assessments provide limited prognostic insight, while many machine learning (ML) models rely on costly biomarkers or poorly interpretable algorithms that limit clinical scalability. This study evaluated whether widely available baseline demographic, clinical, and cognitive measures could support short-term progression prediction using interpretable ML methods under extreme class imbalance. METHODS: We analyzed 3,240 participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI), of whom 2,423 had valid 24-month follow-up data. The primary outcome was strict unidirectional diagnostic worsening within 24 months (13 events; 0.5%). Baseline demographic, clinical, and cognitive variables were used to train XGBoost and logistic regression models under natural class imbalance using stratified k-fold cross-validation with out-of-fold predictions. Model performance was evaluated using AUROC, area under the precision-recall curve (AUPRC), calibration analyses, and bootstrap confidence intervals. Sensitivity analyses evaluated cost-sensitive learning, threshold optimization, and alternative imputation strategies (KNN and MICE). Longitudinal mixed-effects modeling was conducted separately to characterize cognitive decline and was not used as input to the predictive models. SHAP (Shapley Additive Explanations) quantified feature contributions. RESULTS: Under natural class imbalance, XGBoost achieved AUROC = 0.912 and AUPRC = 0.051, while logistic regression achieved AUROC = 0.787 and AUPRC = 0.038. Although discrimination exceeded baseline prevalence, precision remained low and threshold optimization produced substantial false-positive burdens, limiting immediate clinical applicability. Cost-sensitive learning did not materially improve performance. MICE imputation produced results comparable to median imputation, whereas KNN imputation reduced performance. SHAP analyses identified baseline cognitive severity, functional measures, and diagnostic status as dominant predictors. Mixed-effects modeling confirmed significant cognitive decline over time ( CONCLUSION: Accessible baseline clinical and cognitive variables contain measurable but limited predictive signal for short-term AD progression under extreme event scarcity. These findings should be interpreted as an early-stage proof-of-concept rather than a clinically deployable decision-support tool. External validation remains necessary before clinical translation.

Read on ELI