Longitudinal Language-Model Reasoning Enables Automated Labeling of Lung Cancer Recurrence from Unstructured Clinical Records.

Carlotta S Hoelzle, Johannes Brandt, Jonathan C Mueller, Maximiliano Klug, Julian Westphal, Daniel Rueckert, Maulik Chevli, Florian J Fintelmann

Many clinical endpoints are rarely captured as structured variables, necessitating labor-intensive manual abstraction from longitudinal narratives. We present SCRIBE, an open-source, training-free framework that extracts temporally precise, auditable clinical labels from unstructured records using only narrative text. SCRIBE utilizes multi-stage large language model reasoning to reconcile longitudinal evidence into accurate event labels and their timing while retaining verbatim evidence linked to original source records. This traceability enables efficient expert verification and streamlines radiologic review by pinpointing exact diagnostic windows. In a multi-center cohort of 2,065 patient's with lung cancer, SCRIBE achieved high recurrence detection performance, halved temporal localization error compared to note-level inference and reduced total token volume of multi-year patient documentation by nearly two orders of magnitude. Notably, expert adjudication revealed that 47.8% of false positives were valid events missing from official registries. These results demonstrate SCRIBE's capacity to automate high-fidelity endpoint extraction while auditing and improving the completeness of real-world clinical registries.

Read on ELI