Ambient AI Scribes to Create Educational Feedback Notes for Medical Students: Randomized Trial.

Jaideep S Talwalkar, David Chartash, Lisa Zhang, Michael Makutonin, Conrad W Safranek, Anne Elizabeth Sidamon-Eristoff, Lee H Schwamm, Donald S Wright

BACKGROUND: High-quality observation and feedback contribute to the development of clinical competence and professional growth in medical education. Faculty often struggle to translate verbal observations into written feedback because of documentation burden and competing demands. Ambient artificial intelligence (AI) scribes, already adopted in clinical practice, may address this challenge by capturing verbal exchanges and generating structured notes. OBJECTIVE: The purpose of this study was to examine the use of ambient AI scribes to generate educational feedback notes during a formative medical interviewing workshop for first-year medical students in March and April 2025. METHODS: Thirteen instructors were randomized to control (human-only) or intervention (AI scribe-assisted) workflows to complete narrative feedback forms. The intervention group used an AI scribe to generate transcripts of student-instructor encounters, which were then summarized into feedback notes using a large language model and edited by the instructors before submission. All narratives were scored using the Evaluation of Feedback Captured Tool (EFeCT). Factual accuracy of a subsample of unedited AI feedback summaries was reviewed against the source transcripts. Task load and usability were measured using NASA Task Load Index and System Usability Scale, respectively. RESULTS: Instructors submitted feedback on 92.2% (94/102) of the students. EFeCT scores on the scale from 0 to 5 were higher for human-edited AI narratives (median 3.00, IQR 2.00-4.00) and unedited AI summaries (median 3.00, IQR 3.00-4.00) than for human-only narratives (median 2.00, IQR 1.75-3.00; P<.001). Human-only narratives were shorter than AI-assisted outputs (P<.001). Review of 117 AI-generated feedback elements showed a 6.8% (n=8) mischaracterization and 1.7% (n=2) hallucination rate, with most errors corrected during editing. Task load was high, and usability was marginal in both the control and intervention groups, with no significant differences (P=.31 and P=.40, respectively). CONCLUSIONS: An ambient AI scribe-assisted workflow improved the quality of written narrative feedback with no observed increase in instructor effort compared to human-only documentation. Although occasional inaccuracies required review, this innovation has the potential to transform feedback documentation.

Read on ELI