Evaluating large language models for accuracy incentivizes hallucinations.
Adam Tauman Kalai, Ofir Nachum, Santosh S Vempala, Edwin Zhang
Large language models sometimes produce confident, plausible falsehoods ('hallucinations'), limiting their reliability
Read on ELI