About the Session
Artificial intelligence in radiology depends on the idea of “ground truth,” yet in medical imaging, truth is rarely absolute. Radiology reports, pathology results, patient outcomes, and hybrid approaches each reflect different—and often imperfect—perspectives. Attendees will gain a clearer understanding of why these inconsistencies matter and how unclear or misaligned definitions of truth can undermine the clinical relevance, validation, and trustworthiness of AI systems.
Participants will explore how different labeling strategies shape AI performance and interpretation, drawing on examples from commonly used public datasets and from labels generated through NLP and large language models. By examining how weakly supervised or inconsistent annotations arise, attendees will come away better equipped to critically assess reported model accuracy, benchmarking claims, and real-world applicability of AI tools in imaging.
Through discussion and an interactive exercise, participants will practice defining “ground truth” for a sample case and experience firsthand how expert interpretations can vary. Attendees will leave with practical strategies for managing uncertainty—such as adjudication, uncertainty-aware modeling, and hybrid or multimodal approaches—and a stronger framework for making transparent, context-aware decisions when developing, evaluating, or adopting AI in clinical imaging.
Objectives
- Identify the limitations and implications of using radiology reports, pathology findings, or clinical outcomes as ground truth in AI development.
- Compare labeling strategies—including public datasets and NLP/LLM-derived labels—and evaluate their impact on AI model performance.
- Describe approaches for managing label uncertainty, such as adjudication, uncertainty-aware modeling, and hybrid or multimodal strategies.
Presented By
Julie Bauml, MD
Diagnostic Radiologist, Imagen Technologies
Lindsey Johnstone, MD, MS