Foundation Models & Self-Supervised Learning in Imaging: What is it? Why should I care?

Thu, Jun 11

3:30 PM-4:00 PM ET

TechTalk Stage – Expo Hall A

About the Session

The traditional “one-task, one-model” paradigm in medical AI has created a fragmented ecosystem of siloed algorithms that lack clinical context and cross-modality reasoning. While individual tools excel at narrow tasks like nodule detection, they fail to integrate longitudinal patient data, such as EMR notes, genomic markers, and historical imaging series, leading to “alert fatigue” and incomplete diagnostic insights. This fragmentation hinders the transition toward true Imaging Intelligence, where the bottleneck is no longer image acquisition but the synthesis of disparate data streams. There is an urgent need for a unified framework that utilizes Foundation Models to provide a holistic, multimodal view of the patient journey within the imaging informatics workflow.

This session explores the paradigm shift from specialized AI to Generalist Medical and Multimodal Foundation Models. We will dive into the technical architecture of vision-language models (VLMs) and their ability to perform cross-modal reasoning, such as generating structured reports from pixel data or querying large-scale image archives using natural language prompts. Using real-world pilot data, we will demonstrate how these models bridge the gap between DICOM and non-DICOM data, fostering utility across the enterprise. Attendees will examine the infrastructure requirements for deploying these large-scale models, addressing critical hurdles in GPU orchestration, prompt engineering for radiology, and the ethical implications of “hallucinations” in generative outputs. The session concludes with a strategic roadmap for integrating these foundational tools into existing PACS ecosystems to enhance diagnostic precision and operational efficiency.

Objectives

Analyze the architectural differences between narrow-task AI and multimodal foundation models within an enterprise imaging context.
Evaluate strategies for integrating non-imaging data (e.g., HL7/FHIR feeds) with computer vision models to improve diagnostic contextualization.
Identify the technical and ethical requirements for deploying generative AI, including GPU resource management and output validation protocols.