large-scale medical imaging datasets – Science

AI-Powered CT Scan Analysis Promises to Accelerate Clinical Assessments

SCIENMAG — Wed, 04 Mar 2026 18:00:31 +0000

In a groundbreaking advancement poised to revolutionize medical imaging, a research team funded by the National Institutes of Health (NIH) has unveiled Merlin, a versatile machine learning model designed to deepen and expand the insights gleaned from computed tomography (CT) scans. This cutting-edge model transcends traditional imaging applications by integrating vast amounts of data to perform a sweeping array of diagnostic and prognostic tasks. Merlin’s capacity to seamlessly interpret complex 3D abdominal CT scans marks a pivotal step towards automating and enhancing the nuanced field of radiological assessment with unprecedented precision.

Merlin represents a new paradigm in artificial intelligence within medical imaging—unifying vast, unlabeled datasets through the application of foundation models. Unlike conventional approaches restricted to narrowly defined tasks, Merlin’s training employed an extensive and unique dataset encompassing more than 15,000 clinically annotated 3D abdominal CT scans paired with corresponding radiology reports and nearly one million diagnosis codes. This expansive trove emanates from the Stanford University School of Medicine, forming the most comprehensive abdominal CT database assembled to date, thus enabling Merlin to learn sophisticated relationships between visual imaging and textual medical knowledge.

The strength of Merlin stems from its innovative architecture which facilitates the fusion of complex three-dimensional scan data with the semantic richness of natural language reports. This integration empowers the model to undertake over 750 distinct tasks, ranging from elementary anatomical delineation to the intricate prediction of disease development years before clinical manifestation. By harnessing multi-modal inputs during training, Merlin effectively bridges the gap between raw imaging data and diagnostic interpretation, a task that conventionally requires expert human radiologists supported by multiple rounds of clinical testing and evaluation.

Merlin’s performance was rigorously evaluated by challenging the model with over 50,000 previously unseen abdominal CT scans sourced from four independent hospitals. The model exhibited extraordinary proficiency in correlating imaging findings with human-generated diagnostic labels and conclusions. For example, Merlin’s ability to predict relevant ICD codes associated with individual scans surpassed other contemporary AI tools, achieving greater than 81% accuracy across a broad suite of diagnostic labels and peaking at 90% accuracy within certain disease subsets. These results underscore Merlin’s potential as a reliable clinical assistant in routine radiological workflows.

Beyond retrospective diagnostic tasks, Merlin demonstrates a remarkable capacity for forecasting future disease trajectories. In predictive tests focusing on chronic diseases—such as diabetes, osteoporosis, and cardiovascular illnesses—the model effectively identified individuals at elevated risk years before the clinical onset of disease based solely on their abdominal CT scans. Specifically, Merlin’s predictive accuracy reached 75%, outperforming comparator models operating at 68%. This ability suggests the presence of subtle imaging biomarkers, heretofore unnoticed by human experts, which Merlin is uniquely equipped to detect and interpret.

A particularly compelling facet of Merlin’s versatility is its adaptability to imaging domains outside its initial training data. Despite being exclusively trained on abdominal CT scans, Merlin was tasked with interpreting chest CT images—a domain with divergent anatomical and pathological features. Impressively, Merlin matched or exceeded the diagnostic performance of models specifically trained on chest imaging data, further evidencing its generalizability and the power of foundational learning approaches within medical AI.

Although Merlin is a “jack-of-all-trades,” competing with specialized models tailored for individual diagnostic tasks, it consistently matched or outperformed these experts. This comprehensive capability cultivates excitement for integrating Merlin into clinical practice not merely as a supplemental tool but potentially as a primary diagnostic aid. Its ability to reduce reliance on scarce radiological expertise may alleviate burgeoning physician shortages while streamlining diagnostic workflows, thereby accelerating patient care and treatment initiation.

Despite these advances, some tasks such as drafting complete radiology reports from scratch remain challenging and require further refinement of Merlin’s learning algorithms and fine-tuning with more targeted datasets. The research team advocates for continuous model refinement through domain-specific customization, encouraging practitioners to augment Merlin with local clinical data to enhance performance tailored to specialized clinical environments or demographic variations.

At its core, Merlin epitomizes a leap forward in multi-modal artificial intelligence research—combining the raw spatial complexity of volumetric CT data with the semantic depth inherent in diagnostic narratives. This confluence enables the model to understand and predict disease with a degree of nuance unattainable by previous generation AI systems. The synergy between data scale, model design, and diverse task demands positions Merlin as a foundational tool upon which future medical imaging innovations can be built.

This research, supported by several NIH institutes under multiple grants, also marks a pivotal collaboration between AI researchers and clinical scientists. It illuminates the potential for AI-driven tools not only to automate routine image analysis but also to reveal new medical insights, transforming radiology from a solely human-driven discipline into a synergistic human-machine partnership.

As the community begins to adopt and build upon Merlin, the implications span beyond immediate clinical applications. The model’s capacity to identify subtle patterns invisible to human eyes fuels optimism about discovering novel imaging biomarkers. Such biomarkers could inaugurate new frontiers in understanding disease pathophysiology, risk stratification, and personalized medicine, reshaping the landscape of preventative healthcare.

Ultimately, Merlin heralds a future where the integration of advanced AI models streamlines clinical decision-making, enhances diagnostic accuracy, and expands the role of medical imaging in health management. As senior author Akshay Chaudhari from Stanford University aptly noted, this foundational AI model is poised to be a robust backbone for the broader medical community, and from this platform, the potential applications are bound only by the limits of innovation itself.

Subject of Research: Medical imaging and machine learning application in computed tomography (CT) scan analysis.

Article Title: Merlin: A Computed Tomography Vision Language Foundation Model and Dataset

News Publication Date: 4-Mar-2026

Web References:
https://www.nature.com/articles/s41586-026-10181-8

References:
Louis Blankemeier, Ashwin Kumar, et al. Merlin: A Computed Tomography Vision Language Foundation Model and Dataset. Nature. 2026 DOI: 10.1038/s41586-026-10181-8.

Keywords

Health and medicine, Artificial intelligence, Medical imaging, Clinical imaging

Generalist Models Revolutionize 3D CT Analysis

SCIENMAG — Sat, 28 Feb 2026 01:11:45 +0000

In the realm of medical imaging, the advent of artificial intelligence has brought transformative changes, yet significant hurdles remain, especially when it comes to three-dimensional (3D) imaging techniques such as computed tomography (CT). Despite the critical role of 3D scans in diagnosing complex diseases, AI advancements have been stifled by the scarcity of expansive, well-annotated datasets that comprehensively combine volumetric imaging data with rich textual context. Addressing this gap, a groundbreaking study introduces CT-RATE, a publicly available dataset comprised of an unprecedented volume of paired 3D chest CT scans and their corresponding radiology reports, setting a new benchmark for future medical AI development.

CT-RATE is remarkable not only for its size but also for its clinical depth and diversity. The dataset encompasses 25,692 non-contrast chest CT scans sourced from 21,304 unique patients, ensuring a robust sample representative of a wide spectrum of clinical presentations. Each CT volume is meticulously aligned with its respective radiological narrative, providing rich multimodal data that enables models to learn complex correlations across visual and textual medical information. This comprehensive resource aims to catalyze progress in AI-assisted diagnostics, bridging the gap between radiological imaging interpretation and natural language understanding.

Leveraging this new dataset, the research team developed CT-CLIP, a novel contrastive learning framework specifically tailored for 3D CT volumes. CT-CLIP extends the principles of contrastive language–image pretraining (CLIP), a paradigm celebrated for enabling flexible, zero-shot learning in natural image domains, into the specialized terrain of volumetric medical imaging. Uniquely, it undertakes the challenge of mapping entire 3D CT structures into a semantically rich joint embedding space shared with radiology reports, enabling versatile downstream applications without the necessity for task-specific retraining.

The transformative potential of CT-CLIP is multifaceted. In the critical application of multi-abnormality detection, it demonstrates superior performance, outstripping state-of-the-art fully supervised models across all major metrics. This suggests that CT-CLIP not only generalizes across diverse pathological features but also robustly captures subtleties within the volumetric data that traditional models might overlook. Additionally, CT-CLIP’s prowess in case retrieval promises to revolutionize clinical workflows by rapidly finding analogous historical cases, a capability that can substantially augment diagnostic confidence and medical education.

However, the research journey did not stop at diagnosis and classification. Building upon the vision encoder of CT-CLIP, the investigators introduced CT-CHAT, a vision–language foundational chat model uniquely designed for interactive dialogue involving 3D chest CT volumes. By integrating CT-CLIP with a pre-trained large language model, CT-CHAT merges visual understanding with advanced conversational capabilities. This synergy allows clinicians and researchers to query complex imaging data in natural language, receiving contextually relevant and clinically insightful responses grounded in the volumetric data.

The training of CT-CHAT involved an extraordinary scale of fine-tuning, utilizing over 2.7 million question–answer pairs generated from the CT-RATE dataset. This massive supervised training corpus laid the foundation for a sophisticated AI assistant adept at nuanced interpretation and communication, a crucial step toward realizing AI as an integral partner in clinical decision-making. Importantly, CT-CHAT exemplifies the necessity for specialized methodologies and architectures when transitioning AI systems from two-dimensional imaging to the more information-dense and computationally challenging domain of 3D medical imaging.

The integration of CT-RATE, CT-CLIP, and CT-CHAT together forms a comprehensive ecosystem that addresses previously insurmountable challenges in medical AI for 3D imaging. Open-sourcing these assets marks a pivotal move towards democratizing access and fostering innovation across research communities and clinical institutions worldwide. This openness is expected to accelerate the development of new diagnostic tools, educational platforms, and decision support systems that harness the full potential of volumetric imaging and natural language processing.

Moreover, the methodological advancements embodied by CT-CLIP demonstrate a crucial paradigm shift. Where traditional AI approaches often require extensive labeled datasets tailored for each specific diagnostic task, the contrastive learning framework promotes the development of foundation models capable of zero-shot adaptation. This not only streamlines the research and deployment pipeline but also significantly lowers the barrier for adopting AI solutions in diverse healthcare settings where annotated data may be scarce.

CT-CHAT’s conversational interface exemplifies how AI can move beyond static image interpretation toward dynamic and interactive clinical workflows. By allowing clinicians to engage in dialogue with the AI about patient scans, this model enables a more intuitive exploration of the data, providing explanations, highlighting abnormalities, or retrieving related case histories on demand. Such a system empowers radiologists to leverage AI as a collaborative partner rather than a black-box tool, enhancing transparency and trust in AI-generated insights.

The comprehensive scale and integration of multimodal data in CT-RATE are particularly noteworthy given the inherent complexity of aligning 3D imaging with unrestricted textual narratives. Radiology reports are rich in nuance, imbued with clinical context, diagnostic reasoning, and descriptive details that are challenging for AI to parse. By creating a dataset that tightly couples volumetric data with authentic, contemporaneous reports, the researchers have provided an invaluable training ground for models to learn complex multimodal relationships, pushing beyond the limitations of purely vision- or text-based datasets.

Looking ahead, the release of the CT-RATE dataset alongside CT-CLIP and CT-CHAT is poised to catalyze a new era of innovation in medical imaging AI. Researchers will now be able to develop more generalized, interpretable, and clinically useful AI tools that address not only detection but also explainability, usability, and integration into real-world workflows. This holistic approach is essential to surmount longstanding barriers, translating technical breakthroughs into tangible clinical benefit and improved patient outcomes.

Furthermore, the emphasis on open-source release fosters a spirit of collaborative advancement, inviting the global research community to build upon this foundational work. As AI models trained on CT-RATE grow more sophisticated, the potential to uncover previously unnoticed imaging biomarkers or to automate complex diagnostic tasks increases, thereby expanding the frontiers of precision medicine. The paradigm of multimodal, foundation model pretraining illustrated here is likely to inspire analogous developments in related domains such as MRI, ultrasound, and other medical imaging modalities.

In conclusion, this landmark study charts a visionary path forward, demonstrating that multidisciplinary synergy between radiology, AI, and data science can be harnessed to unlock the diagnostic potential buried in vast repositories of 3D imaging data. From the meticulous construction of a large-scale multimodal dataset to the development of advanced foundation models and interactive AI agents, the work highlights how focused innovation can transcend data scarcity and computational challenges. It heralds a future where AI not only augments the capabilities of clinicians but revolutionizes the very nature of medical imaging analysis.

As the clinical community embraces these tools, the implications extend beyond diagnostics to include personalized medicine, disease monitoring, and education. The integration of CT-CHAT as an intelligent conversational interface exemplifies the move toward democratizing expert knowledge, making sophisticated image interpretation accessible to a wider spectrum of healthcare providers. Collectively, these advances underscore the profound and transformative impact that well-designed, large-scale multimodal AI systems can have on the practice of medicine in the 21st century.

Subject of Research: Development of generalist foundation models for 3D computed tomography through large-scale multimodal datasets.

Article Title: Generalist foundation models from a multimodal dataset for 3D computed tomography.

Article References:
Hamamci, I.E., Er, S., Wang, C. et al. Generalist foundation models from a multimodal dataset for 3D computed tomography. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-025-01599-y

Image Credits: AI Generated

DOI: https://doi.org/10.1038/s41551-025-01599-y