Wednesday, May 13, 2026
Science
No Result
View All Result
  • Login
  • HOME
  • SCIENCE NEWS
  • CONTACT US
  • HOME
  • SCIENCE NEWS
  • CONTACT US
No Result
View All Result
Scienmag
No Result
View All Result
Home Science News Biology

Evaluating AI Anatomy Segmentation Models Without Ground Truth Data

May 13, 2026
in Biology
Reading Time: 4 mins read
0
Evaluating AI Anatomy Segmentation Models Without Ground Truth Data — Biology

Evaluating AI Anatomy Segmentation Models Without Ground Truth Data

65
SHARES
589
VIEWS
Share on FacebookShare on Twitter
ADVERTISEMENT

In the rapidly advancing field of medical imaging, the advent of artificial intelligence (AI) has revolutionized how vast collections of scans are analyzed. Automated anatomy segmentation—where AI models label organs and structures in images such as chest CT scans—has become a cornerstone for enabling large-scale studies previously infeasible due to the need for painstaking manual annotation. However, as the number of segmentation models multiplied, researchers have grappled with a fundamental challenge: how to objectively compare these AI tools in the absence of expert-verified ground truth.

A recent study published in the Journal of Medical Imaging has shed new light on this problem, proposing a robust and practical framework to evaluate concordance among different AI-based anatomy segmentation models without relying on expert annotations as a gold standard. This work centers on chest CT images sourced from the National Lung Screening Trial (NLST), a widely used public dataset for cancer research, ensuring high relevance and applicability to deployed clinical and research scenarios.

The dilemma stems from the nature of public datasets like NLST, which, despite containing thousands of imaging volumes, lack comprehensive organ and bone segmentations. Manual annotations for such intricate structures are astronomically time-consuming and require highly skilled radiologists, rendering complete ground truth labeling impractical. AI models can generate these labels automatically, yet disparity arises because each model may use different terminology, boundary definitions, or anatomical inclusion criteria. Without a consensus or external standard, pinpointing the superior model has remained a vexing conundrum.

Addressing this, the investigators embraced a paradigm shift: they evaluated AI segmentation tools based on their agreement rather than absolute accuracy. The hypothesis is elegant—if independently developed models concur in labeling a structure, that concordance likely indicates a reliable and valid anatomical segmentation. Rather than seeking the elusive “correct” answer, the study quantifies where AI tools align and where they diverge on the same dataset.

Achieving direct model comparison necessitated a standardized baseline. The researchers selected six prominent open-source segmentation models, including TotalSegmentator (two versions), Auto3DSeg, MOOSE, MultiTalent, and CADS. Despite their differing original output formats and nomenclature, the team harmonized all results by converting them into an interoperable DICOM segmentation standard. Furthermore, they unified labels using the SNOMED-CT vocabulary—a widely accepted medical ontology—assigning uniform color codes and identifiers to anatomical regions. This harmonization enabled side-by-side visualization of segmentations from different models on the very same scan, facilitating accurate comparison.

To enhance accessibility, the study leveraged two powerful open-source platforms widely embraced in medical imaging research: OHIF Viewer, a browser-based tool, and 3D Slicer, a robust desktop application. They extended these viewers with bespoke integrations and plugins capable of displaying multiple segmentations simultaneously in three-dimensional and orthogonal two-dimensional views. This user-friendly interface allows researchers to interactively explore congruence and discrepancies among models for individual organs and structures with unprecedented ease.

The analytic phase focused on a carefully curated subset of 18 chest CT scans from different NLST participants. After filtering out partially imaged or inconsistently detected anatomical structures, the study concentrated on 24 key regions, including lung lobes, the heart, ribs, thoracic vertebrae, and the sternum. For each structure, the authors identified a “consensus” segmentation defined as the voxel set concurrently labeled by all models recognizing that anatomical part. Subsequent comparisons measured how each model’s output overlapped with this consensus region, employing metrics quantified shape similarity and volumetric congruence.

These quantitative results were further distilled into interactive plots enabling rapid identification of outlier models or scans exhibiting problematic segmentations. Notably, the team released a publicly accessible interactive website to disseminate these findings, inviting the broader research community to examine the detailed concordance metrics and underlying imaging data themselves, fostering transparency and collaborative refinement.

Results illuminated variable performance across structures. Lung segmentation demonstrated remarkable agreement, with high overlap and nearly indistinguishable boundaries across all models. This consistency highlights the maturity of lung segmentation technologies—likely a function of abundant training data and well-defined anatomical landmarks. In contrast, heart segmentations initially showed moderate concordance owing primarily to one outlier model adopting a narrower definition of the heart. Excluding this model markedly improved overall alignment among the remainder.

Bone structures revealed greater challenges. Four of the six models manifested frequent errors in rib and thoracic vertebrae labels, including merges of adjacent bones or misidentification of vertebral levels. Conversely, two models trained on distinct datasets produced notably more consistent and anatomically comprehensive segmentations. These subtleties eluded aggregate statistics but emerged clearly through simultaneous visual scrutiny, underscoring the indispensability of combined quantitative and qualitative evaluation techniques.

This investigation underscores a crucial insight: even highly cited AI segmentation models can harbor systematic weaknesses, particularly when trained on overlapping or limited data. It also validates a novel pathway for meaningful model assessment without the prohibitive cost of manual ground truth annotation. By integrating standardized atlases, ontology-driven label harmonization, automated voxelwise comparison, and interactive visualization, this framework provides a reproducible, scalable solution for evaluating medical imaging AI tools.

Beyond its immediate findings, this work promotes a vital cultural shift in biomedical AI research—from chasing a mythical single “best” model to embracing evidence-based decision-making informed by comparative strengths and weaknesses. The open availability of software, label mappings, and sample datasets offers the community an invaluable toolkit applicable not only to chest CT anatomy but extensible to other modalities and segmentation tasks.

As AI becomes integral to clinical workflows and population-scale studies alike, transparent evaluation frameworks like this will be indispensable. They empower data scientists, clinicians, and researchers to select segmentation models thoughtfully, gauge reliability, and appreciate limitations—ultimately enhancing the trustworthiness and impact of AI in healthcare.

In a landscape increasingly reliant on AI-generated annotations, the study by L. Giebeler et al. pioneers a path that balances rigor with practicality. Their approach bridges methodological divides, nurtures collaboration, and elevates the standard of medical image analysis through collective truth-seeking, even when classical ground truths remain elusive.

Subject of Research: Not applicable
Article Title: In search of truth: evaluating concordance of AI-based anatomy segmentation models
News Publication Date: 3-Apr-2026
Web References:

  • https://www.spiedigitallibrary.org/journals/journal-of-medical-imaging/volume-13/issue-06/062204/In-search-of-truth–evaluating-concordance-of-AI-based/10.1117/1.JMI.13.6.062204.full
  • http://dx.doi.org/10.1117/1.JMI.13.6.062204
    References:
  • Giebeler L., et al., “In search of truth: evaluating concordance of AI-based anatomy segmentation models,” Journal of Medical Imaging, 13(6), 062204 (2026).
    Image Credits: L. Giebeler et al.
    Keywords: Artificial intelligence, Medical imaging, Anatomy, Anatomy segmentation, AI evaluation, Chest CT, National Lung Screening Trial, Open-source models, DICOM segmentation, SNOMED-CT, 3D Slicer, OHIF Viewer
Tags: AI anatomy segmentation evaluationAI in radiologyAI model concordance assessmentautomated organ labelingchest CT scan segmentationcomparison of AI segmentation toolslarge-scale medical image analysismedical imaging AI modelsNational Lung Screening Trial datasetpublic medical imaging datasetsrobust evaluation framework for AIsegmentation without ground truth
Share26Tweet16
Previous Post

New Hormone Analysis of Baleen Reveals Life Story of Critically Endangered Rice’s Whale with Only 50 Adults Left

Next Post

Neanderthal Dentists Employed Stone Drills to Treat Cavities Nearly 60,000 Years Ago

Related Posts

Study Finds Genetic Risk for Schizophrenia Emerges in Early Adolescence — Biology
Biology

Study Finds Genetic Risk for Schizophrenia Emerges in Early Adolescence

May 13, 2026
How Water Fleas Sense Their Predators: A Scientific Insight — Biology
Biology

How Water Fleas Sense Their Predators: A Scientific Insight

May 13, 2026
Cellular ‘All-Clear’ Signal Triggers Resumption of Protein Synthesis — Biology
Biology

Cellular ‘All-Clear’ Signal Triggers Resumption of Protein Synthesis

May 13, 2026
Hidden Giant Viruses Infect and Inherit in Algae — Biology
Biology

Hidden Giant Viruses Infect and Inherit in Algae

May 13, 2026
NAD(P)H Dehydrogenase Diversity Drives Clofazimine Resistance — Biology
Biology

NAD(P)H Dehydrogenase Diversity Drives Clofazimine Resistance

May 13, 2026
Rhein Alleviates Intestinal Damage in Severe Acute Pancreatitis by Modulating Macrophage Activation via PPARγ — Biology
Biology

Rhein Alleviates Intestinal Damage in Severe Acute Pancreatitis by Modulating Macrophage Activation via PPARγ

May 13, 2026
Next Post
Neanderthal Dentists Employed Stone Drills to Treat Cavities Nearly 60,000 Years Ago — Social Science

Neanderthal Dentists Employed Stone Drills to Treat Cavities Nearly 60,000 Years Ago

  • Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    Mothers who receive childcare support from maternal grandparents show more parental warmth, finds NTU Singapore study

    27643 shares
    Share 11054 Tweet 6909
  • University of Seville Breaks 120-Year-Old Mystery, Revises a Key Einstein Concept

    1047 shares
    Share 419 Tweet 262
  • Bee body mass, pathogens and local climate influence heat tolerance

    678 shares
    Share 271 Tweet 170
  • Researchers record first-ever images and data of a shark experiencing a boat strike

    541 shares
    Share 216 Tweet 135
  • Groundbreaking Clinical Trial Reveals Lubiprostone Enhances Kidney Function

    528 shares
    Share 211 Tweet 132
Science

Embark on a thrilling journey of discovery with Scienmag.com—your ultimate source for cutting-edge breakthroughs. Immerse yourself in a world where curiosity knows no limits and tomorrow’s possibilities become today’s reality!

RECENT NEWS

  • Blood Pressure Medication Shown to Reduce Arterial Stiffness
  • Mouth Stem Cells Show Promise in Overcoming Brain Cancer Defenses
  • Scientists Urge WHO to Reevaluate Airborne Transmission Risks Amid Hantavirus Outbreak
  • Adaptive Evolution Shapes Hyperdiverse Cichlid Intestines

Categories

  • Agriculture
  • Anthropology
  • Archaeology
  • Athmospheric
  • Biology
  • Biotechnology
  • Blog
  • Bussines
  • Cancer
  • Chemistry
  • Climate
  • Earth Science
  • Editorial Policy
  • Marine
  • Mathematics
  • Medicine
  • Pediatry
  • Policy
  • Psychology & Psychiatry
  • Science Education
  • Social Science
  • Space
  • Technology and Engineering

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,146 other subscribers

© 2025 Scienmag - Science Magazine

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • HOME
  • SCIENCE NEWS
  • CONTACT US

© 2025 Scienmag - Science Magazine

Discover more from Science

Subscribe now to keep reading and get access to the full archive.

Continue reading