The Germans Trias i Pujol Research Institute (IGTP), through its Biostatistics Unit, has achieved a significant milestone in open scientific data sharing by publishing the DIVINE study database. This comprehensive dataset, now accessible via the prestigious Nature Portfolio journal Scientific Data, represents a groundbreaking contribution to COVID-19 research and epidemiological studies. Scientific Data is renowned for emphasizing datasets’ accessibility, thorough documentation, and reusability, which aligns perfectly with the objective of this publication: fostering transparent, reproducible, and collaborative science.
The DIVINE database is a vast multicenter cohort comprising detailed clinical information from 5,813 patients hospitalized with COVID-19 across four pandemic waves between March 2020 and August 2021. These patients were admitted to five hospitals in the southern metropolitan area of Barcelona. The rich dataset integrates multidimensional clinical data, including patient demographics, underlying health conditions, treatments administered, and clinical outcomes both during hospitalization and in post-discharge follow-ups, enabling multifaceted analyses and model development.
To maximize usability and facilitate research integration, the data have been meticulously organized into an R package, available on the Comprehensive R Archive Network (CRAN). Alongside this, the authors maintain an associated GitHub repository and a Zenodo record for robust data traceability and archiving. This ensures that researchers worldwide can readily access and apply the dataset in diverse statistical, epidemiological, and predictive modeling contexts without ethical or logistical barriers.
Data anonymization was central to this initiative, preserving patient confidentiality while allowing detailed clinical and temporal analyses. Researchers can leverage the dataset to track patient trajectories through different viral waves, identify clinical risk factors associated with severe outcomes, and validate new predictive algorithms, making it a potent tool for advancing knowledge in infectious disease epidemiology and patient management strategies.
Prior to its public release, the DIVINE cohort has already been instrumental in numerous scientific investigations. These studies have explored determinants of in-hospital mortality, examined long-term post-COVID sequelae, applied patient stratification approaches, and contributed to developing advanced predictive models tailored to clinical needs. By publishing this dataset, the researchers invite the global scientific community to extend this work, thereby accelerating advancements in precision medicine and public health responses.
Cristian Tebé, the head of the Biostatistics Unit at IGTP, underscored the ethical imperatives behind open data publication. According to Tebé, releasing clinical datasets promotes transparency, enhances reproducibility, accelerates discoveries, and reduces redundant research efforts, constituting a moral obligation towards both science and society. His perspective reflects a growing consensus within the scientific community advocating for open data as a vehicle for sustainable and trustworthy biomedical research.
The origins of the DIVINE cohort trace back to a collaborative response during the initial wave of the pandemic. The Infectious Diseases Service at Bellvitge University Hospital spearheaded data collection, supported by biostatisticians from the Bellvitge Biomedical Research Institute (IDIBELL). Subsequently, the project expanded through a network of Catalan healthcare and academic institutions, including IGTP, Universitat Politècnica de Catalunya, Universitat de Barcelona, and multiple healthcare consortia and hospitals, collectively representing the MetroSud and DIVINE research groups.
This multicenter approach ensured a heterogeneous and representative sample covering diverse patient demographics, healthcare settings, and temporal dynamics of SARS-CoV-2 infection. The inclusion of nuanced clinical variables collected prospectively enhances the dataset’s robustness, accuracy, and applicability in both retrospective and prospective analyses. Such data depth is critical for elucidating the complex interplay between viral pathogenesis, host factors, and treatment modalities influencing COVID-19 outcomes.
The dataset’s design also considers teaching and methodological training applications in biostatistics, epidemiology, and clinical data science. Educational institutions can adopt the DIVINE R package as a real-world resource for illustrating statistical modeling, survival analysis, risk stratification, and longitudinal data interpretation. This fosters a new generation of researchers equipped with practical skills grounded in actual pandemic data, addressing a critical gap in translational biomedical education.
Making the dataset openly available via multiple platforms epitomizes modern scientific best practices, combining open access publishing with resource sharing. This ecosystem supports continuous updates, community-driven improvements, and collaborative validation efforts, imperative during an evolving public health crisis. The transparent provision of metadata, code, and documentation exemplifies reproducible research, enabling other teams to replicate findings, benchmark models, and generate novel hypotheses grounded in empirical evidence.
Furthermore, the ethical framework guiding this project, balancing patient privacy with maximal data utility, serves as a model beyond COVID-19. It demonstrates how sensitive clinical information can be securely shared to catalyze global scientific progress without compromising individual rights or institutional responsibilities. Such frameworks are increasingly vital in the era of big data and precision health, where collaborative efforts span disciplines, institutions, and borders.
This landmark publication reinforces the critical role of biostatistics and data science units within clinical research ecosystems. By bridging clinical data collection with rigorous statistical methodology and open science principles, the IGTP Biostatistics Unit highlights the transformative potential of integrated multidisciplinary teams. Their contribution illustrates how robust data infrastructure and analytic expertise can drive impactful findings and influence public health policies during and after the pandemic.
Looking ahead, the DIVINE cohort’s availability encourages further international collaborations and meta-analyses, integrating datasets from different regions and timeframes. The longitudinal and multicenter nature of the database positions it as a valuable resource for monitoring virus evolution, vaccine effectiveness, and emerging variants. As global health responses continue adapting to the pandemic’s shifting landscape, such datasets underpin evidence-based strategies and precision interventions.
In conclusion, the publication of the DIVINE COVID-19 cohort dataset epitomizes a shift towards greater openness, ethical responsibility, and methodological rigor in biomedical research. It empowers researchers, educators, and policymakers with a comprehensive resource that both deepens understanding and accelerates innovation in managing SARS-CoV-2 and its clinical consequences. The pioneering spirit of the IGTP Biostatistics Unit and their collaborators sets a benchmark for transparent scientific communication and data sharing in the face of global health challenges.
Subject of Research: People
Article Title: A multicenter COVID-19 database from four waves in the south metropolitan area of Barcelona, Catalonia
News Publication Date: 29-May-2026
Web References:
References:
Tebé, C., et al. (2026). A multicenter COVID-19 database from four waves in the south metropolitan area of Barcelona, Catalonia. Scientific Data. DOI: 10.1038/s41597-026-07479-7
Image Credits: IGTP
Keywords: COVID-19, Open access, Scientific data, Statistics, Public health, Epidemiology

