Genomic Data Commons provides unprecedented cancer data resource
Launched in 2016 by then-Vice President Biden, the GDC now contains more than 3.3 petabytes of data and serves more than 50,000 unique users each month.
The National Cancer Institute’s Genomic Data Commons (GDC), launched in 2016 by then-Vice President Joseph Biden and hosted at the University of Chicago, has become one of the largest and most widely used resources in cancer genomics, with more than 3.3 petabytes of data from more than 65 projects and over 84,000 anonymized patient cases, serving more than 50,000 unique users each month.
In new papers published Feb. 22 in Nature Communications and Nature Genetics, the UChicago-based research team shares new details about the GDC, which is funded by the National Cancer Institute (NCI), via subcontract with the Frederick National Laboratory for Cancer Research, currently operated by Leidos Biomedical Research, Inc. One of the papers describes the design and operation of the GDC. The other describes the pipelines used by the GDC for the harmonization of data submitted to the GDC and the generation of datasets used by the GDC research community.
The goal of the GDC is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine.
Data production for what would become the GDC began in June 2015 using a private cloud. After just a year, the GDC had analyzed more than 50,000 raw sequencing data inputs. The GDC includes genomic, transcriptomic, epigenomic, proteomic, clinical, and imaging data. The processing pipelines described in the Nature paper have produced more than 1,660 TB of data on more than two dozen types of primary cancers. These data are stored within the GDC Data Portal, where they are available for viewing and downloading.
Along with the data portal, the GDC also offers additional user resources, including the GDC Data Analysis, Visualization, and Exploration (DAVE) Tools for interactive exploration of data by genomic variant or specific alteration; the GDC Data Submission Portal for submitting data; the GDC Data Transfer Tool (DTT) for downloading large genomic datasets; and the GDC data harmonization system, which allows users to run data submitted to the GDC through the harmonizing processing pipelines.
“These data have a critical role to play,” said Robert Grossman, PhD, principal investigator for the GDC and director of the Center for Translational Data Science at UChicago. “As data accumulates, new signals will become easier to identify as important targets for understanding cancer biology. In addition, the data-sharing infrastructure can serve to inform research studies, providing new insight into genetic variation between individuals and how it may affect cancer patient outcomes.”
To read about the launch of the GDC, visit uchicagomedicine.org.
The Genomic Data Commons project is funded with Federal funds from the National Cancer Institute, National Institutes of Health from the following sources: Leidos Biomedical Research, Inc. Agreement 14X050 and Agreement 17X147, Task Order TO2 under Prime Contract HHSN261200800001E; and Agreement 17X147, Task Orders F07 and F12 under Prime Contract 75N91019D00024, Task Orders 75N91019F0129 and 75N91020F00003.
About the University of Chicago Medicine & Biological Sciences
The University of Chicago Medicine, with a history dating back to 1927, is one of the nation’s leading academic health systems. It unites the missions of the University of Chicago Medical Center, Pritzker School of Medicine and the Biological Sciences Division. Twelve Nobel Prize winners in physiology or medicine have been affiliated with the University of Chicago Medicine. Its main Hyde Park campus is home to the Center for Care and Discovery, Bernard Mitchell Hospital, Comer Children’s Hospital and the Duchossois Center for Advanced Medicine. It also has ambulatory facilities in Orland Park, South Loop and River East as well as affiliations and partnerships that create a regional network of care. UChicago Medicine offers a full range of specialty-care services for adults and children through more than 40 institutes and centers including an NCI-designated Comprehensive Cancer Center. Together with Harvey-based Ingalls Memorial, UChicago Medicine has 1,296 licensed beds, nearly 1,300 attending physicians, over 2,800 nurses and about 970 residents and fellows.
Visit UChicago Medicine’s health and science news blog at http://www.
Alison Caldwell, PhD