bioRxiv. 2024 Sep 22. pii: 2024.09.18.613544. [Epub ahead of print]
Genome in a Bottle Consortium
The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first to be explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line (HG008-T) and matched normal cells from duodenal tissue (HG008-N-D) and pancreatic tissue (HG008-N-P). The data come from thirteen whole genome measurement technologies: Illumina paired-end, Element standard and long insert, Ultima UG100, PacBio (HiFi and Onso), Oxford Nanopore (standard and ultra-long), Bionano Optical Mapping, Arima and Phase Genomics Hi-C, G-banded karyotyping, directional genomic hybridization, and BioSkryb Genomics single-cell ResolveDNA. Most tumor data is from a large homogenous batch of non-viable cells after 23 passages of the primary tumor cells, along with some data from different passages to enable an initial understanding of genomic instability. These data will be used by the GIAB Consortium to develop matched tumor-normal benchmarks for somatic variant detection. In addition, extensive data from two different normal tissues from the same individual can enable understanding of mosaicism. Long reads also contain methylation tags for epigenetic analyses. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic mutations. This first-of-its-kind broadly consented open-access resource will facilitate further understanding of sequencing methods used for cancer biology.