Public Data Resource

precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions

Contact: Nathanael David Olson.
Identifier: doi:10.18434/mds2-2336
Version: 1.0...
The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants. This dataset includes the fastq files provided to participants, the submitted variant callset as vcfs, and the benchmarking results, along with challenge submission metadata.
Research Areas
NIST R&D: Bioscience: Genomics
Keywords: Variant callingbioinformaticswhole-genome sequencingbenchmarkingprecisionFDAShow more...
These data are public.
Files

Loading file list...

Version: 1.0...
Cite this dataset
Nathanael David Olson (2020), precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-2336 (Accessed 2025-07-14)
Repository Metadata
Machine-readable descriptions of this dataset are available in the following formats:
NERDm
Access Metrics
Metrics data is not available for all datasets, including this one. This may be because the data is served via servers external to this repository.