Data Publication

Challenging Medically-Relevant Genes Benchmark Set

Justin Wagner, Nathan D Olson Author's orcid, Jennifer McDaniel, Justin M Zook Author's orcid
Contact: Nathanael David Olson.
Identifier: doi:10.18434/mds2-2475
Version: 1.0...
CMRG v1.00 of a small variant benchmark and structural variant benchmark focused on 273 challenging medically relevant genes for the Genome in a Bottle (GIAB) sample HG002 (aka Ashkenazi son). These benchmarks were generated from a trio-based hifiasm v0.11 (https://doi.org/10.1038/s41592-020-01056-5) diploid assembly of HG002 using PacBio HiFi reads for HG002 for assembly and partitioning into phased haplotypes using Illumina reads for the parents, HG003 and HG004. This benchmark contains vcfs for small and structural variants along with corresponding benchmark bed files indicating regions that are homozygous reference if they do not have a variant in the vcf. We extensively curated the variant calls, excluding any found to be questionable or errors. This benchmark helps measure performance in important challenging regions, including challenging segmental duplications, regions with complex variants, regions with structural variants, and regions affected by false duplications in GRCh37 or GRCh38. This benchmark is described in https://doi.org/10.1101/2021.06.07.444885.
Research Areas
NIST R&D: Bioscience: Genomics
Keywords: Human genomicsDNA sequencingReference materialsMedical genomicsBioinformatics
These data are public.
Data and related material can be found at the following locations:
  GIAB FTP Site
NCBI Hosted Genome In A Bottle FTP Site
  Code Repository
Github repository with code used to generate benchmark sets.
  Code for Manuscript Analysis Repository
Github repository with code used to generate figures and perform analysis for manuscript.
Files

Loading file list...

Version: 1.0...
Cite this dataset
Justin Wagner, Nathan D Olson, Jennifer McDaniel, Justin M Zook (2021), Challenging Medically-Relevant Genes Benchmark Set, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-2475 (Accessed 2025-07-09)
Repository Metadata
Machine-readable descriptions of this dataset are available in the following formats:
NERDm
Access Metrics
Metrics data is not available for all datasets, including this one. This may be because the data is served via servers external to this repository.