Your Image
PUBLIC
DATA REPOSITORY
1.15.4

The NIST Public Data Repository allows users to download a single data file from a dataset by clicking on the download icon () to the right of file's name in the file listing; however, more often you will want to download many files from the dataset.

There are a few options to download files in bulk:

  • Downloading all files with the Data Cart (for fewer than 300 files)
  • Downloading a subset of files with the Data Cart
  • Download data using the rclone tool *
  • Download data using the Python script, pdrdownload.py *
  • Programmatic access to NIST data products

*recommended for datasets with more than 300 files

Note:
This page mentions third party tools, some which may be commercial products; this does not imply endorsement by NIST.

Download all files through the Data Cart

For smaller datasets, all the files can be easily downloaded through the browser using the PDR Data Cart. On the dataset's home page, just click on the "Download-All" icon ( ) above and to the left of the file listing. This will open a Data Cart (in a separate tab) with all of files from the dataset in it. You will see a spinner icon in that page as it prepares the download. Eventually, it will show you a list of one or more zip files containing the files from the dataset. Click the "Start Download" button to start downloading those Zip files.

Download selected portions of a dataset with the Data Cart

In the PDR, you have a general Data Cart available for downloading in bulk a variety of files from multiple datasets. To use this feature, add files or folders of files from a dataset to your cart by clicking on "Add-to-Cart" icons () on the right side of the file listing. Alternatively, add all the files from a dataset to your cart by clicking the "Add-all-to-Cart" icon () above and to the left of the file listing.

After adding the files of interest, you can open up your view of the Cart by clicking either the "Cart" link in the most top-right corner of any dataset's home page or the "Data Cart" link in the navigation bar on the right side of the dataset's page. You will see a listing of the files and folders you have added to the cart. You can browse list, download individual files, or select the files you wish to download in bulk by clicking the selection boxes at the left. Click to the "Download Selected" button to prepare download. Like withe the "download-all" feature, a pop-up will show you a list of one or more Zip files containing the selected files; click the "Start Download" to start the actual download.

Downloading large datasets using the rclone tool

Install rclone

When PDR demand is high and the dataset you want to download contains a large number of files, the Data Cart will struggle to provide the data. When the number of files is larger than about 300, we recommend you try using rclone; it is a free, open-source tool for transfering many files to and from remote storage (such as a cloud drive) easily and reliably. The NIST PDR is fully compatible with rclone, making it a useful tool for downloading large datasets in bulk. It is available for all major computers, including Linux, Macs, and Windows, and can be installed manually or via common OS software package managers (e.g. apt, rpm, Brew, etc.).

After installing rclone, you can download all the files in a PDR dataset to your current directory by typing:

rclone copy :http: ./dataset-id/ --http-url http://data.nist.gov/od/ds/dataset-id/ -P

If the download process gets interrupted for any reason, you can rerun the same command, and it will resume the download where it left off.

Downloading large datasets using the Python script, pdrdownload.py

Download pdrdownload.py This script requires Python 3.8 or higher

Another way to download large datasets conveniently and reliably from the PDR is with our custom Python script, pdrdownload.py. For users that can run a python script, this script has several advantages:

  • It displays a preview of number of files and total number of bytes.
  • It downloads files into the same folder hierarchy as seen in dataset home page file listing.
  • It automatically checks that files downloaded with out error or corruption.
  • Restarting the script will resume file downloads after an interruption.

To see a preview of what you will be downloading from your dataset, type:

python pdrdownload.py -I dataset-id

This will construct and save locally a list of the files in the dataset with the identifier, dataset-id, and it will display the total number of files and the total number of bytes available as part of this dataset.

To start the download, type:

python pdrdownload.py -I dataset-id -D

The script has other useful features like downloading subsets, more verbose output, and others. To see the full list of options available, type:

python pdrdownload.py --help

Programmatic access to NIST data products

The NIST Public Data Repository API interface allows users to create their own scripts for downloading files in bulk. In particular, one can download a JSON-encoded metadata description of a dataset which provides the URLs for downloadable files along with other useful information for tracking the data.