Data Publication

Theory aware Machine Learning (TaML)

Debra J. Audus Author's orcid, Austin McDannald Author's orcid, Brian DeCost Author's orcid
Contact: Debra Audus..
Identifier: doi:10.18434/mds2-2637
Version: 1.1... First Released: 2022-06-22 Revised: 2023-01-05

Abstract

A code repository and accompanying data for incorporating imperfect theory into machine learning for improved prediction and explainability. Specifically, it focuses on the case study of the dimensions of a polymer chain in different solvent qualities. Jupyter Notebooks for quickly testing concepts and reproducing figures, as well as source code that computes the mean squared error as a function of dataset size for various machine learning models are included.

For additional details on the data, please refer to the README.md associated with the data. For additional details on the code, please refer to the README.md provided with the code repository (GitHub Repo for Theory aware Machine Learning). For additional details on the methodology, see Debra J. Audus, Austin McDannald, and Brian DeCost, "Leveraging Theory for Enhanced Machine Learning" *ACS Macro Letters* **2022** *11* (9), 1117-1122 DOI: [10.1021/acsmacrolett.2c00369](https://doi.org/10.1021/acsmacrolett.2c00369).
Research Topics: Materials: Polymers, Information Technology: Data and informatics, Materials: Modeling and computational material science, Mathematics and Statistics: Uncertainty quantification    
Subject Keywords: polymers, machine learning, transfer learning, theory    

Data Access

These data are public.
Data and related material can be found at the following locations:
Files

Loading file list...

About This Dataset

Version: 1.1... First Released: 2022-06-22 Revised: 2023-01-05
Cite this dataset
Audus, Debra, MacDannald, Austin, DeCost, Brian (2022), Theory aware Machine Learning (TaML), National Institute of Standards and Technology, https://doi.org/10.18434/mds2-2637 (Accessed 2023-03-26)
Repository Metadata
Machine-readable descriptions of this dataset are available in the following formats:
NERDm
Access Metrics
Metrics data is not available for all datasets, including this one. This may be because the data is served via servers external to this repository.