Data Publication

Predicting ABM Results with Covering Arrays and Random Forests

Megan Olsen, M S Raunak Author's orcid, D. Richard Kuhn
Contact: M S Raunak.
Identifier: doi:10.18434/mds2-3002
Version: 1.0... First Released: 2023-10-05 Revised: 2023-10-05
Our goal is to explore the feasibility and usefulness of using a combination of covering arrays and machine learning models for predicting results of an agent- based simulation model within the vast parameter value combination space. The challenge is to select parameter values that are representative of the overall behavior of the model, so that we can train the machine learning model to be able to correctly predict behavior on previously untested areas of the parameter space. We have chosen Wilensky's Heat Bugs model in NetLogo for our study. It is a simple model, amenable to quick data generation, with a limited number of outputs to predict, and with emergent behavior. This model therefore allows exploration of this new approach.

We utilize covering arrays to reduce the parameter value space systematically, run the model for each parameter set in the 2-way and 3-way covering arrays, train a random forest model on the 2-way data (33, 351 parameter combinations), and test its ability to predict the outcome of the simulation on the significantly larger 3-way data that was not seen during the training of the model (3, 971, 955 parameter combinations).
Research Areas
NIST R&D: Information Technology: Data and informatics
Keywords: agent-based modeling · machine learning · calibration
These data are public.
Files

Loading file list...

Version: 1.0... First Released: 2023-10-05 Revised: 2023-10-05
Cite this dataset
Megan Olsen, M S Raunak, D. Richard Kuhn (2023), Predicting ABM Results with Covering Arrays and Random Forests, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-3002 (Accessed 2025-04-24)
Repository Metadata
Machine-readable descriptions of this dataset are available in the following formats:
NERDm
Access Metrics
Metrics data is not available for all datasets, including this one. This may be because the data is served via servers external to this repository.