Data Publication

Predicting ABM Results with Covering Arrays and Random Forests

Megan Olsen, M S Raunak

, D. Richard Kuhn

Contact: M S Raunak.

Identifier: doi:10.18434/mds2-3002

Version: 1.0... First Released: 2023-10-05 Revised: 2023-10-05

Our goal is to explore the feasibility and usefulness of using a combination of covering arrays and machine learning models for predicting results of an agent- based simulation model within the vast parameter value combination space. The challenge is to select parameter values that are representative of the overall behavior of the model, so that we can train the machine learning model to be able to correctly predict behavior on previously untested areas of the parameter space. We have chosen Wilensky's Heat Bugs model in NetLogo for our study. It is a simple model, amenable to quick data generation, with a limited number of outputs to predict, and with emergent behavior. This model therefore allows exploration of this new approach.

We utilize covering arrays to reduce the parameter value space systematically, run the model for each parameter set in the 2-way and 3-way covering arrays, train a random forest model on the 2-way data (33, 351 parameter combinations), and test its ability to predict the outcome of the simulation on the significantly larger 3-way data that was not seen during the training of the model (3, 971, 955 parameter combinations).

Research Areas

NIST R&D: Information Technology: Data and informatics

Keywords: agent-based modeling · machine learning · calibration

These data are public.

Files

Loading file list...

Version: 1.0... First Released: 2023-10-05 Revised: 2023-10-05

Cite this dataset

Megan Olsen, M S Raunak, D. Richard Kuhn (2023), Predicting ABM Results with Covering Arrays and Random Forests, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-3002 (Accessed 2025-07-02)

Repository Metadata

Machine-readable descriptions of this dataset are available in the following formats:

NERDm

Access Metrics

Metrics data is not available for all datasets, including this one. This may be because the data is served via servers external to this repository.