Public Data Resource

Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development"

Contact: June W. Lau.
Identifier: doi:10.18434/mds2-3198
Version: 1.0 First Released: 2024-09-05 Revised: 2024-09-05
This is a collection of code written by Maurice Curran that was used to process the Microscopy and Microanalysis conference proceeding corpus into word products described in the publication "NLP-Driven Electron Microscopy Ontology Development". The scripts are written in Python, to be used in the following order: 1. SettingUpTextFiles.py and CopyingText.py to get the raw text files; 2. SentenceConversion.py; 3. reference_remover.py; 4. testing.py and testingavg.py; 5. SentenceCreator.py; 6. matscholar_model.py to get matscholar tags; 7. training_model_gensim.py to get gensim model; 8. word2vecscript.py and gensim_visual.py;
Research Areas
NIST R&D: Information Technology: Data and informaticsMaterials: Modeling and computational material scienceMaterials: Materials characterization
Keywords: Natural language processingNLPelectron microscopycontrolled vocabularyontology
These data are public.
Files

Loading file list...

Version: 1.0 First Released: 2024-09-05 Revised: 2024-09-05
Cite this dataset
June W. Lau (2021), Code used to produce terms list in the work "NLP-Driven Electron Microscopy Ontology Development", National Institute of Standards and Technology, https://doi.org/10.18434/mds2-3198 (Accessed 2025-07-09)
Repository Metadata
Machine-readable descriptions of this dataset are available in the following formats:
NERDm
Access Metrics
Metrics data is not available for all datasets, including this one. This may be because the data is served via servers external to this repository.