Scripts for multifidelity models used to assess data hierarchy scaling for the prediction of excitation energies of molecules. This also includes scripts to generate the newly introduced Gamma curve. The scripts given here can be used to generate the figures from the manuscript of the same title hosted at https://arxiv.org/abs/2410.11392. The requirements.txt file contains all required packages to run the scripts given herein. The dataset used in this work is hosted freely at this ZENODO repository.
- The python file
Model_MFML.pyis the module that was developed in this previous work and contains both both MFML and o-MFML implementations that are used in this work. -
PrepFromQeMFi.pyseparates the data from the QeMFi dataset into train, test, and validation datasets. - The jupyter notebook
Plots.ipynb.offers the different functions to reproduce the plots from the manuscript. -
LearningCurve.pygenerates the data needed to assess the different fixed scaling factors ($\gamma$ ) - The script
RatioTimeBasedScalingFactor.pyproduces the learning curves for the scaling factors defined as$\theta_{f-1}^f$ . -
TargetFidelityTimeRatioScalingFactor.pygenerates the data for scaling factors defined as$\theta_f^F$ in the manuscript. - The script
ErrorContours_gamma2.pywill generate the data needed to plot the error contours of MFML (Fig 6 of manuscript). -
GammaCurve.pycreates all the data points needed to assess the different$\Gamma(N_{train}^{TZVP})$ from the manuscript. The value ofntopcan be changed based on$N_{train}^{TZVP}$ . - The scripts
saveindexfortimevsmae.pyandsaveindex_extendedgamma.pyare used to get the indices of the training samples used so they can be used to generate the time-cost plots.