If you want to use our proposed method MGS, please use the following updated repository : https://github.com/artefactory/mgs-grf
Repository for Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants paper.
In praticular, you will find code to reproduce the paper experiments.
If you want to reproduce our paper experiments:
- the notebooks here and here reproduce the experiments
- thise code contains implementation the protocols used for the numerical experiments of our article.
In order to use our MGS strategy:
The data sets of used for our article should be dowloaded inside the data/externals folder. The data sets are available at the followings adresses :
- Pima
- Phoneme : https://github.com/jbrownlee/Datasets/blob/master/phoneme.csv
- Abalone : https://archive.ics.uci.edu/dataset/1/abalone
- Wine : https://archive.ics.uci.edu/dataset/186/wine+quality
- Haberman : https://archive.ics.uci.edu/dataset/43/haberman+s+survival
- Yeast : https://archive.ics.uci.edu/dataset/110/yeast
- Vehicle : https://archive.ics.uci.edu/dataset/149/statlog+vehicle+silhouettes
- Ionosphere : https://archive.ics.uci.edu/dataset/52/ionosphere
- Breast cancer Wisconsin : https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
- CreditCard : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- MagicTel : https://www.openml.org/d/44125
- California : https://www.openml.org/d/44090
- House_16H : https://openml.org/d/821
This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.
If you find the code usefull, please consider citing us :
@article{sakho2024we,
title={Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants},
author={Sakho, Abdoulaye and Malherbe, Emmanuel and Scornet, Erwan},
journal={arXiv preprint arXiv:2402.03819},
year={2024}
}