In our process, we need to ensure that numeric variables (such as income) are properly balanced and interpolated when necessary.
Interpolation techniques:
- Linear Interpolation (
pandas.DataFrame.interpolate(method='linear'))
- Polynomial Interpolation (
numpy.polyfit)
- Spline Interpolation (
scipy.interpolate.spline or scipy.interpolate.interp1d)
- KNN-based Imputation (
sklearn.impute.KNNImputer)
Balancing techniques:
- Standardization (
sklearn.preprocessing.StandardScaler)
- Quantile Normalization (
scipy.stats.mstats.rankdata)
- SMOTER (Synthetic Minority Over-sampling Technique for Regression) for balancing underrepresented numeric values (
imblearn.over_sampling.SMOTER)