This project uses machine learning to predict house sale prices using the Iowa Home Prices Dataset. It was created as part of the Kaggle Intro to Machine Learning course.
π Project Structure
train.csv β Training data used to build the model
test.csv β Test data used to generate predictions
notebook.ipynb β Jupyter notebook containing all steps (data prep, modeling, evaluation)
submission.csv β Final predictions ready for Kaggle submission
π§ Approach
Load & Explore Data β Selected key features like:
LotArea, YearBuilt, 1stFlrSF, 2ndFlrSF, FullBath, BedroomAbvGr, TotRmsAbvGrd
Split Dataset β Used train_test_split to create training/validation sets.
Train Model β Started with a DecisionTreeRegressor, then improved accuracy using RandomForestRegressor.
Evaluate Model β Used Mean Absolute Error (MAE) to compare models.
Train on Full Data β Trained the best model (RandomForestRegressor) on all available data.
Make Predictions β Predicted on the test.csv dataset and saved results as submission.csv.
π Results Model Validation MAE Decision Tree (Default) ~28,000 Decision Tree (Tuned) ~24,000 Random Forest ~18,000 β π How to Reproduce
Clone this repo or open the notebook in Kaggle.
Run all cells in order to generate submission.csv.
Go to Data tab β submission.csv β Submit to send results to the competition leaderboard.
π Competition Link
π Kaggle Competition β Home Data for ML Course