Skip to content

AQI Prediction using Machine Learning | Predict Air Quality Index from pollutant data with Python & ML.

Notifications You must be signed in to change notification settings

prachi757/AQI-PREDICTION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🌍 AQI Prediction using Machine Learning

Predict the Air Quality Index (AQI) from real‑world pollutant data (CO, NO₂, SO₂, O₃, PM2.5, PM10) across multiple global cities using Python & ML.


Python Jupyter scikit-learn Status


✨ Features

  • 📊 Exploratory Data Analysis – Visual pollutant distributions & AQI trends.
  • 🧹 Data Cleaning – Null checks, duplicates, drop unused columns.
  • 🧮 Feature Engineering – One‑hot encode cities; scale numeric features.
  • 🤖 Machine Learning Models – Linear Regression baseline vs Random Forest ensemble.
  • 📈 Model Evaluation – R², RMSE, MAE comparison.
  • 🔮 Custom Prediction – Plug in new pollutant readings and estimate AQI.

🎯 Why This Matters

Poor air quality affects respiratory health, productivity, and urban planning. An ML model that estimates AQI from pollutant levels helps:

  • Citizens track exposure risk.
  • City agencies forecast alerts.
  • Students learn regression modeling on environmental data.

📂 Dataset

Rows: 52,560 hourly records
Columns: City, CO, NO2, SO2, O3, PM2.5, PM10, AQI
Cities Covered: Brasilia, Cairo, Dubai, London, New York, Sydney
Use: Educational / learning project dataset (bundled locally in repo).

If you later host the dataset separately (e.g., Kaggle), update the link here.


🛠 Tech Stack

  • Python (pandas, numpy)
  • Visualization: matplotlib, seaborn
  • Modeling: scikit-learn (LinearRegression, RandomForestRegressor, MinMaxScaler, metrics)
  • Environment: Jupyter Notebook

🔍 Workflow

  1. Load CSV → pandas.read_csv()
  2. Inspect shape, dtypes, nulls
  3. Drop Date (not modeled)
  4. Encode City → one-hot columns
  5. Split train/test
  6. Scale features → MinMaxScaler
  7. Train models:
    • Linear Regression (baseline)
    • Random Forest Regressor (ensemble)
  8. Evaluate → R², RMSE, MAE
  9. Predict on new samples

📊 Results

Model RMSE MAE Notes
Linear Regression 0.83 10.21 7.38 Baseline
Random Forest 0.86 9.37 6.33 ✅ Best Model

(Metrics from notebook run; will vary by random seed.)


🚀 Quickstart

First create the repo on GitHub under your account prachi757 named aqi-prediction (Public). Then run the steps below.

1. Clone the repository

git clone https://github.com/prachi757/aqi-prediction.git
cd aqi-prediction

If you forked this repo instead: replace the URL with your fork (shown on GitHub after you click Fork).

2. (Optional) Create & activate a virtual environment

macOS / Linux

python -m venv .venv
source .venv/bin/activate

Windows PowerShell

python -m venv .venv
.\.venv\Scripts\activate

3. Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

4. Launch the notebook

jupyter notebook Major_Project.ipynb

Run cells top→bottom.


🧪 Try Your Own Prediction

After running the notebook and training the Random Forest model:

# Example new pollutant reading (scaled automatically below)
# Order: CO, NO2, SO2, O3, PM2_5, PM10, Brasilia, Cairo, Dubai, London, New_York, Sydney
new_sample = [[0.7, 45.0, 12.0, 32.0, 58.0, 105, 0, 1, 0, 0, 0, 0]]

# IMPORTANT: Use the *same* scaler fitted on training data
new_sample_scaled = scaler.transform(new_sample)

pred = AQI_Regressor.predict(new_sample_scaled)
print(f"Predicted AQI: {pred[0]:.2f}")

📁 Project Structure

aqi-prediction/
│
├── Major_Project.ipynb        # Notebook: EDA + Modeling
├── Air_Quality_dataset.csv    # Dataset (hourly pollutant readings)
├── requirements.txt           # Environment + install instructions
└── README.md                  # You are here!

🗺 Roadmap (Future Ideas)

  • Add Gradient Boosting / XGBoost
  • Include time‑series features from Date (hour, month, season)
  • Hyperparameter tuning (GridSearchCV)
  • Streamlit mini‑app for live AQI prediction
  • Feature importance + SHAP explainability

🙋 Contact

Prachi Garg
GitHub: prachi757
LinkedIn: Prachi Garg
Email: [email protected]


📜 License

Educational & portfolio use. Feel free to fork, learn, and extend—please credit the original author.


⭐ Like this project?

If it helped you, star the repo and share! 🙌

About

AQI Prediction using Machine Learning | Predict Air Quality Index from pollutant data with Python & ML.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published