- Overview
- Project Structure
- Features
- Installation
- Usage
- Model Training & Evaluation
- Deployment
- Customization
- Contributing
- License
- Acknowledgements
- Contact
Credit Risk Classification using AWS SageMaker is an end-to-end project that demonstrates how to build, train, and deploy a machine learning model to classify credit risk (e.g., predicting if a loan applicant is likely to default), utilizing AWS SageMaker’s managed ML services. The project covers data preprocessing, model development, evaluation, and deployment in a scalable and reproducible way.
.
├── data/ # Raw and processed datasets
├── notebooks/ # Jupyter notebooks for EDA, training, and inference
├── src/ # Source code for data loading, model, utils, etc.
│ ├── preprocessing.py
│ ├── train.py
│ └── inference.py
├── requirements.txt # Python dependencies
├── README.md
├── .gitignore
└── config/ # Configuration files and hyperparameters
- Data Preprocessing: Cleaning, feature engineering, and transformation scripts.
- Model Training: Training pipelines using AWS SageMaker SDK.
- Model Evaluation: Metrics and visualization for evaluating model performance.
- Deployment: Scripts and steps for deploying models as SageMaker endpoints.
- Automation: Sample workflow for automating training and deployment.
- Scalability: Easily adaptable to larger datasets and more complex models.
-
Clone the repository
git clone https://github.com/naman-sriv/Credit_Risk_Classification_AWS_Sagemaker.git cd Credit_Risk_Classification_AWS_Sagemaker -
Create and activate a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
(Optional) Set up AWS credentials
- Configure your AWS CLI with
aws configure, or set environment variables as per AWS docs.
- Configure your AWS CLI with
- Check the
notebooks/directory for EDA, training, and deployment walkthroughs. - Example:
notebooks/eda.ipynb,notebooks/train_model.ipynb,notebooks/deploy_model.ipynb
- Place your raw dataset in the
data/folder or update paths in the config files. - Run data preprocessing scripts:
python src/preprocessing.py --config config/preprocessing.yaml
- Local training:
python src/train.py --config config/train_config.yaml
- Or use SageMaker:
- Follow instructions in
notebooks/train_model.ipynbto launch a SageMaker training job.
- Follow instructions in
- Use evaluation scripts or notebooks to review metrics and visualizations.
- Deploy using SageMaker endpoint scripts or via notebook.
- Example:
python src/deploy.py --model-path <model_artifact>
- Algorithms Used: (e.g., Logistic Regression, XGBoost, Random Forest)
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC, etc.
- Validation: k-fold cross-validation, hold-out set, etc.
- SageMaker Endpoint: Deploy trained model as a REST API endpoint.
- Sample Request:
import boto3 runtime = boto3.client('sagemaker-runtime') response = runtime.invoke_endpoint( EndpointName='your-endpoint-name', ContentType='text/csv', Body='<CSV_DATA>' ) print(response['Body'].read())
- Change hyperparameters in the
config/directory. - Add new features or models in
src/. - Update data sources as needed.
Contributions are welcome! Please open issues or submit pull requests for improvements.
- Fork the repository.
- Create your feature branch:
git checkout -b feature/YourFeature - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin feature/YourFeature - Open a pull request.
This project is licensed under the MIT License. See LICENSE for more information.
For questions or feedback, please contact naman-sriv.