This repository demonstrates a comprehensive MLOps (Machine Learning Operations) pipeline that showcases industry-standard practices for end-to-end machine learning workflow automation. The project implements a production-ready ML system with automated training, validation, deployment, and monitoring capabilities.
This MLOps pipeline follows a modular, scalable architecture designed to handle real-world machine learning challenges:
- Data-Centric Approach: Implements robust data ingestion, validation, and feature engineering processes
- Model-Centric Operations: Automated model training, evaluation, and deployment with performance tracking
- Infrastructure-as-Code: Kubernetes manifests and Docker containerization for scalable deployments
- Observability-First: Comprehensive monitoring with metrics, logs, and distributed tracing
- CI/CD Integration: Automated testing, security scanning, and deployment pipelines
Core ML Pipeline Components:
- οΏ½ 6-Stage DVC Pipeline: Data ingestion β Validation β Feature engineering β Transformation β Training β Evaluation
- π€ XGBoost Model: Achieved 92.15% accuracy with automated hyperparameter tuning
- π MLflow Integration: Experiment tracking and model registry for version control
- π Data Quality Monitoring: Evidently AI for drift detection and quality assessment
Production Infrastructure:
- π³ Docker Containerization: Multi-stage builds with security best practices
- βΈοΈ Kubernetes Deployment: Scalable orchestration with services, ingress, and monitoring
- πͺοΈ Apache Airflow: Workflow orchestration and scheduling for automated pipeline execution
- π GitHub Actions CI/CD: Automated testing, security scanning, and deployment
Observability Stack:
- π Prometheus: Metrics collection and alerting for model performance
- π Grafana: Custom dashboards for ML pipeline visualization
- π ELK Stack: Centralized logging with Elasticsearch, Kibana, and Fluentd
- πΈοΈ Jaeger: Distributed tracing for microservices monitoring
- Python 3.11+
- Docker & Docker Compose
- Git & DVC
- Kaggle Account (for data access)
- Clone the repository
git clone https://github.com/Abeshith/MLOps_PipeLine.git
cd MLOps_PipeLine
- Set up Python environment
# Using uv (recommended)
uv venv
uv pip install -r requirements.txt
# Or using pip
pip install -r requirements.txt
- Configure Kaggle credentials
# Create kaggle.json in ~/.kaggle/ directory
{
"username": "your_kaggle_username",
"key": "your_kaggle_key"
}
Stage 1: Data Ingestion
python -m src.mlpipeline.pipeline.stage_01_data_ingestion
or
$env:PYTHONPATH = "src"; python src/mlpipeline/pipeline/stage_01_data_ingestion.py
Downloads Kaggle competition data, splits into train/test sets, validates data integrity
Stage 2: Data Validation
python -m src.mlpipeline.pipeline.stage_02_data_validation
or
$env:PYTHONPATH = "src"; python src/mlpipeline/pipeline/stage_02_data_validation.py
Schema validation against config/schema.yaml, data drift detection using Evidently AI
Stage 3: Feature Engineering
python -m src.mlpipeline.pipeline.stage_03_feature_engineering
or
$env:PYTHONPATH = "src"; python src/mlpipeline/pipeline/stage_03_feature_engineering.py
Feature creation and transformation, correlation analysis, feature importance calculation
Stage 4: Data Transformation
python -m src.mlpipeline.pipeline.stage_04_data_transformation
or
$env:PYTHONPATH = "src"; python src/mlpipeline/pipeline/stage_04_data_transformation.py
Data preprocessing and scaling, encoding categorical variables, train/test preparation
Stage 5: Model Training
python -m src.mlpipeline.pipeline.stage_05_model_trainer
or
$env:PYTHONPATH = "src"; python src/mlpipeline/pipeline/stage_05_model_trainer.py
XGBoost model training with hyperparameter tuning, MLflow experiment tracking
Stage 6: Model Evaluation
python -m src.mlpipeline.pipeline.stage_06_model_evaluation
or
$env:PYTHONPATH = "src"; python src/mlpipeline/pipeline/stage_06_model_evaluation.py
Performance metrics calculation, model comparison, results logging to MLflow
Run Complete Pipeline Script
python main.py
or
$env:PYTHONPATH = "src"; python main.py
Executes all 6 stages sequentially with comprehensive logging and error handling
Start Flask Application
python app.py
or
$env:PYTHONPATH = "src"; python main.py
Launches web interface at http://localhost:5000 for model predictions and monitoring
# Run complete pipeline
dvc repro
# Run specific stages
dvc repro data_ingestion
dvc repro data_validation
dvc repro feature_engineering
dvc repro data_transformation
dvc repro model_trainer
dvc repro model_evaluation
# View pipeline status
dvc dag
Prerequisites
Note: Apache Airflow requires Linux-based systems for optimal performance. Windows users should use WSL or Linux environment.
1. Copy Project to Linux Environment (Windows Users)
cp -r /mnt/d/MLOps_PipeLine ~/
2. Install Python and Setup Virtual Environment
# Install Python3 if not already installed
sudo apt update
sudo apt install python3 python3-pip python3-venv
# Navigate to project directory
cd ~/MLOps_PipeLine
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
3. Configure Apache Airflow
# Set Airflow home directory
export AIRFLOW_HOME=~/airflow
echo $AIRFLOW_HOME
# Configure Airflow settings
vim ~/airflow/airflow.cfg
-> insert - i
Replace "auth_manager = airflow.api.fastapi.auth.managers.simple.simple_auth_manager.SimpleAuthManager" in airflow.cfg file
with "auth_manager=airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager".
After Updation press- > esc -> :wq!
# Create DAGs directory
mkdir -p ~/airflow/dags
# Copy DAG file to Airflow directory
cp model_dag.py ~/airflow/dags/
# Test DAG configuration
python ~/airflow/dags/model_dag.py
4. Launch Airflow Webserver
# Start Airflow standalone mode
airflow standalone
5. Execute Pipeline
- Open your web browser and navigate to: http://0.0.0.0:8080
- Search for ml_pipeline_dag in the DAGs list
- Click on the DAG and trigger the pipeline execution
- Monitor the workflow progress through the Airflow UI
# Initialize Minikube cluster
minikube start
# Deploy the application using deployment manifest
kubectl apply -f k8s/deployment.yaml
# Check pod status
kubectl get pods
# Apply service configuration
kubectl apply -f k8s/service.yaml
# Verify service deployment
kubectl get svc
# Forward service port to local machine
kubectl port-forward svc/mlapp-service 8000:80
# Access application in browser
# Navigate to: http://localhost:8000
# Edit service configuration
kubectl edit svc mlapp-service
# Change service type from NodePort to LoadBalancer
esc - :wq! - enter
# Save and exit the editor
# Open new terminal and create tunnel
minikube tunnel
# In original terminal, check for external IP
kubectl get svc
# Access application using external IP in browser
# 127.0.0.1
# Deploy the ingress.yaml file
kubectl apply -f k8s/ingress.yaml
# Install the Ingress Controller (nginx)
minikube addons enable ingress
# Check the downloaded Ingress
kubectl get pods -A | grep nginx
# Check Ingress is Deployed
kubectl get ingress # A Address is Being Updated like -> 192.168.49.2
# for setup local system configuraton
sudo vim /etc/hosts
# Add
127.0.0.1 localhost
127.0.1.1 Abis-PC. Abis-PC
192.168.49.2 foo.bar.com
esc - :wq!
# Check Updated or not
ping foo.bar.com
# then go to browser
http://foo.bar.com/demo
http://foo.bar.com/admin
Start Prometheus Monitoring
# Navigate to observability directory
cd observability
# Start Prometheus service
docker-compose up -d prometheus
# Access Prometheus UI
# URL: http://localhost:9090
Prometheus Operations
# Check application health
curl http://localhost:9090/api/v1/query?query=up
# View custom metrics
curl http://localhost:9090/api/v1/query?query=ml_model_accuracy
# Check targets status
curl http://localhost:9090/api/v1/targets
# View alert rules
curl http://localhost:9090/api/v1/rules
Configure ML Pipeline Metrics
# Prometheus collects metrics for:
# - Model accuracy and performance
# - Pipeline execution times
# - Resource utilization
# - API response times
# - Error rates and failures
Start Grafana Dashboard
# Start Grafana service
docker-compose up -d grafana
# Access Grafana UI
# URL: http://localhost:3000
# Default credentials: admin/admin
Import ML Pipeline Dashboards
# Pre-configured dashboards available:
# 1. ML Pipeline Performance Dashboard
# 2. Model Metrics Dashboard
# 3. Infrastructure Monitoring Dashboard
# Import custom dashboard JSON files from:
# observability/grafana/dashboards/
Grafana Operations
# Create data source (Prometheus)
# URL: http://prometheus:9090
# Import dashboard from file
# Upload: observability/grafana/dashboards/ml-pipeline-dashboard.json
# Set up alerting rules for model performance
# Configure notification channels (email, slack, webhook)
Start Elasticsearch & Kibana
# Start Elasticsearch service
docker-compose up -d elasticsearch
# Start Kibana service
docker-compose up -d kibana
# Access Kibana UI
# URL: http://localhost:5601
Elasticsearch Operations
# Check cluster health
curl http://localhost:9200/_cluster/health
# List indices
curl http://localhost:9200/_cat/indices?v
# Search ML pipeline logs
curl -X GET "localhost:9200/logs-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"level": "ERROR"
}
}
}'
# View pipeline execution logs
curl -X GET "localhost:9200/ml-pipeline-*/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"range": {
"@timestamp": {
"gte": "now-1h"
}
}
}
}'
Kibana Dashboard Setup
# Create index patterns for:
# - ml-pipeline-* (Pipeline execution logs)
# - application-* (Application logs)
# - error-* (Error logs)
# - performance-* (Performance metrics)
# Build visualizations for:
# - Pipeline success/failure rates
# - Model training duration trends
# - Error analysis and patterns
# - Resource usage over time
Fluentd Log Collection
# Start Fluentd service
docker-compose up -d fluentd
# Fluentd automatically collects logs from:
# - ML pipeline stages
# - Flask application
# - Kubernetes pods (if deployed)
# - Docker containers
# - System logs
# Configure log parsing in:
# observability/fluentd/fluent.conf
Start Jaeger Tracing
# Start Jaeger service
docker-compose up -d jaeger
# Access Jaeger UI
# URL: http://localhost:16686
Jaeger Operations
# View ML pipeline traces
# Search for service: ml-pipeline
# View operation: model_training
# Analyze request flows:
# - Data ingestion β validation β training β evaluation
# - API request β prediction β response
# - Error traces and bottleneck identification
# Configure trace sampling and retention
# Monitor microservice dependencies
Complete Monitoring Stack
# Start all monitoring services
docker-compose up -d
# Check services status
docker-compose ps
# View service logs
docker-compose logs prometheus
docker-compose logs grafana
docker-compose logs elasticsearch
Alert Configuration
# Prometheus alerts configured for:
# - Model accuracy degradation (< 90%)
# - High error rates (> 5%)
# - Pipeline failures
# - Resource exhaustion
# - API latency issues
# View active alerts
# URL: http://localhost:9090/alerts
# Configure alert notifications in Grafana
# Set up escalation policies and on-call rotations
Monitoring Cleanup
# Stop all services
docker-compose down
# Remove volumes (caution: deletes all data)
docker-compose down -v
# Remove specific service
docker-compose stop grafana
docker-compose rm grafana
Automated MLOps pipeline with GitHub Actions
- Setup: Python environment and dependencies
- Code Quality: Black formatting, isort, flake8 linting
- Data Validation: Schema validation and data quality checks
- Model Performance: Accuracy threshold validation (>90%)
- Security Scan: Secret detection and vulnerability scanning
- Build & Deploy: Docker build and push to registry
- Push to
main
branch - Pull request creation
- Manual workflow dispatch
# Add to GitHub repository secrets:
DOCKER_USERNAME=your_docker_username
DOCKER_PASSWORD=your_docker_password
MLOps_PipeLine/
βββ π src/mlpipeline/ # Core ML pipeline source code
β βββ π components/ # ML pipeline components
β βββ π config/ # Configuration management
β βββ π entity/ # Data entities and schemas
β βββ π pipeline/ # Stage-wise pipeline execution
β βββ π utils/ # Utility functions
βββ π config/ # Configuration files
β βββ config.yaml # Main configuration
β βββ params.yaml # Model parameters
β βββ schema.yaml # Data schema validation
βββ π artifacts/ # Generated artifacts (DVC tracked)
β βββ π data_ingestion/ # Raw and processed data
β βββ π data_validation/ # Validation reports
β βββ π feature_engineering/ # Feature artifacts
β βββ π model_trainer/ # Trained models
β βββ π model_evaluation/ # Performance metrics
βββ π k8s/ # Kubernetes manifests
β βββ deployment.yaml # Application deployment
β βββ service.yaml # Service configuration
β βββ ingress.yaml # Ingress routing
βββ π observability/ # Monitoring stack
β βββ docker-compose.yml # Complete monitoring setup
β βββ π prometheus/ # Metrics collection
β βββ π grafana/ # Visualization dashboards
β βββ π elasticsearch/ # Log storage
β βββ π kibana/ # Log analysis
βββ π .github/workflows/ # CI/CD automation
β βββ ci-cd.yml # Complete MLOps pipeline
βββ dvc.yaml # DVC pipeline definition
βββ Dockerfile # Container definition
βββ model_dag.py # Airflow DAG definition
βββ app.py # Flask application
Current model performance metrics:
- Algorithm: XGBoost Classifier
- Accuracy: 92.15%
- Precision: 91.47%
- Recall: 92.00%
- F1-Score: 91.64%
- AUC: 94.04%
Performance tracking via MLflow with experiment comparison and model registry integration.
- Container Orchestration: Kubernetes with auto-scaling
- Service Mesh: Istio for traffic management (optional)
- Data Storage: DVC for version control, MLflow for experiments
- Monitoring: Full observability stack with metrics, logs, traces
- CI/CD: Automated testing, security scanning, deployment
This MLOps pipeline demonstrates Development and Devployment-ready machine learning operations with:
β
End-to-End Automation: From data ingestion to model deployment
β
Scalable Infrastructure: Kubernetes orchestration with monitoring
β
Quality Assurance: Automated testing, validation, and security scanning
β
Observability: Comprehensive metrics, logging, and tracing
β
Continuous Integration: GitHub Actions for automated workflows
β
Model Governance: Version control, experiment tracking, performance monitoring
The pipeline achieves 92.15% model accuracy while maintaining production standards for reliability, scalability, and maintainability. It serves as a blueprint for implementing MLOps best practices in enterprise environments.
- π Fully automated 6-stage ML pipeline
- π³ Containerized deployment with security best practices
- βΈοΈ Kubernetes orchestration with service mesh capabilities
- π Real-time monitoring and alerting system
- π CI/CD automation with quality gates
- π Model performance tracking and drift detection
β Star this repository if you found it helpful!