- Overview
- Features
- Repository Structure
- Prerequisites
- Infrastructure Setup
- Application Deployment
- Observability Stack
- CI/CD Pipeline
This project demonstrates a production-grade MLOps pipeline that deploys a pre-trained brain tumor detection service on Google Kubernetes Engine (GKE). By leveraging modern cloud-native technologies, it offers a complete solution for automated deployment, comprehensive monitoring, and efficient scaling of machine learning models in production. The pipeline incorporates industry best practices for CI/CD, observability, and infrastructure as code.
The system architecture diagram below illustrates the main components and their interactions:
- ML Service: Production-ready brain tumor detection service powered by pre-trained model
- Cloud Infrastructure: Fully automated GCP infrastructure using Terraform and Ansible
- Kubernetes Orchestration: Scalable deployment on Google Kubernetes Engine (GKE)
- Metrics Monitoring: Real-time performance tracking with Prometheus and Grafana
- Log Management: Centralized logging with Elasticsearch, Logstash, and Kibana (ELK Stack)
- Distributed Tracing: Request tracing and performance analysis with Jaeger
- CI/CD Pipeline: Automated testing and deployment using Jenkins
- Infrastructure as Code: Version-controlled infrastructure with Terraform
- Configuration Management: Automated provisioning with Ansible
.
├── api/ # Brain tuomor detection API service
├── charts/ # Helm charts for deployment
│ ├── brain-tumor-detection/ # Application chart
│ └── nginx-ingress/ # Ingress controller
├── custom_images/ # Custom container images
│ └── jenkins/ # Jenkins configuration
├── infrastructure/ # Infrastructure as Code
│ ├── ansible/ # Ansible playbooks
│ ├── credentials/ # GCP credentials
│ ├── ssh_keys/ # SSH keys for instances
│ └── terraform/ # Terraform configurations
├── models/ # ML model files
├── monitoring/ # Observability components
│ ├── K8s/ # Kubernetes monitoring
│ │ ├── elk-filebeat/ # ELK Stack configuration
│ │ ├── helmfile.yaml # Helm releases
│ │ ├── jaeger/ # Distributed tracing
│ │ └── kube-prometheus-stack/ # Prometheus & Grafana
│ └── Local/ # Local monitoring setup
├── notebooks/ # Training notebooks
└── scripts/ # Utility scripts
- Google Cloud Platform account with billing enabled
- Sufficient permissions to create GKE clusters and service accounts
Tool | Minimum Version | Purpose |
---|---|---|
Google Cloud SDK | ≥ 440.0.0 | GCP resource management |
Terraform | ≥ 1.5.0 | Infrastructure provisioning |
kubectl | ≥ 1.26.0 | Kubernetes cluster management |
Helm | ≥ 3.12.0 | Package management |
Helmfile | ≥ 0.151.0 | Helm chart orchestration |
Docker | ≥ 24.0.0 | Container management |
- kubens - Kubernetes namespace switching utility
- kubectx - Kubernetes context switching utility
- Install and configure Google Cloud SDK:
# Follow installation guide at: cloud.google.com/sdk/docs/install
gcloud init
gcloud auth application-default login
- Create service account:
- Configure editor role
- Store credentials in
infrastructure/credentials/
- Update configuration in
terraform/terraform.tfvars
Before creating a GKE cluster, you need to enable the Kubernetes Engine API for your Google Cloud Project:
-
Navigate to Google Cloud Console Marketplace:
https://console.cloud.google.com/marketplace/product/google/container.googleapis.com
-
Ensure you have selected the correct project in the Google Cloud Console header
-
On the Kubernetes Engine API page, click the "Enable" button
-
Once enabled, you'll see a "Manage" button and status indicating the API is active
Note: Enabling this API is a prerequisite for all GKE operations and only needs to be done once per project. If you encounter any "API not enabled" errors during cluster creation, ensure this step has been completed successfully.
cd infrastructure
make generate-key # Generate SSH keys
make init
make plan
make apply # Takes approximately 10-15 minutes
gcloud container clusters get-credentials [CLUSTER_NAME] --region [REGION]
kubectx [CLUSTER_NAME]
Deploy Jenkins on Google Compute Engine using Ansible:
cd infrastructure
make deploy
- Create required namespaces:
kubectl create namespace model-serving
kubectl create namespace nginx-ingress
- Deploy components:
# Install Nginx ingress controller
helm upgrade --install nginx-ingress charts/nginx-ingress --namespace nginx-ingress
# Deploy brain tumor detection service
helm upgrade --install brain-tumor-detection charts/brain-tumor-detection --namespace model-serving
- Setup with Helmfile:
# Using Helmfile (recommended)
cd monitoring/K8s
helmfile sync
- Setup with Docker:
# Alternative: Using Docker Compose
cd monitoring/Local
docker compose up -d
cd elk && docker compose up -d
- Grafana (
http://[NODE_IP]:30000
):
# Retrieve admin password
kubectl get secret kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode
- Kibana (
http://[NODE_IP]:5601
):
# Retrieve elastic user password
kubectl get secret elasticsearch-master-credentials -o jsonpath="{.data.password}" | base64 --decode
- Jaeger UI is accessible at
http://[NODE_IP]:16686
- Connect to Google Compute Engine:
cd infrastructure
ssh -i ssh_keys/jenkins_key [USERNAME]@[GCE_EXTERNAL_IP]
- Access Jenkins UI:
- Navigate to
http://[GCE_EXTERNAL_IP]:8081
- Retrieve initial admin password:
sudo docker exec jenkins cat /var/jenkins_home/secrets/initialAdminPassword
- Install required plugins:
- Kubernetes
- Docker and Docker Pipeline
- Google Cloud SDK
Note: If Jenkins fails to restart after plugin installation, SSH into the GCE instance and restart the container:
sudo docker start jenkins
- Configure credentials:
- GitHub authentication
- DockerHub access token
- GKE service account
# Create service account
kubectl create serviceaccount model-serving-sa -n model-serving
# Get token (default expiration: 1 hour)
kubectl create token model-serving-sa -n model-serving
Note:
- Default token has 1 hour expiration time
- To create token with longer duration, use the
--duration
flag
Example:
# Create token valid for 1 year
kubectl create token model-serving-sa -n model-serving --duration=8760h
- Set up GKE permissions:
# Create admin binding for model-serving-sa service account
kubectl create clusterrolebinding model-serving-admin-binding \
--clusterrole=cluster-admin \
--serviceaccount=model-serving:model-serving-sa
# Create admin binding for default service account
kubectl create clusterrolebinding cluster-admin-default-binding \
--clusterrole=cluster-admin \
--user=system:serviceaccount:model-serving:default
- Configure pipeline:
- Create pipeline job
- Link Git repository
- Set up Jenkinsfile
The CI/CD pipeline includes the following stages:
- Code validation and linting
- Automated testing
- Docker image building
- Container registry push
- GKE deployment
- Verify service status:
kubectl get services -n model-serving
- Test endpoints:
# Health check endpoint
curl http://[SERVICE_IP]:8000/health
# Brain tuomor detection endpoint
curl -X POST http://[SERVICE_IP]:8000/detect/brain-tumor/image \
-F "image=@/path/to/image.jpg"