English | 简体中文
Slurm Docker Cluster is a multi-container Slurm cluster designed for rapid deployment using Docker Compose. This repository simplifies the process of setting up a robust Slurm environment for development, testing, or lightweight usage.
To get up and running with Slurm in Docker, make sure you have the following tools installed:
Clone the repository:
git clone https://github.com/giovtorres/slurm-docker-cluster.git
cd slurm-docker-clusterThis setup consists of the following containers:
- mysql: Stores job and cluster data.
- slurmdbd: Manages the Slurm database.
- slurmctld: The Slurm controller responsible for job and resource management.
- c1, c2: Compute nodes (running
slurmd).
etc_munge: Mounted to/etc/mungeetc_slurm: Mounted to/etc/slurmslurm_jobdir: Mounted to/datavar_lib_mysql: Mounted to/var/lib/mysqlvar_log_slurm: Mounted to/var/log/slurm
The version of the Slurm project and the Docker build process can be simplified
by using a .env file, which will be automatically picked up by Docker Compose.
Update the SLURM_TAG and IMAGE_TAG found in the .env file and build
the image:
docker compose buildAlternatively, you can build the Slurm Docker image locally by specifying the SLURM_TAG as a build argument and tagging the container with a version (IMAGE_TAG):
docker build --build-arg SLURM_TAG="slurm-21-08-6-1" -t slurm-docker-cluster:21.08.6 .Once the image is built, deploy the cluster with the default version of slurm using Docker Compose:
docker compose up -dTo specify a specific version and override what is configured in .env, specify
the IMAGE_TAG:
IMAGE_TAG=21.08.6 docker compose up -dThis will start up all containers in detached mode. You can monitor their status using:
docker compose psAfter the containers are up and running, register the cluster with SlurmDBD:
./register_cluster.shTip: Wait a few seconds for the daemons to initialize before running the registration script to avoid connection errors like:
sacctmgr: error: Problem talking to the database: Connection refused.
For real-time cluster logs, use:
docker compose logs -fTo interact with the Slurm controller, open a shell inside the slurmctld container:
docker exec -it slurmctld bashNow you can run any Slurm command from inside the container:
[root@slurmctld /]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 2 idle c[1-2]The cluster mounts the slurm_jobdir volume across all nodes, making job files accessible from the /data directory. To submit a job:
[root@slurmctld /]# cd /data/
[root@slurmctld data]# sbatch --wrap="hostname"
Submitted batch job 2Check the output of the job:
[root@slurmctld data]# cat slurm-2.out
c1Stop the cluster without removing the containers:
docker compose stopRestart it later:
docker compose startTo completely remove the containers and associated volumes:
docker compose down -vYou can modify Slurm configurations (slurm.conf, slurmdbd.conf) on the fly without rebuilding the containers. Just run:
./update_slurmfiles.sh slurm.conf slurmdbd.conf
docker compose restartThis makes it easy to add/remove nodes or test new configuration settings dynamically.
Contributions are welcomed from the community! If you want to add features, fix bugs, or improve documentation:
- Fork this repo.
- Create a new branch:
git checkout -b feature/your-feature. - Submit a pull request.
This project is licensed under the MIT License.