Model Runner

The backend library for the Docker Model Runner.

Overview

Note

This package is still under rapid development and its APIs should not be considered stable.

This package supports the Docker Model Runner in Docker Desktop (in conjunction with Model Distribution and the Model CLI). It includes a main.go that mimics its integration with Docker Desktop and allows the package to be run in a standalone mode.

Contributing - Building from Source

This guide is for external contributors who want to build and test the complete Docker Model Runner ecosystem from source.

Architecture Overview

The Docker Model Runner ecosystem consists of three main components:

model-runner (this repository) - The backend daemon/server that manages and runs AI models
model-cli - The CLI client that communicates with model-runner
model-spec - The specification for packaging models as OCI artifacts

Prerequisites

Before building from source, ensure you have the following installed:

Go 1.24+ - Required for building both model-runner and model-cli
Git - For cloning repositories
Make - For using the provided Makefiles
Docker (optional) - For building and running containerized versions
CGO dependencies - Required for model-runner's GPU support:
- On macOS: Xcode Command Line Tools (xcode-select --install)
- On Linux: gcc/g++ and development headers
- On Windows: MinGW-w64 or Visual Studio Build Tools

Building the Complete Stack

Step 1: Clone and Build model-runner (Server/Daemon)

# Clone the model-runner repository
git clone https://github.com/docker/model-runner.git
cd model-runner

# Build the model-runner binary
make build

# Or build with specific backend arguments
make run LLAMA_ARGS="--verbose --jinja -ngl 999 --ctx-size 2048"

# Run tests to verify the build
make test

The model-runner binary will be created in the current directory. This is the backend server that manages models.

Step 2: Clone and Build model-cli (Client)

# In a new terminal/directory
git clone https://github.com/docker/model-cli.git
cd model-cli

# Build the CLI binary
make build

# The binary will be named 'model-cli'
# Optionally, install it as a Docker CLI plugin
make install  # This will link it to ~/.docker/cli-plugins/docker-model

Testing the Complete Stack End-to-End

Note: We use port 13434 in these examples to avoid conflicts with Docker Desktop's built-in Model Runner, which typically runs on port 12434.

Option 1: Local Development (Recommended for Contributors)

Start model-runner in one terminal:

cd model-runner
MODEL_RUNNER_PORT=13434 ./model-runner
# The server will start on port 13434

Use model-cli in another terminal:

cd model-cli
# List available models (connecting to port 13434)
MODEL_RUNNER_PORT=13434 ./model-cli list

# Pull and run a model
MODEL_RUNNER_PORT=13434 ./model-cli run ai/smollm2 "Hello, how are you?"

Option 2: Using Docker

Build and run model-runner in Docker:

cd model-runner
make docker-build
make docker-run PORT=13434 MODELS_PATH=/path/to/models

Connect with model-cli:

cd model-cli
MODEL_RUNNER_PORT=13434 ./model-cli list

Development Workflow

When making changes to either component:

For model-runner changes:
- Edit code in the model-runner repository
- Run make build to rebuild
- Run make test to verify changes
- Restart the model-runner process
For model-cli changes:
- Edit code in the model-cli repository
- Run make build to rebuild
- Test against your running model-runner instance

Additional Resources

Using the Makefile

This project includes a Makefile to simplify common development tasks. It requires Docker Desktop >= 4.41.0 The Makefile provides the following targets:

build - Build the Go application
run - Run the application locally
clean - Clean build artifacts
test - Run tests
docker-build - Build the Docker image
docker-run - Run the application in a Docker container with TCP port access and mounted model storage
help - Show available targets

Running in Docker

The application can be run in Docker with the following features enabled by default:

TCP port access (default port 8080)
Persistent model storage in a local models directory

# Run with default settings
make docker-run

# Customize port and model storage location
make docker-run PORT=3000 MODELS_PATH=/path/to/your/models

This will:

Create a models directory in your current working directory (or use the specified path)
Mount this directory into the container
Start the service on port 8080 (or the specified port)
All models downloaded will be stored in the host's models directory and will persist between container runs

llama.cpp integration

The Docker image includes the llama.cpp server binary from the docker/docker-model-backend-llamacpp image. You can specify the version of the image to use by setting the LLAMA_SERVER_VERSION variable. Additionally, you can configure the target OS, architecture, and acceleration type:

# Build with a specific llama.cpp server version
make docker-build LLAMA_SERVER_VERSION=v0.0.4

# Specify all parameters
make docker-build LLAMA_SERVER_VERSION=v0.0.4 LLAMA_SERVER_VARIANT=cpu

Default values:

LLAMA_SERVER_VERSION: latest
LLAMA_SERVER_VARIANT: cpu

The binary path in the image follows this pattern: /com.docker.llama-server.native.linux.${LLAMA_SERVER_VARIANT}.${TARGETARCH}

API Examples

The Model Runner exposes a REST API that can be accessed via TCP port. You can interact with it using curl commands.

Using the API

When running with docker-run, you can use regular HTTP requests:

# List all available models
curl http://localhost:8080/models

# Create a new model
curl http://localhost:8080/models/create -X POST -d '{"from": "ai/smollm2"}'

# Get information about a specific model
curl http://localhost:8080/models/ai/smollm2

# Chat with a model
curl http://localhost:8080/engines/llama.cpp/v1/chat/completions -X POST -d '{
  "model": "ai/smollm2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
  ]
}'

# Delete a model
curl http://localhost:8080/models/ai/smollm2 -X DELETE

# Get metrics
curl http://localhost:8080/metrics

The response will contain the model's reply:

{
  "id": "chat-12345",
  "object": "chat.completion",
  "created": 1682456789,
  "model": "ai/smollm2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm doing well, thank you for asking! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 16,
    "total_tokens": 40
  }
}

Metrics

The Model Runner exposes the metrics endpoint of llama.cpp server at the /metrics endpoint. This allows you to monitor model performance, request statistics, and resource usage.

Accessing Metrics

# Get metrics in Prometheus format
curl http://localhost:8080/metrics

Configuration

Enable metrics (default): Metrics are enabled by default
Disable metrics: Set DISABLE_METRICS=1 environment variable
Monitoring integration: Add the endpoint to your Prometheus configuration

Check METRICS.md for more details.

Kubernetes

Experimental support for running in Kubernetes is available in the form of a Helm chart and static YAML.

If you are interested in a specific Kubernetes use-case, please start a discussion on the issue tracker.

Community

For general questions and discussion, please use Docker Model Runner's Slack channel.

For discussions around issues/bugs and features, you can use GitHub Issues and Pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 457 Commits
.github/workflows		.github/workflows
assets		assets
charts/docker-model-runner		charts/docker-model-runner
cmd/mdltool		cmd/mdltool
pkg		pkg
scripts		scripts
tools/benchmarks/parallelget		tools/benchmarks/parallelget
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
METRICS.md		METRICS.md
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model Runner

Overview

Contributing - Building from Source

Architecture Overview

Prerequisites

Building the Complete Stack

Step 1: Clone and Build model-runner (Server/Daemon)

Step 2: Clone and Build model-cli (Client)

Testing the Complete Stack End-to-End

Option 1: Local Development (Recommended for Contributors)

Option 2: Using Docker

Development Workflow

Additional Resources

Using the Makefile

Running in Docker

llama.cpp integration

API Examples

Using the API

Metrics

Accessing Metrics

Configuration

Kubernetes

Community

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 13

Languages

License

docker/model-runner

Folders and files

Latest commit

History

Repository files navigation

Model Runner

Overview

Contributing - Building from Source

Architecture Overview

Prerequisites

Building the Complete Stack

Step 1: Clone and Build model-runner (Server/Daemon)

Step 2: Clone and Build model-cli (Client)

Testing the Complete Stack End-to-End

Option 1: Local Development (Recommended for Contributors)

Option 2: Using Docker

Development Workflow

Additional Resources

Using the Makefile

Running in Docker

llama.cpp integration

API Examples

Using the API

Metrics

Accessing Metrics

Configuration

Kubernetes

Community

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 13

Languages

Packages