llm-d is a well-lit path for serving large language models at scale with the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d provides modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.
- 📖 Documentation: llm-d.ai
- 🏗️ Architecture: llm-d architecture docs
- 📖 Project Details: PROJECT.md
- 📦 Releases: GitHub Releases
- 💬 Slack: Join our development discussions at llm-d.slack.com
- 📧 Google Group: Subscribe to llm-d-contributors for architecture docs and meeting invites
- 🗓️ Weekly Standup: Wednesdays at 1230 ET - Public Calendar
- Read Guidelines: Review our Code of Conduct and contribution process
- Sign Commits: All commits require DCO sign-off (
git commit -s
)
- 🐛 Bug fixes and small features - Submit PRs directly to component repos
- 🚀 New features with APIs - Require project proposals
- 📚 Documentation - Help improve guides and examples
- 🧪 Testing & Benchmarking - Contribute to our test coverage
- 💡 Experimental features - Start in llm-d-incubation org
License: Apache 2.0