vLLM x LMCache x RTX 50~

Usage

# pull the image
docker pull docker.io/fuis/tritonserver:25.05-vllm-python-py3-lmcache

# start LMCache
uv sync; task lmcache

# start vLLM
task vllm

# test if it works
task test

Build LMCache

git clone https://github.com/LMCache/LMCache.git
cd LMCache
git checkout -b v0.3.0 v0.3.0

# Edit two files
# requirements/build.txt
-torch==2.7.0
+torch

# requirements/common.txt
-torch==2.7.0
+torch

docker run -it --rm --gpus all --ipc=host --net=host \
--ulimit memlock=-1 --ulimit stack=67108864 \
-e CUDA_VISIBLE_DEVICES=0 \
-w $(pwd) \
-v $(pwd):$(pwd) \
nvcr.io/nvidia/tritonserver:25.05-vllm-python-py3 \
bash -l

Then build:

export TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;9.0;10.0;12.0+PTX"
python3 setup.py bdist_wheel

Build docker image

task docker

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config/lmcache		config/lmcache
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
Taskfile.yml		Taskfile.yml
lmcache-0.2.1-cp312-cp312-linux_x86_64.whl		lmcache-0.2.1-cp312-cp312-linux_x86_64.whl
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vLLM x LMCache x RTX 50~

Usage

Build LMCache

Build docker image

About

Uh oh!

Languages

SlimRAG/vLLM

Folders and files

Latest commit

History

Repository files navigation

vLLM x LMCache x RTX 50~

Usage

Build LMCache

Build docker image

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages