I just published a performance test result of vllm vs sglang but can someone help me explain it? #5757

qiulang · 2025-04-26T06:59:29Z

qiulang
Apr 26, 2025

Hi, I am using vllm for all my projects but I had been thinking maybe I should give sglang a try. So I did a performance test against them. Before the test I had no idea what result I would get as I had no bias at all. So I was very surprised about the result!

I use one A10 GPU to test Qwen 2.5-7B, as I have a specific, focused goal: to evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU like A10.

I find that SGLang only uses 7G GPU memory compared with 21G memory (A10 has 24 G memory in total) and delivers a much better result, especially the consistent response times.

But why is such big difference ? Can someone help to explain it ? This is my project, https://github.com/qiulang/vllm-sglang-perf

Thanks a lot.

qiulang · 2025-04-27T07:31:17Z

qiulang
Apr 27, 2025
Author

Finally find out why such huge GPU memory usage difference lol

The counterpart of --max-model-len in sglang is --context-length NOT --max-total-tokens

Once I changed to --context-length, the GPU memory usage is basically the same

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I just published a performance test result of vllm vs sglang but can someone help me explain it? #5757

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I just published a performance test result of vllm vs sglang but can someone help me explain it? #5757

Uh oh!

qiulang Apr 26, 2025

Replies: 1 comment

Uh oh!

qiulang Apr 27, 2025 Author

qiulang
Apr 26, 2025

qiulang
Apr 27, 2025
Author