Replies: 1 comment
-
Finally find out why such huge GPU memory usage difference lol The counterpart of Once I changed to |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am using vllm for all my projects but I had been thinking maybe I should give sglang a try. So I did a performance test against them. Before the test I had no idea what result I would get as I had no bias at all. So I was very surprised about the result!
I use one A10 GPU to test Qwen 2.5-7B, as I have a specific, focused goal: to evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU like A10.
I find that SGLang only uses 7G GPU memory compared with 21G memory (A10 has 24 G memory in total) and delivers a much better result, especially the consistent response times.
But why is such big difference ? Can someone help to explain it ? This is my project, https://github.com/qiulang/vllm-sglang-perf
Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions