How is the performance of the model with pytorch as the backend #4745

oppolll · 2025-05-29T01:32:28Z

Which one has better performance, using pythorch as the backend or tensorrt-llm as the backend? During the actual test with qwen3, I found that the performance of using pythorch as the backend was not good, and the performance of a single gpu and multiple gpus was the same. Is this normal? Did I miss any details of the inference configuration?

QiJune · 2025-05-29T07:11:36Z

@oppolll Could you please share your scripts? cc @byshiue

oppolll · 2025-05-29T08:33:37Z

@oppolll Could you please share your scripts? cc @byshiue

Pytorch as backend reasoning code reference: https://github.com/NVIDIA/TensorRT-LLM/blob/v0.20.0rc3/examples/pytorch/quickstart.py
Lora reasoning code reference: https://github.com/NVIDIA/TensorRT-LLM/blob/v0.20.0rc3/tests/unittest/llmapi/test_llm_pytorch.py

With an input length of 8192 and an accuracy of bf16, I conducted the following several groups of experiments on the H20:

Using the qwen2.5-32b model and tensorrt-llm and pytorch as the backends, under the same experimental conditions, the inference performance with tensorrt-llm as the backend is superior.
Using the qwen3-32b model and pytorch as the backend, the performance is not much different from that of vllm inference. Usually, the inference performance of tensorrt-llm is superior to that of vllm.
Using the qwen3-32b model and pytorch as the backend inference, the performance was surprisingly the same on one gpu and two gpus.
Use qwen3-fp8 model on this link: https://huggingface.co/Qwen/Qwen3-32B-FP8, on a gpu, like bf16 reasoning performance.
Using the qwen3-32B model and pytorch as the backend to load lora inference did not work. Is it because qwen3 does not support lora inference?

Are the above results normal? Could it be that I overlooked some parameter configurations? May I ask if there are two types of model performance data for back-end inference to refer to?

QiJune assigned byshiue May 29, 2025

hchings added the Performance TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts. label May 30, 2025

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels May 30, 2025

github-actions bot assigned hypdeb May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How is the performance of the model with pytorch as the backend #4745

How is the performance of the model with pytorch as the backend #4745

oppolll commented May 29, 2025

QiJune commented May 29, 2025

Uh oh!

oppolll commented May 29, 2025

Uh oh!

How is the performance of the model with pytorch as the backend #4745

How is the performance of the model with pytorch as the backend #4745

Comments

oppolll commented May 29, 2025

QiJune commented May 29, 2025

Uh oh!

oppolll commented May 29, 2025

Uh oh!