How is the performance of the model with pytorch as the backend #4745
Labels
Investigating
Performance
TRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.
triaged
Issue has been triaged by maintainers
Which one has better performance, using pythorch as the backend or tensorrt-llm as the backend? During the actual test with qwen3, I found that the performance of using pythorch as the backend was not good, and the performance of a single gpu and multiple gpus was the same. Is this normal? Did I miss any details of the inference configuration?
The text was updated successfully, but these errors were encountered: