Open
Description
System Info
- CPU architecture :× 86 64 , aarch64)
- GPU properties - GPU name NVIDIA A800
- GPU memory size (80G)f known)
- Libraries 一 TensorRT-LLM tag : v0.10.0 , tensorrtllm-backend tag: v0.10.0
- Container used: nvidia/cuda:12.5.0-devel-ubuntu22.04 to generate engine, success, can run in local. nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 or nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 to launch triton, failed.
- OS (Ubuntu 22 .04
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Model name: Qwen1.5-14b-Chat
- generate engine follwing steps in readme of TensorRT-LLM . successed.
- launch service using triton. failed.
Expected behavior
launch the service successfully.