Open
Description
System Info
env: NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 V100 16G*8
docker images: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
command: mpirun -n 2 --allow-run-as-root tritonserver --model-control-mode=explicit --modelepository=/data/multi_model_repo/ --load-model=Qwen1.5-7B-Chat --load-model=Llama3-8B-Chinese-Chat
Expected behavior
actual behavior
Two models can be loaded successfully, but when I call the Qwen1.5-7B Chat model using the openai interface, an error occurs
additional notes
no