-
Couldn't load subscription status.
- Fork 519
Description
Checklist
- I've prepended issue tag with type of change: [bug]
- (If applicable) I've attached the script to reproduce the bug
- (If applicable) I've documented below the DLC image/dockerfile this relates to
- (If applicable) I've documented below the tests I've run on the DLC image
- I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description:
Wrong CUDA driver version on sagemaker triton image version 25.04. Problem occurs when deploying on sagemaker (multimodel with GPU)
Probably CUDA Forward Compatibility mode is disabled
From logs:
This container was built for NVIDIA Driver Release 575.51 or later, but version 470.256.02 was detected and compatibility mode is UNAVAILABLE
Driver requirements from NVIDIA:
Release 25.04 is based on CUDA 12.9.0 which requires NVIDIA Driver release 575 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 470.57 (or later R470), 525.85 (or later R525), 535.86 (or later R535), or 545.23 (or later R545).
DLC image/dockerfile:
[account].dkr.ecr.[region].amazonaws.com/sagemaker-tritonserver:25.04-py3
Current behavior:
Triton server won't start correctly and GPU inference is not available
Expected behavior:
Triton server starts and GPU is available for inference
expected logs:
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.4 driver version 550.54.14 with kernel driver version 470.256.02.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
Additional context:
Logs:
2025-07-02T11:56:45.343+03:00
=============================
2025-07-02T11:56:45.343+03:00
== Triton Inference Server ==
2025-07-02T11:56:45.343+03:00
=============================
2025-07-02T11:56:45.343+03:00
NVIDIA Release 25.04 (build <unknown>)
2025-07-02T11:56:45.343+03:00
Triton Server Version 2.57.0
2025-07-02T11:56:45.343+03:00
Copyright (c) 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2025-07-02T11:56:45.343+03:00
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2025-07-02T11:56:45.343+03:00
GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
2025-07-02T11:56:45.343+03:00
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
2025-07-02T11:56:45.343+03:00
and the Product-Specific Terms for NVIDIA AI Products
2025-07-02T11:56:45.343+03:00
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
2025-07-02T11:56:47.239+03:00
ERROR: This container was built for NVIDIA Driver Release 575.51 or later, but version 470.256.02 was detected and compatibility mode is UNAVAILABLE. [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
2025-07-02T11:56:47.239+03:00
Triton is running in SageMaker MME mode. Using Triton ping mode: "live"
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.173997 122 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
2025-07-02T11:56:47.239+03:00
W0702 08:56:47.175318 122 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
W0702 08:56:47.175318 122 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.175355 122 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197207 122 server.cc:604]
2025-07-02T11:56:47.239+03:00
+------------------+------+
2025-07-02T11:56:47.239+03:00
| Repository Agent | Path |
2025-07-02T11:56:47.239+03:00
+------------------+------+
2025-07-02T11:56:47.239+03:00
+------------------+------+
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197237 122 server.cc:631]
2025-07-02T11:56:47.239+03:00
+---------+------+--------+
2025-07-02T11:56:47.239+03:00
| Backend | Path | Config |
2025-07-02T11:56:47.239+03:00
+---------+------+--------+
2025-07-02T11:56:47.239+03:00
+---------+------+--------+
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197249 122 server.cc:674]
2025-07-02T11:56:47.239+03:00
+-------+---------+--------+
2025-07-02T11:56:47.239+03:00
| Model | Version | Status |
2025-07-02T11:56:47.239+03:00
+-------+---------+--------+
2025-07-02T11:56:47.239+03:00
+-------+---------+--------+
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197307 122 tritonserver.cc:2598]
2025-07-02T11:56:47.239+03:00
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2025-07-02T11:56:47.239+03:00
| Option | Value |
2025-07-02T11:56:47.239+03:00
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2025-07-02T11:56:47.239+03:00
| server_id | triton |
2025-07-02T11:56:47.239+03:00
| server_version | 2.57.0 |
2025-07-02T11:56:47.239+03:00
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
2025-07-02T11:56:47.239+03:00
| model_repository_path[0] | /tmp/sagemaker |
2025-07-02T11:56:47.239+03:00
| model_control_mode | MODE_EXPLICIT |
2025-07-02T11:56:47.239+03:00
| strict_model_config | 0 |
2025-07-02T11:56:47.239+03:00
| model_config_name | |
2025-07-02T11:56:47.239+03:00
| rate_limit | OFF |
2025-07-02T11:56:47.239+03:00
| pinned_memory_pool_byte_size | 268435456 |
2025-07-02T11:56:47.239+03:00
| min_supported_compute_capability | 6.0 |
2025-07-02T11:56:47.239+03:00
| strict_readiness | 1 |
2025-07-02T11:56:47.239+03:00
| exit_timeout | 30 |
2025-07-02T11:56:47.239+03:00
| cache_enabled | 0 |
2025-07-02T11:56:47.239+03:00
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2025-07-02T11:56:50.634+03:00
I0702 08:56:47.197848 122 sagemaker_server.cc:311] "Started Sagemaker HTTPService at 0.0.0.0:8080"
2025-07-02T11:56:50.634+03:00
I0702 08:56:50.295119 122 sagemaker_server.cc:190] "SageMaker request: 0 /ping"
2025-07-02T11:56:50.634+03:00
I0702 08:56:50.345408 122 sagemaker_server.cc:190] "SageMaker request: 0 /models"
2025-07-02T11:56:55.267+03:00
I0702 08:56:50.345450 122 sagemaker_server.cc:228] "SageMaker request: LIST ALL MODELS"
2025-07-02T11:56:59.343+03:00
I0702 08:56:55.251236 122 sagemaker_server.cc:190] "SageMaker request: 0 /ping"
2025-07-02T11:57:04.389+03:00
I0702 08:57:00.251085 122 sagemaker_server.cc:190] "SageMaker request: 0 /ping"