Skip to content

[bug] CUDA driver version is insufficient for CUDA runtime version (Sagemaker Triton) #4990

@valibojici

Description

@valibojici

Checklist

Concise Description:
Wrong CUDA driver version on sagemaker triton image version 25.04. Problem occurs when deploying on sagemaker (multimodel with GPU)

Probably CUDA Forward Compatibility mode is disabled

From logs:

This container was built for NVIDIA Driver Release 575.51 or later, but version 470.256.02 was detected and compatibility mode is UNAVAILABLE

Driver requirements from NVIDIA:

Release 25.04 is based on CUDA 12.9.0 which requires NVIDIA Driver release 575 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 470.57 (or later R470), 525.85 (or later R525), 535.86 (or later R535), or 545.23 (or later R545).

DLC image/dockerfile:
[account].dkr.ecr.[region].amazonaws.com/sagemaker-tritonserver:25.04-py3

Current behavior:
Triton server won't start correctly and GPU inference is not available

Expected behavior:
Triton server starts and GPU is available for inference

expected logs:

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.4 driver version 550.54.14 with kernel driver version 470.256.02.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Additional context:

Logs:

2025-07-02T11:56:45.343+03:00
=============================
2025-07-02T11:56:45.343+03:00
== Triton Inference Server ==
2025-07-02T11:56:45.343+03:00
=============================
2025-07-02T11:56:45.343+03:00
NVIDIA Release 25.04 (build <unknown>)
2025-07-02T11:56:45.343+03:00
Triton Server Version 2.57.0
2025-07-02T11:56:45.343+03:00
Copyright (c) 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2025-07-02T11:56:45.343+03:00
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2025-07-02T11:56:45.343+03:00
GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
2025-07-02T11:56:45.343+03:00
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
2025-07-02T11:56:45.343+03:00
and the Product-Specific Terms for NVIDIA AI Products
2025-07-02T11:56:45.343+03:00
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
2025-07-02T11:56:47.239+03:00
ERROR: This container was built for NVIDIA Driver Release 575.51 or later, but version 470.256.02 was detected and compatibility mode is UNAVAILABLE. [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
2025-07-02T11:56:47.239+03:00
Triton is running in SageMaker MME mode. Using Triton ping mode: "live"
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.173997 122 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
2025-07-02T11:56:47.239+03:00
W0702 08:56:47.175318 122 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"

W0702 08:56:47.175318 122 pinned_memory_manager.cc:273] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.175355 122 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197207 122 server.cc:604]
2025-07-02T11:56:47.239+03:00
+------------------+------+
2025-07-02T11:56:47.239+03:00
| Repository Agent | Path |
2025-07-02T11:56:47.239+03:00
+------------------+------+
2025-07-02T11:56:47.239+03:00
+------------------+------+
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197237 122 server.cc:631]
2025-07-02T11:56:47.239+03:00
+---------+------+--------+
2025-07-02T11:56:47.239+03:00
| Backend | Path | Config |
2025-07-02T11:56:47.239+03:00
+---------+------+--------+
2025-07-02T11:56:47.239+03:00
+---------+------+--------+
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197249 122 server.cc:674]
2025-07-02T11:56:47.239+03:00
+-------+---------+--------+
2025-07-02T11:56:47.239+03:00
| Model | Version | Status |
2025-07-02T11:56:47.239+03:00
+-------+---------+--------+
2025-07-02T11:56:47.239+03:00
+-------+---------+--------+
2025-07-02T11:56:47.239+03:00
I0702 08:56:47.197307 122 tritonserver.cc:2598]
2025-07-02T11:56:47.239+03:00
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2025-07-02T11:56:47.239+03:00
| Option | Value |
2025-07-02T11:56:47.239+03:00
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2025-07-02T11:56:47.239+03:00
| server_id | triton |
2025-07-02T11:56:47.239+03:00
| server_version | 2.57.0 |
2025-07-02T11:56:47.239+03:00
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
2025-07-02T11:56:47.239+03:00
| model_repository_path[0] | /tmp/sagemaker |
2025-07-02T11:56:47.239+03:00
| model_control_mode | MODE_EXPLICIT |
2025-07-02T11:56:47.239+03:00
| strict_model_config | 0 |
2025-07-02T11:56:47.239+03:00
| model_config_name | |
2025-07-02T11:56:47.239+03:00
| rate_limit | OFF |
2025-07-02T11:56:47.239+03:00
| pinned_memory_pool_byte_size | 268435456 |
2025-07-02T11:56:47.239+03:00
| min_supported_compute_capability | 6.0 |
2025-07-02T11:56:47.239+03:00
| strict_readiness | 1 |
2025-07-02T11:56:47.239+03:00
| exit_timeout | 30 |
2025-07-02T11:56:47.239+03:00
| cache_enabled | 0 |
2025-07-02T11:56:47.239+03:00
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2025-07-02T11:56:50.634+03:00
I0702 08:56:47.197848 122 sagemaker_server.cc:311] "Started Sagemaker HTTPService at 0.0.0.0:8080"
2025-07-02T11:56:50.634+03:00
I0702 08:56:50.295119 122 sagemaker_server.cc:190] "SageMaker request: 0 /ping"
2025-07-02T11:56:50.634+03:00
I0702 08:56:50.345408 122 sagemaker_server.cc:190] "SageMaker request: 0 /models"
2025-07-02T11:56:55.267+03:00
I0702 08:56:50.345450 122 sagemaker_server.cc:228] "SageMaker request: LIST ALL MODELS"
2025-07-02T11:56:59.343+03:00
I0702 08:56:55.251236 122 sagemaker_server.cc:190] "SageMaker request: 0 /ping"
2025-07-02T11:57:04.389+03:00
I0702 08:57:00.251085 122 sagemaker_server.cc:190] "SageMaker request: 0 /ping"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions