Skip to content

Issues: NVIDIA/k8s-dra-driver-gpu

Beta
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Issues list

ComputeDomain: Daemonset pods are stuck in terminating forever bug Issue/PR to expose/discuss/fix a bug
#380 opened May 30, 2025 by klueska
Build comprehensive CI pipeline to yield confidence for releasing ci/testing issue/PR related to CI and/or testing
#369 opened May 14, 2025 by jgehrcke
Ensure compatibility with OpenShift Kubernetes Engine ci/testing issue/PR related to CI and/or testing
#367 opened May 14, 2025 by jgehrcke
kubelet plugin: move MASK_NVIDIA_DRIVER_PARAMS into code maintenance issue/pr for maintenance, cleanup, refactor etc
#366 opened May 14, 2025 by jgehrcke
GPUs: remove state associated with deleted ResourceClaims robustness issue/pr: edge cases & fault tolerance
#365 opened May 14, 2025 by jgehrcke
ComputeDomain: do not require a priori node count for creation feature issue/PR that proposes a new feature or functionality
#364 opened May 14, 2025 by jgehrcke
GPU sharing: revisit MPS support (change semantics of config, and daemon control) feature issue/PR that proposes a new feature or functionality
#362 opened May 14, 2025 by jgehrcke
GPU sharing: support Dynamic MIG (using DRA partitionable devices) feature issue/PR that proposes a new feature or functionality
#361 opened May 14, 2025 by jgehrcke
GPUs: take device offline when unhealthy (build logic in go-nvlib) feature issue/PR that proposes a new feature or functionality robustness issue/pr: edge cases & fault tolerance
#360 opened May 14, 2025 by jgehrcke
Docs for general installation process documentation Issue/PR focused on fixing/editing/adding documentation bits question Categorizes issue or PR as a support question.
#356 opened May 13, 2025 by fracappa
New channels do not appear in ResourceSlices question Categorizes issue or PR as a support question.
#354 opened May 12, 2025 by robertdavidsmith
ComputeDomain: support more than one domain per node, with subsets of GPUs feature issue/PR that proposes a new feature or functionality
#353 opened May 12, 2025 by jgehrcke
ComputeDomain: explore exposing Prometheus metrics debuggability issue/pr related to the ability to debug the system
#352 opened May 12, 2025 by jgehrcke
ComputeDomain: add current state (health) and debug info to event stream/conditionals debuggability issue/pr related to the ability to debug the system
#350 opened May 12, 2025 by jgehrcke
ComputeDomain: add support for elastic workloads feature issue/PR that proposes a new feature or functionality
#349 opened May 12, 2025 by jgehrcke
ComputeDomain: add support for node failure (follow workload as it is re-scheduled) feature issue/PR that proposes a new feature or functionality robustness issue/pr: edge cases & fault tolerance
#348 opened May 12, 2025 by jgehrcke
Expose kubelet plugin socket path(s) as configuration parameter(s) config issue/PR about user-facing configuration interface
#339 opened May 5, 2025 by jgehrcke
DRA Admin Access integration feature issue/PR that proposes a new feature or functionality
#337 opened Apr 29, 2025 by ritazh
Could we please add a guide to help others to install in a k8s cluster ? question Categorizes issue or PR as a support question.
#336 opened Apr 29, 2025 by kangclzjc
Make container log verbosity configurable / dynamic feature issue/PR that proposes a new feature or functionality
#335 opened Apr 26, 2025 by jgehrcke
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT is not set when using MPS bug Issue/PR to expose/discuss/fix a bug
#319 opened Apr 16, 2025 by harche
First-class the use case of multiple pods per node in a specific ComputeDomain feature issue/PR that proposes a new feature or functionality
#309 opened Mar 28, 2025 by jgehrcke
CD controller pod: hard requirement on control-plane label? question Categorizes issue or PR as a support question.
#308 opened Mar 28, 2025 by jgehrcke
ProTip! Add no:assignee to see everything that’s not assigned.