-
Notifications
You must be signed in to change notification settings - Fork 73
Issues: NVIDIA/k8s-dra-driver-gpu
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
ComputeDomain: Daemonset pods are stuck in terminating forever
bug
Issue/PR to expose/discuss/fix a bug
#380
opened May 30, 2025 by
klueska
Build comprehensive CI pipeline to yield confidence for releasing
ci/testing
issue/PR related to CI and/or testing
#369
opened May 14, 2025 by
jgehrcke
Ensure compatibility with OpenShift Kubernetes Engine
ci/testing
issue/PR related to CI and/or testing
#367
opened May 14, 2025 by
jgehrcke
kubelet plugin: move MASK_NVIDIA_DRIVER_PARAMS into code
maintenance
issue/pr for maintenance, cleanup, refactor etc
#366
opened May 14, 2025 by
jgehrcke
GPUs: remove state associated with deleted ResourceClaims
robustness
issue/pr: edge cases & fault tolerance
#365
opened May 14, 2025 by
jgehrcke
ComputeDomain: do not require a priori node count for creation
feature
issue/PR that proposes a new feature or functionality
#364
opened May 14, 2025 by
jgehrcke
GPU sharing: fix time-slicing config for old devices (Tesla P4)
#363
opened May 14, 2025 by
jgehrcke
GPU sharing: revisit MPS support (change semantics of config, and daemon control)
feature
issue/PR that proposes a new feature or functionality
#362
opened May 14, 2025 by
jgehrcke
GPU sharing: support Dynamic MIG (using DRA partitionable devices)
feature
issue/PR that proposes a new feature or functionality
#361
opened May 14, 2025 by
jgehrcke
GPUs: take device offline when unhealthy (build logic in go-nvlib)
feature
issue/PR that proposes a new feature or functionality
robustness
issue/pr: edge cases & fault tolerance
#360
opened May 14, 2025 by
jgehrcke
Docs for general installation process
documentation
Issue/PR focused on fixing/editing/adding documentation bits
question
Categorizes issue or PR as a support question.
#356
opened May 13, 2025 by
fracappa
New channels do not appear in Categorizes issue or PR as a support question.
ResourceSlices
question
#354
opened May 12, 2025 by
robertdavidsmith
ComputeDomain: support more than one domain per node, with subsets of GPUs
feature
issue/PR that proposes a new feature or functionality
#353
opened May 12, 2025 by
jgehrcke
ComputeDomain: explore exposing Prometheus metrics
debuggability
issue/pr related to the ability to debug the system
#352
opened May 12, 2025 by
jgehrcke
ComputeDomain: add current state (health) and debug info to event stream/conditionals
debuggability
issue/pr related to the ability to debug the system
#350
opened May 12, 2025 by
jgehrcke
ComputeDomain: add support for elastic workloads
feature
issue/PR that proposes a new feature or functionality
#349
opened May 12, 2025 by
jgehrcke
ComputeDomain: add support for node failure (follow workload as it is re-scheduled)
feature
issue/PR that proposes a new feature or functionality
robustness
issue/pr: edge cases & fault tolerance
#348
opened May 12, 2025 by
jgehrcke
Expose kubelet plugin socket path(s) as configuration parameter(s)
config
issue/PR about user-facing configuration interface
#339
opened May 5, 2025 by
jgehrcke
DRA Admin Access integration
feature
issue/PR that proposes a new feature or functionality
#337
opened Apr 29, 2025 by
ritazh
Could we please add a guide to help others to install in a k8s cluster ?
question
Categorizes issue or PR as a support question.
#336
opened Apr 29, 2025 by
kangclzjc
Make container log verbosity configurable / dynamic
feature
issue/PR that proposes a new feature or functionality
#335
opened Apr 26, 2025 by
jgehrcke
ComputeDomain kubelet plugin: kubelet restart may permanently delete resource slice
bug
Issue/PR to expose/discuss/fix a bug
#330
opened Apr 25, 2025 by
jgehrcke
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT is not set when using MPS
bug
Issue/PR to expose/discuss/fix a bug
#319
opened Apr 16, 2025 by
harche
First-class the use case of multiple pods per node in a specific ComputeDomain
feature
issue/PR that proposes a new feature or functionality
#309
opened Mar 28, 2025 by
jgehrcke
CD controller pod: hard requirement on control-plane label?
question
Categorizes issue or PR as a support question.
#308
opened Mar 28, 2025 by
jgehrcke
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.