-
Notifications
You must be signed in to change notification settings - Fork 745
Open
Description
runtime is defined already on containerd
sudo crictl info | jq '.config.containerd.defaultRuntimeName'
"nvidia"
when applying the v0.18.0 with k apply
k apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.18.0/deployments/static/nvidia-device-plugin.yml
i get error starting plugins: error getting plugins: unable to create plugins: failed to construct resource managers: invalid device discovery strategy
k logs nvidia-device-plugin-daemonset-fzltg -n kube-system
I1022 18:32:39.566652 1 main.go:239] "Starting NVIDIA Device Plugin" version=<
3c9ffca9
commit: 3c9ffca9491f0d2d362a7064138dfcd71bb57592
>
I1022 18:32:39.566674 1 main.go:242] Starting FS watcher for /var/lib/kubelet/device-plugins
I1022 18:32:39.566692 1 main.go:249] Starting OS watcher.
I1022 18:32:39.566840 1 main.go:264] Starting Plugins.
I1022 18:32:39.566851 1 main.go:321] Loading configuration.
I1022 18:32:39.567169 1 main.go:346] Updating config with default resource matching patterns.
I1022 18:32:39.567245 1 main.go:357]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": true,
"mpsRoot": "",
"nvidiaDriverRoot": "/",
"nvidiaDevRoot": "/",
"gdrcopyEnabled": false,
"gdsEnabled": false,
"mofedEnabled": false,
"useNodeFeatureAPI": null,
"deviceDiscoveryStrategy": "auto",
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "cdi.k8s.io/",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "nvidia.com/gpu"
}
]
},
"sharing": {
"timeSlicing": {}
},
"imex": {}
}
I1022 18:32:39.567250 1 main.go:360] Retrieving plugins.
E1022 18:32:39.567304 1 factory.go:113] Incompatible strategy detected auto
E1022 18:32:39.567309 1 factory.go:114] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E1022 18:32:39.567311 1 factory.go:115] You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
E1022 18:32:39.567312 1 factory.go:116] You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
E1022 18:32:39.567314 1 factory.go:117] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
E1022 18:32:39.567388 1 main.go:177] error starting plugins: error getting plugins: unable to create plugins: failed to construct resource managers: invalid device discovery strategyv0.17.4 works fine
$ k apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.4/deployments/static/nvidia-device-plugin.yml
daemonset.apps/nvidia-device-plugin-daemonset created
$ kubectl get pods -n kube-system | grep nvidia
nvidia-device-plugin-daemonset-ss9lm 0/1 ContainerCreating 0 8s
$ kubectl get pods -n kube-system | grep nvidia
nvidia-device-plugin-daemonset-ss9lm 1/1 Running 0 12s
log from v0.17.4:
$ k logs nvidia-device-plugin-daemonset-ss9lm -n kube-system
I1022 18:38:35.316289 1 main.go:235] "Starting NVIDIA Device Plugin" version=<
fd56a747
commit: fd56a747defe15333adce40fcd3a06ffb129251b
>
I1022 18:38:35.316319 1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I1022 18:38:35.316336 1 main.go:245] Starting OS watcher.
I1022 18:38:35.316431 1 main.go:260] Starting Plugins.
I1022 18:38:35.316442 1 main.go:317] Loading configuration.
I1022 18:38:35.316944 1 main.go:342] Updating config with default resource matching patterns.
I1022 18:38:35.317034 1 main.go:353]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": false,
"mpsRoot": "",
"nvidiaDriverRoot": "/",
"nvidiaDevRoot": "/",
"gdsEnabled": false,
"mofedEnabled": false,
"useNodeFeatureAPI": null,
"deviceDiscoveryStrategy": "auto",
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "cdi.k8s.io/",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "nvidia.com/gpu"
}
]
},
"sharing": {
"timeSlicing": {}
},
"imex": {}
}
I1022 18:38:35.317039 1 main.go:356] Retrieving plugins.
I1022 18:38:35.331421 1 server.go:195] Starting GRPC server for 'nvidia.com/gpu'
I1022 18:38:35.331842 1 server.go:139] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I1022 18:38:35.332662 1 server.go:146] Registered device plugin for 'nvidia.com/gpu' with Kubelet
gilgameshfreedom
Metadata
Metadata
Assignees
Labels
No labels