-
Notifications
You must be signed in to change notification settings - Fork 745
Description
I faced a problem with resource exposure from GPU with MIG enabled, I can see that there is no difference between 1g.10gb and 1g.10g+me GPU instances from the perspective of k8s-device-plugin
Actual behavior
Here is my mig-manager config file:
config19-config19:
- devices: [0]
mig-devices: &id001
1g.10gb: 5
1g.10gb+me: 1
1g.20gb: 1
mig-enabled: true
- devices: [1]
mig-devices: *id001
mig-enabled: trueWhen I applied this config on my node with 2 A100 80GBs attached to it, I got these resources exposed by the device plugin:
kind: Node
status:
capacity:
...
nvidia.com/gpu: "0"
nvidia.com/mig-1g.10gb: "12" # (1 x 1g.10gb + 1 x 1g.10gb+me) * 2
nvidia.com/mig-1g.20gb: "2" # 1 x 1g20gb * 2
...As you can see, there is no difference between 1g.10gb and 1g.10g+me GPU instances
Desired behavior
I would like to see this:
kind: Node
status:
capacity:
...
nvidia.com/gpu: "0"
nvidia.com/mig-1g.10gb: "10" # 5 x 1g.10gb * 2
nvidia.com/mig-1g.10gb.me: "2" # 1 x 1g.10gb+me * 2
nvidia.com/mig-1g.20gb: "2" # 1 x 1g20gb * 2
...Why do I need this
I want to run a Pod that needs to be running on ME GPU instance only, but I don't think it's possible. I can neither set it as a nvidia.com/mig-1g.10gb.me resource nor specify nodeAffinity rules, since using affinity rules I can only target the desired node, but not the device on this node
How can I achieve this? Maybe there are k8s-device-plugin config flags or something else?