New channels do not appear in `ResourceSlices` #354

robertdavidsmith · 2025-05-12T14:00:51Z

Hi,

Currently if you make new channels (e.g use mknod to make files such as /dev/nvidia-caps-imex-channels/channel3), these don't show up in the ResourceSlices. This is true even after restarting all dra driver pods. Have you any plans to fix? Alternatively would you accept a PR to fix?

Thanks,

Rob

The text was updated successfully, but these errors were encountered:

jgehrcke · 2025-05-12T14:19:36Z

Hello, Rob!

if you make new channels (e.g use mknod to make files such as /dev/nvidia-caps-imex-channels/channel3), these don't show up in the ResourceSlice

That is expected.

The ComputeDomain construct manages IMEX channels (and IMEX daemons, for that matter) under the hood. With the ComputeDomain primitive, we can treat anything-IMEX as an implementation detail. That is, as a user one would never go in and "create an IMEX channel". Orchestrating IMEX primitives is the responsibility of the ComputeDomain logic/implementation and generally one should not interfere with that.

Is your motivation to offer more than one IMEX channel per single ComputeDomain? If so: what use case do you have in mind for that?

Currently -- by design -- one ComputeDomain is backed by precisely one IMEX channel (we picked channel zero for that). Also by current design, there is no further sub-division within one ComputeDomain (processes associated with one ComputeDomain are meant to see each other via that single shared IMEX channel).

The ComputeDomain concept is still in its infancy and we are certainly looking forward to making it more flexible and robust and powerful in the future. For example, we might be looking into using different channels as part of supporting more than one ComputeDomain per node (#353).

robertdavidsmith · 2025-05-13T11:10:50Z

Hello @jgehrcke,

Thanks for your comment, understood.

The use case is that we have multiple namespaces which may be running jobs at once. For example we may have two 32-GPU jobs, and two 2-GPU jobs from 4 different namespaces running on one NVL72. Ideally there would be a security boundary between the namespaces.

Being new to IMEX, I was thinking of making either a ComputeDomain per namespace, or a ComputeDomain per job. Then I wanted to do the following for each namespace (there will be ~100 namespaces).

Make a channel under /dev/nvidia-caps-imex-channels
Make a DeviceClass for that channel
Reference the DeviceClass from the namespaces (or job's) ComputeDomain - currently not supported AFAIK, I've made issue Please make the computedomains.resource.nvidia.com crd support any deviceclass/channel #351

I now understand the above won't work for 2-GPU jobs sharing a k8s node. Would it even work if the smallest job took a whole 4-GPU node?

What would you recommend doing to implement separation between namespaces? If we just put all namespaces on one imex channel how big a security concern is this?

Thanks,

Rob

robertdavidsmith · 2025-05-13T11:12:23Z

(Note also closely related #351)

jgehrcke · 2025-05-16T14:44:18Z

The use case is that we have multiple namespaces which may be running jobs at once [...] Ideally there would be a security boundary between the namespaces.

Perfect. The ComputeDomain (CD) primitive is precisely here to provide that security boundary. The security isolation between jobs in different CDs in different namespaces is strong. That is our ambition.

Being new to IMEX, I was thinking of making either a ComputeDomain per namespace, or a ComputeDomain per job.

A CD is really meant to be tied to specific workload (to "one job").

Our idea is for a ComputeDomain to form around workload on the fly.

This magic for automatic creation and teardown of a CD is enabled by the ResourceClaimTemplate approach as shown in this example (note how the pod spec refers to a resourceClaimTemplate, which is also defined in the same YAML document).

In that case, the CD is formed automatically and dynamically around the job (the k8s pods). That implies forming a short-lived, single-channel IMEX domain under the hood, which is properly torn down upon job completion.

Then I wanted to do the following for each namespace (there will be ~100 namespaces): [...]

I believe and hope that none of what you wrote next is actually required! :)

Once the CD is formed, the containers that use it (across pods across nodes) all have IMEX channel0 injected, and can use it.

What would you recommend doing to implement separation between namespaces? If we just put all namespaces on one imex channel how big a security concern is this?

The general idea is that when you use ComputeDomains as intended then isolation is done for you.

Next, let me respond a little more in-depth about the relationship between CDs and k8s namespaces, and about actual security.

GPU memory can only be shared among containers (via NVLink/IMEX) when those containers are all within the same CD. For clarity:

When they are in the same CD, they automatically have access to a shared IMEX channel.
When they are not in the same CD, then there is no shared IMEX channel (there might be physical NVLink-linkage, but that cannot be misused; (the lack of) IMEX-connectivity guarantees that).

A user that has access to the k8s namespace that a job and CD are deployed in can obviously inject itself and access GPU memory of that job after all.

That is, to enforce the boundary that a CD provides (from an actual security perspective), one needs to make sure that a bad actor does not have access to the same k8s namespace that the (to-be-secured) job and CD are deployed in.

So, your plan to have many namespaces to isolate users and jobs from each other is exactly in alignment with our security/threat model.

In other words:

CDs in separate namespaces provide actual security isolation -- a user that has access to one k8s namespace and runs jobs in that namespace cannot reach into GPU memory shared within an IMEX domain belonging to a CD deployed in a different k8s namespace.
CDs in the same namespace provide "don't step onto each other's toes" security (often very useful, too).

robertdavidsmith · 2025-05-19T10:31:10Z

Thank you for your very detailed reply.

We will create a ComputeDomain per job as you suggest. This could be done by a small armada code change, or by a k8s controller. Then as you say, imex channel 0 is enough, and we can close this ticket.

Would you also recommend additional security such as the below?

Using cilium network policy to block access to IMEX SERVER_PORT=50000 from pods
Configuring IMEX_ENABLE_AUTH_ENCRYPTION in the nvidia-imex config?
Configuring kerberos in the nvidia-imex config?

Thanks,

Rob

jgehrcke added the question Categorizes issue or PR as a support question. label May 12, 2025

robertdavidsmith mentioned this issue May 13, 2025

Please make the computedomains.resource.nvidia.com crd support any deviceclass/channel #351

Closed

robertdavidsmith mentioned this issue May 19, 2025

Remove ComputeDomain.spec.numNodes #368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New channels do not appear in `ResourceSlices` #354

New channels do not appear in `ResourceSlices` #354

robertdavidsmith commented May 12, 2025 •

edited

Loading

jgehrcke commented May 12, 2025 •

edited

Loading

Uh oh!

robertdavidsmith commented May 13, 2025 •

edited

Loading

Uh oh!

robertdavidsmith commented May 13, 2025

Uh oh!

jgehrcke commented May 16, 2025 •

edited

Loading

Uh oh!

robertdavidsmith commented May 19, 2025 •

edited

Loading

Uh oh!

New channels do not appear in ResourceSlices #354

New channels do not appear in ResourceSlices #354

Comments

robertdavidsmith commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jgehrcke commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertdavidsmith commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertdavidsmith commented May 13, 2025

Uh oh!

jgehrcke commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertdavidsmith commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

New channels do not appear in `ResourceSlices` #354

New channels do not appear in `ResourceSlices` #354

robertdavidsmith commented May 12, 2025 •

edited

Loading

jgehrcke commented May 12, 2025 •

edited

Loading

robertdavidsmith commented May 13, 2025 •

edited

Loading

jgehrcke commented May 16, 2025 •

edited

Loading

robertdavidsmith commented May 19, 2025 •

edited

Loading