Skip to content

MPS in K8s does not work on Jetson AGX Orin #1412

@micturkey

Description

@micturkey

Hi, I’m running Kubernetes with the latest nvidia/k8s-device-plugin:v0.17.3 on a Jetson AGX Orin node with JetPack 6.2, and I encountered a persistent crash issue when enabling the MPS Control Daemon.

I know that JetPack 6.1 added official support for MPS on Jetson, and I can confirm that MPS works correctly on my Jetson AGX Orin outside of containers — for example, I’m able to manually start the MPS control daemon using: nvidia-cuda-mps-control -d on the host without any issues.

However, when I deploy the device plugin with MPS enabled and set replicas: 4, I noticed that no GPU resources show up on the node at all.
In addition, when I check the plugin pods via: kubectl get pods -n nvidia-device-plugin -o wide, I see that some of the nvidia-device-plugin-mps-control-daemon pods are in a CrashLoopBackOff state.

Here is the error log from the crashing container (mps-control-daemon-ctr):

E0904 09:35:53.754750 312 main.go:84] error starting plugins: error getting daemons: error building device map: error building device map from config.resources: error building GPU device map: error visiting device: error building Device: error getting device paths: error getting GPU device minor number: Not Supported

To further investigate, I also tried switching to Time Slicing mode instead of MPS, and that worked perfectly: GPU resources were reported correctly, and no pods crashed.

So I’d like to ask:

Is it currently feasible to share GPU resources via MPS on Jetson Orin devices in Kubernetes?

This approach works well on traditional x86 platforms with discrete GPUs (e.g., RTX 2080, A100, etc.), but I’m unsure whether MPS-based sharing is officially supported on Jetson-class embedded GPUs.

If this is currently unsupported, is there any workaround or future roadmap for enabling it?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions