Skip to content

Conversation

mu8086
Copy link

@mu8086 mu8086 commented Jul 23, 2025

  • Allow configuring a custom runtimeClass.name to avoid conflict with NVIDIA's default runtime
  • Topology server namespace in nvidia-smi is now configurable via the TOPOLOGY_CM_NAMESPACE environment variable instead of being hardcoded to gpu-operator

Example Helm upgrade command

helm upgrade --install fake-gpu-operator ~/git/fake-gpu-operator/deploy/fake-gpu-operator \
  --namespace runai --create-namespace \
  --set runtimeClass.name=fake-nvidia

Example: Verified Pod Spec

This Pod verifies that the custom runtimeClass and dynamic topology namespace injection works correctly.

Click to expand pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: "1"
spec:
  runtimeClassName: fake-nvidia
  containers:
  - name: ubuntu
    image: ubuntu:22.04
    command: ["/bin/bash", "-c"]
    args:
      - |
        sleep infinity;
    resources:
      limits:
        nvidia.com/gpu: 1
    env:
      - name: NODE_NAME
        valueFrom:
          fieldRef:
            fieldPath: spec.nodeName

…espace

- Allow configuring a custom `runtimeClass.name` to avoid conflict with NVIDIA's default runtime
- Topology server namespace in `nvidia-smi` is now configurable via the `TOPOLOGY_CM_NAMESPACE` environment variable
  instead of being hardcoded to `gpu-operator`

### Example Helm upgrade command

```bash
helm upgrade --install fake-gpu-operator ~/git/fake-gpu-operator/deploy/fake-gpu-operator \
  --namespace runai --create-namespace \
  --set runtimeClass.name=fake-gpu
@mu8086 mu8086 changed the title feat(fake-gpu-operator): support custom runtimeClass and topology nam… feat: support custom runtimeClass and topology nam… Jul 23, 2025
@mu8086 mu8086 changed the title feat: support custom runtimeClass and topology nam… feat: support custom runtimeClass and topology namespace Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant