KEP-5328: Node Capabilities #5347

pravk03 · 2025-05-28T00:45:56Z

One-line PR description: Add the initial KEP for KEP 5328: Node Capabilities

Issue link: Node Capabilities #5328

Other comments:

k8s-ci-robot · 2025-05-28T00:46:05Z

Welcome @pravk03!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-05-28T00:46:06Z

Hi @pravk03. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

wojtek-t

@dom4ha @sanposhiho @macsko - FYI

keps/sig-node/5328-node-capabilities/README.md

keps/sig-node/5328-node-capabilities/kep.yaml

keps/sig-node/5328-node-capabilities/README.md

pravk03 · 2025-05-29T00:25:30Z

/cc @tallclair @yujuhong

sanposhiho · 2025-05-29T20:59:40Z

/sig scheduling

SergeyKanzhelev · 2025-06-18T19:36:18Z

Can we have any examples listed that will justify this. Right now the KEP suggests to use it for FG-related capabilities, while not giving a good examples where it would be non-FG related.

The guaranteedQOSPodCPUResize example used in the KEP isn't purely a feature gate; it's a logical capability derived from a combination of feature gates and the Kubelet's cpuManagerPolicy configuration.

While this is still in early stages, this recent discussion about making the pod requirement for exclusive resources more explicit also indicates a need for non-FG capabilities. The API field itself should be forward-facing enough to support such potential use-cases ?.

Those are all examples of FG-related capabilities. Not the generic long-term capabilities.

tallclair · 2025-06-18T23:35:00Z

It seems like most of the concerns with this are around the specific capabilities being added, but this KEP doesn't actually propose adding any capabilities. The examples given are hypothetical examples based on features currently in development, but no new features will be able to depend on capabilities until it goes to beta. This creates a bit of a chicken-and-egg situation, where it's hard to point to exactly how capabilities will be used until we have users lined up, but we can't line up users yet.

SergeyKanzhelev · 2025-06-18T23:41:44Z

It seems like most of the concerns with this are around the specific capabilities being added, but this KEP doesn't actually propose adding any capabilities. The examples given are hypothetical examples based on features currently in development, but no new features will be able to depend on capabilities until it goes to beta. This creates a bit of a chicken-and-egg situation, where it's hard to point to exactly how capabilities will be used until we have users lined up, but we can't line up users yet.

we kind of need to know what will be expected use cases. Maybe past examples or hypothetical examples thought thru end-to-end. Right now this KEP is limited to just set of name/value pairs and a scenario of FG discoverability. But already we are thinking there MAY be need to support capabilities for node selection, ability to declare tolerations for capabilities, ability to have node-restricted capabilities. Knowing the scope would help to understand if API proposed is needed (among alternatives if the set of use cases is limited) and if needed, what shape should it have.

keps/sig-node/5328-node-capabilities/README.md

pravk03 · 2025-06-19T17:25:49Z

Maybe past examples or hypothetical examples thought thru end-to-end

RuntimeClass was intended as a past example used to illustrate non-FG related runtime capabilities in the earlier version of the proposal. I agree that it had some missing details and thanks for highlighted them in your comment.

Runtime handlers as a list of handlers is also not a good fit. Default handler runc is not specified in pod spec. So it will not be used by scheduler and by definition must not be added to capabilities. Non-default handlers may need more details on what it is. And names list may not fit into the value length limits. Special object representing the runtime is a better choice here.

I have tried to address these the Case Study section.

tallclair · 2025-06-20T16:31:16Z

Maybe past examples or hypothetical examples thought thru end-to-end. Right now this KEP is limited to just set of name/value pairs and a scenario of FG discoverability.

I feel like we've discussed these options in depth already. Yes, these are all somewhat hypothetical because we've had to work around them in other ways. I'm sure we can dig up more examples from past KEPs, but is that necessary?

Capabilities that are not limited to just feature gates:

swap enabled
static CPU / memory manager enabled
user namespace support

Feature gate capabilities:

pod-level resources
TLS for gRPC probes
in-place resize (+IPPR for pod-level resources, IPPR for static CPU assignment, etc)

But already we are thinking there MAY be need to support capabilities for node selection, ability to declare tolerations for capabilities,

Not sure what node selection means, but we've explicitly said tolerations are out of scope.

ability to have node-restricted capabilities.

Where did this come in? Capabilities are just added by the node, so I'm not sure what this would even mean.

pravk03 · 2025-06-20T19:17:13Z

We discussed this KEP today and decided to re-consider this for 1.35 release cycle. The primary reason is to get input fromsig-arch on using this capability-based framework as a general strategy for managing version skew.

Few more things discussed and that could be refined in the proposal:

Evaluate the strategy for managing capabilities with bounded lifetime. Define a clear lifecycle and deprecation path for capabilities tied to features that graduate to GA.
We would need a better use-case to consider long-term capabilities in-scope. It can be considered a future enhancement once a clear use case arises.
Further explore SemVer based filtering in Node Selectors as a potential alternative.

cc @tallclair @SergeyKanzhelev @dchen1107 @yujuhong

pravk03 · 2025-07-01T23:54:12Z

This proposal was discussed in the SIG-Arch community meeting on June 26th (recording), It was generally seen as a beneficial strategy for managing version skew. The key takeaways and action items from the discussion are as follows:

The KEP can be scoped for temporary capabilities to solve version skew use cases. The configuration skew (like operating systems etc.) use-cases are less clear at the moment and may need explicit fields (like pod.spec.os).
Make the temporary nature of capabilities more obvious. This can prevent other components (webhooks etc.) from depending on this API. Obfuscating the fields can be explored if required.
A common library should be used to encapsulate the logic for inferring the capability requirements. This will ensure consistency between all consuming components, such as the kube-scheduler and admission controllers.
Autoscaler support: The "client-side" problem (determining a pod's requirements) can be solved with the proposed shared library. The problem of how to scale up nodes (specially scaling up from 0 nodes) is still a challenge and needs further exploration. This is already added in the future consideration section of the KEP.

I will incorporate this feedback into the KEP and reach out when its ready for review.

pravk03 · 2025-07-09T16:57:27Z

I've updated the KEP based on the SIG Architecture feedback (#5347 (comment)). The new version focuses more on capabilities tied to the feature lifecycle and expands on the deprecation strategy.

@tallclair @SergeyKanzhelev @haircommander @wojtek-t PTAL when you get a chance.

SergeyKanzhelev · 2025-07-21T18:57:25Z

keps/sig-node/5328-node-capabilities/kep.yaml

+  - '@wojtek-t'
+  - '@dom4ha'
+  - '@macsko'
+approvers:


I'd suggest to add somebody from sig scheduling as approver here

SergeyKanzhelev · 2025-07-21T18:59:00Z

keps/sig-node/5328-node-capabilities/README.md

+2. Introduce a shared library to encapsulate the logic for inferring a pod's requirements and matching them against node capabilities, ensuring consistency between control plane components that depends on capabilities.
+3. Enhance the kube-scheduler to filter nodes based on the pod's requirements.
+4. Enable API admission controllers to validate requests for operations against a node's actual feature support.
+


Suggested change

5. Enable kubelet admission plugin to check the Pod is compatible with the Node's features

SergeyKanzhelev · 2025-07-21T19:00:50Z

keps/sig-node/5328-node-capabilities/README.md

+Considered approaches:
+
+1.  Have the autoscaler inspect a running node in the target node pool and assume all new nodes will be identical. This would work only if a running node exists and fails for the "scale-from-zero" conditions.
+2.  This problem is fundamentally the same as what [kubernetes/autoscaler#7799](https://github.com/kubernetes/autoscaler/issues/7799) is tracking to support DRA use cases. The cluster-autoscaler currently does not consider DRA resources while scaling up and the long term solution would likely involve a new API surface to specify and/or modify autoscaler predictions.


is it the same? I thought that DRA is unique as DRA is not a part of a Node Status so it is harder to add those to the templates

SergeyKanzhelev · 2025-07-21T19:04:11Z

keps/sig-node/5328-node-capabilities/README.md

+**Node Capabilities Requirements:**
+
+1. Every capability must be associated with a Kubernetes feature graduating through the Alpha/Beta/GA process. This ensures capabilities are not used as permanent node attributes and are automatically removed after the feature is stable (after the supported version skew period)
+2. Must be derived from node's static configuration, which the Kubelet evaluates during bootstrap. Reporting new or changed capabilities requires a Kubelet restart to take effect. 


Also capabilities must be calculated BEFORE Pods admission. Otherwise pod admission will fail on node restart

SergeyKanzhelev · 2025-07-21T19:14:45Z

keps/sig-node/5328-node-capabilities/README.md

+* Graduation (GA): When the feature graduates to GA, the Kubelet continues to report the capability. This is necessary to manage version skew, allowing the control plane to correctly identify older nodes that do not yet have the GA feature.
+* Automated Deprecation (Post-GA): Kubelet automatically stop reporting the capability after the feature has been GA for a duration that exceeds the cluster's supported version skew. The capability check is bypassed in the shared library based on consumer component (e.g., kube-scheduler) version and feature gate graduation version.


This requires clarification. Maybe with the specific versions on how to calculate supported version skews. Does this statement suggests that the capability will be removed only after GA + 3 versions? And after this, the logic is removed from both - control plane and kubelet at the same time?

clarificaqtion is needed for the when it is removed from the control plane mostly

SergeyKanzhelev · 2025-07-21T19:16:06Z

keps/sig-node/5328-node-capabilities/README.md

+1. Replace Taints/Tolerations or Node Labels/Selectors/Affinity.
+2. Serve as a reporting mechanism for permanent static node attributes (like architecture, or specific hardware).
+3. To define the exact mapping of a feature to a capability. This KEP proposes the framework that establishes the mechanism; specific mappings will be defined with the features that use them.
+4. To include full Cluster Autoscaler integration in the initial Alpha stage. The autoscaler makes scaling decisions based on node templates, which lack the capability information. Defining an integration strategy is deferred as a [future enhancement](#cluster-autoscaler-integration). 


this will delay adoption. Perhaps it can be solved in alpha

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 28, 2025

k8s-ci-robot requested review from dchen1107 and derekwaynecarr May 28, 2025 00:46

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 28, 2025

pravk03 marked this pull request as draft May 28, 2025 00:47

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025

pravk03 force-pushed the node-capabilities branch 2 times, most recently from 59e7e54 to 4719180 Compare May 28, 2025 00:59

wojtek-t reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

keps/sig-node/5328-node-capabilities/kep.yaml Show resolved Hide resolved

dom4ha reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

dom4ha reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

dom4ha reviewed May 28, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

pravk03 force-pushed the node-capabilities branch 3 times, most recently from 4c11e06 to 9254f9b Compare May 28, 2025 23:11

pravk03 changed the title ~~KEP-5328: Node Capability Aware Scheduling~~ KEP-5328: Node Capabilities May 28, 2025

pravk03 marked this pull request as ready for review May 28, 2025 23:14

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025

k8s-ci-robot requested a review from mrunalp May 28, 2025 23:14

k8s-ci-robot requested review from tallclair and yujuhong May 29, 2025 00:25

pravk03 force-pushed the node-capabilities branch from 9254f9b to f8291a4 Compare May 29, 2025 01:06

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 29, 2025

pravk03 force-pushed the node-capabilities branch from 5fb093d to a3e1436 Compare June 18, 2025 20:36

pravk03 force-pushed the node-capabilities branch from a3e1436 to f069f62 Compare June 18, 2025 23:55

SergeyKanzhelev reviewed Jun 19, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed Jun 19, 2025

View reviewed changes

keps/sig-node/5328-node-capabilities/README.md Show resolved Hide resolved

pravk03 force-pushed the node-capabilities branch 3 times, most recently from 8d6230d to cd6d67e Compare June 19, 2025 17:18

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 19, 2025

pravk03 requested a review from wojtek-t June 20, 2025 16:01

pravk03 force-pushed the node-capabilities branch from cd6d67e to eff4d75 Compare July 8, 2025 15:17

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 8, 2025

pravk03 force-pushed the node-capabilities branch from eff4d75 to 1313dd7 Compare July 8, 2025 15:17

KEP-5328: Introduce Node Capabilities KEP

fc48bfb

pravk03 force-pushed the node-capabilities branch from 1313dd7 to fc48bfb Compare July 9, 2025 01:09

SergeyKanzhelev reviewed Jul 21, 2025

View reviewed changes


	5. Enable kubelet admission plugin to check the Pod is compatible with the Node's features

		* Graduation (GA): When the feature graduates to GA, the Kubelet continues to report the capability. This is necessary to manage version skew, allowing the control plane to correctly identify older nodes that do not yet have the GA feature.
		* Automated Deprecation (Post-GA): Kubelet automatically stop reporting the capability after the feature has been GA for a duration that exceeds the cluster's supported version skew. The capability check is bypassed in the shared library based on consumer component (e.g., kube-scheduler) version and feature gate graduation version.

KEP-5328: Node Capabilities #5347

Are you sure you want to change the base?

KEP-5328: Node Capabilities #5347

Conversation

pravk03 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pravk03 commented May 29, 2025

Uh oh!

sanposhiho commented May 29, 2025

Uh oh!

SergeyKanzhelev commented Jun 18, 2025

Uh oh!

tallclair commented Jun 18, 2025

Uh oh!

SergeyKanzhelev commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pravk03 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tallclair commented Jun 20, 2025

Uh oh!

pravk03 commented Jun 20, 2025

Uh oh!

pravk03 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pravk03 commented Jul 9, 2025

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

SergeyKanzhelev Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pravk03 commented May 28, 2025 •

edited

Loading

SergeyKanzhelev commented Jun 18, 2025 •

edited

Loading

pravk03 commented Jun 19, 2025 •

edited

Loading

pravk03 commented Jul 1, 2025 •

edited

Loading