KEP-1645: define dual stack policies and fields #5264

MrFreezeex · 2025-04-29T12:58:54Z

One-line PR description: Define dual stack recommendations and fields

Issue link: Multi-Cluster Services API #1645

Other comments: This does three things: define initial suggestion as to what an implementation may do to support dual stack services, fix the max items for the IPs field (which is already fixed in the actual CRD) and add a IPfamilies matching the same field in the Service that implementation may use to reconcile this globally with an implementation defined policy.

k8s-ci-robot · 2025-04-29T12:59:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MrFreezeex
Once this PR has been reviewed and has the lgtm label, please assign skitt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-multicluster/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

ryanzhang-oss · 2025-04-29T17:16:45Z

keps/sig-multicluster/1645-multi-cluster-services-api/README.md

+  ipFamilies:
+    - IPv4


AFAIK, IPV4 and IPV6 format are quite different, are the formats in the "ips" field alone not enough for the consumer to discern? I assume that the consumer knows what IP family it can support.

This field is more aimed toward the allocation of the IPs than what happens after the IPs are already actually allocated a bit like the current type field or even how the ipFamilies field behave on a regular Service.

For instance for Cilium we would most likely want to do an intersection of all the ipFamilies in the exported Services which would put this as relevant as the other fields to "allocate" the IPs (meaning to create the derived service as it how we do this) to us essentially.

oh, IIRC, this is for some controller to act on this field? Are we not in the process of moving the serviceImport to either status or root?

Not entirely sure what you mean but yes our controller will use this to create the derived service with the appropriate IPFamily

skitt · 2025-05-21T14:41:32Z

keps/sig-multicluster/1645-multi-cluster-services-api/README.md

@@ -590,10 +592,12 @@ const (
 type ServiceImportSpec struct {
  // +listType=atomic
  Ports []ServicePort `json:"ports"`
-  // +kubebuilder:validation:MaxItems:=1
+  // +kubebuilder:validation:MaxItems:=2


Shouldn’t we have this on IPFamilies too?

The regular services doesn't seems to have it (while it does for the clusterIPs field) but I don't mind adding it there too

Should this actually be limited at all? It's not on the actual Service resource, and IIRC we had agreed that ServiceImport ports field should be the intersection of ports from Services exposed by a ServiceExport. (The KEP says union currently, but I believe we had decided this should be corrected to avoid publishing ports on the "frontend" which may not be available on some "backends").

and IIRC we had agreed that ServiceImport ports

No we didn't, it's still is union and there is no one to my knowledge that have an active PR/doing something about this

Also note that the fields targeted here is the ips not the ports 😅

Also note that the fields targeted here is the ips not the ports 😅

Oops! Just read the diff wrong.

lauralorenz · 2025-06-03T16:35:41Z

Triage note: Had some discussion with comments from @mikemorris last SIG-MC, can you please add them to this PR so we can talk about them?

MrFreezeex · 2025-06-20T11:44:46Z

Hi @mikemorris are you still looking at commenting here about your concerns about this change?

mikemorris

Trying to capture a high-level concern I have with this direction:

Using a Service is just one possible (albeit common) implementation of MCS - alternative explorations such as ClusterIP Gateways are currently being developed and may be an option in the future . Service is a bloated resource and in general I would have a strong preference towards avoiding leaking what should be implementation details up into the actual spec resources.
I don't think adding this field to ServiceImport should actually be necessary.
1. On Service, it is optionally configured by a service owner in conjunction with .spec.ipFamilyPolicy to specify what IP family or familiies should be made available for a Service on a dual-stack cluster, and which family should be used for the legacy.spec.clusterIP field.
2. In contrast to Service, on a ServiceImport no discretion or decision-making should be required - the available IP families will be determined by the IP families of Services exposed by ServiceExports, and may be constrained by the IP family stack configuration of the importing cluster. The exported Services may be on clusters with different dual-stack configurations (some IPv4 only, some IPv6, some dual-stack) and may have different configurations for which IPs are available for each Service in each cluster. I believe determining the appropriate dual-stack configuration should be possible by watching the associated exported Service resources (and their EndpointSlices) directly, from either a centralized controller or per-cluster decentralized API server watches.
I don't see this field as being helpful for order-of-operations concerns in creating an implementation Service resource, because at any point the available endpoints for a ServiceImport may change (an IPv6-only cluster may go offline while an IPv4-only cluster remains available and the ServiceImport in a dual-stack cluster should likely be updated to drop its IPv6 address from the ips field if the topology of an implementation requires direct/flat networking and no IPv6 endpoints are available (cleaning up and removing the ServiceImport entirely if no backends are routable from the importing cluster is a viable alternative too). Similarly, if an exported Service on a new IP family becomes available when it wasn't originally, the ServiceImport should likely be updated to publish an address in the ips field for the newly-available family when adding the backends.
I think what may be helpful instead is clarifying expected behavior in the various scenarios @MrFreezeex had laid out in the presentation, and possibly encoding those in conformance tests and/or a status field indicating that a ServiceImport is "ready" (has backends available which are reachable from the importing cluster (which may have different constraints or meaning in centralized vs decentralized implementations) rather than expecting the ServiceImport and any supporting infra to be created (and destroyed) syncronously and be routable immediately at creation.

mikemorris · 2025-06-24T14:51:05Z

keps/sig-multicluster/1645-multi-cluster-services-api/README.md

@@ -590,10 +592,12 @@ const (
 type ServiceImportSpec struct {
  // +listType=atomic
  Ports []ServicePort `json:"ports"`
-  // +kubebuilder:validation:MaxItems:=1
+  // +kubebuilder:validation:MaxItems:=2


Should this actually be limited at all? It's not on the actual Service resource, and IIRC we had agreed that ServiceImport ports field should be the intersection of ports from Services exposed by a ServiceExport. (The KEP says union currently, but I believe we had decided this should be corrected to avoid publishing ports on the "frontend" which may not be available on some "backends").

MrFreezeex · 2025-06-24T16:24:09Z

Thanks! Quickly answering to those point but we can have a longer discussion in the sig meeting if you are available there

Trying to capture a high-level concern I have with this direction:

1. Using a Service is just one possible (albeit common) implementation of MCS - alternative explorations such as [ClusterIP Gateways](https://github.com/kubernetes-sigs/gateway-api/pull/3608) are currently being developed and may be an option in the future . Service is a bloated resource and in general I would have a strong preference towards avoiding leaking what should be implementation details up into the actual spec resources.

IIUC all the implementation works with a Service in a way or in another, we cannot block PR to ServiceImport by saying that hypothetically another alternative to Service which is very much experimental and that (AFAIK) doesn't even have consensus among sig network/Gateway-API folks is the way forward. If we were to do that we can probably also forget about bumping MCS-API to v1beta1 too for instance...

2. I don't think adding this field to ServiceImport should actually be necessary.
   
   1. On Service, it is optionally configured by a service owner in conjunction with [`.spec.ipFamilyPolicy`](https://kubernetes.io/docs/concepts/services-networking/dual-stack/#services) to specify what IP family or familiies should be made available for a Service on a dual-stack cluster, and which family should be used for the legacy`.spec.clusterIP` field.

   2. In contrast to Service, on a ServiceImport no discretion or decision-making should be required - the available IP families will be determined by the IP families of Services exposed by ServiceExports, and may be constrained by the IP family stack configuration of the importing cluster. The exported Services _may_ be on clusters with different dual-stack configurations (some IPv4 only, some IPv6, some dual-stack) and _may_ have different configurations for which IPs are available for each Service in each cluster. I believe determining the appropriate dual-stack configuration should be possible by watching the associated exported Service resources (and their EndpointSlices) directly, from either a centralized controller _or_ per-cluster decentralized API server watches.

In Cilium at least we do not have this info on the controller creating the derived Service. This controller is intentionally not connected to all the other clusters it only knows about the local ServiceImport and the (possibly not yet created) derived Service. We also do not always sync Endpoint Slices from remote clusters as an optimization (only if there is a specific annotation or that the ServiceImport is headless). So I need to get this info already merged from all clusters/"reconciled" on the ServiceImport resource directly.

3. I don't see this field as being helpful for order-of-operations concerns in creating an implementation Service resource, because at any point the available endpoints for a ServiceImport may _change_ (an IPv6-only cluster may go offline while an IPv4-only cluster remains available and the ServiceImport in a dual-stack cluster should likely be updated to drop its IPv6 address from the `ips` field if the topology of an implementation requires direct/flat networking and no IPv6 endpoints are available (cleaning up and removing the ServiceImport entirely if no backends are routable from the importing cluster is a viable alternative too). Similarly, if an exported Service on a new IP family becomes available when it wasn't originally, the ServiceImport should likely be updated to publish an address in the `ips` field for the newly-available family when adding the backends.

Yep the ServiceImport ipFamilies may change and we do plan to reflect it on the IPs. I am not sure how you see this as unhelpful as you described pretty much what we are going to do... Also note that in our case we want some global consistency of what an IPFamily will get there so we will essentially do the intersection of all exported service ipFamily + what is supported by the local cluster.

4. I think what may be helpful instead is clarifying expected behavior in the various scenarios @MrFreezeex had laid out in the presentation, and possibly encoding those in conformance tests and/or a `status` field indicating that a ServiceImport is "ready" (has backends available which are reachable from the importing cluster (which may have different constraints or meaning in centralized vs decentralized implementations) rather than expecting the ServiceImport and any supporting infra to be created (and destroyed) syncronously and be routable immediately at creation.

I was more planning to to do this in a second step/PR as while the initial use case might be tied to that PR some possible conditions on the ServiceImport might be relevant for other things (like reporting any errors related to the import 🤷‍♂️). I am not sure that besides checking that the ServiceImport would be ready we can do much more on the conformance tests though.

I am not entirely sure what you mean about the behavior of "ready" and " to be created (and destroyed) syncronously and be routable immediately at creation" to me it would be more a place to put any error state whether it's something very generic with ready or some more specific implementation defined errors if there's a need for that.

mikemorris · 2025-06-24T16:46:54Z

In Cilium at least we do not have this info on the controller creating the derived Service. This controller is intentionally not connected to all the other clusters it only knows about the local ServiceImport and the (possibly not yet created) derived Service. We also do not always sync Endpoint Slices from remote clusters as an optimization (only if there is a specific annotation or that the ServiceImport is headless). So I need to get this info already merged from all clusters/"reconciled" on the ServiceImport resource directly.

Okay this is the implementation detail/constraint I had not been familiar with, will need to think about this more.

lauralorenz · 2025-07-22T17:14:24Z

Triage notes:

group still wants to follow up on conversation in KEP-1645: define dual stack policies and fields #5264 (comment)

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 29, 2025

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Apr 29, 2025

k8s-ci-robot requested review from JeremyOT and skitt April 29, 2025 12:59

k8s-ci-robot added sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2025

KEP-1645: define dual stack policies and fields

092d90e

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

MrFreezeex force-pushed the kep1645-dualstack branch from e8481cf to 092d90e Compare April 29, 2025 16:58

ryanzhang-oss reviewed Apr 29, 2025

View reviewed changes

MrFreezeex mentioned this pull request May 21, 2025

apis: add ipFamilies field kubernetes-sigs/mcs-api#104

Open

skitt reviewed May 21, 2025

View reviewed changes

mikemorris reviewed Jun 24, 2025

View reviewed changes

KEP-1645: define dual stack policies and fields #5264

Are you sure you want to change the base?

KEP-1645: define dual stack policies and fields #5264

Conversation

MrFreezeex commented Apr 29, 2025

Uh oh!

k8s-ci-robot commented Apr 29, 2025

Uh oh!

ryanzhang-oss Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

MrFreezeex Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanzhang-oss May 20, 2025

Choose a reason for hiding this comment

Uh oh!

MrFreezeex May 20, 2025

Choose a reason for hiding this comment

Uh oh!

skitt May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MrFreezeex May 21, 2025

Choose a reason for hiding this comment

Uh oh!

mikemorris Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrFreezeex Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemorris Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lauralorenz commented Jun 3, 2025

Uh oh!

MrFreezeex commented Jun 20, 2025

Uh oh!

mikemorris left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemorris Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MrFreezeex commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikemorris commented Jun 24, 2025

Uh oh!

lauralorenz commented Jul 22, 2025

Uh oh!

Uh oh!

MrFreezeex Apr 29, 2025 •

edited

Loading

mikemorris Jun 24, 2025 •

edited

Loading

MrFreezeex Jun 24, 2025 •

edited

Loading

mikemorris Jun 24, 2025 •

edited

Loading

mikemorris left a comment •

edited

Loading

mikemorris Jun 24, 2025 •

edited

Loading

MrFreezeex commented Jun 24, 2025 •

edited

Loading