-
Notifications
You must be signed in to change notification settings - Fork 258
MGMT-21485: Enable dpu-host mode that matches DPF requirements #2778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-21485: Enable dpu-host mode that matches DPF requirements #2778
Conversation
@tsorya: This pull request references MGMT-21485 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting just to modify a variable name and to discuss the changes for managed clusters. All the rest looks good.
/retest |
f862560
to
bac1bb9
Compare
@tsorya: This pull request references MGMT-21485 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
1 similar comment
@tsorya: This pull request references MGMT-21485 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest-required |
@tsorya could you have a look at the failures in
A quick look at what's going on with openshift-ovn-kubernetes/ovnkube-control-plane should tell you whether we missed anything in the input parameters. |
Looking |
bac1bb9
to
180e442
Compare
…ost mode Adding required gateway value for ovnk in dpu-host mode This commit enables usage of ovn-kubernetes/ovn-kubernetes#5327
This commit introduces OVN_NODE_MODE environment variable to enable per-node feature enforcement, particularly for DPU host mode where certain features must be disabled regardless of cluster-wide configuration. - Move feature toggles from ConfigMap (004-config.yaml) to startup scripts - ConfigMap values cannot be reliably overridden per-node, but script logic can be conditional - Implement OVN_NODE_MODE-based conditional feature enablement in node startup script - Update control-plane scripts to handle moved parameters - Add 'dpu-host' mode that automatically disables incompatible features: - Egress IP and related features (egress firewall, egress QoS, egress service) - Multicast support - Multi-external gateway support - Multi-network policies and admin network policies - Network segmentation features - Set gateway_interface='derive-from-mgmt-port' for DPU host nodes - Add ovnkube_node_mode='--ovnkube-node-mode dpu-host' flag From bindata/network/ovn-kubernetes/*/004-config.yaml: - enable-egress-ip=true - enable-egress-firewall=true - enable-egress-qos=true - enable-egress-service=true - enable-multicast=true - enable-multi-external-gateway=true Note: HyperShift hosted cluster ConfigMap (managed/004-config.yaml) retains egress feature flags as DPU host mode is not supported in hosted cluster configurations. - Add conditional blocks based on OVN_NODE_MODE - Full mode (default): All features enabled as configured - DPU host mode: Incompatible features force-disabled - Rename egress_ip_enable_flag to egress_features_enable_flag for clarity - Always-enabled features: Direct CLI flags (cleaner implementation) - --enable-egress-ip=true, --enable-egress-firewall=true, etc. - --enable-multicast, --enable-multi-external-gateway=true - Conditional features: Script variables (matching original ConfigMap logic) - multi_network_enabled_flag, network_segmentation_enabled_flag - multi_network_policy_enabled_flag, admin_network_policy_enabled_flag - Maintain backward compatibility for existing deployments - Add comprehensive TestOVNKubernetesScriptLibCombined test covering: - DPU host mode feature gating and disabling - Full mode with multi-network features enabled/disabled - Non-mode-gated features (route advertisements, DNS resolver, etc.) - Gateway interface variable usage validation - Multi-external gateway and egress features flag behavior across modes - Add TestOVNKubernetesControlPlaneFlags test covering: - Always-enabled features validation (direct CLI flags) - Conditional features validation (script variables) - Multi-network enablement logic (OVN_MULTI_NETWORK_ENABLE or OVN_NETWORK_SEGMENTATION_ENABLE) - Network segmentation logic validation - Remove redundant individual test functions after consolidation - Update existing config rendering tests for new ConfigMap content - Update test assertions to use correct flag names (egress_features_enable_flag) - Create docs/ovn_node_mode.md with detailed technical explanation - Update docs/operands.md with OVN-Kubernetes node modes section - Update docs/architecture.md with per-node configuration explanation - Update README.md with DPU host mode support information - Add implementation details, feature mapping tables, and migration notes - Document multi-external gateway as disabled feature in DPU host mode - Update all references to use correct flag names ConfigMap-based feature control cannot be overridden per-node, making it impossible to disable features on specific node types (like DPU hosts) while keeping them enabled cluster-wide. Moving the logic to startup scripts allows the same cluster configuration to work across heterogeneous node types. This change ensures that DPU host nodes automatically have incompatible features disabled, preventing runtime failures and enabling mixed-mode cluster deployments. - Existing clusters continue to work without changes - Default behavior (full mode) remains unchanged - Control-plane components maintain identical functionality - Migration is automatic during upgrade process - No manual intervention required - HyperShift hosted clusters unaffected (DPU host mode not supported)
180e442
to
c17dba5
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
@kyrtapz PTAL
/retest |
value: "{{.OVN_CONTROLLER_INACTIVITY_PROBE}}" | ||
- name: OVN_KUBE_LOG_LEVEL | ||
value: "4" | ||
- name: OVN_NODE_MODE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your commit message states that ConfigMap values cannot be reliably overridden per-node, but script logic can be conditional
How is that solved with setting the env vars for the whole daemonset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one provides OVN_NODE_MODE per daemonset.
In case of DPU-HOST mode CNO will create new daemonset with DPU HOST mode value.
So each different OVN_NODE_MODE will have it's own daemonset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it allows to provide the mode we running with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [[ "{{.OVN_MULTI_NETWORK_ENABLE}}" == "true" ]]; then | ||
multi_network_enabled_flag="--enable-multi-network" | ||
fi | ||
if [[ "{{.OVN_NETWORK_SEGMENTATION_ENABLE}}" == "true" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a logic change. Was there a reason behind it?
@ricky-rav you are good with that right? I vaguely remember you were working on some bug around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general took it from here
https://github.com/openshift/cluster-network-operator/pull/2778/files#diff-633cbda8a04c3338bc6d456f1046b08e1fe1a1b09d165e74cc313335ca7b62b2R47 but i can remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general i leaved as is OVN_NETWORK_SEGMENTATION_ENABLE logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyrtapz the bug you probably have in mind in the end was for DPU GA (4.21), so the PR is still open and we can get to it once we're done with 4.20: #2739
@tsorya so the difference between the original code and your code is that when OVN_MULTI_NETWORK_ENABLE=true and OVN_NETWORK_SEGMENTATION_ENABLE=false, we don't set any corresponding flags in the old code, while you set the multi network flag.
Let's keep the original behaviour for the sake of not introducing any new behaviour that hasn't been tested :)
Sorry for not realizing before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not really. Look on configmap part i remove
{{- if .OVN_MULTI_NETWORK_ENABLE }} enable-multi-network=true {{- end }}
So in case OVN_MULTI_NETWORK_ENABLE was set control-plane was getting this flag through CM in anycase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the running setup
multi_network_enabled_flag=
if [[ "true" == "true" ]]; then
multi_network_enabled_flag="--enable-multi-network"
fi
if [[ "true" == "true" ]]; then
if [[ "true" != "true" ]]; then
multi_network_enabled_flag="--enable-multi-network"
fi
network_segmentation_enabled_flag="--enable-network-segmentation"
fi
So in case --enable-network-segmentation we will always have enable-multi-network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyrtapz so I've just discussed this with @tsorya. The way that configuration parameters are handled in ovnk is not as intuitive as I thought.
The key point is that non-default values that are present in the configmap take precedence over everything else (https://github.com/ovn-kubernetes/ovn-kubernetes/blame/7c047281d46a6e5f154a0f5d3feb8866af3619f4/go-controller/pkg/config/config.go#L750), also over input arguments that we pass here in the yaml.
So that's why Igal is removing a set of parameters from the configmap and adding them as input args in the yaml. These parameters are disabled by default in upstream ovn-k and simply enabling/disabling them through input arguments while keeping them enabled in the configmap wouldn't have worked.
In the particular case above, the logic that sets the multi network and network segmentation flags is being moved from the configmap to the yaml. The fact that the logic from the original yaml is more restrictive than the one from the original configmap didn't really matter in the end, because it was the non-default values in the configmap (multi network enabled, udn network segmentation enabled) that were taking precedence over everything else.
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw |
@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kyrtapz, ricky-rav, tsorya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw |
@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest-required |
/retest |
/retest-required |
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw |
@tsorya: tsorya unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw |
@kyrtapz: Overrode contexts on behalf of kyrtapz: ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest-required |
1 similar comment
/retest-required |
@tsorya: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
13a1527
into
openshift:master
[ART PR BUILD NOTIFIER] Distgit: cluster-network-operator |
MGMT-21485: Enable dpu-host mode that matches DPF requirements
This commit introduces OVN_NODE_MODE environment variable to enable per-node
feature enforcement, particularly for DPU host mode where certain features
must be disabled regardless of cluster-wide configuration.
Move feature toggles from ConfigMap (004-config.yaml) to startup script (008-script-lib.yaml)
ConfigMap values cannot be reliably overridden per-node, but script logic can be conditional
Implement OVN_NODE_MODE-based conditional feature enablement in startup script
Add 'dpu-host' mode that automatically disables incompatible features:
Set gateway_interface='derive-from-mgmt-port' for DPU host nodes
Add ovnkube_node_mode='--ovnkube-node-mode dpu-host' flag
From bindata/network/ovn-kubernetes/*/004-config.yaml:
Note: HyperShift hosted cluster ConfigMap (managed/004-config.yaml) retains
egress feature flags as DPU host mode is not supported in hosted cluster
configurations.
Add conditional blocks in 008-script-lib.yaml based on OVN_NODE_MODE
Full mode (default): All features enabled as configured
DPU host mode: Incompatible features force-disabled
Maintain backward compatibility for existing deployments
Rename egress_ip_enable_flag to egress_features_enable_flag for clarity
Add comprehensive TestOVNKubernetesScriptLibCombined test covering:
Remove redundant individual test functions after consolidation
Update existing config rendering tests for new ConfigMap content
Update test assertions to use correct flag names (egress_features_enable_flag)
Create docs/ovn_node_mode.md with detailed technical explanation
Update docs/operands.md with OVN-Kubernetes node modes section
Update docs/architecture.md with per-node configuration explanation
Update README.md with DPU host mode support information
Add implementation details, feature mapping tables, and migration notes
Document multi-external gateway as disabled feature in DPU host mode
Update all references to use correct flag names
ConfigMap-based feature control cannot be overridden per-node, making it
impossible to disable features on specific node types (like DPU hosts) while
keeping them enabled cluster-wide. Moving the logic to startup scripts allows
the same cluster configuration to work across heterogeneous node types.
This change ensures that DPU host nodes automatically have incompatible
features disabled, preventing runtime failures and enabling mixed-mode
cluster deployments.
Existing clusters continue to work without changes
Default behavior (full mode) remains unchanged
Migration is automatic during upgrade process
No manual intervention required
HyperShift hosted clusters unaffected (DPU host mode not supported)
MGMT-21314: CNO enable advanced gateway detection in ovnkube in dpu host mode
Adding required gateway value for ovnk in dpu-host mode
This commit enables usage of ovnkube in dpu host mode: advanced gateway detection ovn-kubernetes/ovn-kubernetes#5327