Skip to content

Conversation

kyrtapz
Copy link
Contributor

@kyrtapz kyrtapz commented Aug 4, 2025

Add CEL expression to ignore default/openshift-ovn-kubernetes NAD to prevent circular dependency where ovn-k fails to start because multus webhook blocks NAD creation, while webhook uses cluster-networked pdos which require ovn-k to be running.

Order of events on install:

  1. OVN-Kubernetes and Multus manifests(webhook included) get applied.
  2. OVN-Kubernetes starts and tries to apply the default NAD but fails because the webhook is not running:
failed to run ovnkube: failed to start cluster manager: failed to ensure default network nad exists: Internal error occurred: failed calling webhook "multus-validating-config.k8s.io": failed to call webhook: Post "https://multus-admission-controller.openshift-multus.svc:443/validate?timeout=30s": no endpoints available for service "multus-admission-controller"
  1. Multus webhook doesn't start because it is a cluster-networked deployment behind a service so it depends on ovn-k being up.

@openshift-ci-robot
Copy link
Contributor

@kyrtapz: This pull request explicitly references no jira issue.

In response to this:

Add CEL expression to ignore default/openshift-ovn-kubernetes NAD to prevent circular dependency where ovn-k fails to start because multus webhook blocks NAD creation, while webhook uses cluster-networked pdos which require ovn-k to be running.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 4, 2025
@openshift-ci openshift-ci bot requested review from danwinship and ricky-rav August 4, 2025 15:14
Copy link
Contributor

openshift-ci bot commented Aug 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kyrtapz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2025
@kyrtapz kyrtapz changed the title NO-JIRA: Avoid webhook race with ovn-kubernetes on install CORENET-6261: Avoid webhook race with ovn-kubernetes on install Aug 4, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 4, 2025

@kyrtapz: This pull request references CORENET-6261 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Add CEL expression to ignore default/openshift-ovn-kubernetes NAD to prevent circular dependency where ovn-k fails to start because multus webhook blocks NAD creation, while webhook uses cluster-networked pdos which require ovn-k to be running.

Order of events on install:

  1. OVN-Kubernetes and Multus manifests(webhook included) get applied.
  2. OVN-Kubernetes starts and tries to apply the default NAD but fails because the webhook is not running:
failed to run ovnkube: failed to start cluster manager: failed to ensure default network nad exists: Internal error occurred: failed calling webhook "multus-validating-config.k8s.io": failed to call webhook: Post "https://multus-admission-controller.openshift-multus.svc:443/validate?timeout=30s": no endpoints available for service "multus-admission-controller"
  1. Multus webhook doesn't start because it is a cluster-networked deployment so it depends on ovn-k being up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 4, 2025

@kyrtapz: This pull request references CORENET-6261 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Add CEL expression to ignore default/openshift-ovn-kubernetes NAD to prevent circular dependency where ovn-k fails to start because multus webhook blocks NAD creation, while webhook uses cluster-networked pdos which require ovn-k to be running.

Order of events on install:

  1. OVN-Kubernetes and Multus manifests(webhook included) get applied.
  2. OVN-Kubernetes starts and tries to apply the default NAD but fails because the webhook is not running:
failed to run ovnkube: failed to start cluster manager: failed to ensure default network nad exists: Internal error occurred: failed calling webhook "multus-validating-config.k8s.io": failed to call webhook: Post "https://multus-admission-controller.openshift-multus.svc:443/validate?timeout=30s": no endpoints available for service "multus-admission-controller"
  1. Multus webhook doesn't start because it is a cluster-networked deployment behind a service so it depends on ovn-k being up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Add CEL expression to ignore default/openshift-ovn-kubernetes NAD to prevent
circular dependency where ovn-k fails to start because multus webhook blocks
NAD creation, while webhook uses cluster-networked pdos which require ovn-k to be running.

Signed-off-by: Patryk Diak <[email protected]>
@kyrtapz kyrtapz force-pushed the ignore_default_nad branch from fff43c3 to bea25d9 Compare August 5, 2025 09:15
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

@kyrtapz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ovn-step-registry bea25d9 link false /test e2e-ovn-step-registry
ci/prow/4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-upgrade bea25d9 link false /test 4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-upgrade
ci/prow/e2e-aws-ovn-upgrade bea25d9 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 bea25d9 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-aws-ovn-windows bea25d9 link true /test e2e-aws-ovn-windows
ci/prow/4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade bea25d9 link false /test 4.20-upgrade-from-stable-4.19-e2e-aws-ovn-upgrade
ci/prow/e2e-openstack-ovn bea25d9 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-serial bea25d9 link false /test e2e-aws-ovn-serial
ci/prow/e2e-vsphere-ovn bea25d9 link false /test e2e-vsphere-ovn
ci/prow/e2e-aws-ovn-hypershift-conformance bea25d9 link true /test e2e-aws-ovn-hypershift-conformance
ci/prow/4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade bea25d9 link false /test 4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw bea25d9 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/e2e-aws-hypershift-ovn-kubevirt bea25d9 link false /test e2e-aws-hypershift-ovn-kubevirt
ci/prow/security bea25d9 link false /test security
ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration bea25d9 link false /test e2e-aws-ovn-shared-to-local-gateway-mode-migration
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp bea25d9 link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/4.21-upgrade-from-stable-4.20-images bea25d9 link true /test 4.21-upgrade-from-stable-4.20-images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants