Skip to content

Add note about istio-cni security #16749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions content/en/docs/ops/deployment/security-model/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ The implications of this are discussed [below](#node-compromise).
Because this consolidates the elevated privileges required to setup networking into a single pod, rather than *every* pod,
this option is generally recommended.

#### Ambient mode

In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node.
Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if [using node cordons and taints](/docs/ambient/upgrade/helm#cni-node-agent). In rare cases (e.g. on node restart or new node scale-out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted.
The Istio community is working with [the CNI](https://github.com/containernetworking/cni/pull/1052) and [Kubernetes communities](https://github.com/kubernetes/kubernetes/issues/130594) to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/istio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions.

Reformatted a little for clarity. I wouldn't have thought a pod restart was required anywhere along the way, but one is mentioned? (In the context of the node taint controller, @ilrudie told me that it was only a problem until the CNI agent claimed the pod)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. please fast-follow by documenting that owned CNI mode feature somewhere other than a relnote!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately a pod restart is required due to the nature of the bug 😓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should we document an env var based feature like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we really not handle all existing pods when the CNI starts? We used to be able to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keithmattix What are you thoughts on including this update in 1.27.1 after we address the additional bugs above and have an alternative solution?

Copy link
Contributor

@jaellio jaellio Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we really not handle all existing pods when the CNI starts? We used to be able to do that.

@ilrudie (not positive this is what you meant, so correct me if I misunderstood :) )

Currently, our reconciliation on istio-cni happens before we install the istio cni. binary and config, so if a pod doesn't have a net ns when the snapshot is taken for reconciliation (we ignore it in this case) and has a net ns configured before istio-cni is installed then the pod will only be meshed after a restart.

We could fix this by changing the ordering of reconciliation and install as discussed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you thoughts on including this update in 1.27.1 after we address the additional bugs above and have an alternative solution?

This sounds good to me - I wasn't aware/didn't fully understand the reconciliation aspect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should we document an env var based feature like this?

Somewhere in the ambient documentation relating to the CNI.
Perhaps https://istio.io/latest/docs/ambient/architecture/traffic-redirection/ for now?

We need to separate out the page under the "sidecar" section at some point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaellio, that makes sense. I thought we did reconcile after installing the binary but if we don't then I can totally see how we'd sometimes wind up in limbo like this.

Thanks for the explanation.


### Sidecar Proxies

Istio may [optionally](/docs/overview/dataplane-modes/) deploy a sidecar proxy next to an application.
Expand Down