Skip to content

feat!: Add working conversion webhook with cert rotation #1066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

sbernauer
Copy link
Member

@sbernauer sbernauer commented Jun 30, 2025

Description

Part of stackabletech/issues#642

An working example usage can be found in stackabletech/zookeeper-operator#958 (mainly look at rust/operator-binary/src/main.rs)

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added

@sbernauer sbernauer changed the title feat(webhook): Add functioning conversion webhook with cert rotation feat(webhook): Add working conversion webhook with cert rotation Jul 2, 2025
Comment on lines +29 to +31
pub const WEBHOOK_CA_LIFETIME: Duration = Duration::from_minutes_unchecked(3);
pub const WEBHOOK_CERTIFICATE_LIFETIME: Duration = Duration::from_minutes_unchecked(2);
pub const WEBHOOK_CERTIFICATE_ROTATION_INTERVAL: Duration = Duration::from_minutes_unchecked(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to bump these before merging. Currently they are so low for easy testing

@sbernauer sbernauer moved this to Development: In Progress in Stackable Engineering Jul 2, 2025
@sbernauer sbernauer moved this from Development: In Progress to Development: In Review in Stackable Engineering Jul 2, 2025
@sbernauer sbernauer moved this from Development: In Review to Development: Waiting for Review in Stackable Engineering Jul 2, 2025
@Techassi Techassi self-requested a review July 3, 2025 06:43
@Techassi Techassi moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Jul 3, 2025
Copy link
Member

@Techassi Techassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, I didn't look at the CertificateResolver yet.

@@ -216,6 +233,9 @@ pub struct ProductOperatorRun {
#[arg(long, env, default_value = "")]
pub watch_namespace: WatchNamespace,

#[command(flatten)]
pub operator_environment: OperatorEnvironmentOpts,

#[command(flatten)]
pub telemetry_arguments: TelemetryOptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note(non-blocking): I know this is unrelated, but can we simplify this to telemetry to be inline with operator_environment. This would be a breaking change for downstream operators, but the addition of operator_environment already is breaking anyways.

note(non-blocking): Could we also do the same for cluster_info_opts. The _opts suffix seems redundant. We could also rename the struct to ClusterInfoOptions, because Kubernetes is implied (by context) and I prefer Options over Opts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2ec9b73

Regarding dropping Kubernetes prefix one could argue yes. But just yesterday someone talked about cluster and nodes and I was confused. He meant NifiCluster and nodes....

@@ -278,6 +298,18 @@ impl ProductConfigPath {
}
}

#[derive(clap::Parser, Debug, PartialEq, Eq)]
pub struct OperatorEnvironmentOpts {
/// The namespace the operator is running in, usually `stackable-operators`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: We should mention that setting the operator namespace is preferred to be done via the env variable in combination with the downward API (which should link to the official docs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
);

// env with namespace
unsafe { env::set_var(WATCH_NAMESPACE, "foo") };
unsafe { env::set_var(OPERATOR_SERVICE_NAME, "foo-operator") };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I honestly don't know why we test this. This is again something what the clap project tests upstream. If this breaks, the whole Rust user-base would be in deep trouble, because basically all CLI tools use clap.

As such, I would just get rid of all those tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it in a8a2381

Comment on lines 59 to 69
/// A generic webhook handler receiving a request and state and sending back
/// a response.
///
/// This trait is not intended to be implemented by external crates and this
/// library provides various ready-to-use implementations for it. One such an
/// implementation is part of the [`ConversionWebhookServer`][1].
///
/// [1]: crate::servers::ConversionWebhookServer
pub(crate) trait StatefulWebhookHandler<Req, Res, S> {
fn call(self, req: Req, state: S) -> Res;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why was this removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It imposed maintenance effort to me while I figured out how the code should look like.
I'm also pretty confident we are never going to need it for a conversion web-hook. Therefore I removed it to have less maintenance effort going forward.
Also, stateful conversion webhooks sounds like a potential nightmare once we want to have HA and multiple instances.
We can obviously always re-add it in the case we really need it in the future.

let router = Router::new().route("/convert", post(handler_fn));
Self { router, options }
}
router = router.route(&format!("/convert/{crd_name}"), post(handler_fn));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Split this.

Suggested change
router = router.route(&format!("/convert/{crd_name}"), post(handler_fn));
let route = format!("/convert/{crd_name}, crd_name = crd.name_any());
router = router.route(&route), post(handler_fn));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope f1fee5d works for you

options: Options,
client: Client,
field_manager: impl Into<String> + Debug,
operator_environment: OperatorEnvironmentOpts,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: A good place for this might be the Options for ConversionWebhookServer.

/// }
/// ```
#[instrument(name = "create_conversion_webhook_server_with_state", skip(handler))]
pub fn new_with_state<H, S>(handler: H, state: S, options: Options) -> Self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Why is this removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a duplicate of #1066 (comment) to me

.context(ConvertCaToPemSnafu)?;

let crd_api: Api<CustomResourceDefinition> = Api::all(client.clone());
for mut crd in crds.iter().cloned() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I feel like we shouldn't clone here, but instead mutate the CRDs (so that we also have an up2date version of the CRD in memory).

I also feel like we should be a little more clever on when to run all of the following code. We can skip running all of the below code for like 99% percent of the time, as the certificate is still valid and nothing needs to be adjusted in the conversion section of the CRD.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't really work out in terms of borrow checker (see below). Also I prefer the signature this way, as the only thing that we should change about the CRD is the conversion (which changes every reconcile_crds call).
Having multiple functions take owned or &mut CRDs makes this a bit less clear and some function might (accidentally) change things.
This clone is called every 7 days or so (cert lifetime), I'd say the performance is negligible here.

we should be a little more clever

Currently reconcile_crds is called for every new cert, so every 7 days or so and every time the cert changes.
Even if there would be no change, k8s would not do anything, as the object is unchanged.

diff --git a/crates/stackable-webhook/src/servers/conversion.rs b/crates/stackable-webhook/src/servers/conversion.rs
index 03fc3b2..0d265b1 100644
--- a/crates/stackable-webhook/src/servers/conversion.rs
+++ b/crates/stackable-webhook/src/servers/conversion.rs
@@ -189,7 +189,7 @@ impl ConversionWebhookServer {
         Self::reconcile_crds(
             &client,
             &field_manager,
-            &crds,
+            crds.clone(),
             &operator_environment,
             &current_cert,
         )
@@ -240,14 +240,14 @@ impl ConversionWebhookServer {
         mut cert_rx: mpsc::Receiver<Certificate>,
         client: &Client,
         field_manager: &str,
-        crds: &[CustomResourceDefinition],
+        mut crds: Vec<CustomResourceDefinition>,
         operator_environment: &OperatorEnvironmentOptions,
     ) -> Result<(), ConversionWebhookError> {
         while let Some(current_cert) = cert_rx.recv().await {
             Self::reconcile_crds(
                 client,
                 field_manager,
-                crds,
+                &mut crds,
                 operator_environment,
                 &current_cert,
             )
@@ -261,7 +261,7 @@ impl ConversionWebhookServer {
     async fn reconcile_crds(
         client: &Client,
         field_manager: &str,
-        crds: &[CustomResourceDefinition],
+        crds: &mut [CustomResourceDefinition],
         operator_environment: &OperatorEnvironmentOptions,
         current_cert: &Certificate,
     ) -> Result<(), ConversionWebhookError> {

@Techassi Techassi changed the title feat(webhook): Add working conversion webhook with cert rotation feat!: Add working conversion webhook with cert rotation Jul 3, 2025
@sbernauer sbernauer requested a review from Techassi July 4, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Development: In Review
Development

Successfully merging this pull request may close these issues.

3 participants