-
-
Notifications
You must be signed in to change notification settings - Fork 15
feat!: Add working conversion webhook with cert rotation #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pub const WEBHOOK_CA_LIFETIME: Duration = Duration::from_minutes_unchecked(3); | ||
pub const WEBHOOK_CERTIFICATE_LIFETIME: Duration = Duration::from_minutes_unchecked(2); | ||
pub const WEBHOOK_CERTIFICATE_ROTATION_INTERVAL: Duration = Duration::from_minutes_unchecked(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder to bump these before merging. Currently they are so low for easy testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, I didn't look at the CertificateResolver
yet.
crates/stackable-operator/src/cli.rs
Outdated
@@ -216,6 +233,9 @@ pub struct ProductOperatorRun { | |||
#[arg(long, env, default_value = "")] | |||
pub watch_namespace: WatchNamespace, | |||
|
|||
#[command(flatten)] | |||
pub operator_environment: OperatorEnvironmentOpts, | |||
|
|||
#[command(flatten)] | |||
pub telemetry_arguments: TelemetryOptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note(non-blocking): I know this is unrelated, but can we simplify this to telemetry
to be inline with operator_environment
. This would be a breaking change for downstream operators, but the addition of operator_environment
already is breaking anyways.
note(non-blocking): Could we also do the same for cluster_info_opts
. The _opts
suffix seems redundant. We could also rename the struct to ClusterInfoOptions
, because Kubernetes is implied (by context) and I prefer Options
over Opts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding dropping Kubernetes prefix one could argue yes. But just yesterday someone talked about cluster and nodes and I was confused. He meant NifiCluster and nodes....
@@ -278,6 +298,18 @@ impl ProductConfigPath { | |||
} | |||
} | |||
|
|||
#[derive(clap::Parser, Debug, PartialEq, Eq)] | |||
pub struct OperatorEnvironmentOpts { | |||
/// The namespace the operator is running in, usually `stackable-operators`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: We should mention that setting the operator namespace is preferred to be done via the env variable in combination with the downward API (which should link to the official docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates/stackable-operator/src/cli.rs
Outdated
} | ||
); | ||
|
||
// env with namespace | ||
unsafe { env::set_var(WATCH_NAMESPACE, "foo") }; | ||
unsafe { env::set_var(OPERATOR_SERVICE_NAME, "foo-operator") }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I honestly don't know why we test this. This is again something what the clap
project tests upstream. If this breaks, the whole Rust user-base would be in deep trouble, because basically all CLI tools use clap
.
As such, I would just get rid of all those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it in a8a2381
crates/stackable-webhook/src/lib.rs
Outdated
/// A generic webhook handler receiving a request and state and sending back | ||
/// a response. | ||
/// | ||
/// This trait is not intended to be implemented by external crates and this | ||
/// library provides various ready-to-use implementations for it. One such an | ||
/// implementation is part of the [`ConversionWebhookServer`][1]. | ||
/// | ||
/// [1]: crate::servers::ConversionWebhookServer | ||
pub(crate) trait StatefulWebhookHandler<Req, Res, S> { | ||
fn call(self, req: Req, state: S) -> Res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It imposed maintenance effort to me while I figured out how the code should look like.
I'm also pretty confident we are never going to need it for a conversion web-hook. Therefore I removed it to have less maintenance effort going forward.
Also, stateful conversion webhooks sounds like a potential nightmare once we want to have HA and multiple instances.
We can obviously always re-add it in the case we really need it in the future.
let router = Router::new().route("/convert", post(handler_fn)); | ||
Self { router, options } | ||
} | ||
router = router.route(&format!("/convert/{crd_name}"), post(handler_fn)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Split this.
router = router.route(&format!("/convert/{crd_name}"), post(handler_fn)); | |
let route = format!("/convert/{crd_name}, crd_name = crd.name_any()); | |
router = router.route(&route), post(handler_fn)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope f1fee5d works for you
options: Options, | ||
client: Client, | ||
field_manager: impl Into<String> + Debug, | ||
operator_environment: OperatorEnvironmentOpts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: A good place for this might be the Options
for ConversionWebhookServer
.
/// } | ||
/// ``` | ||
#[instrument(name = "create_conversion_webhook_server_with_state", skip(handler))] | ||
pub fn new_with_state<H, S>(handler: H, state: S, options: Options) -> Self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a duplicate of #1066 (comment) to me
.context(ConvertCaToPemSnafu)?; | ||
|
||
let crd_api: Api<CustomResourceDefinition> = Api::all(client.clone()); | ||
for mut crd in crds.iter().cloned() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I feel like we shouldn't clone here, but instead mutate the CRDs (so that we also have an up2date version of the CRD in memory).
I also feel like we should be a little more clever on when to run all of the following code. We can skip running all of the below code for like 99% percent of the time, as the certificate is still valid and nothing needs to be adjusted in the conversion
section of the CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't really work out in terms of borrow checker (see below). Also I prefer the signature this way, as the only thing that we should change about the CRD is the conversion
(which changes every reconcile_crds
call).
Having multiple functions take owned or &mut CRDs makes this a bit less clear and some function might (accidentally) change things.
This clone is called every 7 days or so (cert lifetime), I'd say the performance is negligible here.
we should be a little more clever
Currently reconcile_crds
is called for every new cert, so every 7 days or so and every time the cert changes.
Even if there would be no change, k8s would not do anything, as the object is unchanged.
diff --git a/crates/stackable-webhook/src/servers/conversion.rs b/crates/stackable-webhook/src/servers/conversion.rs
index 03fc3b2..0d265b1 100644
--- a/crates/stackable-webhook/src/servers/conversion.rs
+++ b/crates/stackable-webhook/src/servers/conversion.rs
@@ -189,7 +189,7 @@ impl ConversionWebhookServer {
Self::reconcile_crds(
&client,
&field_manager,
- &crds,
+ crds.clone(),
&operator_environment,
¤t_cert,
)
@@ -240,14 +240,14 @@ impl ConversionWebhookServer {
mut cert_rx: mpsc::Receiver<Certificate>,
client: &Client,
field_manager: &str,
- crds: &[CustomResourceDefinition],
+ mut crds: Vec<CustomResourceDefinition>,
operator_environment: &OperatorEnvironmentOptions,
) -> Result<(), ConversionWebhookError> {
while let Some(current_cert) = cert_rx.recv().await {
Self::reconcile_crds(
client,
field_manager,
- crds,
+ &mut crds,
operator_environment,
¤t_cert,
)
@@ -261,7 +261,7 @@ impl ConversionWebhookServer {
async fn reconcile_crds(
client: &Client,
field_manager: &str,
- crds: &[CustomResourceDefinition],
+ crds: &mut [CustomResourceDefinition],
operator_environment: &OperatorEnvironmentOptions,
current_cert: &Certificate,
) -> Result<(), ConversionWebhookError> {
Co-authored-by: Techassi <[email protected]>
Description
Part of stackabletech/issues#642
An working example usage can be found in stackabletech/zookeeper-operator#958 (mainly look at
rust/operator-binary/src/main.rs
)Definition of Done Checklist
Author
Reviewer
Acceptance