-
Notifications
You must be signed in to change notification settings - Fork 600
OCPBUGS-57456: podman-etcd should keep the container for crash debugging #2062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
OCPBUGS-57456: podman-etcd should keep the container for crash debugging #2062
Conversation
Replace embedded configuration arguments with external files: - etcd config file (/var/lib/etcd/config.yaml) - podman env file (/var/lib/etcd/env.yaml) This allows configuration changes to be applied via container restart rather than requiring full container recreation.
This commit modifies the podman-etcd resource agent to conditionally reuse existing containers, preventing their removal and preserving historical logs. The agent now: * Compares etcd-pod.yaml changes to decide whether to reuse an existing container or create a new one. * Retains containers after stops to ensure logs are always available for debugging.
No need to move further in the podman_start function if a container is already running. Signed-off-by: Carlo Lobrano <[email protected]>
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2062/1/input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definetely seems like a step forward for debugging purposes. I am concerned about ending up with an infinite list of old containers though. Is is possible to just keep the previous and current ones?
The existing container is deleted right before starting a new one. I could improve it to keep the last one only |
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2062/2/input |
As this PR also introduces configuration files for podman (env.yaml) and etcd (config.yaml), I also need to backup those files together with the podman container, otherwise we will be unable to check what data etcd is started with. /hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the new file-based configuration options. It think it makes it easier to follow along to where the configuration options are being sourced and propagated. Excited to also have these logs preserved.
heartbeat/podman-etcd
Outdated
FORCE_NEW_CLUSTER=false | ||
fi | ||
|
||
cat > "$OCF_RESKEY_podman_env_file" << EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the collection of commented-out options, are these things we're leaving here because we expect to enable them down the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those were leftovers I forgot to remove. However I noticed that some env variables (e.g. etcd_data) were not correctly used, and, thinking more about it, the env file for podman is not necessary anyway, as the only thing that can really change is the etcd command line. This to say that I need to push a version without podman env file :)
48e121c
to
d5da5f6
Compare
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2062/3/input |
d5da5f6
to
e6f68d0
Compare
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2062/4/input |
e6f68d0
to
4c021cc
Compare
When recreating the container due to configuration changes, archive the current stopped container and its config file by renaming them to `${CONTAINER}-previous` and `/var/lib/etcd/config-previous.yaml`. This preserves the container state for debugging while the new instance is created. Falls back to deletion if archiving fails. Only one archived copy is maintained to limit disk usage. Note: the use of an environment file for podman was removed as it is an unnecessary complication. If the environment changes we are forced to create a new container anyway.
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2062/5/input |
4c021cc
to
1353f02
Compare
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2062/6/input |
Let's keep it on hold again. Notice this line from etcd that should be investigated more
|
/hold |
This change modifies the
podman-etcd
resource agent to conditionally reuse existing containers, preventing their removal and preserving historical logs.As preliminary change, it replaces inline configuration arguments with external file:
This allows configuration changes to be applied via container restart rather than requiring full container recreation.
The agent can then
pod.yaml
manifest to decide whether to reuse an existing container or create a new one.-previous
suffix, together with the correspondentconfig.yaml
, renamedconfig-previous.yaml
, to allow further debugging.