WIP: pkg/agent: wait for all volumes to be detached before rebooting #169

invidian · 2022-06-13T17:11:52Z

This commit provides PoC version of implementing agent waiting for all
volumtes attached to the node to be detached as a step after draining
the node, as shutting down the Pod does not mean the volume has been
detached, as usually CSI agent will be running as a DaemonSet on the
node and will take care of detaching the volume from the node when the
pod shuts down.

This commit improves rebooting experience, as right now if there is not
enough time for CSI agent to detach the volumes from the node, node gets
rebooted and pods using attached volumes have no way to be attached to
other nodes, which effectively increases the downtime caused for
stateful workloads.

This commit still requires tests and better interface for the users.

If someone wants to try this feature on their own cluster, I've
published the following image I've been testing with:

quay.io/invidian/flatcar-linux-update-operator:97c0dee50c807dbba7d2debc59b369f84002797e

Closes #30

Signed-off-by: Mateusz Gozdek [email protected]

invidian · 2022-06-13T17:12:42Z

We should also consider compatibility with k8s versions before merging.

invidian · 2022-06-26T15:11:22Z

Just hit some issue with this code:

Draining failed because one workload couldn't satisfy the PDB.
Waiting for volume detachment never finished.

Perhaps we should also have some timeout while waiting for volumes to be detached.

This commit provides PoC version of implementing agent waiting for all volumtes attached to the node to be detached as a step after draining the node, as shutting down the Pod does not mean the volume has been detached, as usually CSI agent will be running as a DaemonSet on the node and will take care of detaching the volume from the node when the pod shuts down. This commit improves rebooting experience, as right now if there is not enough time for CSI agent to detach the volumes from the node, node gets rebooted and pods using attached volumes have no way to be attached to other nodes, which effectively increases the downtime caused for stateful workloads. This commit still requires tests and better interface for the users. If someone wants to try this feature on their own cluster, I've published the following image I've been testing with: quay.io/invidian/flatcar-linux-update-operator:97c0dee50c807dbba7d2debc59b369f84002797e Closes #30 Signed-off-by: Mateusz Gozdek <[email protected]>

invidian · 2023-01-11T18:46:54Z

I also found a bug with RBAC which is now fixed.

invidian · 2023-01-17T13:46:30Z

pkg/agent/agent.go


 	klog.Info("Node drained, rebooting")

+	for {


We should make this optional behind a opt-in flag.

invidian · 2023-01-17T13:46:51Z

pkg/agent/agent.go

 		klog.Errorf("Ignoring node drain error and proceeding with reboot: %v", err)
 	}

 	klog.Info("Node drained, rebooting")


This message needs to be adjusted (or perhaps moved below the volumes detachment).

invidian · 2023-01-17T13:51:46Z

pkg/agent/agent.go

+			klog.Errorf("Listing volume attachments: %v", err)
+			continue


We should probably add some mechanism here to give up, for example a timeout.

invidian · 2023-01-17T13:58:02Z

pkg/agent/agent.go

+			break
+		}
+
+		time.Sleep(5 * time.Second)


Perhaps this should also be adjustable.

invidian mentioned this pull request Jun 13, 2022

Ensure update-agent waits for all volumes to be detached before rebooting #30

Open

invidian force-pushed the invidian/wait-for-volumes-detach branch from 2affae3 to 88957e7 Compare January 11, 2023 17:47

invidian commented Jan 17, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: pkg/agent: wait for all volumes to be detached before rebooting #169

WIP: pkg/agent: wait for all volumes to be detached before rebooting #169

Uh oh!

invidian commented Jun 13, 2022

Uh oh!

invidian commented Jun 13, 2022

Uh oh!

invidian commented Jun 26, 2022 •

edited

Loading

Uh oh!

invidian commented Jan 11, 2023

Uh oh!

invidian Jan 17, 2023

Uh oh!

invidian Jan 17, 2023

Uh oh!

invidian Jan 17, 2023

Uh oh!

invidian Jan 17, 2023

Uh oh!

Uh oh!

WIP: pkg/agent: wait for all volumes to be detached before rebooting #169

Are you sure you want to change the base?

WIP: pkg/agent: wait for all volumes to be detached before rebooting #169

Uh oh!

Conversation

invidian commented Jun 13, 2022

Uh oh!

invidian commented Jun 13, 2022

Uh oh!

invidian commented Jun 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

invidian commented Jan 11, 2023

Uh oh!

invidian Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

invidian Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

invidian Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

invidian Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

invidian commented Jun 26, 2022 •

edited

Loading