Description
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? e.g. registry.opensource.zalan.do/acid/postgres-operator:v1.9.0
- Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
- Are you running Postgres Operator in production? yes
- Type of issue? Bug report / question
Hello dear maintainers! Before we get into the details of what brings us here, we'd like to thank you for sharing this quite useful project, and your availability when considering our problems with it (see #1983 and #1984). That was really appreciated on our side ❤️
Context
We have a three nodes K8s cluster (bare-metal) on which we run a three nodes PostgreSQL cluster managed by Postgres Operator.
The configuration includes Patroni settings to enforce a sync_standby replica is always available:
spec:
# [...]
patroni:
synchronous_mode: true
synchronous_node_count: 1
We are running a rolling update of our K8s cluster by draining nodes in a sequence, before upgrading control plane services. During the drain, Postgres Operator takes care of switching over the master to a selected candidate.
Problem
We have seen a situation where, for some reason, Patroni returns a list of members which do not include a single sync_standby
replica, even though the current master logged that it assigned the role to a replica before we request this list.
Relevant logs can be found at the bottom of this ticket description.
The selected candidate (by getSwitchoverCandidate
) is unfortunately not the one which was promoted by the master. Then, the failover request sent to the master is rejected, since the candidate is not sync_standy even though synchronous_mode is enabled in configuration. And all subsequent retries are done with the same candidate, which continue failing.
We believe there are two issues:
getSwitchoverCandidate
loosely assumes that ifpatroni.GetClusterMembers
does not include anysync_standby
replica, thensynchronous_mode
is not enabled: but our problematic situation shows that this assumption may be wrongMigrateMasterPod
performs a retry on the actualSwitchover
call to Patroni, but always re-uses the same candidate, until it stops with an error - this whole method doesn't seem to be retried, which may be another problem in itself
Proposal
-
Bail out when no sync replicas can be found if synchronous_mode is enabled
- We could change
getSwitchoverCandidate
to read fromc.Spec.SynchronousMode
instead of assuming anything, and return an error in caselen(syncCandidates) == 0
- We could change
-
Include the candidate selection in the retried loop within
MigrateMasterPod
to allow recovering from such failures
We would be happy to submit a PR if you agree with this proposal. We are also unsure whether this behavior from Patroni is expected or if it's a bug (see Logs section below for details), in which case we would also submit an issue in their project.
Logs
-
postgres-operator
logs-
When draining the first node:
Expand
time="2023-02-21T14:21:06Z" level=info msg="moving pods: node \"/node01\" became unschedulable and does not have a ready label: map[]" pkg=controller time="2023-02-21T14:21:06Z" level=info msg="starting process to migrate master pod \"my-ns/postgres-2\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:06Z" level=debug msg="making GET http request: http://10.233.3.147:8008/cluster" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:06Z" level=info msg="check failed: pod \"my-ns/postgres-1\" is already on a live node" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:06Z" level=debug msg="switching over from \"postgres-2\" to \"my-ns/postgres-1\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:06Z" level=debug msg="subscribing to pod \"my-ns/postgres-1\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:06Z" level=debug msg="making POST http request: http://10.233.3.147:8008/failover" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:08Z" level=debug msg="successfully switched over from \"postgres-2\" to \"my-ns/postgres-1\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:09Z" level=debug msg="unsubscribing from pod \"my-ns/postgres-1\" events" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:21:09Z" level=info msg="1/1 master pods have been moved out from the \"/node01\" node" pkg=controller
-
When draining the second node, we see similar logs as for the first one
-
When draining the last node (and failing):
Expand
time="2023-02-21T14:31:07Z" level=info msg="moving pods: node \"/node03\" became unschedulable and does not have a ready label: map[]" pkg=controller time="2023-02-21T14:31:07Z" level=info msg="starting process to migrate master pod \"my-ns/postgres-0\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:07Z" level=debug msg="making GET http request: http://10.233.46.63:8008/cluster" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:07Z" level=info msg="check failed: pod \"my-ns/postgres-1\" is already on a live node" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:07Z" level=debug msg="switching over from \"postgres-0\" to \"my-ns/postgres-1\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:07Z" level=debug msg="subscribing to pod \"my-ns/postgres-1\"" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:07Z" level=debug msg="making POST http request: http://10.233.46.63:8008/failover" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:08Z" level=debug msg="unsubscribing from pod \"my-ns/postgres-1\" events" cluster-name=my-ns/postgres pkg=cluster time="2023-02-21T14:31:08Z" level=error msg="could not failover to pod \"my-ns/postgres-1\": could not switch over from \"postgres-0\" to \"my-ns/postgres-1\": patroni returned 'Failover failed'" cluster-name=my-ns/postgres pkg=cluster [... more attempts] time="2023-02-21T14:35:08Z" level=error msg="could not move master pod \"my-ns/postgres-0\": could not migrate master pod: still failing after 5 retries" pkg=controller time="2023-02-21T14:35:08Z" level=info msg="0/1 master pods have been moved out from the \"/node03\" node" pkg=controller time="2023-02-21T14:35:08Z" level=warning msg="failed to move master pods from the node \"node03\": could not move master 1/1 pods from the \"/node03\" node" pkg=controller
-
-
postgres-0
(master) logs:-
after failover from
postgres-1
topostgres-0
succeededExpand
2023-02-21 14:26:07,349 INFO: Cleaning up failover key after acquiring leader lock... 2023-02-21 14:26:07,516 INFO: promoted self to leader by acquiring session lock 2023-02-21 14:26:07,524 INFO: cleared rewind state after becoming the leader 2023-02-21 14:26:07,517 INFO: Lock owner: postgres-0; I am postgres-0 2023-02-21 14:26:07,652 INFO: updated leader lock during promote 2023-02-21 14:26:19,358 INFO: Lock owner: postgres-0; I am postgres-0 2023-02-21 14:26:19,412 INFO: Assigning synchronous standby status to ['postgres-2'] 2023-02-21 14:26:21,477 INFO: Synchronous standby status assigned to ['postgres-2']
-
when trying to failover to
postgres-1
again (because we're drainingnode03
)Expand
2023-02-21 14:31:07,812 INFO: received failover request with leader=postgres-0 candidate=postgres-1 scheduled_at=None 2023-02-21 14:31:07,825 INFO: Got response from postgres-1 http://10.233.213.196:8008/patroni: {"state": "running", "postmaster_start_time": "2023-02-21 14:30:25.180516+00:00", "role": "replica", "server_version": 140004, "xlog": {"received_location": 1595645080, "replayed_location": 1595645080, "replayed_timestamp": "2023-02-21 14:31:00.143559+00:00", "paused": false}, "timeline": 21, "dcs_last_seen": 1676989859, "database_system_identifier": "7174390007193890885", "patroni": {"version": "2.1.4", "scope": "postgres"}} 2023-02-21 14:31:07,968 INFO: Lock owner: postgres-0; I am postgres-0 2023-02-21 14:31:08,018 WARNING: Failover candidate=postgres-1 does not match with sync_standbys=None 2023-02-21 14:31:08,018 WARNING: manual failover: members list is empty 2023-02-21 14:31:08,018 WARNING: manual failover: no healthy members found, failover is not possible 2023-02-21 14:31:08,019 INFO: Cleaning up failover key 2023-02-21 14:31:08,068 INFO: Assigning synchronous standby status to ['postgres-2'] 2023-02-21 14:31:08,116 INFO: Synchronous standby status assigned to ['postgres-2']
Note: it appears that after the second failover attempt, the master reassigned the sync standby status to
postgres-2
while it said it already did so after becoming master. This may be an issue in itself, which we weren't able to understand yet.
-