Skip to content

Inaccurate status-"RUNNING" shown when pod is stuck in pending state #2864

Open
@RavinaChidambaram

Description

@RavinaChidambaram
  • Which image of the operator are you using? e.g. ghcr.io/zalando/postgres-operator:v1.12.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? K8s
  • Are you running Postgres Operator in production? yes
  • Type of issue? Bug

Hi,
I deployed a PostgreSQL instance, and the pods were stuck in pending state. During this time, the PostgresClusterStatus was set to Creating. After some time the postgres status was set to CreateFailed and the following warning was observed in the operator logs:

time="2025-02-14T11:07:11Z" level=error msg="failed to create cluster: pod labels error: still failing after 200 retries" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:07:11Z" level=warning msg="cluster created failed: pod labels error: still failing after 200 retries" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:07:11Z" level=error msg="could not create cluster: pod labels error: still failing after 200 retries" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=controller worker=2

And later when the sync event is called, it patches the PostgresClusterStatus to Running which is incorrect as the pods are still stuck in pending state.

Image

This incorrect status is misleading, as it serves as the primary way for users to track the PostgreSQL cluster's state.

Logs from the operator during this sync event:

time="2025-02-14T11:11:29Z" level=debug msg="syncing Patroni config" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:29Z" level=warning msg="Patroni config updated? false - errors during config sync: could not get Postgres config from pod test-pg-demo/tcl-minimal-cluster-demo-0: could not get Postgres config from pod test-pg-demo/tcl-minimal-cluster-demo-0:  is not a valid IP', 'could not get Postgres config from pod test-pg-demo/tcl-minimal-cluster-demo-1: could not get Postgres config from pod test-pg-demo/tcl-minimal-cluster-demo-1:  is not a valid IP" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:29Z" level=error msg="errors while restarting Postgres in pods via Patroni API: could not restart Postgres in  pod test-pg-demo/tcl-minimal-cluster-demo-0: could not get member data:  is not a valid IP', 'could not restart Postgres in  pod test-pg-demo/tcl-minimal-cluster-demo-1: could not get member data:  is not a valid IP" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:29Z" level=debug msg="syncing pod disruption budgets" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:29Z" level=debug msg="syncing roles" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:29Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:44Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:11:59Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:12:14Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:12:29Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:12:44Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:12:59Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:14Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:14Z" level=error msg="could not sync roles: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:14Z" level=debug msg="syncing databases" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:14Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:29Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:44Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:13:59Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:14Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:29Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:44Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=warning msg="could not connect to Postgres database: dial tcp 10.245.50.134:5432: connect: connection refused" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=error msg="could not sync databases: could not init database connection" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=debug msg="syncing prepared databases with schemas" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=debug msg="syncing connection pooler (master, replica) from (false, nil) to (false, nil)" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=info msg="identified non running pod, potentially skipping major version upgrade" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=info msg="identified non running pod, potentially skipping major version upgrade" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=cluster worker=2
time="2025-02-14T11:14:59Z" level=info msg="cluster has been synced" cluster-name=test-pg-demo/tcl-minimal-cluster-demo pkg=controller worker=2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions