Skip to content

Reproduce, Validate, and Document RDS Upgrade Blocker for Logical ReplicationΒ #7726

@FolarinOyenuga

Description

@FolarinOyenuga

Background

Users with RDS databases that have rds.logical_replication enabled are encountering failed major version upgrades, even when Terraform applies successfully. The upgrade silently fails during AWS precheck validation if logical replication slots or pglogical nodes exist in the database.
This issue is becoming common as more teams adopt logical replication configurations for CDC/pglogical use cases.

Example 1: Logical Replication Slots

A user attempted to upgrade RDS from PostgreSQL 16.8 β†’ 17.6. Despite successful Terraform apply, the database never upgraded, the thread can be found here.
Precheck logs:

=== Latest Precheck Log Content ===
------------------------------------------------------------------
Upgrade could not be run on Mon Oct 20 12:03:29 2025
------------------------------------------------------------------
The instance could not be upgraded from 16.8.R1 to 17.6.R2 because of following reasons. Please take 
appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.                                                                                  - The instance could not be upgraded because it has one or more logical replication slots. Please dro
p all logical replication slots and try again.  

Key findings: Logical replication slots

  • Dev/prod databases upgraded successfully (no slots present)
  • Preprod database failed (2 active slots in pg_replication_slots table)

Example 2: pglogical Nodes

Another user encountered a similar issue but with a different blocker. Thread can be found here.
Precheck logs:

------------------------------------------------------------------
Upgrade could not be run on Tue Oct 21 12:35:25 2025
------------------------------------------------------------------
The instance could not be upgraded from 16.8.R2 to 17.4.R2 because of following reasons. Please take appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.
- Following usages in database 'db74e3c6cbeecd38ba' need to be corrected before upgrade:
-- The instance can't be upgraded while the database has pglogical nodes created using pglogical extension. Drop all pglogical nodes and try again.

----------------------- END OF LOG ----------------------

Key findings: pglogical Nodes

  • User checked pg_replication_slots and found nothing
  • Issue was pglogical nodes, not replication slots
  • Different fix required using pglogical extension functions

Common Root Cause

Both issues stem from the same Terraform configuration enabling logical replication.

Proposed user journey

Approach

Phase 1: Test Logical Replication Slots

  • Create a test RDS database with PostgreSQL 16.*
  • Enable logical replication slots, an example can be found between line 35 to 61 in this PR
  • Create slots:
    SELECT pg_create_logical_replication_slot('test_slot', 'pgoutput');
  • Verify slot exists:
    SELECT * FROM pg_replication_slots WHERE slot_type = 'logical';
  • Attempt a major version upgrade
  • Capture failure - check the database's precheck logs
  • Drop the replication slots after the upgrade attempt fails
    SELECT pg_drop_replication_slot('test_slot');
  • Verify slots are gone:
SELECT * FROM pg_replication_slots WHERE slot_type NOT LIKE 'physical';
-- Should return 0 rows

  • Retry upgrade and confirm success.

Phase 2: Test pglogical Nodes

  • Use same test RDS instance (or create new one)
  • Enable pglogical extension, example here
  • Create pglogical node:
SELECT pglogical.create_node(
  node_name := 'test_node',
  dsn := 'host=localhost port=5432 dbname=testdb'
);
  • Verify node exists:
    SELECT node_id, node_name FROM pglogical.node;

  • Attempt major version upgrade

  • Capture failure - check precheck logs (should mention pglogical nodes)

  • Drop subscriptions (if any):

SELECT sub_name FROM pglogical.subscription;
SELECT pglogical.drop_subscription('sub_name');
  • Drop the node:
    SELECT pglogical.drop_node('test_node');

  • Verify cleanup:

SELECT * FROM pglogical.node;
-- Should return 0 rows
  • Retry upgrade and confirm success

Which part of the user docs does this impact

Based on validated findings, create a user guide documentation on:

  • How to identify if you have logical replication slots and pglogical nodes
  • Difference between logical slots vs. read replicas vs. configuration vs. pglogical nodes
  • Pre-upgrade checklist
  • Step-by-step remediation
  • When to coordinate with teams.

Communicate changes

  • post for #cloud-platform-update
  • Weeknotes item
  • Show the Thing/P&A All Hands/User CoP
  • Announcements channel

Questions / Assumptions

Definition of done

  • readme has been updated
  • user docs have been updated
  • another team member has reviewed
  • smoke tests are green
  • prepare demo for the team

Reference

AWS documentation regarding upgrade blocker caused by logical replication slots
How to write good user stories

Metadata

Metadata

Labels

bugSomething isn't workingrdssupport-requestCustomer team support requests

Type

No type

Projects

Status

πŸ‘€ Review/QA

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions