- 
                Notifications
    You must be signed in to change notification settings 
- Fork 45
Description
Background
Users with RDS databases that have rds.logical_replication enabled are encountering failed major version upgrades, even when Terraform applies successfully. The upgrade silently fails during AWS precheck validation if logical replication slots or pglogical nodes exist in the database.
This issue is becoming common as more teams adopt logical replication configurations for CDC/pglogical use cases.
Example 1: Logical Replication Slots
A user attempted to upgrade RDS from PostgreSQL 16.8 β 17.6. Despite successful Terraform apply, the database never upgraded, the thread can be found here.
Precheck logs:
=== Latest Precheck Log Content ===
------------------------------------------------------------------
Upgrade could not be run on Mon Oct 20 12:03:29 2025
------------------------------------------------------------------
The instance could not be upgraded from 16.8.R1 to 17.6.R2 because of following reasons. Please take 
appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.                                                                                  - The instance could not be upgraded because it has one or more logical replication slots. Please dro
p all logical replication slots and try again.  
Key findings: Logical replication slots
- Dev/prod databases upgraded successfully (no slots present)
- Preprod database failed (2 active slots in pg_replication_slots table)
Example 2: pglogical Nodes
Another user encountered a similar issue but with a different blocker. Thread can be found here.
Precheck logs:
------------------------------------------------------------------
Upgrade could not be run on Tue Oct 21 12:35:25 2025
------------------------------------------------------------------
The instance could not be upgraded from 16.8.R2 to 17.4.R2 because of following reasons. Please take appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.
- Following usages in database 'db74e3c6cbeecd38ba' need to be corrected before upgrade:
-- The instance can't be upgraded while the database has pglogical nodes created using pglogical extension. Drop all pglogical nodes and try again.
----------------------- END OF LOG ----------------------
Key findings: pglogical Nodes
- User checked pg_replication_slots and found nothing
- Issue was pglogical nodes, not replication slots
- Different fix required using pglogical extension functions
Common Root Cause
Both issues stem from the same Terraform configuration enabling logical replication.
Proposed user journey
Approach
Phase 1: Test Logical Replication Slots
- Create a test RDS database with PostgreSQL 16.*
- Enable logical replication slots, an example can be found between line 35 to 61 in this PR
- Create slots:
 SELECT pg_create_logical_replication_slot('test_slot', 'pgoutput');
- Verify slot exists:
 SELECT * FROM pg_replication_slots WHERE slot_type = 'logical';
- Attempt a major version upgrade
- Capture failure - check the database's precheck logs
- Drop the replication slots after the upgrade attempt fails
 SELECT pg_drop_replication_slot('test_slot');
- Verify slots are gone:
SELECT * FROM pg_replication_slots WHERE slot_type NOT LIKE 'physical';
-- Should return 0 rows
- Retry upgrade and confirm success.
Phase 2: Test pglogical Nodes
- Use same test RDS instance (or create new one)
- Enable pglogical extension, example here
- Create pglogical node:
SELECT pglogical.create_node(
  node_name := 'test_node',
  dsn := 'host=localhost port=5432 dbname=testdb'
);
- 
Verify node exists: 
 SELECT node_id, node_name FROM pglogical.node;
- 
Attempt major version upgrade 
- 
Capture failure - check precheck logs (should mention pglogical nodes) 
- 
Drop subscriptions (if any): 
SELECT sub_name FROM pglogical.subscription;
SELECT pglogical.drop_subscription('sub_name');
- 
Drop the node: 
 SELECT pglogical.drop_node('test_node');
- 
Verify cleanup: 
SELECT * FROM pglogical.node;
-- Should return 0 rows
- Retry upgrade and confirm success
Which part of the user docs does this impact
Based on validated findings, create a user guide documentation on:
- How to identify if you have logical replication slots and pglogical nodes
- Difference between logical slots vs. read replicas vs. configuration vs. pglogical nodes
- Pre-upgrade checklist
- Step-by-step remediation
- When to coordinate with teams.
Communicate changes
- post for #cloud-platform-update
- Weeknotes item
- Show the Thing/P&A All Hands/User CoP
- Announcements channel
Questions / Assumptions
Definition of done
- readme has been updated
- user docs have been updated
- another team member has reviewed
- smoke tests are green
- prepare demo for the team
Reference
AWS documentation regarding upgrade blocker caused by logical replication slots
How to write good user stories
Metadata
Metadata
Assignees
Labels
Type
Projects
Status