Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAJpy0uDW6BpNXLZ0AaP=_GU6pCsZf_7Sk2R0Ti+ov+EO6ruMkg@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
List | pgsql-hackers |
On Wed, Jun 4, 2025 at 3:40 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jun 4, 2025 at 6:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, May 20, 2025 at 9:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Yeah, I find the idea that the presence of a logical slot will allow > > > the user to enable logical decoding/replication more appealing than > > > this new alternative, leaving aside the challenges of realizing it. > > +1. This idea appears more user-friendly and easier to understand > compared to other approaches, such as having multiple GUCs or using > ALTER SYSTEM. > > > I've drafted this idea. Here are summary for attached two patches: > > > > 0001 patch allows us to create a logical slot without WAL reservation. > > > > 0002 patch is the main patch for dynamically enabling/disabling > > logical decoding when wal_level is 'replica'. > > Thank You for the patches. I have done some initial testing, it seems > to be working well. I will do more testing and review and will share > further feedback. I reviewed further and had few concerns: 1) We now invalidate slots on standby if the primary (with wal_level=replica) has dropped the last logical slot and internally reverted its runtime (effective) wal_level back to replica. Consider the following scenario involving a cascaded logical replication setup: a) The publisher is configured with wal_level = replica and has created a publication (pub1). b) A subscriber server creates a subscription (sub1) to pub1. As part of the slot creation for sub1, the publisher's effective wal_level is switched to logical. c) The publisher also has a physical standby, which in turn has its own logical subscriber, named standby_sub1. At this point, everything works as expected i.e. changes from the publisher flow through the physical standby and are replicated to standby_sub1. Now if the user drops sub1, the replication slot on the primary is also dropped. Since this was the last logical slot, the primary automatically switches its effective wal_level back to replica. This change propagates to the standby, causing it to invalidate the slot for standby_sub1. As a result, the standby logs the following error: STATEMENT: START_REPLICATION SLOT "standby_sub1" LOGICAL 0/0 (...) ERROR: logical decoding needs to be enabled on the primary Even if we manually recreate a logical slot on the primary afterward, the standby_sub1 subscriber is not able to proceed: ERROR: can no longer access replication slot "standby_sub1" DETAIL: This replication slot has been invalidated due to "wal_level_insufficient". So the removal of the logical subscriber for the publisher has somehow restricted the logical subscriber of standby to work. Is this behaviour acceptable? Without this feature, if I manually switch back wal_level to replica on primary, then it will fail to start. This makes the issue obvious and prevents misconfiguration. FATAL: logical replication slot "sub2" exists, but "wal_level" < "logical" HINT: Change "wal_level" to be "logical" or higher. But the current behaviour is harder to diagnose, as the problem is effectively hidden behind subscription/slot creation/deletion. 2) 'show effective_wal_level' shows output as 'logical' if a slot exists on primary. But on physical standby, it still shows it as 'replica' even in the presence of slots. Is this intentional? 3) I haven’t tested this yet, but I’d like to discuss what the expected behavior should be if a slot exists on the primary but is marked as invalidated. Will an invalidated slot still cause the effective wal_level to remain at logical, or will invalidating the only logical slot trigger a switch back to replica? There is a chance that a slot with un-reserved wal may be invalidated due to time-out. thanks Shveta
pgsql-hackers by date: