Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAJpy0uCHiCPt0M+j4+kUn=0CXnO=95itRMgmcAWriV93sZhv-w@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Wed, Jun 18, 2025 at 6:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Thank you for the comments! > > > > > 2) > > I see that when primary switches back its effective wal_level to > > replica while standby has wal_level=logical in conf file, then standby > > has this status: > > > > postgres=# show wal_level; > > wal_level > > ----------- > > logical > > > > postgres=# show effective_wal_level; > > effective_wal_level > > --------------------- > > replica > > > > Is this correct? Can effective_wal_level be < wal_level anytime? I > > feel it can be greater but never lesser. > > Hmm, I think we need to define what value we should show in > effective_wal_level on standbys because the standbys actually are not > writing any WALs and whether or not the logical decoding is enabled on > the standbys depends on the primary. > > In the previous version patch, the standby's effective_wal_level value > depended solely on the standby's wal_level value. However, it was > confusing in a sense because it's possible that the logical decoding > could be available even though effective_wal_level is 'replica' if the > primary already enables it. One idea is that given that the logical > decoding availability and effective_wal_level value are independent in > principle, it's better to provide a SQL function to get the logical > decoding status so that users can check the logical decoding > availability without checking effective_wal_level. With that function, > it might make sense to revert back the behavior to the previous one. > That is, on the primary the effective_wal_level value is always > greater than or equal to wal_level whereas on the standbys it's always > the same as wal_level, and users would be able to check the logical > decoding availability using the SQL function. Or it might also be > worth considering to show effective_wal_level as NULL on standbys. Yes, that is one idea. It will resolve the confusion. But I was thinking, instead of having one new GUC + a SQL function, can we have a GUC alone, which shows logical_decoding status plus the cause of that. The new GUC will be applicable on both primary and standby. As an example, let's say we name it as logical_decoding_status, then it can have these values ( <status>_<cause>): enabled_wal_level_logical: valid both for primary, standby enabled_effective_wal_level_logical: valid only for primary enabled_cascaded_logical_decoding valid only for standby disabled : valid both for primary, standby 'enabled_cascaded_logical_decoding' will indicate that logical decoding is enabled on standby (even when its own wal_level=replica) as a cascaded effect from primary. It can be possible either due to primary's wal_level=logical or logical slot being present on primary. > > > > 3) > > When standby invalidate obsolete slots due to effective_wal_level on > > primary changed to replica, it dumps below: > > LOG: invalidating obsolete replication slot "slot_st2" > > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" > > on the primary server > > > > Shall we update this message as well to convey about slot-presence on primary. > > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" > > or presence of logical slot on the primary server. > > Will fix. > > > 4) > > I see that the slotsync worker is running all the time now as against > > the previous behaviour where it will not start if wal_level is less > > than logical or switched to '< logical' anytime. Even with wal_level > > and effective_wal_level set to replica, slot-sync keeps on attempting > > synchronization. This does not look correct. I think we need to find a > > way to stop sot-sync worker when effective_wal_level is switched to > > replica from logical. > > Right, will fix. > > > 5) > > Can you please help me understand the changes at [1]. > > > > a) Why is it needed when we have code logic at [2] > > This is because we use XLOG_LOGICAL_DECODING_STATUS_CHANGE record only > for changing the logical decoding status online (i.e., without > restarting the server). So I think we still these part of code in > cases where we enable/disable the logical decoding by changing the > wal_level value with restarting the server > > Suppose that both the primary and the standby set wal_level='replica', > the logical decoding is not available on both sides. If the primary > restarts with wal_level='logical', it doesn't write an > XLOG_LOGICAL_DECODING_STATUS_CHANGE record. > > Another case is that suppose that the primary sets wal_level='logical' > and the standby sets wal_level='replica', the logical decoding is > available on both sides. If the primary restarts with > wal_level='replica' we need to somehow tell the standby the fact that > the logical decoding gets disabled. Okay, I understand it now. > (BTW I realized we need to > invalidate the logical slots in this case too). > Yes, the behaviour should be the same. The differences in behaviour for the 2 cases I pointed, confused me at the very first place. > > b) in [1], why do we check n_inuse_logical_slots on standby and then > > make decisions? Why not to disable logical-decoding directly just like > > [2] > > It seems the code is incorrect. We should disable the logical decoding > anyway if the primary disables it. Will fix. > I agree. So now case [1] behaviour will be exactly the same as case [2] i.e. invalidate the slot and don't check slots-usage on standby before invalidating. thanks Shveta
pgsql-hackers by date: