Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date | |
Msg-id | CAExHW5uuSK8FzwF_kb9rMLWdzjCWGVHmxpmfY_-o1GxqzzbcXQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
On Fri, Sep 26, 2025 at 11:13 AM shveta malik <shveta.malik@gmail.com> wrote: > Replying to this message to avoid forking the thread. Some more comments: +/* + * This function does several kinds of preparation works required to start + * the process of logical decoding status change. If the status change is + * required, it ensures we can change logical decoding status, setting + * LogicalDecodingCtl->status_change_inprogress on, and returns true. + * Otherwise, if it's not required or not allowed (e.g., during recovery + * or wal_level = 'logical'), it returns false. + */ +static bool +start_logical_decoding_status_change(bool new_status) +{ + Assert(!RecoveryInProgress()); The prologue mentions recovery as a case when we can't change status. Why is Assert here then? + + /* Prepare and start the activation process if it's disabled */ + if (!start_logical_decoding_status_change(true)) The prologue of start_logical_decoding_status_change() mentions that the function returns false if status change is not allowed. If the function returns false because the status change is not allowed, we are simply returning from EnsureLogicalDecodingEnabled(). The caller will assume that logical decoding is enabled. Doesn't it look like a missing case. May be the code is correct but comments aren't explaining the situation well. If a new replication slot is getting created while the last remaining slot is getting dropped, looks like we rely on LogicalDecodingControlLock to serialize the operations. Is that correct? I don't see an injection point in this code path. So I am imagine there's no direct test for this case. +void +DisableLogicalDecodingIfNecessary(bool complete_pending) +{ + bool need_wait = false; + + /* + * Both complete_pending and proc_exit_inprogress must not be true at the + * same time. + */ That's evident from the assertion. I feel the comment should explain the reason. + + /* With 'minimal' WAL level, logical decoding is always disabled */ + if (wal_level == WAL_LEVEL_MINIMAL) + return; Assume a server which has a standby running with wal_level = 'replica'. If we shut it down and restart it with wal_level = 'minimal', that won't be allowed. Hence we don't need to write a XLOG_LOGICAL_DECODING_STATUS_CHANGE record during recovery in that case. Do you think this should be documented somewhere? I did spend a couple hours trying this out before I realized why it is safe to return from here. May be a testcase for the same? ReplicationSlotCleanup(bool synced_only) { LWLockRelease(ReplicationSlotControlLock); + ... snip ... + if (dropped_logical && nlogicalslots == 0) + DisableLogicalDecodingIfNecessary(false); Some magic going on here. If dropped_logical is true, at least one of the loops dropped a logical slot. If nlogicalslots is zero, it means that the last loop did not find any logical slots. When both the conditions are true, it means that the function dropped the last logical slot. Hence it calls DisableLogicalDecodingIfNecessary(). Is that correct? I think we need some comments explaining this. Also there's a tiny window between dropping the last logical slot and calling DisableLogicalDecodingIfNecessary() when another logical slot could be created. But DisableLogicalDecodingIfNecessary() has checks to avoid disabling logical decoding in that case. I don't think all of that is clear from the code changes. Maybe that's because of the way this function is written in the first place. For example, why do we need multiple loops here? Why not just one loop that drops all the slots that need to be dropped? It's not as if the same backend is going to create new temporary slots while it is cleaning up existing ones. Same goes for ReplicationSlotsDropDBSlots(). But there it is at least possible that concurrent backends may have created a slot, so a loop there is justified. + /* Initialize logical info WAL logging state */ + InitializeProcessXLogLogicalInfo(); + /* * Initialize replication slots after pgstat. The exit hook might need to * drop ephemeral slots, which in turn triggers stats reporting. Do we need XlogLogicalInfo to be set before initializing replication slots? Do we need to update the comment here? -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: