Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoDLgUdKvjYGc1QP-Q9VxFu7k_HCdB7NigK87oUkCpYC6w@mail.gmail.com
Whole thread Raw
In response to Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Thu, Nov 20, 2025 at 2:32 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Nov 20, 2025 at 3:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Nov 20, 2025 at 1:27 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Nov 19, 2025 at 11:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 19, 2025 at 1:21 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > On Wed, Nov 19, 2025 at 1:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 17, 2025 at 10:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, Nov 18, 2025 at 1:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Sun, Nov 16, 2025 at 9:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > In v26-0002-FIXUP-remove-status_change_allowed-flag, by using
> > > > > > > > > status_change_inprogress, we ensure that no backend is allowed to
> > > > > > > > > toggle the logical_wal/decoding status till startup process marks the
> > > > > > > > > recovery state as recovery_done. I am trying to think what problem
> > > > > > > > > this part of design prevents. I have considered the following
> > > > > > > > > scenarios:
> > > > > > > > >
> > > > > > > > > Scenario-1:
> > > > > > > > > 1. Startup process enables logical_wal and logical_decoding. Writes
> > > > > > > > > WAL record for it
> > > > > > > > > 2. Backend disables logical_decoding, writes WAL for it, and disables
> > > > > > > > > logical_wal.
> > > > > > > > > 3. Startup process sets recovery_done and allows wal_writes
> > > > > > > > >
> > > > > > > > > Say, instead of using status_change_inprogress to prevent doing
> > > > > > > > > step-2, if we had used recovery_in_progress kind of flag then how is
> > > > > > > > > it possible for backends to create any problem for the current node or
> > > > > > > > > cascaded standbys? I think the only way a problem can happen is if we
> > > > > > > > > write the WAL to disable_logical decoding after any backend could have
> > > > > > > > > written a non-logical WAL information record. Can that happen if we
> > > > > > > > > use the recovery_in_progress flag to prevent disable of logical_wal?
> > > > > > > > > If so, how?
> > > > > > > >
> > > > > > > > The main idea of holding status_change_inprogress until the recovery
> > > > > > > > end is to prevent concurrent toggling the logical decoding status. In
> > > > > > > > your scenario, IIUC backends cannot write any WAL yet at step-2 since
> > > > > > > > it's allowed at step-3. It would end up with a FATAL error actually.
> > > > > > > > One alternative is to make processes call LocalSetXLogInsertAllowed()
> > > > > > > > so that they can write WAL even during recovery, but I don't use it as
> > > > > > > > I'm concerned that it could lead to other problems. On the other hand,
> > > > > > > > we cannot let the backend to disable logical_decoding and logical_wal
> > > > > > > > without WAL warite at step-2 because otherwise the cascaded standby
> > > > > > > > won't disable logical decoding.
> > > > > > > >
> > > > > > >
> > > > > > > Why can't we postpone disabling logical WAL, decoding to the next
> > > > > > > cycle of checkpointer when RecoveryInProgress() is true without
> > > > > > > relying on status_change_inprogress? So, this will lead to a window
> > > > > > > where there are no logical slots but still the effective_wal_level is
> > > > > > > logical. However, this could be true even without considering this
> > > > > > > problem because the checkpointer can take some time to disable the
> > > > > > > logical WAL and decoding.
> > > > > > >
> > > > > > > The other problematic case to consider is during promotion, the
> > > > > > > startup has marked logical decoding as disabled but not yet marked
> > > > > > > recovery-done. Then the backend created a slot and returned without
> > > > > > > marking logical decoding as enabled due to relying on
> > > > > > > RecoveryInProgress(). Then the start-up marked Recovery-Done. Now we
> > > > > > > have a logical slot present, but logical decoding is disabled. I think
> > > > > > > we can simply disallow the creation of a logical slot in this window
> > > > > > > (where effective_wal_level is 'replica' and RecoveryInProgress() is
> > > > > > > true).
> > > > > >
> > > > > > It sounds reasonable. Backends are already prohibited from creating
> > > > > > logical slots when effective_wal_level is 'replica' and
> > > > > > RecoveryInProgress() is true, so it should not be a problem.
> > > > > >
> > > > > > > If the above is feasible and sounds reasonable, then we don't even
> > > > > > > need the status_change_inprogress flag, at least not during the
> > > > > > > start-up flow.
> > > > > >
> > > > > > I've updated the patch based on the above suggestion. I believe we
> > > > > > still need the status_change_inprogress flag when not in recovery but
> > > > > > in the new version I don't use the flag during end-of-recovery action.
> > > > > >
> > > > >
> > > > > Thanks for the patch.
> > > > >
> > > > > I had a look at the new changes. It appears that the startup process
> > > > > now marks the state as pending_disable rather than directly disabling
> > > > > logical decoding itself. In this setup, I think the situation Amit
> > > > > described will no longer occur— the case where the startup process
> > > > > disables logical decoding, and during the brief window before
> > > > > recovery-done is set, a slot-creation attempt is issued and gets
> > > > > blocked.
> > > > >
> > > > > Currently, in the scenario where the primary has one slot and the
> > > > > standby has zero slots, if I promote the standby and then try to
> > > > > create another slot in parallel (after
> > > > > UpdateLogicalDecodingStatusEndOfRecovery but before marking
> > > > > RECOVERY_STATE_DONE), the slot creation proceeds (instead of being
> > > > > blocked) and eventually hangs in DecodingContextFindStartpoint()—it
> > > > > sleeps in read_local_xlog_page_guts() until the promotion completes.
> > > > >
> > > > > IIUC, Amit’s original idea was to disable logical decoding within the
> > > > > startup process itself rather than doing it lazily and block the
> > > > > slot-creation in parallel.
> > > >
> > > > Indeed. I misunderstood the idea. The attached new version patch
> > > > should implement the idea correctly.
> > > >
> > >
> > > Thanks. I will review.
> > >
> > > On reconsideration, it looks like the 'status_change_inprogress' flag
> > > can be removed entirely.
> > >
> > > The scenario where EnsureLogicalDecodingEnabled() is trying to enable
> > > logical decoding while DisableLogicalDecodingIfNecessary() is
> > > attempting to disable it cannot occur in practice. This is because a
> > > drop-logical-slot operation (for the only slot present) would fail in
> > > parallel if the slot is already held by a backend in
> > > EnsureLogicalDecodingEnabled().
> >
> > Could you elaborate on this case? IIUC we attempt to disable after
> > dropping the slot, so it could happen that
> > DisableLogicalDecodingIfNecessary() and EnsureLogicalDecodingEnabled()
> > are executed concurrently.
> >
>
> I was referring to the scenario where backend1 triggers the creation
> of slot1 and is executing EnsureLogicalDecodingEnabled(), while
> backend2 attempts to drop the same slot1 in parallel. In that
> situation, the drop will fail outright and won’t even reach
> DisableLogicalDecodingIfNecessary(), producing an error like:
> ERROR: replication slot "slot1" is active for PID <>.
>
> Now, consider the case where a slot already exists (say slot1).
> Suppose EnsureLogicalDecodingEnabled() is invoked for another slot
> (slot2) while DisableLogicalDecodingIfNecessary() is triggered
> concurrently to drop a pre-existing slot (slot1).
>
> If EnsureLogicalDecodingEnabled() acquires the lock first via
> start_logical_decoding_status_change(), it will essentially be a no-op
> since logical decoding is already enabled. Once the lock is released,
> the disable operation may proceed, which will also be no-op as
> logical-slot exists. This will leave logical-decoding in the enabled
> state, which is expected.
>
> If DisableLogicalDecodingIfNecessary() acquires the lock first, then,
> as mentioned earlier, making the first four steps atomic and releasing
> the lock only after disabling xlog_logical_info ensures safety. If
> somehow EnsureLogicalDecodingEnabled() executes between step 4 and
> step 5 of the disable process, it will re-enable logical decoding. And
> even if step 5 of DisableLogicalDecodingIfNecessary() emits the signal
> later, it will be of no harm, as xlog_logical_info will reflect the
> correct state (true) because EnsureLogicalDecodingEnabled() has
> already re-enabled it.

Thank you for the explanation.

It seems to work well. I wanted to avoid writing WAL and sending a
signal barrier to all processes while holding a lwlock but if it's
okay we can go with this approach.

If we remove the flag, probably we need to rethink how to deal with
the race condition during the standby promotion because the recent
idea relies on the fact that the checkpointer can retry disabling
logical decoding in background. It's probably okay to keep using the
lazy behavior in the deactivation paths for that purpose. I've drafted
the new approach in the attached patches. Please note that I've
squashed the previous 0002 patch into the main patch. The attached
0002 patch implements only the new idea Shveta proposed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Sami Imseih
Date:
Subject: Re: another autovacuum scheduling thread