Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoDFkWxeG6bX1EkGY9=i6P0Xz-PCrw41XNFFGfJXaft4eA@mail.gmail.com
Whole thread Raw
In response to RE: POC: enable logical decoding when wal_level = 'replica' without a server restart  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: POC: enable logical decoding when wal_level = 'replica' without a server restart
List pgsql-hackers
On Wed, Jul 30, 2025 at 12:22 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> While reading more, I found a race condition.

Thank you for reviewing the patch!

> In this case the effective_wal_level
> can be logical even when there is no logical slot.
> UpdateLogicalDecodingStatusEndOfRecovery() checks the number of slots of the logical
> slot then release the lock once. Then startup process acquires the lock once and
> compare with IsLogicalDecodingEnabled(), then update the status afterward if needed.
> So, wal_level can be inconsistent if the status is changed after the n_logical_slots
> is read.
>
> Steps:
> a) constructed a primary-standby system
> b) createad a logical slot on the primary
> c) createad a logical slot on the standby
> d) sent a promote signal to standby
> e) dropped a logical slot on standby, just after startup process released
>    LogicalDecodingControlLock in UpdateLogicalDecodingStatusEndOfRecovery().
>
> After the above, effective_wal_level was keep turning on. Is it the expected behavior?

No, we need to fix it.

I thought we could fix this issue by checking the number of in-use
logical slots while holding ReplicationSlotControlLock and
LogicalDecodingControlLock, but it seems we need to deal with another
race condition too between  backends and startup processes at the end
of recovery.

Currently the backend skips controlling logical decoding status if the
server is in recovery (by checking RecoveryInProgress()), but it's
possible that a backend process tries to drop a logical slot after the
startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and
before accepting writes. In this case, the backend ends up not
disabling logical decoding and it remains enabled. I think we would
somehow need to delay the logical decoding status change in this
period until the recovery completes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: vacuumdb changes for stats import/export
Next
From: Nathan Bossart
Date:
Subject: Re: vacuumdb changes for stats import/export