Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoDjdeqwTHa5nL-3nfEnNA4SfrP4k0yR90kq68=JOLRWxg@mail.gmail.com
Whole thread Raw
In response to RE: POC: enable logical decoding when wal_level = 'replica' without a server restart  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Fri, Aug 29, 2025 at 5:31 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> > My understanding of where the synced slot starts to move was not
> > right; it starts from the remote slot's restart_lsn, which could be
> > far ahead from the STATUS_CHANGE record that the startup process is
> > applying but where logical decoding should be enabled. It doesn't
> > happen that the slotsync worker tries to decode non-logical WAL
> > records even if it advances the slot after the startup disabled
> > logical decoding.
>
> Let me confirm your point. If the situation, which the slot is dropped and then
> created while the startup process processing, happens, the WAL records would be
> aligned like below. Your point is that the restart_lsn of the created slot is
> beginning of (b) so that all records can be decoded, right?
>
> ```
> STATUS_CHANGE true
> RUNNING_XACTS                   // (a) - generated by the first slot
> ...
> STATUS_CHANGE false             // due to the slot drop
> ...
> STATUS_CHANGE true              // from here all records are decode-safe
> RUNNING_XACTS                   // (b) - generated by the second slot, restart_lsn can set here
> ```

Yes. If I understand it correctly, even when the startup is processing
the second STATUS_CHANGE record (i.e., disabling logical decoding),
the synced slot uses the corresponding remote slot's restart_lsn,
i.e., (b). I believe that if the standby has not received the
RUNNING_XACT(b) yet at that point, the slotsync worker skips to sync
the slot (see the check at the top of synchronize_one_slot()).

>
> > how efficiently to fix it. I've considered a simple idea that the
> > slotsync worker checks IsLogicalDecodingEnabled() before trying to
> > sync one logical slot. However, it doesn't solve the race condition;
> > the startup process can disable logical decoding right after the
> > slotsync passed the check, in which case users would see the logical
> > slot is created after logical decoding is disabled.
>
> So... even if we can add check in decoding functions, the startup process can
> disable the logical decoding after that, is it also right?

I think so. I think IsLogicalDecodingEnabled() check is a check
whether a process can start logical decoding, but doesn't cover
already running logical decoding processes. The slot invalidation
mechanism is responsible for that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Assert single row returning SQL-standard functions
Next
From: Sami Imseih
Date:
Subject: Re: [BUG] temporary file usage report with extended protocol and unnamed portals