Re: POC: enable logical decoding when wal_level = 'replica' without a server restart - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date
Msg-id CAD21AoAXZmk3os1c4Hf=Gzd82-rH=g6sPGRyhWC5i4eMQTxYrw@mail.gmail.com
Whole thread Raw
In response to Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Fri, Sep 5, 2025 at 3:15 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Sep 4, 2025 at 11:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > od On Tue, Sep 2, 2025 at 8:11 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear Sawada-san,
> > >
> > > Here are my comments.
> > >
> > > 01.
> > > ```
> > >         checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled();
> > > ```
> > >
> > > Per my analysis, the value is always false here because StartupLogicalDecodingStatus
> > > is not called yet. Can we use "false" directly?
> >
> > I think that it's better to read the shared flag instead of directly
> > setting false since LogicalDecodingCtl is already initialized.
> >
> > >
> > > 02.
> > > ```
> > > elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt);
> > > ```
> > >
> > > Here plural form is always used even if the running transaction is only one.
> > > How about something like:
> > > ```
> > > Number of transactions to wait finishing: %d
> > > ```
> >
> > Hmm, not sure it improves the message. I think we don't care much
> > about plurals in debug messages. And in our convention the main log
> > message doesn't start a capital character.
> >
> >
> > >
> > > 03.
> > > ```
> > >                 while (RecoveryInProgress())
> > >                 {
> > >                         pgstat_report_wait_start(WAIT_EVENT_LOGICAL_DECODING_STATUS_CHANGE_DELAY);
> > >                         pg_usleep(100000L); /* wait for 100 msec */
> > >                         pgstat_report_wait_end();
> > >                 }
> > > ```
> > >
> > > I found a stuck case here: if a backend process within the loop and startup waits
> > > a signal is processed, both of them can stuck. The backend waits the recovery
> > > state to be DONE, and the startup waits all processes handle consume the signal.
> > > IIUC we must add CHECK_FOR_INTERRUPTS() or ProcessProcSignalBarrier().
> > >
> > > Actual steps:
> > >
> > > 0.  constructed a streaming replication system, which the only primary server had
> > >     a logical slot. I.e., the effective_wal_level was logical.
> > > 1.  connected to a standby node
> > > 2.  attached to the backend process via gdb
> > > 3.  added a breakpoint at create_logical_replication_slot
> > > 4.  called pg_create_logical_replication_slot() on the backend.
> > >     the backend will stop before ReplicationSlotCreate().
> > > 5.  from another terminal, attached to the startup process via gdb
> > > 6.  added a breakpoint at UpdateLogicalDecodingStatusEndOfRecovery()
> > > 7.  from another terminal, send a promote signal to the standby.
> > >     The startup will stop at UpdateLogicalDecodingStatusEndOfRecovery()
> > > 8.  executed steps on startup process, untill delay_status_change was updated
> > >     and LogicalDecodingControlLock was released.
> > > 9.  detached from the backend process. It would stop at the loop in
> > >     start_logical_decoding_status_change().
> > > 10. detached from the startup process. It would wait all processes handled the
> > >     signal, but the backend won't do.
> >
> > Good find! I'll fix the problem by adding CHECK_FOR_INTERRUPTS() as
> > you suggested.
>
> I've attached the updated patch that incorporated all comments I got so far.

Sorry, I've attached the wrong version. Please find the attached correct one.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Should io_method=worker remain the default?
Next
From: Julien Rouhaud
Date:
Subject: Re: Extension security improvement: Add support for extensions with an owned schema