od On Tue, Sep 2, 2025 at 8:11 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> Here are my comments.
>
> 01.
> ```
> checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled();
> ```
>
> Per my analysis, the value is always false here because StartupLogicalDecodingStatus
> is not called yet. Can we use "false" directly?
I think that it's better to read the shared flag instead of directly
setting false since LogicalDecodingCtl is already initialized.
>
> 02.
> ```
> elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt);
> ```
>
> Here plural form is always used even if the running transaction is only one.
> How about something like:
> ```
> Number of transactions to wait finishing: %d
> ```
Hmm, not sure it improves the message. I think we don't care much
about plurals in debug messages. And in our convention the main log
message doesn't start a capital character.
>
> 03.
> ```
> while (RecoveryInProgress())
> {
> pgstat_report_wait_start(WAIT_EVENT_LOGICAL_DECODING_STATUS_CHANGE_DELAY);
> pg_usleep(100000L); /* wait for 100 msec */
> pgstat_report_wait_end();
> }
> ```
>
> I found a stuck case here: if a backend process within the loop and startup waits
> a signal is processed, both of them can stuck. The backend waits the recovery
> state to be DONE, and the startup waits all processes handle consume the signal.
> IIUC we must add CHECK_FOR_INTERRUPTS() or ProcessProcSignalBarrier().
>
> Actual steps:
>
> 0. constructed a streaming replication system, which the only primary server had
> a logical slot. I.e., the effective_wal_level was logical.
> 1. connected to a standby node
> 2. attached to the backend process via gdb
> 3. added a breakpoint at create_logical_replication_slot
> 4. called pg_create_logical_replication_slot() on the backend.
> the backend will stop before ReplicationSlotCreate().
> 5. from another terminal, attached to the startup process via gdb
> 6. added a breakpoint at UpdateLogicalDecodingStatusEndOfRecovery()
> 7. from another terminal, send a promote signal to the standby.
> The startup will stop at UpdateLogicalDecodingStatusEndOfRecovery()
> 8. executed steps on startup process, untill delay_status_change was updated
> and LogicalDecodingControlLock was released.
> 9. detached from the backend process. It would stop at the loop in
> start_logical_decoding_status_change().
> 10. detached from the startup process. It would wait all processes handled the
> signal, but the backend won't do.
Good find! I'll fix the problem by adding CHECK_FOR_INTERRUPTS() as
you suggested.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com