Dear Sawada-san,
>
> I think that the problem stems from the fact that the patch sets
> allow_status_change to true before completing all end-of-recovery
> actions for logical decoding status update. I think it should be done
> at the end of UpdateLogicalDecodingStatusEndOfRecovery(), i.e., after
> WaitForProcSignalBarrier(). That way, the logical decoding can always
> start from after the point where the startup updated the logical
> decoding status.
Assuming the fix like [1], and it seems to work well. In the workload I shared,
the backend process cannot consume the ProcSignal emitted by the startup, thus
the status change on the shared memory is not allowed. The backend would fail to
create the replication slot and effective_wal_level would be replica.
[1]:
```
--- a/src/backend/replication/logical/logicalctl.c
+++ b/src/backend/replication/logical/logicalctl.c
@@ -532,13 +532,11 @@ UpdateLogicalDecodingStatusEndOfRecovery(void)
* processes to write XLOG_LOGICAL_DECODING_STATUS_CHANGE records prior to
* completing all end-of-recovery actions.
*/
- LogicalDecodingCtl->allow_status_change = true;
-
- LWLockRelease(LogicalDecodingControlLock);
if (need_wal)
CreateLogicalDecodingStatusChangeRecord(new_status);
+ LWLockRelease(LogicalDecodingControlLock);
/*
* Ensure all running processes have the updated status. We don't need to
* wait for running transactions to finish as we don't accept any writes
@@ -550,5 +548,9 @@ UpdateLogicalDecodingStatusEndOfRecovery(void)
WaitForProcSignalBarrier(
EmitProcSignalBarrier(PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO));
+ LWLockAcquire(LogicalDecodingControlLock, LW_EXCLUSIVE);
+ LogicalDecodingCtl->allow_status_change = true;
+ LWLockRelease(LogicalDecodingControlLock);
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED