Dear Sawada-san,
> My understanding of where the synced slot starts to move was not
> right; it starts from the remote slot's restart_lsn, which could be
> far ahead from the STATUS_CHANGE record that the startup process is
> applying but where logical decoding should be enabled. It doesn't
> happen that the slotsync worker tries to decode non-logical WAL
> records even if it advances the slot after the startup disabled
> logical decoding.
Let me confirm your point. If the situation, which the slot is dropped and then
created while the startup process processing, happens, the WAL records would be
aligned like below. Your point is that the restart_lsn of the created slot is
beginning of (b) so that all records can be decoded, right?
```
STATUS_CHANGE true
RUNNING_XACTS // (a) - generated by the first slot
...
STATUS_CHANGE false // due to the slot drop
...
STATUS_CHANGE true // from here all records are decode-safe
RUNNING_XACTS // (b) - generated by the second slot, restart_lsn can set here
```
> IIUC you're concerned it's possible that the slotsync worker creates
> or advances a logical slot between the startup changes the logical
> decoding status to false and sends the stop signal.
Right.
> how efficiently to fix it. I've considered a simple idea that the
> slotsync worker checks IsLogicalDecodingEnabled() before trying to
> sync one logical slot. However, it doesn't solve the race condition;
> the startup process can disable logical decoding right after the
> slotsync passed the check, in which case users would see the logical
> slot is created after logical decoding is disabled.
So... even if we can add check in decoding functions, the startup process can
disable the logical decoding after that, is it also right?
Best regards,
Hayato Kuroda
FUJITSU LIMITED