On Tue, Jul 25, 2023 at 10:33 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-07-25 14:31:00 +0530, Amit Kapila wrote:
> > To ensure that all the data has been sent during the upgrade, we can
> > ensure that each logical slot's confirmed_flush_lsn (position in the
> > WAL till which subscriber has confirmed that it has applied the WAL)
> > is the same as current_wal_insert_lsn. Now, because we don't send
> > XLOG_CHECKPOINT_SHUTDOWN even on clean shutdown, confirmed_flush_lsn
> > will never be the same as current_wal_insert_lsn. The one idea being
> > discussed in patch [1] (see 0003) is to ensure that each slot's LSN is
> > exactly XLOG_CHECKPOINT_SHUTDOWN ago which probably has some drawbacks
> > like what if we tomorrow add some other WAL in the shutdown checkpoint
> > path or the size of record changes then we would need to modify the
> > corresponding code in upgrade.
>
> Yea, that doesn't seem like a good path. But there is a variant that seems
> better: We could just scan the end of the WAL for records that should have
> been streamed out?
>
This sounds like a better idea. So, one way to realize this is that
group slots based on confirmed_flush_lsn and then scan based on that.
Once we ensure that the slot group with the highest
confirm_flush_location is up-to-date (doesn't have any pending WAL
except for shutdown_checkpoint), any slot group having a lesser value
of confirm_flush_location would be considered a group with pending
data.
BTW, I think the main downside for not trying to send
XLOG_CHECKPOINT_SHUTDOWN for logical walsenders is that even if today
there is no risk of any hint bit updates (or any other possibility of
generating WAL) during decoding of XLOG_CHECKPOINT_SHUTDOWN but there
is no future guarantee of the same. Is there anything I am missing
here?
--
With Regards,
Amit Kapila.