On Mon, 11 May 2020 15:54:02 +0900
Michael Paquier <michael@paquier.xyz> wrote:
[...]
> There are several HA solutions floating around in the community, and I
> got to wonder as well if some of them don't just scan the local
> pg_wal/ of each standby in this case, even if that's more simple to
> let the nodes start and replay up to their latest point available.
PAF relies on pg_last_wal_receive_lsn(). Relying on pg_last_wal_replay_lsn
might be possible. As you explained, it would requires to compare current
replay LSN with the last received on disk thought. This might probably be done,
eg with pg_waldump maybe and a waiting loop.
However, such a waiting loop might be dangerous. If standbys are lagging far
behind and/or have read only sessions and/or load slowing down the replay, the
waiting loop might be very long. Maybe longer than the required RTO. The HA
automatic operator might even takes curative action because of some
recovery timeout, making things worst.
Regards,