On Mon, Jan 10, 2022 at 04:25:27PM -0500, Tom Lane wrote:
> Apropos of that, it's worth noting that wait_for_catchup *is*
> dependent on up-to-date stats, and here's a recent run where
> it sure looks like the timeout cause is AWOL stats collector:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2022-01-10%2004%3A51%3A34
>
> I wonder if we should refactor wait_for_catchup to probe the
> standby directly instead of relying on the upstream's view.
It would be nice. For logical replication tests, do we have a monitoring API
independent of the stats collector? If not and we don't want to add one, a
hacky alternative might be for wait_for_catchup to run a WAL-writing command
every ~20s. That way, if the stats collector misses the datagram about the
standby reaching a certain LSN, the stats collector would have more chances.