Hi
On 2019-Jan-30, Konstantin Knizhnik wrote:
> One of our customers was faced with the following problem: he has
> setup physical primary-slave replication but for some reasons
> specified very large (~12 hours) recovery_min_apply_delay.
We also came across this exact same problem some time ago. It's pretty
nasty. I wrote a quick TAP reproducer, attached (needed a quick patch
for PostgresNode itself too.)
I tried several failed strategies:
1. setting lastSourceFailed just before sleeping for apply delay, with
the idea that for the next fetch we would try stream. But this
doesn't work because WaitForWalToBecomeAvailable is not executed.
2. split WaitForWalToBecomeAvailable in two pieces, so that we can call
the first half in the restore loop. But this causes 1s of wait
between segments (error recovery) and we never actually catch up.
What back then I thought was the *real* solution but I didn't get around
to implementing is the idea you describe to start a walreceiver at an
earlier point.
> I wonder if it can be considered as acceptable solution of the problem or
> there can be some better approach?
I didn't find one.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services