On Wed, 2009-09-30 at 18:45 +0300, Heikki Linnakangas wrote:
> Regarding this in InitStandbyDelayTimers:
> + /*
> + * If replication delay is enormously huge, just treat that as
> + * zero and work up from there. This prevents us from acting
> + * foolishly when replaying old log files.
> + */
> + if (*currentDelay_ms < 0)
> + *currentDelay_ms = 0;
> +
>
> So we're treating restoring from an old backup the same as an up-to-date
> standby server. If you're restoring from say a month old base backup
> with WAL archive up to present day, and have max_standby_delay set to
> say 5 seconds, the server will wait for that 5 seconds on each
> conflicting query before killing it. Until it reaches the point in the
> archive where the delay is less than INT_MAX/1000 seconds old: at that
> point it switches into "oh my goodness, we've fallen badly behind, let's
> try to catch up ASAP and kill any queries that get into the way" mode.
> That's pretty surprising behavior, and not documented either. I propose
> we simply remove the above check (fixing the rest of the code so that
> you don't hit integer overflows), and always respect max_standby_delay.
Agreed.
I will docuemnt the recommendation to set max_standby_delay = 0 if
performing an archive recovery (and explain why).
> BTW, I wonder if should warn or something if we find that the timestamps
> in the archive are in the future? IOW, if either the master's or the
> standby's clock is not set correctly.
Something similar was just spotted by a client. You can set a
recovery_target_timestamp that is before the pg_stop_recovery()
timestamp and it doesn't complain. Will fix.
Not sure if I like the sound of a system moaning at me about the clock
settings. Perhaps just once when it starts, when we read control file.
-- Simon Riggs www.2ndQuadrant.com