On Fri, 2008-05-02 at 18:11 +0000, Wojciech Strzalka wrote:
> I've discovered problem with WAL recovery on standby server.
Thanks for reporting it.
> I start copy database to the second
> machine which takes me 30 minutes.
> The output from pg_standby:
> ------------------------------------
> Trigger file : /tmp/pgsql.promote_trigger.5432
> Waiting for WAL file : 00000001.history
> WAL file path : /var/lib/pgsql/incoming_wal/
> 00000001.history
> Restoring to... : pg_xlog/RECOVERYHISTORY
> Sleep interval : 5 seconds
> Max wait interval : 0 forever
> Command for restore : ln -s -f "/var/lib/pgsql/incoming_wal/
> 00000001.history" "pg_xlog/RECOVERYHISTORY"
> Keep archive history : 0000000100000001000000DB and later
...................................................^^
> running restore : OK
Looks like the first message gives and incorrect archive cut-off point.
My guess is that its picking up a checkpoint that happens during backup
as the starting point for the cutoff, which is incorrect. My guess at a
fix is that we need to make sure that the %r value is always less than
or equal to the %f value. Sounds like a good safeguard even if that
doesn't fix this specific bug.
I'll check and see if I can recreate it, then patch. Won't be tonight
though.
If I'm right, then removing the %r value from the restore_command should
make this work correctly as a workaround. Please let me know either way.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com