Josh Berkus <josh@agliodbs.com> wrote:
> Currently, if archive_command is failing, pg_stop_backup() will hang
> forever. The only way to figure out what's wrong with pg_stop_backup()
> is to tail the PostgreSQL logs. This is difficult for users to
> troubleshoot, and strongly resists any kind of automation.
That is bad.
> Yes, we can work around this by setting statement_timeout, but that has
> two issues (a) the user has to remember to do it before the problem
> occurs, and (b) it won't differentiate between archive failure and other
> reasons it might time out.
Clearly not a long-term solution.
> As such, I propose that pg_stop_backup() should error with an
> appropriate error message ("Could not archive WAL segments") after
> three
> archiving attempts. We could also add an optional parameter to raise
> the number of attempts from the default of three.
That sounds sane to me.
> An alternative, if we were doing this from scratch, would be for
> pg_stop_backup to return false or -1 or something if it couldn't
> archive; there are reasons why a user might not care that
> archive_command was failing (shared storage comes to mind). However,
> that would be a surprising break with backwards compatability, since
> currently users don't check the result value of pg_stop_backup().
Some might, which is a stronger argument against changing what get
returned. Even in a green field though, I would argue that
pg_stop_backup() should return information about the minimum range
of WAL files needed to perform a consistent recovery -- or possibly
duplicate everything in the backup history file. An error seems
much more appropriate to indicate that the user does not have a
valid backup.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company