Our sync script is setup to fail if the pg_start_backup fails as if it fails for some other reason the sync won't be valid as the backup_label file will be missing so the slave won't have the correct location to restart from.
Originally I had gone down the road of changing the sync script such that if the pg_start_backup failed and the backup_label file existed it would sync the backup_label right away so it could then do the sync. It was also setup so that if it didn't start the backup it wouldn't stop the backup. This however didn't work as if the DR starts the backup and begins it sync first, and the local slave then goes to startup and the backup is already in progress it would complete the sync faster then the DR and then try to start up. But the local slave would not come up into hot standby until the stop_backup was executed (it came up but would never switch over to allow readonly queries).
At that point I was going to change the script to basically be whoever got to the point of needing to stop the backup first would call stop backup. But the new procedure of calling start and then stop right away seems simpler (it makes the slave startup script simpler for sure).
On Wed, Sep 19, 2012 at 10:05 AM, Mike Roest
<mike.roest@replicon.com> wrote:
Specifically what is the error?
psql (9.1.5)
Type "help" for help.
postgres=# select pg_start_backup('hotbackup',true);
pg_start_backup
-----------------
61/B000020
(1 row)
postgres=# select pg_start_backup('hotbackup',true);
ERROR: a backup is already in progress
HINT: Run pg_stop_backup() and try again.
postgres=#