Hello,
I am playing with a script that implements physical backups by snapshotting the EBS-backed software RAID. My basic
workflowis this:
1. Stop PG on the slave
2. pg_start_backup on the master
3. On the slave:
A. unmount the PG RAID
B. snapshot each disk in the raid
C. mount the PG RAID
4. pg_stop_backup
5. Restart PG on the slave
Step 3 is actually quite fast, however, on the master, I end up seeing the following warning:
WARNING: transaction log file "00000001000000CC00000076" could not be archived: too many failures
I am guessing (I will confirm with timestamps later) this warning happens during steps 3A-3C, however my questions
belowstand regardless of when this failure occurs.
It is worth noting that, the slave (seemingly) catches up eventually, recovering later log files with streaming
replicationcurrent. Can I trust this state?
Should I be concerned about this warning? Is it a simple blip that can easily be ignored, or have I lost data? From
googling,it looks like retry attempts is not a configurable parameter (it appears to have retried a handful of times).
If this is indeed a real problem, am I best off changing my archive_command to retain logs in a transient location when
Iam in "snapshot mode", and then ship them in bulk once the snapshot has completed? Are there any other remedies that I
ammissing?
Thank you very much for your time,
Andrew Hannon