Failed to archive WAL segment `pg_xlog/00000002000011E800000012` on host `localhost:30022
I'm having spikes that cause me outage every 15 minutes.. I believe the cause of those spikes is that error above.
The server was rebooted and a parameter on postgres.conf was changed:
shared_buffer.
So i don't believe the cause of this is that change.
Before the reboot on the server, everything was working.
I just can't find the solution.
What I did:
1 - I can connect via postgres user between all the servers
2 - the file 00000002000011E800000012 is into the master /pg_xlog (it was already there)
2 - the file 00000002000011E800000012 is into the slaves server /9.2/data/wal_archive (it was already there)
So the question that comes to my mind - taking the above at face value - is that the archive_command is failing because it wants to archive said wal segment but when it goes to do so it finds that said segment already exists in the target location. It correctly fails to potentially corrupt the remote file and due to the error will likewise not remove the master segment.
If you are certain, or can become certain, that the remote files are identical to the one on the server, it would seem that manually removing the wal segment on the master would resolve the deadlock. I am not recommending that you do this. But it is an option to consider. There are too many unknowns still present, and my own inexperience, to actually allow me to recommend something definitive.
Actually, strike that...the system knows which one it is trying to archive so simply removing it likely won't work out well. i.e., it probably won't just move onto the next file in the directory. I'm not positive in either case.