I am getting some weird archiving failed messages in the logs but nothing is failing. I *think* its just exiting with a non zero code from the script.
Sound ok, now here is the weird part. Archiving is NOT failing, its working. There are no *.ready files in the archive_status directory. In fact the below wal file is not on the server. Yet postgres complains about this file every time the archiver runs, in fact I have over a thousand messages telling me over the past 24 hours, same file every time, no *.ready files and everything is archiving. We have other servers doing the same thing. So I suspect the script that runs the archive is to blame, although we tried redirecting to debug and didn't find anything yet. As if this is not strange enough the error message will appear for hour blocks of time. For example 2:00pm - 2:59pm, errors on what looks like every run, but then the errors will stop and nothing until another random hour like 11:00pm.
2015-01-09 02:00:50.478 CST,,,31084,,54a2e24d.796c,49091,,2014-12-30 11:35:09 CST,,0,LOG,00000,"archive command failed with exit code 1","The failed archive command was: /path/to/archive/script pg_xlog/0000000100002ED0000000CB 0000000100002ED0000000CB 9.1/clustername",,,,,,,,""
2015-01-09 02:00:50.479 CST,,,31084,,54a2e24d.796c,49092,,2014-12-30 11:35:09 CST,,0,WARNING,01000,"transaction log file ""0000000100002ED0000000CB"" could not be archived: too many failures",,,,,,,,,""
postgres=# select version();
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit
(1 row)