In an effort to track down the problem, I switched to using rsync rather
than scp to copy the files. I also take the SHA1 hash on each end, and
have my archiving script exit with a non-zero status if there's a mismatch.
Sure enough:
Oct 27 14:26:35 colo2vs1 canit-failover-wal-archive[29118]:
Warning: rsync succeeded, but local_sha1 1fe9fc62b2a05d21530decac1c5442969adc5819
!= remote_sha1 4f9f8bcd151129db64acd05470f0f05954b56232 !!
This is a "can't happen" situation, so I have to investigate bugs in rsync,
ssh, the kernel, the network, the disk.... bleah.
But I'm pretty sure it's not a PostgreSQL problem.
(My script exits with non-zero status if the SHA1s mismatch, and PostgreSQL
re-archives the WAL a short time later, and that succeeds, so I'm happy
for now.)