Hi all
The attached patch fixes an issue I found while testing the prior revision: it would read WAL from WAL segments on the old timeline up until the timeline switch boundary, but this doesn't work if the last WAL segment on the timeline has been renamed to append the .partial suffix.
Instead it's necessary to eagerly switch to reading the WAL segment from the newest timeline on that segment. We'll still be reading WAL records from the correct timeline since the partial WAL segment from the old timeline gets copied to a new name on promotion, but we're reading it from the newest copy of that segment, which is either complete and archived or is still being written to by the current timeline.
For example, if the old master was on timeline 1 and writing to 000000010000000000000003 when it dies and we promote a streaming replica, the replica will copy 000000010000000000000003 to 000000020000000000000003 and append its recovery checkpoint to the copy. It renames 000000010000000000000003 to 000000010000000000000003.partial, which means the xlogreader won't find it. If we're reading the record at 0/3000000 then even though 0/3000000 is on timeline 1, we have to read it from the segment on timeline 2.
Fun, eh?
(I'm going to write a README.timelines to document some of this stuff soon, since it has some pretty hairy corners and some of the code paths are a bit special.)
I've written some initial TAP tests for timeline following that exploit the fact that replication slots are preserved on a replica if the replica is created with a filesystem level copy that includes pg_replslot, rather than using pg_basebackup. They are not included here because they rely on TAP support improvements (filesystem backup support, psql enhancements, etc) that I'll submit separately, but they're how I found the .partial issue.
A subsequent patch can add testing of slot creation and advance on replicas using a C test extension to prove that this approach can be used to achieve practical logical failover for extensions.
I think this is ready to go as-is.
I don't want to hold it up waiting for test framework enhancements unless those can be committed fairly easily because I think we need this in 9.6 and the tests demonstrate that it works when run separately.
See for a git tree containing the timeline following patch, TAP enhancements and the tests for timeline following.
--