Timeline following for logical slots - Mailing list pgsql-hackers

From Craig Ringer
Subject Timeline following for logical slots
Date
Msg-id CAMsr+YH-C1-X_+s=2nzAPnR0wwqJa-rUmVHSYyZaNSn93MUBMQ@mail.gmail.com
Whole thread Raw
Responses Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Hi all

Per discussion on the failover slots thread (https://commitfest.postgresql.org/9/488/) I'm splitting timeline following for logical slots into its own separate patch.

The attached patch fixes an issue I found while testing the prior revision: it would read WAL from WAL segments on the old timeline up until the timeline switch boundary, but this doesn't work if the last WAL segment on the timeline has been renamed to append the .partial suffix.

Instead it's necessary to eagerly switch to reading the WAL segment from the newest timeline on that segment. We'll still be reading WAL records from the correct timeline since the partial WAL segment from the old timeline gets copied to a new name on promotion, but we're reading it from the newest copy of that segment, which is either complete and archived or is still being written to by the current timeline.

For example, if the old master was on timeline 1 and writing to 000000010000000000000003 when it dies and we promote a streaming replica, the replica will copy 000000010000000000000003 to 000000020000000000000003 and append its recovery checkpoint to the copy. It renames 000000010000000000000003 to 000000010000000000000003.partial, which means the xlogreader won't find it. If we're reading the record at 0/3000000 then even though 0/3000000 is on timeline 1, we have to read it from the segment on timeline 2.

Fun, eh?

(I'm going to write a README.timelines to document some of this stuff soon, since it has some pretty hairy corners and some of the code paths are a bit special.)

I've written some initial TAP tests for timeline following that exploit the fact that replication slots are preserved on a replica if the replica is created with a filesystem level copy that includes pg_replslot, rather than using pg_basebackup. They are not included here because they rely on TAP support improvements (filesystem backup support, psql enhancements, etc) that I'll submit separately, but they're how I found the .partial issue.

A subsequent patch can add testing of slot creation and advance on replicas using a C test extension to prove that this approach can be used to achieve practical logical failover for extensions.

I think this is ready to go as-is.

I don't want to hold it up waiting for test framework enhancements unless those can be committed fairly easily because I think we need this in 9.6 and the tests demonstrate that it works when run separately.

See  for a git tree containing the timeline following patch, TAP enhancements and the tests for timeline following.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Equivalent of --enable-tap-tests in MSVC scripts
Next
From: Craig Ringer
Date:
Subject: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.