Re: Timeline following for logical slots - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Timeline following for logical slots
Date
Msg-id CAMsr+YEPdRkoZA+uN90vSmXT9GYj0UFaYMn9=O11XW4L3cvosg@mail.gmail.com
Whole thread Raw
In response to Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Timeline following for logical slots
List pgsql-hackers
On 1 March 2016 at 21:00, Craig Ringer <craig@2ndquadrant.com> wrote:
Hi all

Per discussion on the failover slots thread (https://commitfest.postgresql.org/9/488/) I'm splitting timeline following for logical slots into its own separate patch.


I've updated the logical decoding timeline following patch to fix a bug found as a result of test development related to how Pg renames the last WAL seg on the old timeline to suffix it with .partial on promotion. The xlogreader must switch to reading from the newest-timeline version of a given segment eagerly, for the first page of the segment, since that's the only one guaranteed to actually exist.

I'd really appreciate some review of the logic there by people who know timelines well and preferably know the xlogreader. It's really just one function and 2/3 comments; the code is simple but the reasoning leading to it is not.


I've also attached an updated version of the tests posted a few days ago.  The tests depend on the remaining patches from the TAP enhancements tree so it's easiest to just get the whole tree from https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following (subject to regular rebases and force pushes, do not use as a base).

The tests now include a test module that exposes some slots guts to SQL to allow the client to sync slot state from master to replica(s) without needing failover slots and the use of extra WAL as transport. It's very much for-testing-only. 

The new test module is used by a second round of tests to demonstrate the practicality of failover of a logical replication client to a physical replica using a base backup taken by pg_basebackup and without the presence of failover slots. I won't pretend it's pretty.

This proves that the approach works barring unforseen showstoppers. It also proves it's pretty ugly - failover slots provide a much, MUCH simpler and safer way for clients to achieve this with way less custom code needed by each client to sync slot state.

I've got a bit of cleanup to do in the test suite and a few more tests to write for cases where the slot on the replica is allowed to fall behind the slot on the master but this is mostly waiting on the remaining two TAP test patches before it can be evaluated for possible push.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: postgres_fdw vs. force_parallel_mode on ppc
Next
From: Tom Lane
Date:
Subject: Re: silent data loss with ext4 / all current versions