Replication origins conflate two separate functions - Mailing list pgsql-hackers

From Craig Ringer
Subject Replication origins conflate two separate functions
Date
Msg-id CAMsr+YEViEm8sshLeK6CZV1rx7XadiWf9t63aVA2wfyHDzxwcg@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi folks

During some recent work with a plugin (pglogical) that uses replication origins heavily, it's become apparent that replication origins conflate two orthogonal features into one thing. There's "replication origins (session replay progress tracking)" and "replication origins (per-transaction commit origin tracking)".

TL;DR: replication origins should be able to be set independently for  txn for node-of-origin purposes, and for a session for position-tracking purposes.



This comes up in a few places:

- Sometimes you're replaying a txn with both a proximate origin (immediate upstream) and ultimate origin (the node that originally wrote it). You should really store the ultimate upstream as the commit's replication origin for conflict resolution etc purposes. But you have set your session up with the proximate origin for replay position tracking purposes. If you set up your session then change replorigin_session_origin to the ultimate origin, *crash recovery will update the wrong replication origin's position tracking*.

- It's not currently possible to use replication origins for replay position tracking without also storing committs origin data (if committs is on), because setting replorigin_session_origin=InvalidRepOriginId turns off replorigin handling entirely.

- Using replorigin_advance is no substitute because it cannot be crash-safe. Either it risks skipping some changes, or replaying some twice.


It hasn't been much of an issue because nobody's been doing a great deal with parallelism, re-syncing tables, replication with hop distances other than 1, etc. But pglogical does support resyncing tables, adding new tables to existing subscriptions with an initial sync, cascading, etc, and we're starting to run into these issues. They'll no doubt be a problem for Pg core logical rep down the track, too.

For example, to resync a table pglogical makes a new slot for the copy, does a COPY from the new slot's snapshot, then a post-COPY replay from the replication slot, replaying only tuples for the table of interest, until it has caught up with the main apply position.

The problem is that there is no crash-safe way to record both the initial origin of the tuples (committs replication origin) and the replay progress during post-COPY catchup of changes to the table. You have to write a separate "origin for copy purposes" or something, then keep track of them. So say you're using a temp slot and do everything in one txn. You don't need origins for position tracking then, but if you set up the main session origin for your copy (for committs purposes) you have issues with exclusive locking of origins because the apply session already has it locked. Or if you negotiate that, you still can't do multiple parallel copies.

The tuples came from the same origin as the main apply process, so you want that origin set on the committs (e.g. origin=1) for correct conflict detection and reporting, etc.

To me this means that the commit record's replorigin info should separately track the (origin,lsn) for replay progress tracking and the (origin,committs) for xact info. It should be possible to record one or the other independently. A session replorigin should not have to be set in order to record the (origin,committs) info, it should only be needed for replay position tracking.

AFAICS that comes down to one extra RepOriginId in commit records with origins. There's no need for an extra flag bit, we'd just expand the existing XLOG_INCLUDE_ORIGIN and add a member to the xl_origin subrecord.

Then we can support using origins for just position tracking, for just commit timestamp origin metadata, or for both.

No BC-breaking changes would occur in the SQL UI anywhere. I'd probably add an extra arg to pg_replication_origin_xact_setup(...) that lets you set a per-xact origin, and tweak how replorigins tracks state so you can use pg_replication_origin_xact_setup without an active session-origin. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Vladimir Sitnikov
Date:
Subject: Re: Proposal: http2 wire format
Next
From: Vladimir Sitnikov
Date:
Subject: Re: Proposal: http2 wire format