Re: Timeline following for logical slots - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Timeline following for logical slots
Date
Msg-id 20160404100116.GB25969@awork2.anarazel.de
Whole thread Raw
In response to Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On 2016-04-04 17:50:02 +0800, Craig Ringer wrote:
> To rephrase per my understanding: The client only specifies the point it
> wants to start seeing decoded commits. Decoding starts from the slot's
> restart_lsn, and that's the point from which the accumulation of reorder
> buffer contents begins, the snapshot building process begins, and where
> accumulation of relcache invalidation information begins. At restart_lsn no
> xact that is to be emitted to the client may yet be in progress. Decoding,
s/yet/already/
> whether or not the xacts will be fed to the output plugin callbacks,
> requires access to the system catalogs. Therefore catalog_xmin reported by
> the slot must be >= the real effective catalog_xmin of the heap and valid
> at the restart_lsn, not just the confirmed flush point or the point the
> client specifies to resume fetching changes from.

Hm. Maybe I'm misunderstanding you here, but doesn't it have to be <=?

> On the original copy of the slot on the pre-failover master the restart_lsn
> would've been further ahead, as would the catalog_xmin. So catalog rows
> have been purged.
+may


> So it's necessary to ensure that the slot's restart_lsn and catalog_xmin
> are advanced in a timely, consistent manner on the replica's copy of the
> slot at a point where no vacuum changes to the catalog that could remove
> needed tuples have been replayed.

Right.


> The only way I can think of to do that really reliably right now, without
> full failover slots, is to use the newly committed pluggable WAL mechanism
> and add a hook to SaveSlotToPath() so slot info can be captured, injected
> in WAL, and replayed on the replica.

I personally think the primary answer is to use separate slots on
different machines. Failover slots can be an extension to that at some
point, but I think they're a secondary goal.


> It'd also be necessary to move
> CheckPointReplicationSlots() out of CheckPointGuts()  to the start of a
> checkpoint/restartpoint when WAL writing is still permitted, like the
> failover slots patch does.

Ugh. That makes me rather wary.


> Basically, failover slots as a plugin using a hook, without the
> additions to base backup commands and the backup label.

I'm going to be *VERY* hard to convince that adding a hook inside
checkpointing code is acceptable.


> I'd really hate 9.6 to go out with - still - no way to use logical decoding
> in a basic, bog-standard HA/failover environment. It overwhelmingly limits
> their utility and it's becoming a major drag on practical use of the
> feature. That's a difficulty given that the failover slots patch isn't
> especially trivial and you've shown that lazy sync of slot state is not
> sufficient.

I think the right way to do this is to focus on failover for logical
rep, with separate slots. The whole idea of integrating this physical
rep imo makes this a *lot* more complex than necessary. Not all that
many people are going to want to physical rep and logical rep.


> The restart_lsn from the newer copy of the slot is, as you said, a point we
> know we can reconstruct visibility info.

We can on the master. There's absolutely no guarantee that the
associated serialized snapshot is present on the standby.


Andres



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Timeline following for logical slots
Next
From: Amit Langote
Date:
Subject: Re: PATCH: use foreign keys to improve join estimates v1