Re: Timeline following for logical slots - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Timeline following for logical slots
Date
Msg-id 20160404151529.GD25969@awork2.anarazel.de
Whole thread Raw
In response to Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Timeline following for logical slots
List pgsql-hackers
On 2016-04-04 22:59:41 +0800, Craig Ringer wrote:
> Assuming that here you mean separate slots on different machines
> replicating via physical rep:

No,  I don't.

> We don't currently allow the creation of a logical slot on a standby. Nor
> replay from it, even to advance it without receiving the decoded
> changes.

Yes. I know.


> > I think the right way to do this is to focus on failover for logical
> > rep, with separate slots. The whole idea of integrating this physical
> > rep imo makes this a *lot* more complex than necessary. Not all that
> > many people are going to want to physical rep and logical rep.

> If you're saying we should focus on failover between nodes that're
> themselves connected using logical replication rather than physical
> replication, I really have to strongly disagree.
> 
> TL;DR for book-length below: We're a long, long way from being able to
> deliver even vaguely decent logical rep based failover.

I don't buy that.


> * Robust sequence decoding and replication. If you were following the later
> parts of that discussion you will've seen how fun that's going to be, but
> it's the simplest of all of the problems.

Unconvinced. People used londiste and slony for years without that, and
it's not even remotely at the top of the list of problems with either.


> * Logical decoding and sending of in-progress xacts, so the logical client
> can already be most of the way through receiving a big xact when it
> commits. Without this we have a huge lag spike whenever a big xact happens,
> since we must first finish decoding it in to a reorder buffer and can only
> then *begin* to send it to the client. During which time no later xacts may
> be decoded or replayed to the client. If you're running that rare thing,
> the perfect pure OLTP system, you won't care... but good luck finding one
> in the real world.

So? If you're using logical rep, you've always have that.


> * Either parallel apply on the client side or at least buffering of
> in-progress xacts on the client side so they can be safely flushed to disk
> and confirmed, allowing receive to continue while replay is done on the
> client. Otherwise sync rep is completely impractical... and there's no
> shortage of systems out there that can't afford to lose any transactions.
> Or at least have some crucial transactions they can't lose.

That's more or less trivial to implement. In an extension.


> * Robust, seamless DDL replication, so things don't just break randomly.
> This makes the other points above look nice and simple by comparison.

We're talking about a system which involves logical decoding. Whether
you have failover via physical rep or not, doesn't have anything to do
with this point.


> Physical rep *works*. Robustly. Reliably. With decent performance. It's
> proven. It supports sync rep. I'm confident telling people to use it.

Sure? And nothing prevents you from using it.


> I don't think there's any realistic way we're going to get there for
> logical rep in 9.6+n for n<2 unless a whole lot more people get on board
> and work on it. Even then.

I think the primary problem here is that you're focusing on things that
just are not very interesting for the majority of users, and which thus
won't get very enthusastic help.  The way to make progress is to get
something basic in, and then iterate from there.  Instead you're
focussing on the fringes; which nobody cares about, because the basics
aren't there.

FWIW, I plan to aggressively work on in-core (9.7) logical rep starting
in a few weeks. If we can coordinate on that end, I'm very happy, if not
then not.

> Right now we can deliver logical failover for DBs that:
> (b) don't do DDL, ever, or only do some limited DDL via direct admin
> commands where they can call some kind of helper function to queue and
> apply the DDL;
> (c) don't do big transactions or don't care about unbounded lag;
> (d) don't need synchronous replication or don't care about possibly large
> delays before commit is confirmed;
> (e) only manage role creation (among other things) via very strict
> processes that can be absolutely guaranteed to run on all nodes

And just about nothing of that has to do with any of the recent patches
send towards logical rep. I agree about the problems, but you should
attack them, instead of building way too complicated workarounds which
barely anybody will find interesting.


> ... which in my view isn't a great many databases.

Meh^2. Slony and londiste have these problems. And the reason they're
not used isn't primarily those, but that they're external and waaayyy to
complicated to use.


> Physical rep has *none* of those problems. (Sure, it has others, but we're
> used to them).

So?


-Andres



pgsql-hackers by date:

Previous
From: Artur Zakirov
Date:
Subject: Patch: fix typo, duplicated word in indexam.sgml
Next
From: David Steele
Date:
Subject: Re: [PROPOSAL] Client Log Output Filtering