Re: Timeline following for logical slots - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Timeline following for logical slots
Date
Msg-id CA+TgmoZTyEGPCmpjebJZd3nQarOCUBzT+PnQXECa_RWYHRzHVA@mail.gmail.com
Whole thread Raw
In response to Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Timeline following for logical slots  (Craig Ringer <craig@2ndquadrant.com>)
Re: Timeline following for logical slots  (Petr Jelinek <petr@2ndquadrant.com>)
List pgsql-hackers
On Tue, Apr 5, 2016 at 3:51 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> First, I'd like to remind everyone that logical decoding is useful for more
> than replication. You can consume the change stream for audit
> logging/archival, to feed into a different DBMS, etc etc. This is not just
> about replicating from one PostgreSQL to another, though to be sure that'll
> be the main use in the near term.
>
> The Zalando guys at least are already using it for other things, and
> interest in the json support suggests they're not alone.

I have not forgotten any of that, nor do I consider it unimportant.

> Right now if you're doing any kind of logical deocoding from a master server
> that fails over to a standby the client just dies. The slot vanishes. You're
> stuffed. Gee, I hope you didn't need all that nice consistent ordering,
> because you're going to have to start from scratch and somehow reconcile the
> data in the new master with what you've already received ... and haven't.

Right, but right now you probably *aren't* do doing any kind of
logical decoding from a master server to a standby, because there's
squat in the core distribution that could make use of that capability.
So you never get as far as discovering this problem.

> I don't understand why it seems to be considered OK for logical slots to
> just vanish on failover. The only other things I can think of where that's
> considered OK are unlogged tables (because that's the point and we have
> failover-safe ones too) and the old hash indexes nobody's quite willing to
> remove yet.

First, it wasn't until 9.3 that physical standbys could follow
timeline switches, but that doesn't mean that streaming replication
was useless in 9.0 - 9.2, or that warm standby was useless in earlier
versions.  Logical decoding isn't useless without that capability
either.  Would it be nice if we did have that capability?  Of course.

Second, I'm not sure whether it was a good design decision to make
logical slots a special kind of object that sit off to the side,
neither configuration (like postgresql.conf) nor WAL-protected data
(like pg_clog and the data files themselves), but it was certainly a
very deliberate decision.  I sort of expected them to be WAL-logged,
but Andres argued (not unconvincingly) that we'd want to have slots on
standbys, and making them WAL-logged would preclude that.  So I don't
really think that this is much like hash indexes, which just never got
properly finished.  It's more like unlogged tables, where a deliberate
design decision to lose data was made in order to meet some other
goal.

>> realistically, there are a lot of people who simply don't change their
>> schema all that often, and who could (and might even prefer to) manage
>> that process in other ways - e.g. change nodes one by one while they
>> are off-line, then bring them on-line.
>
> Yeah, it's a bit more complex than that. Schema changes *must* be made at a
> specific point in replay. You can't generally just ALTER TABLE ... DROP
> COLUMN on the master then do the same thing on the replica. The replica
> probably still has un-replayed changes from the master that have the
> now-dropped column in their change stream, but now it can't apply them to
> the new table structure on the downstream. This particular case can be
> worked around, but column type changes, addition of non-null columns etc
> cannot.

There are certainly problem cases, but I'm not sure they really arise
that much in practice.  If you retype a column from text to integer,
you probably aren't storing anything in it other than integers, in
which case it is not necessarily the case that you are locked into
applying that change at a particular point in the change stream.  If
you are storing non-integers in a text column and relying on a USING
clause to make them look like integers during the conversion, then,
yes, that has to be done at a precise point in the change stream.  But
that's a pretty strange thing to do, and your application is most
likely going to get confused anyway, so you are probably taking a
maintenance window for the changeover anyway - in which case, there's
not really a big problem.  You can run the same change at the same
time on both servers while nothing else is happening.

> Also, a growing portion of the world uses generated schemas and migrations
> and can't easily pipe that through some arbitrary function we provide. They
> may not be too keen on stopping writes to their entire database as an
> alternative either (and we don't provide a true read-only mode for them to
> use if they did want to). I know it's fashionable around here to just write
> them off as idiots for using ORMs and so on, but they're rather widespread
> idiots who're just as likely to be interested in geographically distributed
> selective replication, change stream extraction, etc. This has been a
> persistent problem with people who want to use BDR, too.

It's not my intent to write anyone off as an idiot.

> Review and test responses have been pretty underwhelming for pglogical, and
> quite a bit seem to have boiled down to "this should live as an extension,
> we don't need it in core". It often feels like we can't win: if we seek to
> get it into core we're told it's not wanted/needed, but if we try to focus
> on solving issues in core to make it work better and let it live as an
> extension we're told we shouldn't bother until it's in core.

To be honest, I was shocked that pglogical and pglogical_output didn't
go into this release.  I assumed that you and other folks at
2ndQuadrant were going to make a big push to get that done.  I did
take a brief look at one of them - pglogical, I think - a week or two
ago but there were unaddressed review comments that had been pending
for months and there were a lot of fairly obvious things that needed
to be done before it could be seriously considered as a core
submission.  Like, for example, rewriting the documentation heavily
and making it look like the rest of our docs, and putting it in SGML
format.  The code seemed to need quite a bit of cleanup, too.  Now,
logical replication is a sufficiently important feature that if the
only way it's going to get into core is if I work on it myself, or get
other people at EnterpriseDB to do so, then I'll try to make that
happen.  But I was assuming that that was your/2ndQuadrant's patch,
that you were going to get it in shape, and that me poking my nose
into it wasn't going to be particularly welcome.  Maybe I've misread
the whole dynamic here.

> Do you want to get a logical replication system into core that doesn't work
> properly with lots of the other features in PostgreSQL? That's historically
> not how we've done things here, and sometimes massive amounts of work have
> been required to make new feature X work with obscure/awkward existing
> feature Y. Inheritance comes strongly to mind as an exciting challenge for
> many new features to support properly. Arguments are usually raised that
> reference things like the mess created by MySQL's storage engines and how
> we're better off making everything work right once and for all.

So, let me be clear, here.  You have every right to decide which
feature you want to work on, and if, at the current time, failover
slots is that thing and if, at the current time, in-core logical
replication solution is not that thing, then that is entirely up to
you.  I don't have any right to tell you what to work on, and I'm not
trying to tell you what to work on.  I'm giving you my opinion on what
I would work on if I were going to do some work related to logical
replication - nothing more.

That being said, if we get a logical replication system into core that
doesn't do DDL, doesn't do multi-master, doesn't know squat about
sequences, and rolls over and dies if a timeline switch happens, I
would consider that a huge step forward and I think a lot of other
people would, too.  We have a long history of building features
incrementally.  Parallel query in 9.6 doesn't parallelize every query
that can be parallelized, postgres_fdw has atrocious performance on
simple queries like SELECT count(*) FROM ft, and autovacuum existed
for a long time before it was turned on by default.  Some people fail
to get patches committed because they set their expectations too low,
and we come back and say "well, that's nice, but we really need a
little more here in order to consider this feature complete".  But
plenty of people also make the opposite mistake, of thinking that they
have to fix everything at once in order to have anything worthwhile,
and that's just as much of a trap as the other thing.  Even with lots
of limitations, built-in logical replication could still be good
enough to be used long enough to manage a major version upgrade -
people who have this problem today are using Slony, and I believe that
even a really basic version of logical rep could have significant
advantages over Slony in terms of both performance and ease of
configuration.

> Still, I don't really want to block work on making logical decoding more
> real-world usable on inclusion of a logical replication system for
> PostgreSQL, especially one that'll be lucky to get in for 9.7 at the
> earliest.

Well, as noted above, I think the main thing we need to figure out is
who is going to do the work.  The main thing blocking other people
from working on it is the belief that we are just waiting for you or
someone else at 2ndQuadrant to land the necessary patches, but if you
aren't really working on that in a focused way, then somebody else can
step up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [BUGS] Breakage with VACUUM ANALYSE + partitions
Next
From: Amit Langote
Date:
Subject: Re: Materialized views vs. primary keys