Re: Changeset Extraction v7.6.1 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Changeset Extraction v7.6.1
Date
Msg-id 20140218090758.GJ7161@awork2.anarazel.de
Whole thread Raw
In response to Re: Changeset Extraction v7.6.1  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Changeset Extraction v7.6.1  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi Robert,

On 2014-02-17 20:31:34 -0500, Robert Haas wrote:
> 1. How safe is it to try to do decoding inside of a regular backend?
> What we're doing here is entering a special mode where we forbid the
> use of regular snapshots in favor of requiring the use of "decoding
> snapshots", and forbid access to non-catalog relations.  We then run
> through the decoding process; and then exit back into regular mode.
> On entering and on exiting this special mode, we
> InvalidateSystemCaches().  I don't see a big problem with having
> special backends (e.g. walsender) use this special mode, but I'm less
> convinced that it's wise to try to set things up so that we can switch
> back and forth between decoding mode and regular mode in a single
> backend.

The main reason the SQL interface exists is that it's awfully hard to
use isolationtester, pg_regress et al when the output isn't also visible
via SQL. We tried hacking things in other ways, but that's what it came
down to. If you recall, previously the SQL changes interface was only in
a test_logical_decoding extension, because I wasn't sure it's all that
interesting for real usecases.
It's sure nice for testing things though.

> I worry that won't end up working out very cleanly, and I
> think the prohibition against using this special mode in an
> XID-bearing transaction is merely a small downpayment on future pain
> in this area.

That restriction is in principle only needed when creating the slot, not
when getting changes. The only problem is that some piece of code
doesn't know about it.

The reason it exists are twofold: One is that when looking for an
initial snapshot, we wait for concurrent transactions to end. If we'd
wait for the transaction itself we'd be in trouble, it could never
happen. The second reason is that the code do a XactLockTableWait() to
"visualize" it's waiting, so isolatester knows it should background the
command. It's not good to wait on itself.
But neither is actually needed when not creating the slot, the code just
needs to be told about that.

> That having been said, I can't pretend at this point
> either to understand the genesis of this particular restriction or
> what other problems are likely to crop up in trying to allow this
> mode-switching.  So it's possible that I'm overblowing it, but it's
> makin' me nervous.

I am not terribly concerned, but I can understand where you are coming
from. I think for replication solutions this isn't going to be needed
but it's way much more handy for testing and such.

> 2. I think the snapshot-export code is fundamentally misdesigned.  As
> I said before, the idea that we're going to export one single snapshot
> at one particular point in time strikes me as extremely short-sighted.

I don't think so. It's precisely what you need to implement a simple
replication solution. Yes, there are usecases that could benefit from
more possibilities, but that's always the case.

>  For example, consider one-to-many replication where clients may join
> or depart the replication group at any time.  Whenever somebody joins,
> we just want a <snapshot, LSN> pair such that they can apply all
> changes after the LSN except for XIDs that would have been visible to
> the snapshot.

And? They need to create individual replication slots, which each will
get a snapshot.

> And in fact, we don't even need any special machinery
> for that; the client can just make a connection and *take a snapshot*
> once decoding is initialized enough.

No, they can't. Two reasons: For one the commit order between snapshots
and WAL isn't necessarily the same. For another, clients now need logic
to detect whether a transaction's contents has already been applied or
has not been applied yet, that's nontrivial.

> This code is going to great
> pains to be able to export a snapshot at the precise point when all
> transactions that were running in the first xl_running_xacts record
> seen after the start of decoding have ended, but there's nothing
> magical about that point, except that it's the first point at which a
> freshly-taken snapshot is guaranteed to be good enough to establish an
> initial state for any table in the database.

I still maintain that there's something magic about that moment. It's
when all *future* (from the POV of the snapshot) changes will be
streamed, and all *past* changes are included in the exported snapshot.

> But do you really want to keep that snapshot around long enough to
> copy the entire database?  I bet you don't: if the database is big,
> holding back xmin for long enough to copy the whole thing isn't likely
> to be fun.

Well, that's how pg_dump works, it's not this patch's problem to fix
that.

> You might well want to copy one table at a time, with
> progressively newer snapshots, and apply to each table only those
> transactions that weren't part of the initial snapshot for that table.
>  Many other patterns are possible.  What you've got baked in here
> right now is suitable only for the simplest imaginable case, and yet
> we're paying a substantial price in implementation complexity for it.

Which implementation complexity are you talking about? The relevant code
is maybe 50-60 lines?

> Frankly, this code is *ugly*; the fact that SnapBuildExportSnapshot()
> needs to start a transaction so that it can push out a snapshot.  I
> think that's a pretty awful abuse of the transaction machinery, and
> the whole point of it, AFAICS, is to eliminate flexibility that we'd
> have with simpler approaches.

It's not my idea that the snapshot importing requires that
restriction. We could possibly lift it and replace it by another check,
but I don't really see the problem.

> 3. As this feature is proposed, the only plugin we'll ship with 9.4 is
> a test_decoding plugin which, as its own documentation says, "doesn't
> do anything especially useful."  What exactly do we gain by forcing
> users who want to make use of these new capabilities to write C code?

It gains us to have a output plugin in which we can easily demonstrate
features so they can be tested in the regression tests. Which I find to
be rather important.
Just like e.g. the test_shm_mq stuff doesn't do anything really useful.

> You previously stated that it wasn't possible (or there wasn't time)
> to write something generic, but how hard is it, really?  Sure, people
> who are hard-core should have the option to write C code, and I'm
> happy that they do.  But that shouldn't, IMHO anyway, be a requirement
> to use that feature, and I'm having trouble understanding why we're
> making it one.

I think the commmunity will step up and provide further plugins. In
fact, there's already been a json plugin on the mailinglist.

> The test_decoding plugin doesn't seem tremendously
> much simpler than something that someone could actually use, so why
> not make that the goal?

For one, it being a designated toy plugin allows us to easily change it,
to showcase/test new features. For another, I still don't agree that
it's easy to agree to an output format. I think we should include some
that matured into 9.5.

Thanks,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
Next
From: Andres Freund
Date:
Subject: Re: Changeset Extraction v7.6.1