Re: Changeset Extraction Interfaces - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Changeset Extraction Interfaces
Date
Msg-id CA+TgmoYXhW1fhewbeNWixMge+P3A7m3d9YEFLW_WjRH1i+nvmw@mail.gmail.com
Whole thread Raw
In response to Re: Changeset Extraction Interfaces  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Changeset Extraction Interfaces
List pgsql-hackers
On Fri, Dec 13, 2013 at 9:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> If you imagine a scenario where somebody establishes a replication
>> slot and then keeps it forever, not often.  But if you're trying to do
>> something more ad hoc, where replication slots might be used just for
>> short periods of time and then abandoned, I think it could come up
>> pretty frequently.
>
> But can you imagine those users needing an exported snapshot? I can
> think of several short-lived usages, but all of those are unlikely to
> need a consistent view of the overall database. And those are less
> likely to be full blown replication solutions.
> I.e. it's not the DBA making that decision but the developer making the
> decision based on whether he requires the snapshot or not.

Well, it still seems to me that the right way to think about this is
that the change stream begins at a certain point, and then once you
cross a certain threshold (all transactions in progress at that time
have ended) any subsequent snapshot is a possible point from which to
roll forward.  You'll need to avoid applying any transactions that are
already included during the snapshot, but I don't really think that's
any great matter.  You're focusing on the first point at which the
consistent snapshot can be taken, and on throwing away any logical
changes that might have been available before that point so that they
don't have to be ignored in the application code, but I think that's
myopic.

For example, suppose somebody is replication tables on node A to node
B.  And then the decide to replicate some of the same tables to node
C.  Well, one way to do this is to have node C connect to node A and
acquire its own slot, but that means decoding everything twice.
Alternatively, you could reuse the same change stream, but you'll need
a new snapshot to roll forward from.  That doesn't seem like a problem
unless the API makes it a problem.

>> Generally, I think you're being too dismissive of the stuff I'm
>> complaining about here.  If we just can't get this, well then I
>> suppose we can't.
>
> I think I am just scared of needing to add more features before getting
> the basics done and in consequence overrunning 9.4...

I am sensitive to that.  On the other hand, this API is going to be a
lot harder to change once it's released, so we really need to avoid
painting ourselves into a corner with v1.  As far as high-level design
concerns go, there are three things that I'm not happy with:

1. Slots.  We know we need physical slots as well as logical slots,
but the patch as currently constituted only offers logical slots.
2. Snapshot Mangement.  This issue.
3. Incremental Decoding.  So that we can begin applying a really big
transaction speculatively before it's actually committed.

I'm willing to completely punt #3 as far as 9.4 is concerned, because
I see a pretty clear path to fixing that later.  I am not yet
convinced that either of the other two can or should be postponed.

>> Right.  I think your idea is good, but maybe there should also be a
>> version of the function that never confirms receipt even if the
>> transaction commits.  That would be useful for ad-hoc poking at the
>> queue.
>
> Ok, that sounds easy enough, maybe
> pg_decoding_slot_get_[binary_]changes()
> pg_decoding_slot_peek_[binary_]changes()
> ?

s/pg_decoding_slot/pg_logical_stream/?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Useless "Replica Identity: NOTHING" noise from psql \d
Next
From: Tom Lane
Date:
Subject: Re: clang's -Wmissing-variable-declarations shows some shoddy programming