Re: logical changeset generation v6.2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: logical changeset generation v6.2
Date
Msg-id CA+TgmoZ4tQJCwpgx3O=7uuC7KLc+TKfP0GVBHjTPvj_O46WhVA@mail.gmail.com
Whole thread Raw
In response to Re: logical changeset generation v6.2  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: logical changeset generation v6.2
List pgsql-hackers
On Thu, Oct 10, 2013 at 7:11 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi Robert,
>
> On 2013-10-09 14:49:46 -0400, Robert Haas wrote:
>> I spent some time looking at the sample plugin (patch 9/12).  Here are
>> some review comments:
>>
>> - I think that the decoding plugin interface should work more like the
>> foreign data wrapper interface.  Instead of using pg_dlsym to look up
>> fixed names, I think there should be a struct of function pointers
>> that gets filled in and registered somehow.
>
> You mean something like CREATE OUTPUT PLUGIN registering a function with
> an INTERNAL return value returning a filled struct? I thought about
> that, but it seemed more complex. Happy to change it though if it's
> preferred.

I don't see any need for SQL syntax.  I was just thinking that the
_PG_init function could fill in a structure and then call
RegisterLogicalReplicationOutputPlugin(&mystruct).

>> - Still wondering how we'll use this from a bgworker.
>
> Simplified code to consume data:

Cool.  As long as that use case is supported, I'm happy; I just want
to make sure we're not presuming that there must be an external
client.

>> - The output format doesn't look very machine-parseable.   I really
>> think we ought to provide something that is.  Maybe a CSV-like format,
>> or maybe something else, but I don't see why someone who wants to do
>> change logging should be forced to write and install C code.  If
>> something like Bucardo can run on an unmodified system and extract
>> change-sets this way without needing a .so file, that's going to be a
>> huge win for usability.
>
> We can change the current format but I really see little to no chance of
> agreeing on a replication format that's serviceable to several solutions
> short term. Once we've gained some experience - maybe even this cycle -
> that might be different.

I don't see why you're so pessimistic about that.  I know you haven't
worked it out yet, but what makes this harder than sitting down and
designing something?

>> More generally on this patch set, if I'm going to be committing any of
>> this, I'd prefer to start with what is currently patches 3 and 4, once
>> we reach agreement on those.
>
> Sounds like a reasonable start.

Perhaps you could reshuffle the order of the series, if it's not too much work.

>> Are we hoping to get any of this committed for this CF?  If so, let's
>> make a plan to get that done; time is short.  If not, let's update the
>> CF app accordingly.
>
> I'd really like to do so. I am travelling atm, but I will be back
> tomorrow evening and will push an updated patch this weekend. The issue
> I know of in the latest patches at
> http://www.postgresql.org/message-id/20131007133232.GA15202@awork2.anarazel.de
> is renaming from http://www.postgresql.org/message-id/20131008194758.GB3718183@alap2.anarazel.de

I'm a bit nervous about the way the combo CID logging.  I would have
thought that you would emit one record per combo CID, but what you're
apparently doing is emitting one record per heap tuple that uses a
combo CID.  For some reason that feels like an abuse (and maybe kinda
inefficient, too).

Either way, I also wonder what happens if a (logical?) checkpoint
occurs between the combo CID record and the heap record to which it
refers, or how you prevent that from happening.  What if the combo CID
record is written and the transaction aborts before writing the heap
record (maybe without writing an abort record to WAL)?

What are the performance implications of this additional WAL logging?
What's the worst case?  What's the typical case?  Does it have a
noticeable overhead when wal_level < logical?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Haribabu kommi
Date:
Subject: Heavily modified big table bloat even in auto vacuum is running
Next
From: Robert Haas
Date:
Subject: Re: Patch: FORCE_NULL option for copy COPY in CSV mode