Re: logical changeset generation v3 - comparison to Postgres-R change set format - Mailing list pgsql-hackers

From Andres Freund
Subject Re: logical changeset generation v3 - comparison to Postgres-R change set format
Date
Msg-id 20121116140532.GA6505@awork2.anarazel.de
Whole thread Raw
In response to Re: logical changeset generation v3 - comparison to Postgres-R change set format  (Markus Wanner <markus@bluegap.ch>)
Responses Re: logical changeset generation v3 - comparison to Postgres-R change set format
List pgsql-hackers
Hi Markus,

On 2012-11-16 14:46:39 +0100, Markus Wanner wrote:
> On 11/15/2012 01:27 AM, Andres Freund wrote:
> > In response to this you will soon find the 14 patches that currently
> > implement $subject.
>
> Congratulations on that piece of work.

Thanks.

> I'd like to provide a comparison of the proposed change set format to
> the one used in Postgres-R.

Uh, sorry to interrupt you right here, but thats not the "proposed
format" ;) Thats just an example output plugin that people wished
for. For the use-case were after we (as in 2ndq) also want to use binary
data.  Its also rather useful for debugging and such.

I generally aggree that the presented format is too verbose for actual
replication, but it seems fine enough for showing off ;)

If you look at Patch 12/14 "Add a simple decoding module in contrib
named 'test_decoding'" you can see that adding a different output format
should be pretty straight forward.

Which output plugin is used is determined by the initial
INIT_LOGICAL_REPLICATION '$plugin'; command in a replication connection.

> To finish off this comparison, let's take a look at how and where the
> change sets are generated: in Postgres-R the change set stream is
> constructed directly from the heap modification routines, i.e. in
> heapam.c's heap_{insert,update,delete}() methods. Where as the patches
> proposed here parse the WAL to reconstruct the modifications and add the
> required meta information.
>
> To me, going via the WAL first sounded like a step that unnecessarily
> complicates matters. I recently talked to Andres and brought that up.
> Here's my current view of things:
>
> The Postgres-R approach is independent of WAL and its format, where as
> the approach proposed here clearly is not. Either way, there is a
> certain overhead - however minimal it is - which the former adds to the
> transaction processing itself, while the later postpones it to a
> separate XLogReader process. If there's any noticeable difference, it
> might reduce latency in case of asynchronous replication, but can only
> increase latency in the synchronous case. As far as I understood Andres,
> it was easier to collect the additional meta data from within the
> separate process.

There also is the point that if you do the processing inside heap_* you
need to make sure the replication targeted data is safely received &
fsynced away, in "our" case thats not necessary as WAL already provides
crash safety, so should the replication connection break you can simply
start from the location last confirmed as being safely sent.

As we want to provide asynchronous replication thats a rather major
point.

> In summary, I'd say that Postgres-R is an approach specifically
> targeting and optimized for multi-master replication between Postgres
> nodes, where as the proposed patches are kept more general.

One major aim definitely was optionally be able to replicate into just
about any target system, so yes, I certainly agree.

> I hope you found this to be an insightful and fair comparison.

Yes, input in general and especially from other replication providers is
certainly interesting and important!

Thanks,

Andres

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: WIP patch for hint bit i/o mitigation
Next
From: Amit Kapila
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL