Re: Future In-Core Replication - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: Future In-Core Replication
Date
Msg-id CAFNqd5U-6=i7+p_NZOSiXJu39VsA_3JH3+hRvVoaWhCrjfWXyQ@mail.gmail.com
Whole thread Raw
In response to Re: Future In-Core Replication  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Future In-Core Replication
List pgsql-hackers
On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> What I'm hoping to do is to build a basic prototype of logical
> replication using WAL translation, so we can inspect it to see what
> the downsides are. It's an extremely non-trivial problem and so I
> expect there to be mountains to climb. There are other routes to
> logical replication, with messages marshalled in a similar way to
> Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
> measurements to be made and discussions to be had.

I'll note that the latest version of Slony, expected to be 2.2 (which
generally seems to work, but we're stuck at the moment waiting to get
free cycles to QA it) has made a substantial change to its data
representation.

The triggers used to cook data into a sort of "fractional WHERE
clause," transforming an I/U/D into a string that you'd trivially
combine with the string INSERT INTO/UPDATE/DELETE FROM to get the
logical update.  If there was need to do anything fancier, you'd be
left having to have a "fractional SQL parser" to split the data out by
hand.

New in 2.2 is that the log data is split out into an array of text
values which means that if someone wanted to do some transformation,
such as filtering on value, or filtering out columns, they could
modify the application-of-updates code to query for the data that they
want to fiddle with.  No parser needed.

It's doubtless worthwhile to take a peek at that to make sure it
informs your data representation appropriately.  It's important to
have data represented in a fashion that is amenable to manipulation,
and that decidedly wasn't the case pre-2.2.

I wonder if a meaningful transport mechanism might involve combining:
a) A trigger that indicates that some data needs to be captured in a
"logical" form (rather than the presently pretty purely physical form
of WAL)
b) Perhaps a way of capturing logical updates in WAL
c) One of the old ideas that fell through was to try to capture commit
timestamps via triggers.  Doing it directly turned out to be too
controversial to get in.  Perhaps that's something that could be
captured via some process that parses WAL.

Something seems wrong about that in that it mixes together updates of
multiple forms into WAL, physical *and* logical, and perhaps that
implies that there should be an altogether separate "logical updates
log." (LUL?)  That still involves capturing updates in a duplicative
fashion, e.g. - WAL + LUL, which seems somehow wrong.  Or perhaps I'm
tilting at a windmill here.  With Slony/Londiste/Bucardo, we're
capturing "LUL" in some tables, meaning that it gets written both to
the tables' data files as well as WAL.  Adding a binary LUL eliminates
those table files and attendant WAL updates, thus providing some
savings.

[Insert a LULCATS joke here...]

Perhaps I've just had too much coffee...
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: xReader, double-effort
Next
From: Tom Lane
Date:
Subject: Re: plpython crash (PG 92)