Re: Future In-Core Replication - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Future In-Core Replication
Date
Msg-id 1335642042.3919.53.camel@hvost
Whole thread Raw
In response to Re: Future In-Core Replication  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Future In-Core Replication  (Hannu Krosing <hannu@2ndQuadrant.com>)
Re: Future In-Core Replication  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sat, 2012-04-28 at 09:36 +0100, Simon Riggs wrote:
> On Fri, Apr 27, 2012 at 11:50 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
> > On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> What I'm hoping to do is to build a basic prototype of logical
> >> replication using WAL translation, so we can inspect it to see what
> >> the downsides are. It's an extremely non-trivial problem and so I
> >> expect there to be mountains to climb. There are other routes to
> >> logical replication, with messages marshalled in a similar way to
> >> Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
> >> measurements to be made and discussions to be had.
> >
> > I'll note that the latest version of Slony ...has made a substantial change to its data
> > representation....
> 
> The basic model I'm working to is that "logical replication" will ship
> Logical Change Records (LCRs) using the same transport mechanism that
> we built for WAL.

One outcome of this LCR approach is probably that you will be shipping
changes as they are made and on slave have to either apply them in N
parallel transactions and commit each transaction when the LCR for the
corresponding transaction says so, or you have to collect the LCR-s
before applying and then apply and commit committed-on-master
transactions in commit order and throw away the aborted ones.

The optimal approach will probably be some combination of these, that is
collect and apply short ones, start replay in separate transaction if
commit does not arrive in N ms.

As to what LCRs should contain, it will probably be locical equivalents
of INSERT, UPDATE ... LIMIT 1, DELETE ... LIMIT 1, TRUNCATE and all DDL.

The DDL could actually stay "raw" (as in LCRs for system tables) on
generator side as hopefully the rule that system tables cant have
triggers" does not apply when generating the LCR-s on WAL path.
If we need to go back to ALTER TABLE ... commands, then this is probably
the wisest to leave for client. Client here could also be some xReader
like middleman.

I would even go as far as propose a variant for DML-WITH-LIMIT-1 to be
added to postgresql's SQL syntax so that the LCRs could be converted to
SQL text for some tasks and thus should be easy to process using generic
text-based tools.
The DML-WITH-LIMIT-1 is required to do single logical updates on tables
with non-unique rows.
And as for any logical updates we will have huge performance problem
when doing UPDATE or DELETE on large table with no indexes, but
fortunately this problem is on slave, not master ;)

Generating and shipping the LCR-s at WAL-generation time or perhaps even
a bit earlier will have a huge performance benefit of not doing double
writing of captured events on the master which currently is needed for
several reasons, the main one being the determining of which
transactions do commit and in what order. (this cant be solved on master
without a local event log table as we dont have commit/rollback
triggers)

If we delegate that part out of the master then this alone enables us to
be almost as fast as WAL based replica in most cases, even when we have
different logical structure on slaves.

> How the LCRs are produced and how they are applied is a subject for
> debate and measurement. We're lucky enough to have a variety of
> mechanisms to compare, Slony 1.0/2.0, Slony 2.2/Londiste/Bucardo and
> its worth adding WAL translation there also. My initial thought is
> that WAL translation has many positive aspects to it and we are
> investigating. There are also some variants on those themes, such as
> the one you discussed above.
> 
> You probably won't recognise this as such, but I hope that people
> might see that I'm hoping to build Slony 3.0, Londiste++ etc. At some
> point, we'll all say "thats not Slony", but we'll also say (Josh
> already did) "thats not binary replication". But it will be the
> descendant of all.

If we get efficient and flexible logical change event generation on the
master, then I'm sure the current trigger-based logical replication
providers will switch (for full replication) or at least add and extra
LCR-source . It may still make sense to leave some flexibility to the
master side, so the some decisions - possibly even complex ones - could
be made when generating the LCR-s

What I would like is to have some of it exposed to userspace via
function which could be used by developers to push their own LCRs.

As metioned above, significant part of this approach can be prototyped
from user-level triggers as soon as we have triggers on commit and
rollback , even though at a slightly reduced performance. That is it
will still have the trigger overhead, but we can omit all the extra
writing and then re-reading and event-table management on the master. 

Wanting to play with Streaming Logical Replication (as opposed to
current Chunked Logical Replication) is also one of the reasons that I
complained when the "command triggers" patch was kicked out from 9.2.

> Backwards compatibility is not a goal, please note, but only because
> that will complicate matters intensely.

Currently there really is nothing similar enough this could be backward
compatible to :)

> -- 
>  Simon Riggs                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Next
From: Hannu Krosing
Date:
Subject: Re: Future In-Core Replication