Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel
Date
Msg-id 201210110433.34628.andres@2ndquadrant.com
Whole thread Raw
In response to Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thursday, October 11, 2012 03:10:48 AM Robert Haas wrote:
> On Wed, Oct 10, 2012 at 7:02 PM, Peter Geoghegan <peter@2ndquadrant.com> 
wrote:
> > The purpose of ApplyCache/transaction reassembly is to reassemble
> > interlaced records, and organise them by XID, so that the consumer
> > client code sees only streams (well, lists) of records split by XID.
> 
> I think I've mentioned it before, but in the interest of not being
> seen to critique the bikeshed only after it's been painted: this
> design gives up something very important that exists in our current
> built-in replication solution, namely pipelining.  With streaming
> replication as it exists today, a transaction that modifies a huge
> amount of data (such as a bulk load) can be applied on the standby as
> it happens.  The rows thus inserted will become visible only if and
> when the transaction commits on the master and the commit record is
> replayed on the standby.  This has a number of important advantages,
> perhaps most importantly that the lag between commit and data
> visibility remains short.  With the proposed system, we can't start
> applying the changes until the transaction has committed and the
> commit record has been replayed, so a big transaction is going to have
> a lot of apply latency.
I don't think there is a fundamental problem here, just an incremental ones. 

The major problems are:
* transactions with DDL & DML currently need to be reassembled, it might be 
possible to resolve this though, haven't thought about it too much
* subtransaction are only assigned to toplevel transactions at commit time
* you need a variable amount of backends/parallel transactions open at the 
target system to apply all the transactions concurrently. You can't smash them 
together because one of them might rollback.

All of those seem solveable to me, so I am not too worried about addition of a 
streaming mode somewhere down the line. I don't want to focus on it right now 
though. Ok?

> Here's you making the same point in different words:
> > Applycache is presumably where you're going to want to spill
> > transaction streams to disk, eventually. That seems like a
> > prerequisite to commit.
> 
> Second, crash recovery.  I think whatever we put in place here has to
> be able to survive a crash on any node.  Decoding must be able to
> restart successfully after a system crash, and it has to be able to
> apply exactly the set of transactions that were committed but not
> applied prior to the crash.  Maybe an appropriate mechanism for this
> already exists or has been discussed, but I haven't seen it go by;
> sorry if I have missed the boat.
I have discussed it privately & roughly prototyped, but not publically. There 
are two pieces to this:
1) restartable after a crash/disconnection/shutdown
2) pick of exactly where it stopped

Those are somewhat different because 1) is relevant on the source side and be 
solved there. 2) depends on the target system because it needs to ensure that 
it safely received the changes up to some point.

The idea for 1) is to serialize the applycache whenever we reach a checkpoint 
and have that as a starting point for every confirmed flush location of 2).

Obviously 2) will need cooperation by the receiving side.

> > You consider this to be a throw-away function that won't ever be
> > committed. However, I strongly feel that you should move it into
> > /contrib, so that it can serve as a sort of reference implementation
> > for authors of decoder client code, in the same spirit as numerous
> > existing contrib modules (think contrib/spi).
> 
> Without prejudice to the rest of this review which looks quite
> well-considered, I'd like to add a particular +1 to this point.
So were in violent agreement here ;)

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Warnings from fwrite() in git head
Next
From: Andres Freund
Date:
Subject: Re: [PATCH 8/8] Introduce wal decoding via catalog timetravel