Re: Catalog/Metadata consistency during changeset extraction from wal - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Catalog/Metadata consistency during changeset extraction from wal
Date
Msg-id 201206222130.59842.andres@2ndquadrant.com
Whole thread Raw
In response to Re: Catalog/Metadata consistency during changeset extraction from wal  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Catalog/Metadata consistency during changeset extraction from wal  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Friday, June 22, 2012 03:22:03 PM Andres Freund wrote:
> On Thursday, June 21, 2012 05:40:08 PM Andres Freund wrote:
> > On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote:
> > > On Jun21, 2012, at 13:41 , Andres Freund wrote:
> > > > 3b)
> > > > Ensure that enough information in the catalog remains by fudging the
> > > > xmin horizon. Then reassemble an appropriate snapshot to read the
> > > > catalog as the tuple in question has seen it.
> > > 
> > > The ComboCID machinery makes that quite a bit harder, I fear. If a
> > > tuple is updated multiple times by the same transaction, you cannot
> > > decide whether a tuple was visible in a certain snapshot unless you
> > > have access to the updating backend's ComboCID hash.
> > 
> > Thats a very good point. Not sure how I forgot that.
> > 
> > It think it might be possible to reconstruct a sensible combocid mapping
> > from the walstream. Let me think about it for a while...
> 
> I have a very, very preliminary thing which seems to work somewhat. I just
> log (cmin, cmax) additionally for every modified catalog tuple into the
> wal and so far that seems to be enough.
> Do you happen to have suggestions for other problematic things to look into
> before I put more time into it?
Im continuing to play around with this. The tricky bit so far is 
subtransaction handling in transactions which modify the catalog (+ possible 
tables which are marked as being required for decoding like pg_enum 
equivalent).

Would somebody fundamentally object to one the following things:
1.
replace

#define IsMVCCSnapshot(snapshot)  \((snapshot)->satisfies == HeapTupleSatisfiesMVCC)

with something like

#define IsMVCCSnapshot(snapshot)  \((snapshot)->satisfies == HeapTupleSatisfiesMVCC || (snapshot)->satisfies == 
HeapTupleSatisfiesMVCCDuringDecode)

The define is only used sparingly and none of the code path looks so hot that 
this could make a difference.

2.
Set SnapshotNowData.satisfies to HeapTupleSatisfiesNowDuringRecovery while 
reading the catalog for decoding.

Its possible to go on without both but the faking up of data gets quite a bit 
more complex.

The problem making replacement of SnapshotNow.satisfies useful is that there is 
no convenient way to represent subtransactions of the current transaction 
which already have committed according to the TransactionLog but aren't yet 
visible at the current lsn because they only started afterwards. Its 
relatively easy to fake this in an mvcc snapshot but way harder for 
SnapshotNow because you cannot mark transactions as in-progress.

Thanks,

Andres

-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: random failing builds on spoonbill - backends not exiting...
Next
From: Tom Lane
Date:
Subject: Re: random failing builds on spoonbill - backends not exiting...