Re: CLUSTER and MVCC - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: CLUSTER and MVCC
Date
Msg-id 45F962F3.2020804@enterprisedb.com
Whole thread Raw
In response to Re: CLUSTER and MVCC  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> I'm thinking of keeping an in-memory mapping of old and new tids of 
>> updated tuples while clustering, instead. That means that cluster 
>> requires a little bit of memory for each RECENTLY_DEAD updated tuple. In 
>> the worst case that means that you run out of memory if there's too many 
>> of those in the table, but I doubt that's going to be a problem in practice.
> 
> That is more or less isomorphic to what VACUUM FULL does.  While people
> have complained about VACUUM FULL's memory usage on occasion, just at
> the moment I feel that the main problem with it is complexity.  If we
> still haven't gotten all the bugs out of VACUUM FULL after more than
> eight years of work on it, what are the odds that we can make CLUSTER
> do it right the first time?

Well, I can't guarantee that there's no bugs.

To copy a chain correctly, we need to correctly detect tuples that have 
a t_ctid pointing to a non-dead tuple (non-dead meaning 
HeapTupleSatisfiesVacuum(tuple) != DEAD), and tuples that are being 
pointed to by a non-dead tuple. If we incorrectly detect that a tuple 
belongs to either of those categories, when in fact it doesn't, we don't 
corrupt anything, but we waste a little bit of memory memorizing the 
tuple unnecessarily.

To detect tuples in the first category, we need to check that xmax of 
the tuple isn't invalid, and t_ctid doesn't point to itself.

To detect tuples in the second category, we need to check that xmin 
isn't invalid, and is greater than OldestXmin.

With both categories correctly identified, it's just a matter of mapping 
old ctids to corresponding tids in the new heap.

Unlike in my first proposal, if something nevertheless goes wrong in 
detecting the chains, we only lose the chaining between the tuples, but 
we don't otherwise lose any data. The latest version of each row is fine 
anyway. I think this approach is pretty robust, and it fails in a good way.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CLUSTER and MVCC
Next
From: "Joshua D. Drake"
Date:
Subject: Re: [PATCHES] Bitmapscan changes