Re: CLUSTER and MVCC - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: CLUSTER and MVCC
Date
Msg-id 45F18AFC.2010209@enterprisedb.com
Whole thread Raw
In response to Re: CLUSTER and MVCC  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: CLUSTER and MVCC  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Is there a particular reason why CLUSTER isn't MVCC-safe? It seems to me 
>> that it would be trivial to fix, by using SnapshotAny instead of 
>> SnapshotNow, and not overwriting the xmin/xmax with the xid of the 
>> cluster command.
> 
> The reason it's not trivial is that you also have to preserve the t_ctid
> links of update chains.  If you look into VACUUM FULL, a very large part
> of its complexity is that it moves update chains as a unit to make that
> possible.  (BTW, I believe the problem Pavan Deolasee reported yesterday
> is a bug somewhere in there --- it looks to me like sometimes the same
> update chain is getting copied multiple times.)

Ah, that's it. Thanks.

The easiest solution I can think of is to skip newer versions of updated 
rows when scanning the old relation, and to fetch and copy all tuples in 
the update chain to the new relation whenever you encounter the first 
tuple in the chain.

To get a stable view of what's the first tuple in chain, you need to get 
the oldest xmin once at the beginning, and use that throughout the 
operation. Since we take an exclusive lock on the table, no-one can 
insert new updated tuples during the operation, and all updaters are 
finished before the lock is granted.

Those tuples wouldn't be in the cluster order, though, but that's not a 
big deal.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Interaction of PITR backups and Bulk operations avoiding WAL
Next
From: "Simon Riggs"
Date:
Subject: Re: Interaction of PITR backups and Bulk operationsavoiding WAL