Home > mailing lists

Re: CLUSTER and MVCC - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: CLUSTER and MVCC
Date	March 12, 2007 19:22:26
Msg-id	45F5D295.8090205@enterprisedb.com Whole thread Raw
In response to	Re: CLUSTER and MVCC (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses	Re: CLUSTER and MVCC Re: CLUSTER and MVCC
List	pgsql-hackers

Tree view

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> The reason it's not trivial is that you also have to preserve the t_ctid
>> links of update chains.  If you look into VACUUM FULL, a very large part
>> of its complexity is that it moves update chains as a unit to make that
>> possible.  (BTW, I believe the problem Pavan Deolasee reported yesterday
>> is a bug somewhere in there --- it looks to me like sometimes the same
>> update chain is getting copied multiple times.)
> 
> Ah, that's it. Thanks.
> 
> The easiest solution I can think of is to skip newer versions of updated 
> rows when scanning the old relation, and to fetch and copy all tuples in 
> the update chain to the new relation whenever you encounter the first 
> tuple in the chain.
> 
> To get a stable view of what's the first tuple in chain, you need to get 
> the oldest xmin once at the beginning, and use that throughout the 
> operation. Since we take an exclusive lock on the table, no-one can 
> insert new updated tuples during the operation, and all updaters are 
> finished before the lock is granted.

I've been thinking about this some more, and I think the above would 
work. The tricky part is to recognize the first tuple in a chain, given 
the recent bug in vacuum full.

In each chain, there must be at least one non-dead tuple with xmin < 
Oldestxmin. Otherwise we're already missing tuples; there would be no 
version visible to a transaction with xid between OldestXmin and the min 
xmin present in the chain. That tuple is the root of the chain.

There might be a dead tuple in the middle of the chain. Dead tuples have 
xmax < OldestXmin, which means there must be another non-dead tuple in 
the chain with xmax >= OldestXmin. For cluster's purposes, that another 
non-dead tuple is also considered as a root. Dead tuples and chains 
ending in a dead tuples don't need to be stored in the new table.

This picture helped me a lot: 
http://community.enterprisedb.com/updatechain.gif

Arrows represent tuples, beginning at xmin and ending at xmax. 
OldestXmin is represented by the vertical bar, everything to the left is 
smaller than OldestXmin and everything to the right is larger than 
OldestXmin. The chain must begin with a tuple with xmin on the left side 
of OldestXmin, and the last tuple in the chain must end on the right side.

Does anyone see a problem with this? If not, I'm going to write a patch.

One potential issue I'm seeing is that if we rely on the unbroken chain 
starting from < OldestXmin, and that tuple isn't there because of a bug, 
for example, the later version of the tuple is skipped and the row is lost.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Alvaro Herrera
Date: 12 March 2007, 17:18:19
Subject: Re: autovacuum next steps, take 3

From: Josh Berkus
Date: 12 March 2007, 19:50:52
Subject: Inconsistent behavior on select * from void_function()?

Re: CLUSTER and MVCC - Mailing list pgsql-hackers

Previous

Next