Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: vacuum, performance, and MVCC
Date
Msg-id Pine.OSF.4.61.0606250954090.124217@kosh.hut.fi
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Bruce Momjian <bruce@momjian.us>)
Responses Re: vacuum, performance, and MVCC  (Bruce Momjian <bruce@momjian.us>)
Re: vacuum, performance, and MVCC  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
On Sat, 24 Jun 2006, Bruce Momjian wrote:

> Because having them be on the same page is the only way you can update
> the page item pointer so when you recycle the row, you the indexes are
> now pointing to the new version.  Pages look like:
>
>     [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
>
> and indexes only point to items, not to tuples.  This allows tuples to
> be compacted on the page without affecting the indexes.
>
> If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
> backends, you can modify item1 to point to tuple2, and you can mark the
> space used by tuple1 as reusable:
>
>     [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]

Ok, now I think I get it. So the limitation of old and new tuple being on 
the same page is required to make it possible to remove the old tuple 
without touching the indexes?

If you move the new tuple (logically, by modifying item pointers) on 
vacuum, isn't there a risk that a concurrent seqscan misses it?

> If you can't expire the old row because one of the indexed columns was
> modified, I see no reason to try to reduce the additional index entries.

It won't enable early expiration, but it means less work to do on update. 
If there's a lot of indexes, not having to add so many index tuples can be 
a significant saving.

To summarise, we have two issues related to frequent updates:
1. Index and heap bloat, requiring frequent vacuum.
2. Updates are more expensive than on other DBMSs, because we have to add 
a new index tuple in every index, even if none of the indexed columns are 
modified.

Tom suggested that we just improve vacuum and autovacuum, and someone 
brought up the dead space map idea again. Those are all worthwhile things 
to do and help with vacuuming after deletes as well as updates, but they 
only address issue 1. Mark's suggestion (assuming that it would've 
worked) as well as yours address both, but only for updates.

- Heikki


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC