Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 200606251852.k5PIqih25183@momjian.us
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
Heikki Linnakangas wrote:
> On Sat, 24 Jun 2006, Bruce Momjian wrote:
> 
> > Because having them be on the same page is the only way you can update
> > the page item pointer so when you recycle the row, you the indexes are
> > now pointing to the new version.  Pages look like:
> >
> >     [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
> >
> > and indexes only point to items, not to tuples.  This allows tuples to
> > be compacted on the page without affecting the indexes.
> >
> > If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
> > backends, you can modify item1 to point to tuple2, and you can mark the
> > space used by tuple1 as reusable:
> >
> >     [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]
> 
> Ok, now I think I get it. So the limitation of old and new tuple being on 
> the same page is required to make it possible to remove the old tuple 
> without touching the indexes?

Yes, modifying the heap page item pointer is required for reuse.

> If you move the new tuple (logically, by modifying item pointers) on 
> vacuum, isn't there a risk that a concurrent seqscan misses it?

Well, you lock the page while updating the item pointer.  Because the
versions are on the same page, a single page lock should be fine.

> > If you can't expire the old row because one of the indexed columns was
> > modified, I see no reason to try to reduce the additional index entries.
> 
> It won't enable early expiration, but it means less work to do on update. 
> If there's a lot of indexes, not having to add so many index tuples can be 
> a significant saving.

Already added to TODO.
* Reuse index tuples that point to heap tuples that are not visible to  anyone?

> To summarise, we have two issues related to frequent updates:
> 1. Index and heap bloat, requiring frequent vacuum.
> 2. Updates are more expensive than on other DBMSs, because we have to add 
> a new index tuple in every index, even if none of the indexed columns are 
> modified.
> 
> Tom suggested that we just improve vacuum and autovacuum, and someone 
> brought up the dead space map idea again. Those are all worthwhile things 
> to do and help with vacuuming after deletes as well as updates, but they 
> only address issue 1. Mark's suggestion (assuming that it would've 
> worked) as well as yours address both, but only for updates.

Agreed.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Jan Wieck
Date:
Subject: Re: vacuum, performance, and MVCC