Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 200606252119.k5PLJYN20108@momjian.us
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
Hannu Krosing wrote:
> ?hel kenal p?eval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian:
> > Jan Wieck wrote:
> > > >> Sure, but index reuse seems a lot easier, as there is nothing additional
> > > >> to remember or clean out when doing it.
> > > > 
> > > > Yes, seems so.  TODO added:
> > > > 
> > > >     * Reuse index tuples that point to heap tuples that are not visible to
> > > >       anyone?
> > > > 
> > > >> When reusing a heap tuple you have to clean out all index entries
> > > >> pointing to it.
> > > > 
> > > > Well, not for UPDATE for no key changes on the same page, if we do that.
> > > > 
> > > 
> > > An update that results in all the same values of every indexed column of 
> > > a known deleted invisible tuple. This reused tuple can by definition not 
> > > be the one currently updated. So unless it is a table without a primary 
> > > key, this assumes that at least 3 versions of the same row exist within 
> > > the same block. How likely is that to happen?
> > 
> > Good question.  You take the current tuple, and make another one on the
> > same page.  Later, an update can reuse the original tuple if it is no
> > longer visible to anyone (by changing the item id), so you only need two
> > tuples, not three.  My hope is that a repeated update would eventually
> > move to a page that enough free space for two (or more) versions.
> 
> I can confirm that this is exactly what happens when running an
> update-heavy load with frequent vacuums. Eventually most rows get their
> own db pages or share the same page with 2-3 rows. And there will be
> lots of unused (filed up, or cleaned and not yet reused) pages.

Right, that was my guess because heavily updated rows start to move
around in the table, and because UPDATE tries to stay on the same page,
once it the row hits a mostly-empty page, it stays there.

> The overall performance could be made a little better by tuning the
> system to not put more than N new rows on the same page at initial
> insert or when the row move to a new page during update. Currently
> several new rows are initially put on the same page and then move around
> during repeated updates until they slow(ish)ly claim their own page.

We have a fillfactor patch that will be in 8.2.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: "Diogo Biazus"
Date:
Subject: Re: xlog viewer proposal