Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 200606241742.k5OHg0b07414@momjian.us
Whole thread Raw
In response to vacuum, performance, and MVCC  ("Mark Woodward" <pgsql@mohawksoft.com>)
Responses Re: vacuum, performance, and MVCC  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
bruce wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > I think at some point we have to admit that _polling_ the tables, which
> > > is what autovacuum does, just isn't going to work well, no matter how
> > > much it is tweeked, and another approach should be considered for
> > > certain workload cases.
> > 
> > Autovacuum polls in its current, first-generation implementation;
> > what I said upthread was it needs to be smarter than that.  I am not
> > sure how you get from that to the conclusion that the very next step
> > is to abandon the vacuuming approach altogether.
> 
> I am not ready to abandon autovacuum, but as I stated later the UPDATE
> with no key change case is common enought that it could be handled
> better without involving autovacuum and its limitations.
> 
> As I remember, most databases have problem with DELETE/INSERT cycles,
> but we seem to be hit by UPDATE performance more than most, and more
> than is wise.

In an attempt to get some concrete on these ideas...  ;-)

I think the major complexity in doing an in-place UPDATE when no key
columns change is allowing rollback on transaction abort (or backend
crash), and properly handling visibility for transactions in progress.

If the old and new rows are on the same heap page (perhaps a necessary
limitation), you can easily update the heap item id to point to the new
heap row.  All indexes will then point to the new row, and sequential
scans will still see both rows (which is good).  That leave the rollback
issue (undoing the item id change), and having index scans for current
backends still see the old row.

OK, I have an idea.  Right now, an UPDATE where the old and new rows are
on the same page have two index entries.  What if we made only one index
entry for both?  We already have UPDATE chaining, where the old row
points to the new one.  If no key columns change, we set a bit in the
heap that the chaining points to the old and new row (both on the same
page), so an index scan uses one index entry to see the old and new row,
and once the old row is no longer visible, the page index id can be
updated to point to the new row and the old row can be marked free and
perhaps used for a future UPDATE.  (UPDATE already tries to do keep
updates on the same heap page.)

FYI, the reason heap cleanup is possible once you go with a single index
entry for old and new rows is because there is no index cleanup (and
single-row index cleanup is very expensive).

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: Gist does not build with VC++ anymore
Next
From: "Mark Woodward"
Date:
Subject: Re: vacuum, performance, and MVCC