Partial vacuum versus pg_class.reltuples - Mailing list pgsql-hackers

From Tom Lane
Subject Partial vacuum versus pg_class.reltuples
Date
Msg-id 14616.1244317490@sss.pgh.pa.us
Whole thread Raw
Responses Re: Partial vacuum versus pg_class.reltuples  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Partial vacuum versus pg_class.reltuples  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Partial vacuum versus pg_class.reltuples  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I complained a couple days ago that in HEAD, vacuum is putting
very bogus values into pg_class.reltuples for indexes:
http://archives.postgresql.org/pgsql-bugs/2009-06/msg00037.php

After looking through the code a bit, I've confirmed my prior guess that
this is caused by the partial-vacuum patch.  The number that's being
used is the number of live tuples found on the pages that were actually
scanned.  There's a kluge in there to keep from storing this number as
the relation's own reltuples, but the implications for indexes were not
considered.  The index AMs generally assume that what they're told in
IndexVacuumInfo.num_heap_tuples can be trusted, and that's not true
anymore.

I think probably the only workable solution for 8.4 is to use the prior
value of the relation's reltuples field as num_heap_tuples when we have
not scanned the whole heap.  This will effectively mean that index
reltuples values don't change either in a partial vacuum.

In the longer term, we need to do something else.  As the code now
stands, reltuples is close to being completely untrustworthy: it will
get updated only by VACUUM FULL or anti-wraparound vacuums, neither of
which will happen often in a well-run installation.  So the value will
inevitably diverge from reality, perhaps arbitrarily far from reality.
I wonder whether we shouldn't get rid of it altogether (and relpages
too), and make the planner look to the counts maintained by the stats
collector instead of using reltuples/relpages.  The main objection I can
think of to that is that turning off stats collection will no longer be
a viable option ... but how many people do that anyway?

Another interesting question is why successive vacuums aren't causing
the index reltuples counts to go to zero.  Shouldn't a partial vacuum
result in *all* pages of the relation being marked as not needing to
be examined by the next vacuum?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Grzegorz Jaskiewicz
Date:
Subject: Re: pg_migrator issue with contrib
Next
From: Andrew Dunstan
Date:
Subject: Re: PostgreSQL Developer meeting minutes up