Re: Partial vacuum versus pg_class.reltuples - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Partial vacuum versus pg_class.reltuples
Date
Msg-id 603c8f070906071308x2d36d23fm6b29c8236539bc0a@mail.gmail.com
Whole thread Raw
In response to Re: Partial vacuum versus pg_class.reltuples  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Partial vacuum versus pg_class.reltuples  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Am I wrong to be frightened by the implications of updating this value
>> only once in a blue moon?
>
> It's not great, but I think it's probably not catastrophic either.
> Keep in mind that all we need from reltuples is that the ratio
> reltuples/relpages be a reasonable estimate of the density of live
> tuples, because what the planner actually uses is
>        GetRelationNumberOfBlocks() * reltuples / relpages.
> So for example an append-only table isn't a big problem, even if it's
> been quite a while since we updated reltuples and relpages.

My first reaction was to be relieved by this explanation, but on
further thought I'm not sure I was right to be relieved.  The
frequency of anti-wraparound vacuums is so low that it doesn't seem
inconceivable that this ratio could change considerably on a far
shorter time scale.  (For me, at least in some cases, it's less than
the frequency with which I dump+reload for maintenance reasons like OS
upgrade, PG upgrade, HW upgrade...)

> There was some mention of having a partial vacuum extrapolate a value of
> reltuples and update pg_class with that.  I'm afraid that that could be
> a seriously bad idea; because there is no very good reason to suppose
> that the subset of recently-modified pages forms a good sample of the
> whole table as far as live-tuple density goes.

I think you're right about that.

> [ thinks a bit and reads the code some more ... ]  There is a
> considerably safer alternative, which is to let ANALYZE update the
> reltuples estimate based on the pages it sampled; which should be a
> considerably less biased sample than the pages a partial vacuum would
> have looked at.  And we have already got the code doing that, either
> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
> out to be partial.

I'm confused with how this squares with the previous discussion - I
thought you observed a case where this wasn't happening.

> So actually I think we are in better shape than I realized, so far as
> the heap reltuples numbers go.  The case that's a bit nasty is where
> we are propagating the heap reltuples number to the index reltuples
> number for a GIN index.  (Remember this only matters for a partial
> index.)  As the code stands today, what we'll be propagating is the
> reltuples estimate from the most recent ANALYZE, not the ANALYZE that
> we might be about to conduct.  This is not great; but considering that
> we are completely ignoring the first-order problem of the partial index
> predicate's selectivity, quibbling about a second-order effect like the
> estimate being out of date is pretty pointless.

Being one ANALYZE behind shouldn't be too bad.

>> Do we have any reasonable manual way of forcing
>> VACUUM to scan the entire heap?
>
> You could use VACUUM FREEZE, for instance.

That'll generate a fair amount of I/O.

...Robert


pgsql-hackers by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: pg_migrator issue with contrib
Next
From: Tom Lane
Date:
Subject: Re: Partial vacuum versus pg_class.reltuples