Thread: Partial vacuum versus pg_class.reltuples

Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

06 June 2009, 16:44:56

I complained a couple days ago that in HEAD, vacuum is putting
very bogus values into pg_class.reltuples for indexes:
http://archives.postgresql.org/pgsql-bugs/2009-06/msg00037.php

After looking through the code a bit, I've confirmed my prior guess that
this is caused by the partial-vacuum patch.  The number that's being
used is the number of live tuples found on the pages that were actually
scanned.  There's a kluge in there to keep from storing this number as
the relation's own reltuples, but the implications for indexes were not
considered.  The index AMs generally assume that what they're told in
IndexVacuumInfo.num_heap_tuples can be trusted, and that's not true
anymore.

I think probably the only workable solution for 8.4 is to use the prior
value of the relation's reltuples field as num_heap_tuples when we have
not scanned the whole heap.  This will effectively mean that index
reltuples values don't change either in a partial vacuum.

In the longer term, we need to do something else.  As the code now
stands, reltuples is close to being completely untrustworthy: it will
get updated only by VACUUM FULL or anti-wraparound vacuums, neither of
which will happen often in a well-run installation.  So the value will
inevitably diverge from reality, perhaps arbitrarily far from reality.
I wonder whether we shouldn't get rid of it altogether (and relpages
too), and make the planner look to the counts maintained by the stats
collector instead of using reltuples/relpages.  The main objection I can
think of to that is that turning off stats collection will no longer be
a viable option ... but how many people do that anyway?

Another interesting question is why successive vacuums aren't causing
the index reltuples counts to go to zero.  Shouldn't a partial vacuum
result in *all* pages of the relation being marked as not needing to
be examined by the next vacuum?
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

06 June 2009, 19:18:27

I wrote:
> Another interesting question is why successive vacuums aren't causing
> the index reltuples counts to go to zero.  Shouldn't a partial vacuum
> result in *all* pages of the relation being marked as not needing to
> be examined by the next vacuum?

I figured out the reason for that: the first 32 pages of the table are
always scanned, even if the whole thing is frozen, because of the
SKIP_PAGES_THRESHOLD logic.  We could change that behavior by
initializing all_visible_streak to SKIP_PAGES_THRESHOLD instead of zero.
But if we did so then having even just page zero be skippable would mean
that we clear scanned_all and thus fail to update reltuples, which is
probably not a good thing.
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Simon Riggs

Date:

07 June 2009, 13:59:34

On Sat, 2009-06-06 at 15:44 -0400, Tom Lane wrote:

> I think probably the only workable solution for 8.4 is to use the prior
> value of the relation's reltuples field as num_heap_tuples when we have
> not scanned the whole heap.  This will effectively mean that index
> reltuples values don't change either in a partial vacuum.

Our analysis in 2006 was that HOT had a very wide use case, while
partial vacuum was needed, but in far fewer cases. ISTM that most
vacuums will still be full-table vacuums, so change is probably
unwarranted and could cause a worse regression (somewhere) than the
behaviour being fixed.

> In the longer term, we need to do something else.  As the code now
> stands, reltuples is close to being completely untrustworthy: it will
> get updated only by VACUUM FULL or anti-wraparound vacuums, neither of
> which will happen often in a well-run installation.  So the value will
> inevitably diverge from reality, perhaps arbitrarily far from reality.
> I wonder whether we shouldn't get rid of it altogether (and relpages
> too), and make the planner look to the counts maintained by the stats
> collector instead of using reltuples/relpages.  The main objection I can
> think of to that is that turning off stats collection will no longer be
> a viable option ... but how many people do that anyway?

-1 for such radical change at this stage of release.



Perhaps we can recalc the new reltuples value by pro-rating the amount
of change against the previous value?

i.e. for fraction of table scanned by partial vacuum, F

new reltuples value = ((1-F) * old val)     + (F * (new val from partial scan))

So when we scan whole table the existing behaviour is preserved.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

07 June 2009, 14:08:25

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Sat, 2009-06-06 at 15:44 -0400, Tom Lane wrote:
>> In the longer term, we need to do something else.

> -1 for such radical change at this stage of release.

Uh, by "longer term" I meant this is something to think about for 8.5.
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Robert Haas

Date:

07 June 2009, 15:15:53

On Sat, Jun 6, 2009 at 3:44 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I complained a couple days ago that in HEAD, vacuum is putting
> very bogus values into pg_class.reltuples for indexes:
> http://archives.postgresql.org/pgsql-bugs/2009-06/msg00037.php
>
> After looking through the code a bit, I've confirmed my prior guess that
> this is caused by the partial-vacuum patch.  The number that's being
> used is the number of live tuples found on the pages that were actually
> scanned.  There's a kluge in there to keep from storing this number as
> the relation's own reltuples, but the implications for indexes were not
> considered.  The index AMs generally assume that what they're told in
> IndexVacuumInfo.num_heap_tuples can be trusted, and that's not true
> anymore.
>
> I think probably the only workable solution for 8.4 is to use the prior
> value of the relation's reltuples field as num_heap_tuples when we have
> not scanned the whole heap.  This will effectively mean that index
> reltuples values don't change either in a partial vacuum.

Am I wrong to be frightened by the implications of updating this value
only once in a blue moon?  Doesn't this have the potential to result
in really bad plans?  Do we have any reasonable manual way of forcing
VACUUM to scan the entire heap?

...Robert

Re: Partial vacuum versus pg_class.reltuples

From

Greg Stark

Date:

07 June 2009, 15:38:13

On Sun, Jun 7, 2009 at 7:11 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> Am I wrong to be frightened by the implications of updating this value
> only once in a blue moon?  Doesn't this have the potential to result
> in really bad plans?  Do we have any reasonable manual way of forcing
> VACUUM to scan the entire heap?

It shouldn't be necessary to scan the entire heap to get a reasonable
estimate for the number of tuples. Analyze doesn't have to, for
example. Perhaps we should just make autovacuum periodically run
analyze even if it has run a vacuum recently -- i.e. not make vacuum
count as a recent analyze.

Actually it should be possible to get a more accurate estimate out of
ANALYZE than we used to as well. It could very quickly scan the entire
FSM and use that and the average tuple size to get a much more
accurate estimate for the number of tuples.

For VACUUM ANALYZE we could have it count the actual number of tuples
in the vacuumable pages and separately take a sample of non-vacuumable
pages and calculate an estimate based on the FSM and the average tuple
size in those non-vacuumable pages and add those two values together.
Since it just looked at every vacuumable page those FSM values are
precisely accurate and the estimate for average tuple size ought to be
pretty reliable.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

07 June 2009, 16:24:21

Robert Haas <robertmhaas@gmail.com> writes:
> Am I wrong to be frightened by the implications of updating this value
> only once in a blue moon?

It's not great, but I think it's probably not catastrophic either.
Keep in mind that all we need from reltuples is that the ratio
reltuples/relpages be a reasonable estimate of the density of live
tuples, because what the planner actually uses is GetRelationNumberOfBlocks() * reltuples / relpages.
So for example an append-only table isn't a big problem, even if it's
been quite a while since we updated reltuples and relpages.

There was some mention of having a partial vacuum extrapolate a value of
reltuples and update pg_class with that.  I'm afraid that that could be
a seriously bad idea; because there is no very good reason to suppose
that the subset of recently-modified pages forms a good sample of the
whole table as far as live-tuple density goes.

[ thinks a bit and reads the code some more ... ]  There is a
considerably safer alternative, which is to let ANALYZE update the
reltuples estimate based on the pages it sampled; which should be a
considerably less biased sample than the pages a partial vacuum would
have looked at.  And we have already got the code doing that, either
in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
out to be partial.

So actually I think we are in better shape than I realized, so far as
the heap reltuples numbers go.  The case that's a bit nasty is where
we are propagating the heap reltuples number to the index reltuples
number for a GIN index.  (Remember this only matters for a partial
index.)  As the code stands today, what we'll be propagating is the
reltuples estimate from the most recent ANALYZE, not the ANALYZE that
we might be about to conduct.  This is not great; but considering that
we are completely ignoring the first-order problem of the partial index
predicate's selectivity, quibbling about a second-order effect like the
estimate being out of date is pretty pointless.

> Do we have any reasonable manual way of forcing
> VACUUM to scan the entire heap?

You could use VACUUM FREEZE, for instance.
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Robert Haas

Date:

07 June 2009, 17:08:36

On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Am I wrong to be frightened by the implications of updating this value
>> only once in a blue moon?
>
> It's not great, but I think it's probably not catastrophic either.
> Keep in mind that all we need from reltuples is that the ratio
> reltuples/relpages be a reasonable estimate of the density of live
> tuples, because what the planner actually uses is
>        GetRelationNumberOfBlocks() * reltuples / relpages.
> So for example an append-only table isn't a big problem, even if it's
> been quite a while since we updated reltuples and relpages.

My first reaction was to be relieved by this explanation, but on
further thought I'm not sure I was right to be relieved.  The
frequency of anti-wraparound vacuums is so low that it doesn't seem
inconceivable that this ratio could change considerably on a far
shorter time scale.  (For me, at least in some cases, it's less than
the frequency with which I dump+reload for maintenance reasons like OS
upgrade, PG upgrade, HW upgrade...)

> There was some mention of having a partial vacuum extrapolate a value of
> reltuples and update pg_class with that.  I'm afraid that that could be
> a seriously bad idea; because there is no very good reason to suppose
> that the subset of recently-modified pages forms a good sample of the
> whole table as far as live-tuple density goes.

I think you're right about that.

> [ thinks a bit and reads the code some more ... ]  There is a
> considerably safer alternative, which is to let ANALYZE update the
> reltuples estimate based on the pages it sampled; which should be a
> considerably less biased sample than the pages a partial vacuum would
> have looked at.  And we have already got the code doing that, either
> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
> out to be partial.

I'm confused with how this squares with the previous discussion - I
thought you observed a case where this wasn't happening.

> So actually I think we are in better shape than I realized, so far as
> the heap reltuples numbers go.  The case that's a bit nasty is where
> we are propagating the heap reltuples number to the index reltuples
> number for a GIN index.  (Remember this only matters for a partial
> index.)  As the code stands today, what we'll be propagating is the
> reltuples estimate from the most recent ANALYZE, not the ANALYZE that
> we might be about to conduct.  This is not great; but considering that
> we are completely ignoring the first-order problem of the partial index
> predicate's selectivity, quibbling about a second-order effect like the
> estimate being out of date is pretty pointless.

Being one ANALYZE behind shouldn't be too bad.

>> Do we have any reasonable manual way of forcing
>> VACUUM to scan the entire heap?
>
> You could use VACUUM FREEZE, for instance.

That'll generate a fair amount of I/O.

...Robert

Re: Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

07 June 2009, 17:19:39

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> [ thinks a bit and reads the code some more ... ] �There is a
>> considerably safer alternative, which is to let ANALYZE update the
>> reltuples estimate based on the pages it sampled; which should be a
>> considerably less biased sample than the pages a partial vacuum would
>> have looked at. �And we have already got the code doing that, either
>> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
>> out to be partial.

> I'm confused with how this squares with the previous discussion - I
> thought you observed a case where this wasn't happening.

No, the problem was that the wrong things were getting done to reltuples
entries for indexes.  The heap reltuples values were correct, but there
is code that takes the heap value into account when estimating the value
for an index, and that was all wet because it was using the number of
tuples on the scanned pages rather than any total-table estimate.

>>> Do we have any reasonable manual way of forcing
>>> VACUUM to scan the entire heap?
>> 
>> You could use VACUUM FREEZE, for instance.

> That'll generate a fair amount of I/O.

Er, isn't that exactly what you asked for?
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Robert Haas

Date:

07 June 2009, 21:32:01

On Sun, Jun 7, 2009 at 4:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> [ thinks a bit and reads the code some more ... ]  There is a
>>> considerably safer alternative, which is to let ANALYZE update the
>>> reltuples estimate based on the pages it sampled; which should be a
>>> considerably less biased sample than the pages a partial vacuum would
>>> have looked at.  And we have already got the code doing that, either
>>> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
>>> out to be partial.
>
>> I'm confused with how this squares with the previous discussion - I
>> thought you observed a case where this wasn't happening.
>
> No, the problem was that the wrong things were getting done to reltuples
> entries for indexes.  The heap reltuples values were correct, but there
> is code that takes the heap value into account when estimating the value
> for an index, and that was all wet because it was using the number of
> tuples on the scanned pages rather than any total-table estimate.
>
>>>> Do we have any reasonable manual way of forcing
>>>> VACUUM to scan the entire heap?
>>>
>>> You could use VACUUM FREEZE, for instance.
>
>> That'll generate a fair amount of I/O.
>
> Er, isn't that exactly what you asked for?

Basically, I'm trying to figure out what we're going to recommend to
someone who gets bitten by whatever remaining corner case still exists
after your recent patch, and I admit I'm not real clear on what that
is.  VACUUM FULL doesn't seem like a good solution because it's more
than just "vacuum but don't skip any pages even if the visibility map
says you can".  Maybe we should just have a GUC to enable/disable
partial vacuums.

...Robert

Re: Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

08 June 2009, 01:16:07

Robert Haas <robertmhaas@gmail.com> writes:
> Basically, I'm trying to figure out what we're going to recommend to
> someone who gets bitten by whatever remaining corner case still exists
> after your recent patch, and I admit I'm not real clear on what that
> is.

If anyone actually shows up with a clear problem, we can recommend
VACUUM FREEZE.  I'm not going to panic about it right now.
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Alvaro Herrera

Date:

08 June 2009, 11:28:02

Robert Haas escribió:

> Basically, I'm trying to figure out what we're going to recommend to
> someone who gets bitten by whatever remaining corner case still exists
> after your recent patch, and I admit I'm not real clear on what that
> is.  VACUUM FULL doesn't seem like a good solution because it's more
> than just "vacuum but don't skip any pages even if the visibility map
> says you can".  Maybe we should just have a GUC to enable/disable
> partial vacuums.

IIRC you can set vacuum_freeze_table_age to 0.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Partial vacuum versus pg_class.reltuples

From

Tom Lane

Date:

08 June 2009, 11:34:54

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Robert Haas escribi�:
>> Maybe we should just have a GUC to enable/disable
>> partial vacuums.

> IIRC you can set vacuum_freeze_table_age to 0.

That has the same effects as issuing VACUUM FREEZE, no?
        regards, tom lane

Re: Partial vacuum versus pg_class.reltuples

From

Alvaro Herrera

Date:

08 June 2009, 11:56:33

Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Robert Haas escribi�:
> >> Maybe we should just have a GUC to enable/disable
> >> partial vacuums.
> 
> > IIRC you can set vacuum_freeze_table_age to 0.
> 
> That has the same effects as issuing VACUUM FREEZE, no?

As far as I can make from the docs, I think it only forces a full table
scan, but the freeze age remains the same.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Partial vacuum versus pg_class.reltuples

From

Robert Haas

Date:

08 June 2009, 14:15:10

On Mon, Jun 8, 2009 at 10:40 AM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:
> Tom Lane escribió:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> > Robert Haas escribió:
>> >> Maybe we should just have a GUC to enable/disable
>> >> partial vacuums.
>>
>> > IIRC you can set vacuum_freeze_table_age to 0.
>>
>> That has the same effects as issuing VACUUM FREEZE, no?
>
> As far as I can make from the docs, I think it only forces a full table
> scan, but the freeze age remains the same.

Yeah, that looks like what I was looking for, thanks.  I looked for it
in the docs under the vacuum-related sections, but couldn't find it,
and I didn't know the name of it so I couldn't find it that way
either.  I wonder if we should consider moving all of the vacuum and
autovacuum parameters into one section.

http://developer.postgresql.org/pgdocs/postgres/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE

In the worst case scenario where the new partial-table-vacuums are
causing headaches for someone, they should hopefully be able to use
this parameter to basically turn them off without too many nasty side
effects.

(Another nice thing about this parameter is that if you have a really
big table that is append-mostly, you can potentially tune this
parameter downward to spread out the freezing activity.  With the
default settings, you might cruise along merrily until you hit 150M
transactions and then generate an I/O storm as you freeze a pretty big
chunk of the table all at once.  With a lower setting, non-partial
vacuums will be more frequent, but each one will generate a smaller
amount of write traffic.)

...Robert

Re: Partial vacuum versus pg_class.reltuples

From

Heikki Linnakangas

Date:

16 June 2009, 08:19:02

(back from vacation)

Tom Lane wrote:
> I wrote:
>> Another interesting question is why successive vacuums aren't causing
>> the index reltuples counts to go to zero.  Shouldn't a partial vacuum
>> result in *all* pages of the relation being marked as not needing to
>> be examined by the next vacuum?
> 
> I figured out the reason for that: the first 32 pages of the table are
> always scanned, even if the whole thing is frozen, because of the
> SKIP_PAGES_THRESHOLD logic.  We could change that behavior by
> initializing all_visible_streak to SKIP_PAGES_THRESHOLD instead of zero.
> But if we did so then having even just page zero be skippable would mean
> that we clear scanned_all and thus fail to update reltuples, which is
> probably not a good thing.

Right, that's exactly why I wrote it like that. I also thought about 
scanning the (beginning of the) visibility map first to see if there's 
big enough gaps in there to warrant skipping pages, but went with the 
current approach because it's so much simpler.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com