Thread: lazy_update_relstats considered harmful (was Re: [PERFORM] pg_autovacuum not having enough suction ?)

I wrote:
> One thing that is possibly relevant here is that in 8.0 a plain VACUUM
> doesn't set reltuples to the exactly correct number, but to an
> interpolated value that reflects our estimate of the "steady state"
> average between vacuums.  I wonder if that code is wrong, or if it's
> operating as designed but is confusing autovac.

Now that I think it over, I'm thinking that I must have been suffering
severe brain fade the day I wrote lazy_update_relstats() (see
vacuumlazy.c).  The numbers that that routine is averaging are the pre-
and post-vacuum physical tuple counts.  But the difference between them
consists of known-dead tuples, and we shouldn't be factoring dead tuples
into reltuples.  The planner has always considered reltuples to count
only live tuples, and I think this is correct on two grounds:

1. The numbers of tuples estimated to be returned by scans certainly
shouldn't count dead ones.

2. Dead tuples don't have that much influence on scan costs either, at
least not once they are marked as known-dead.  Certainly they shouldn't
be charged at full freight.

It's possible that there'd be some value in adding a column to pg_class
to record dead tuple count, but given what we have now, the calculation
in lazy_update_relstats is totally wrong.

The idea I was trying to capture is that the tuple density is at a
minimum right after VACUUM, and will increase as free space is filled
in until the next VACUUM, so that recording the exact tuple count
underestimates the number of tuples that will be seen on-the-average.
But I'm not sure that idea really holds water.  The only way that a
table can be at "steady state" over a long period is if the number of
live tuples remains roughly constant (ie, inserts balance deletes).
What actually increases and decreases over a VACUUM cycle is the density
of *dead* tuples ... but per the above arguments this isn't something
we should adjust reltuples for.

So I'm thinking lazy_update_relstats should be ripped out and we should
go back to recording just the actual stats.

Sound reasonable?  Or was I right the first time and suffering brain
fade today?

            regards, tom lane

Re: lazy_update_relstats considered harmful (was Re: [PERFORM]

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:

>I wrote:
>
>
>>One thing that is possibly relevant here is that in 8.0 a plain VACUUM
>>doesn't set reltuples to the exactly correct number, but to an
>>interpolated value that reflects our estimate of the "steady state"
>>average between vacuums.  I wonder if that code is wrong, or if it's
>>operating as designed but is confusing autovac.
>>
>>
>
>Now that I think it over, I'm thinking that I must have been suffering
>severe brain fade the day I wrote lazy_update_relstats() (see
>vacuumlazy.c).  The numbers that that routine is averaging are the pre-
>and post-vacuum physical tuple counts.  But the difference between them
>consists of known-dead tuples, and we shouldn't be factoring dead tuples
>into reltuples.  The planner has always considered reltuples to count
>only live tuples, and I think this is correct on two grounds:
>
>1. The numbers of tuples estimated to be returned by scans certainly
>shouldn't count dead ones.
>
>2. Dead tuples don't have that much influence on scan costs either, at
>least not once they are marked as known-dead.  Certainly they shouldn't
>be charged at full freight.
>
>It's possible that there'd be some value in adding a column to pg_class
>to record dead tuple count, but given what we have now, the calculation
>in lazy_update_relstats is totally wrong.
>
>The idea I was trying to capture is that the tuple density is at a
>minimum right after VACUUM, and will increase as free space is filled
>in until the next VACUUM, so that recording the exact tuple count
>underestimates the number of tuples that will be seen on-the-average.
>But I'm not sure that idea really holds water.  The only way that a
>table can be at "steady state" over a long period is if the number of
>live tuples remains roughly constant (ie, inserts balance deletes).
>What actually increases and decreases over a VACUUM cycle is the density
>of *dead* tuples ... but per the above arguments this isn't something
>we should adjust reltuples for.
>
>So I'm thinking lazy_update_relstats should be ripped out and we should
>go back to recording just the actual stats.
>
>Sound reasonable?  Or was I right the first time and suffering brain
>fade today?
>


Re: lazy_update_relstats considered harmful (was Re: [PERFORM]

From
Simon Riggs
Date:
On Fri, 2005-03-25 at 15:22 -0500, Tom Lane wrote:
> 2. Dead tuples don't have that much influence on scan costs either, at
> least not once they are marked as known-dead.  Certainly they shouldn't
> be charged at full freight.

Yes, minor additional CPU time, but the main issue is when the dead
tuples force additional I/O.

> It's possible that there'd be some value in adding a column to pg_class
> to record dead tuple count, but given what we have now, the calculation
> in lazy_update_relstats is totally wrong.

Yes, thats the way. We can record the (averaged?) dead tuple count, but
also record the actual row count in reltuples.

We definitely need to record the physical and logical tuple counts,
since each of them have different contributions to run-times.

For comparing seq scan v index, we need to look at the physical tuples
count * avg row size, whereas when we calculate number of rows returned
we should look at fractions of the logical row count.

> The idea I was trying to capture is that the tuple density is at a
> minimum right after VACUUM, and will increase as free space is filled
> in until the next VACUUM, so that recording the exact tuple count
> underestimates the number of tuples that will be seen on-the-average.
> But I'm not sure that idea really holds water.  The only way that a
> table can be at "steady state" over a long period is if the number of
> live tuples remains roughly constant (ie, inserts balance deletes).
> What actually increases and decreases over a VACUUM cycle is the density
> of *dead* tuples ... but per the above arguments this isn't something
> we should adjust reltuples for.
>
> So I'm thinking lazy_update_relstats should be ripped out and we should
> go back to recording just the actual stats.
>
> Sound reasonable?  Or was I right the first time and suffering brain
> fade today?

Well, I think the original idea had some validity, but clearly
lazy_update_relstats isn't the way to do it even though we thought so at
the time.

Best Regards, Simon Riggs



Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2005-03-25 at 15:22 -0500, Tom Lane wrote:
>> 2. Dead tuples don't have that much influence on scan costs either, at
>> least not once they are marked as known-dead.  Certainly they shouldn't
>> be charged at full freight.

> Yes, minor additional CPU time, but the main issue is when the dead
> tuples force additional I/O.

I/O costs are mostly estimated off relpages, though, not reltuples.
The only time you really pay through the nose for a dead tuple is when
an indexscan visits it, but with the known-dead marking we now do in
btree indexes, I'm pretty sure that path is seldom taken.

>> It's possible that there'd be some value in adding a column to pg_class
>> to record dead tuple count, but given what we have now, the calculation
>> in lazy_update_relstats is totally wrong.

> Yes, thats the way. We can record the (averaged?) dead tuple count, but
> also record the actual row count in reltuples.

What I'd be inclined to record is the actual number of dead rows removed
by the most recent VACUUM.  Any math on that is best done in the
planner, since we can change the logic more easily than the database
contents.  It'd probably be reasonable to take half of that number as
the estimate of the average number of dead tuples.

But in any case, that's for the future; we can't have it in 8.0.*, and
right at the moment I'm focusing on what to push out for 8.0.2.

> We definitely need to record the physical and logical tuple counts,
> since each of them have different contributions to run-times.

There isn't any difference, if you are talking about fully dead tuples.
It would be possible for VACUUM to also count the number of
not-committed-but-not-removable tuples (ie, new from still-open
transactions, plus dead-but-still-visible-to-somebody), but I'm not sure
that it would be useful to do so, because that sort of count is hugely
transient.  The stat would be irrelevant moments after it was taken.

            regards, tom lane