Re: [HACKERS] lazy_update_relstats considered harmful (was Re: - Mailing list pgsql-performance

From Matthew T. O'Connor
Subject Re: [HACKERS] lazy_update_relstats considered harmful (was Re:
Date
Msg-id 42447EBB.20905@zeut.net
Whole thread Raw
In response to lazy_update_relstats considered harmful (was Re: pg_autovacuum not having enough suction ?)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:

>I wrote:
>
>
>>One thing that is possibly relevant here is that in 8.0 a plain VACUUM
>>doesn't set reltuples to the exactly correct number, but to an
>>interpolated value that reflects our estimate of the "steady state"
>>average between vacuums.  I wonder if that code is wrong, or if it's
>>operating as designed but is confusing autovac.
>>
>>
>
>Now that I think it over, I'm thinking that I must have been suffering
>severe brain fade the day I wrote lazy_update_relstats() (see
>vacuumlazy.c).  The numbers that that routine is averaging are the pre-
>and post-vacuum physical tuple counts.  But the difference between them
>consists of known-dead tuples, and we shouldn't be factoring dead tuples
>into reltuples.  The planner has always considered reltuples to count
>only live tuples, and I think this is correct on two grounds:
>
>1. The numbers of tuples estimated to be returned by scans certainly
>shouldn't count dead ones.
>
>2. Dead tuples don't have that much influence on scan costs either, at
>least not once they are marked as known-dead.  Certainly they shouldn't
>be charged at full freight.
>
>It's possible that there'd be some value in adding a column to pg_class
>to record dead tuple count, but given what we have now, the calculation
>in lazy_update_relstats is totally wrong.
>
>The idea I was trying to capture is that the tuple density is at a
>minimum right after VACUUM, and will increase as free space is filled
>in until the next VACUUM, so that recording the exact tuple count
>underestimates the number of tuples that will be seen on-the-average.
>But I'm not sure that idea really holds water.  The only way that a
>table can be at "steady state" over a long period is if the number of
>live tuples remains roughly constant (ie, inserts balance deletes).
>What actually increases and decreases over a VACUUM cycle is the density
>of *dead* tuples ... but per the above arguments this isn't something
>we should adjust reltuples for.
>
>So I'm thinking lazy_update_relstats should be ripped out and we should
>go back to recording just the actual stats.
>
>Sound reasonable?  Or was I right the first time and suffering brain
>fade today?
>


pgsql-performance by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Delete query takes exorbitant amount of time
Next
From: Tom Lane
Date:
Subject: Re: Delete query takes exorbitant amount of time