Re: lazy_update_relstats considered harmful (was Re: [PERFORM] pg_autovacuum not having enough suction ?) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: lazy_update_relstats considered harmful (was Re: [PERFORM] pg_autovacuum not having enough suction ?)
Date
Msg-id 14073.1111790158@sss.pgh.pa.us
Whole thread Raw
In response to Re: lazy_update_relstats considered harmful (was Re: [PERFORM]  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2005-03-25 at 15:22 -0500, Tom Lane wrote:
>> 2. Dead tuples don't have that much influence on scan costs either, at
>> least not once they are marked as known-dead.  Certainly they shouldn't
>> be charged at full freight.

> Yes, minor additional CPU time, but the main issue is when the dead
> tuples force additional I/O.

I/O costs are mostly estimated off relpages, though, not reltuples.
The only time you really pay through the nose for a dead tuple is when
an indexscan visits it, but with the known-dead marking we now do in
btree indexes, I'm pretty sure that path is seldom taken.

>> It's possible that there'd be some value in adding a column to pg_class
>> to record dead tuple count, but given what we have now, the calculation
>> in lazy_update_relstats is totally wrong.

> Yes, thats the way. We can record the (averaged?) dead tuple count, but
> also record the actual row count in reltuples.

What I'd be inclined to record is the actual number of dead rows removed
by the most recent VACUUM.  Any math on that is best done in the
planner, since we can change the logic more easily than the database
contents.  It'd probably be reasonable to take half of that number as
the estimate of the average number of dead tuples.

But in any case, that's for the future; we can't have it in 8.0.*, and
right at the moment I'm focusing on what to push out for 8.0.2.

> We definitely need to record the physical and logical tuple counts,
> since each of them have different contributions to run-times.

There isn't any difference, if you are talking about fully dead tuples.
It would be possible for VACUUM to also count the number of
not-committed-but-not-removable tuples (ie, new from still-open
transactions, plus dead-but-still-visible-to-somebody), but I'm not sure
that it would be useful to do so, because that sort of count is hugely
transient.  The stat would be irrelevant moments after it was taken.

            regards, tom lane

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: lazy_update_relstats considered harmful (was Re: [PERFORM]
Next
From: Alvaro Herrera
Date:
Subject: HeapTupleSatisfiesUpdate missing a bet?