Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum
Date
Msg-id BANLkTinREie2cX2zqZPSEfUsZsdM+GjymQ@mail.gmail.com
Whole thread Raw
In response to Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, May 26, 2011 at 5:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> When we prune or vacuum a page, I don't suppose we have enough
>> information about that page's previous state to calculate a tuple
>> count delta, do we?  That would allow a far more accurate number to
>> be maintained than anything suggested so far, as long as we tweak
>> autovacuum to count inserts toward the need to vacuum.
>
> Well, that was the other direction that was suggested upthread: stop
> relying on reltuples at all, but use the stats collector's counts.
> That might be a good solution in the long run, but there are some
> issues:
>
> 1. It's not clear how using a current count, as opposed to
> time-of-last-vacuum count, would affect the behavior of the autovacuum
> control logic.  At first glance I think it would break it, since the
> basic logic there is "how much of the table changed since it was last
> vacuumed?".  Even if the equations could be modified to still work,
> I remember enough feedback control theory from undergrad EE to think that
> this is something to be seriously scared of tweaking without extensive
> testing.  IMO it is far more risky than what Robert is worried about.

Yeah, I think that would be broken.

> 2. You still have the problem that we're exposing inaccurate (or at
> least less accurate than they could be) counts to the planner and to
> onlooker clients.  We could change the planner to also depend on the
> stats collector instead of reltuples, but at that point you just removed
> the option for people to turn off the stats collector.  The implications
> for plan stability might be unpleasant, too.
>
> So that's not a direction I want to go without a significant amount
> of work and testing.

FWIW, I agree.  Your proposed solution is certainly better than trying
to do this; but it still seems a bit shaky to me.

Still, maybe we don't have a better option.  If it were me, I'd add an
additional safety valve: use your formula if the percentage of the
relation scanned is above some threshold where there's unlikely to be
too much skew.  But if the percentage scanned is too small, then don't
use that formula.  Instead, only update relpages/reltuples if the
relation is now larger; set relpages to the new actual value, and
scale up reltuples proportionately.

However, I just work here.  It's possible that I'm worrying about a
problem that won't materialize in practice.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Next
From: Pavan Deolasee
Date:
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD