Re: autovacuum not prioritising for-wraparound tables - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: autovacuum not prioritising for-wraparound tables
Date
Msg-id 20130321221547.GD3685@alvh.no-ip.org
Whole thread Raw
In response to Re: autovacuum not prioritising for-wraparound tables  (Christopher Browne <cbbrowne@gmail.com>)
List pgsql-hackers
Here's another cut at this patch.  This is mainly about the
infrastructure to pass the data around in autovacuum; the proposed
formulas probably need lot of work.

We still have two terms in autovacuum priority, the first one considers
dead tuples and the second one considers wraparound limit.  I have kept
Chris' proposal for the second term, but refined the first one a bit per
Jim Nasby's suggestion of discounting dead space.  So we now have

    return
        (d->p_deadtuples * (d->p_livetuples + d->p_deadtuples) * d->p_relpages) /
        ((d->p_livetuples + 1) * d->p_reltuples * nblocks) +
        exp(d->p_xidage * logf(nblocks) / UINT_MAX);

Here, deadtuples and livetuples come from pgstat data, while relpages
and reltuples come from pg_class.  nblocks, on the other hand, comes
from the actual number of blocks in the table.

I haven't considered the case where pg_class.reltuples = 0 (which
results in division-by-zero), but I think to be really robust here we'd
want to have some code copied from estimate_rel_size; or maybe simply
use some hardcoded magic value.  I lean towards the latter, because I'm
not sure we want to expend a relation open at this point (incurring an
attempt to lock the table, which could be problematic); hence the new
RelFileNodeGetNumberOfBlocks() thingy, which is admittedly pretty ugly,
not to mention untested.

(I have considered livetuples=0 however, hence the +1 there).

I think we now need to have a more focused discussion on useful
formulas to use here.  One thing I noticed that fails in the above
formula is that as nblocks grows, ceteris paribus, the score falls; but
that's wrong, because if you have a table that turns out to have much
larger nblocks because it bloated and pgstat lost the message, we need
to look harder at it.  So somehow we need to consider the tuple density
as given by pg_class.reltuples/pg_class.relpages, and compare with the
one given by pgstat.(live+dead) / nblocks; and raise the score as the
ratio goes down (in normal conditions the ratio should be 1; a bloated
table that pgstat hasn't noticed will have a lower ratio).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: SIGHUP not received by custom bgworkers if postmaster is notified
Next
From: Michael Paquier
Date:
Subject: Re: Support for REINDEX CONCURRENTLY