Re: autovacuum not prioritising for-wraparound tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: autovacuum not prioritising for-wraparound tables
Date
Msg-id CA+TgmoZVaFnV83v=AjT1-=TeNQtnPoEWNwM3ydxt4+bts2=x2A@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum not prioritising for-wraparound tables  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: autovacuum not prioritising for-wraparound tables  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Sat, Feb 2, 2013 at 8:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> - It's probably important to have a formula where we can be sure that
>> the wrap-around term will eventually dominate the dead-tuple term,
>> with enough time to spare to make sure nothing really bad happens; on
>> the other hand, it's also desirable to avoid the case where a table
>> that has just crossed the threshold for wraparound vacuuming doesn't
>> immediately shoot to the top of the list even if it isn't truly
>> urgent.  It's unclear to me just from looking at this formula how well
>> the second term meets those goals.
>
> I just wanted to mention that if everything goes well, we won't *ever*
> get to an anti-wraparound-vacuum. Normally the table should cross the
> vacuum_table_age barrier earlier and promote a normal vacuum to a
> full-table vacuum which will set relfrozenxid to a new and lower value
> and thus prevent anti-wraparound vacuums from occurring.
> So priorizing anti-wraparound vacuums immediately and heavily doesn't
> seem to be too bad.

IMHO, this is hopelessly optimistic.  Yes, it's intended to work that
way.  But INSERT-only or INSERT-mostly tables are far from an uncommon
use case; and in fact they're probably the most common cause of pain
in this area.  You insert a gajillion tuples, and vacuum never kicks
off, and then eventually you either update some tuples or hit
autovacuum_freeze_max_age and suddenly, BAM, you get this gigantic
vacuum that rewrites the entire table.  And then you open a support
ticket with your preferred PostgreSQL support provider and say
something like "WTF?".

>> - More generally, it seems to me that we ought to be trying to think
>> about the units in which these various quantities are measured.  Each
>> term ought to be unit-less.  So perhaps the first term ought to divide
>> dead tuples by total tuples, which has the nice property that the
>> result is a dimensionless quantity that never exceeds 1.0.  Then the
>> second term can be scaled somehow based on that value.
>
> I think we also need to be careful to not try to get too elaborate on
> this end. Once the general code for priorization is in, the exact
> priorization formula can be easily incrementally tweaked. Just about any
> half-way sensible priorization is better than what we have right now and
> we might discover new effects once we do marginally better.

I agree.  It would be nice to have some way of measuring the positive
or negative impact of what we introduce, too, but I don't have a good
idea what that would be.

> Imo the browne_strength field should be called 'priority' and the
> priorization calculation formula should be moved qinto an extra
> function.

Yeah, or maybe vacuum_priority, since that would be easier to grep for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: autovacuum not prioritising for-wraparound tables
Next
From: Tom Lane
Date:
Subject: Re: COPY FREEZE has no warning