Re: autovacuum not prioritising for-wraparound tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: autovacuum not prioritising for-wraparound tables
Date
Msg-id CA+TgmoaabcQRQmr6XwhYzXsSu1TRh9W6e5h2DM+4593mr1m=fQ@mail.gmail.com
Whole thread Raw
In response to Re: autovacuum not prioritising for-wraparound tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Jan 25, 2013 at 1:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jan 25, 2013 at 12:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I don't think the first part is problematic. Which scenario do you have
>>> in mind where that would really cause adverse behaviour? autovacuum
>>> seldomly does full table vacuums on tables otherwise these days so
>>> tables get "old" in that sense pretty regularly and mostly uniform.
>
>> I'm worried about the case of a very, very frequently updated table
>> getting put ahead of a table that needs a wraparound vacuum, but only
>> just.  It doesn't sit well with me to think that the priority of that
>> goes from 0 (we don't even try to update it) to infinity (it goes
>> ahead of all tables needing to be vacuumed for dead tuples) the
>> instant we hit the vacuum_freeze_table_age.
>
> Well, really the answer to that is that we have multiple autovac
> workers, and even if the first one that comes along picks the wraparound
> job, the next one won't.

Sure, but you could easily have 10 or 20 cross the
vacuum_freeze_table_age threshold simultaneously - and you'll only be
able to process a few of those at a time, due to
autovacuum_max_workers.  Moreover, even if you don't hit the
autovacuum_max_workers limit (say it's jacked up to 100 or so), you're
still introducing a delay of up to N * autovacuum_naptime, where N is
the number of tables that cross the threshold at the same instant,
before any dead-tuple cleanup vacuums are initiated.  It's not
difficult to imagine that being bad.

> Having said that, I agree that it might be better to express the
> sort priority as some sort of continuous function of multiple figures of
> merit, rather than "sort by one then the next".  See Chris Browne's
> mail for another variant.

Ah, so.  I think, though, that my variant is a whole lot simpler and
accomplishes mostly the same purpose.  One difference between my
proposal and the others that have popped up thus far is that I am not
convinced table size matters, or at least not in the way that people
are proposing to make it matter.  The main reason I can see why big
tables matter more than small tables is that a big table takes
*longer* to autovacuum than a small table.  If you are 123,456
transactions from a cluster-wide shutdown, and there is one big table
and one small table that need to be autovacuumed, you had better start
on the big one first - because the next autovacuum worker to come
along will quite possibly be able to finish the small one before
doomsday, but if you don't start the big one now you won't finish in
time.  This remains true even if the small table has a slightly older
relfrozenxid than the large one, but ceases to be true when the
difference is large enough that vacuuming the small one first will
advance datfrozenxid enough to extend the time until a shutdown occurs
by more than the time it takes to vacuum it.

For dead-tuple vacuuming, the question of whether the table is large
or small does not seem to me to have a categorical right answer.  You
could argue that it's more important recover 2GB of space in a 20GB
table than 2MB of space in a 20MB table, because more space is being
wasted.  On the flip side you could argue that a small table becomes
bloated much more easily than a large table, because even a minute of
heavy update activity can turn over the entire table contents, which
is unlikely for a larger table.  I am inclined to think that the
percentage of dead tuples is a more important rubric - if things are
going well, it shouldn't ever be much different from the threshold
that triggers AV in the first place - but if somehow it is much
different (e.g. because the table's been locked for a while, or is
accumulating more bloat that the threshold in a single
autovacuum_naptime), that seems like good justification for doing it
ahead of other things that are less bloated.

We do need to make sure that the formula is defined in such a way that
something that is *severely* past vacuum_freeze_table_age always beats
an arbitrarily-bloated table.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Question regarding Sync message and unnamed portal
Next
From: Bruce Momjian
Date:
Subject: Re: Question regarding Sync message and unnamed portal