Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. - Mailing list pgsql-bugs

From David Gould
Subject Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Date
Msg-id 20160315142816.5dca09dd@engels
Whole thread Raw
In response to Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
List pgsql-bugs
On Tue, 15 Mar 2016 17:40:26 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> David Gould wrote:
> > On Mon, 29 Feb 2016 18:33:50 -0300
> > Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > > Hi David, did you ever post an updated version of this patch?
> >
> > No. Let me fix that now. I've attached my current revision of the patch
> > based on master. This version is significantly better than the original
> > version and resolves multiple issues:
>
> Thanks for the patch.  I spent some time studying it and testing it.  I
> think there's a minor bug in this latest version: the "continue" when
> the "for" loop exits because of all relids being less than
> highest_oid_claimed should be a "break" instead.  Otherwise the worker
> will uselessly loop over the whole list of OIDs only to find each time
> that it has nothing to do.

Good catch. Thanks!

> After seeing it work under pgbench inserting and deleting large numbers
> of tuples into a large number of tables, I figured that the algorithm
> you described doesn't actually work in certain corner cases: consider
> what happens if table with oid INT_MAX is huge and very bloated.  At
> some point one worker is going to be stuck working on it for a long
> time, and from that point until the big vacuum is finished, all workers
> will see that one worker is processing a table with maximal OID and
> decide they can skip all tables in their work lists.
>
> I tested this by creating one table after all the other test tables were
> created (so it'd have a higher OID than the others -- but keep an eye on
> OID wraparound in your test install), setting a nonzero
> vacuum_cost_limit, then filling it with dead tuples in pgbench via
> "begin; insert into deadtups select * from generate_series(1, 100000);
> rollback".  I can see that one autovacuum process is stuck with it for
> over a minute, and while this one runs there is no other autovacuum
> activity in the database.
>
> We need to cater for this scenario, because it is going to bit somebody.
> I think it might work to still use the OID-sorted list of tables to
> process, but instead of pruning based just on the highest OID across all
> workers, we need to consider only the OID ranges that we've seen
> processed while the current worker has been running.  Seems fiddly :-(

This is a good observation. I was focused on the bad behavior when there are
many small tables, not one large table at the end of the list. The problem
with all this vacuum code is that every one who worked on it seemed to focus
on a particular use scenario and was blind to the other. Including me
apparently. However, I think I have a simple fix. I'll think about it a bit
and post it once it has gelled a bit.

> Another thing that makes me uneasy about this approach is that we've had
> complaints that some tables that have just enough dead tuples generated
> between each autovacuum run on a DB are vacuumed by all workers all the
> time, cause vacuum-starvation of other tables that accumulate more dead
> tuples.  It seems to me we should prioritise tables to vacuum depending
> on how much they actually need vacuum.  If we depend on an algorithm
> that sorts by OID, we will never be able to prioritise anything.
>
>
> One seemingly trivial way to attack this problem is to forget the
> sorting idea and just keep track in shared memory of the last (say) 100
> tables that have been vacuumed by each worker, alongside with a
> timestamp for each.  Using that, each worker can determine whether each
> table on its own list has been vacuumed after it obtained its list, and
> skip it if so.  Since there's no specific sort order required, we could
> in the future change the ordering in which tables are processed for
> other reasons.

Your discomfort is well founded. The more I learned about autovacuum
scheduling the less it made sense. Really, there should be some sort of
priority order for vacuuming based on some metric of need and tables should be
processed in that order. I'm willing to work on that, and some other
autovacuum issues that I found too. But this patch is intended to be back
patchable.

More later.

-dg


--
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Next
From: Teodor Sigaev
Date:
Subject: Re: BUG #13440: unaccent does not remove all diacritics