Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. - Mailing list pgsql-bugs

From David Gould
Subject Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Date
Msg-id 20151030231952.70eb5887@engels
Whole thread Raw
In response to Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.  (David Gould <daveg@sonic.net>)
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.  (David Gould <daveg@sonic.net>)
List pgsql-bugs
On Fri, 30 Oct 2015 21:49:00 -0700
Jeff Janes <jeff.janes@gmail.com> wrote:

> On Fri, Oct 30, 2015 at 8:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> >> David Gould wrote:
> >>> Anyway, they are not actually vacuuming. They are waiting on the
> >>> VacuumScheduleLock. And requesting freshs snapshots from the
> >>> stats_collector.
> >
> >> Oh, I see.  Interesting.  Proposals welcome.  I especially dislike the
> >> ("very_expensive") pgstat check.
> >
> > Couldn't we simply move that out of the locked stanza?  That is, if no
> > other worker is working on the table, claim it, and release the lock
> > immediately.  Then do the "very expensive" check.  If that fails, we
> > have to re-take the lock to un-claim the table, but that sounds OK.
>
>
> The attached patch does that.  In a system with 4 CPUs and that had
> 100,000 tables, with a big chunk of them in need of vacuuming, and
> with 30 worker processes, this increased the throughput by a factor of
> 40.  Presumably it will do even better with more CPUs.
>
> It is still horribly inefficient, but 40 times less so.

That is a good result for such a small change.

The attached patch against REL9_5_STABLE_goes a little further. It
claims the table under the lock, but also addresses the problem of all the
workers racing to redo the same table by enforcing an ordering on all the
workers. No worker can claim a table with an oid smaller than the highest
oid claimed by any worker. That is, instead of racing to the same table,
workers leapfrog over each other.

In theory the recheck of the stats could be eliminated although this patch
does not do that. It does eliminate the special handling of stats snapshots
for autovacuum workers which cuts back on the excess rewriting of the stats
file somewhat.

I'll send numbers shortly, but as I recall it is over 100 times better than
the original.

-dg

--
David Gould              510 282 0869         daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

Attachment

pgsql-bugs by date:

Previous
From: Jeff Janes
Date:
Subject: Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Next
From: David Gould
Date:
Subject: Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.