Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower. - Mailing list pgsql-bugs

From David Gould
Subject Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Date
Msg-id 20151106164649.3ed185fa@engels
Whole thread Raw
In response to Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
On Tue, 3 Nov 2015 19:24:25 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> David Gould wrote:
>
> > This is for an idle system with 100,000 new small tables to analyze. I ran
> > all the test for an hour or 5000 tables processed. "jj" refers to the patch
> > from Jeff Janes, "dg" refers to the attached patch (same as previous).
> >
> > /autovacuum actions per minute/
> > workers   9.5b1     jj     dg
> > -------   -----   ----  -----
> >    1        183    171    285
> >    4         45    212   1158
> >    8         23    462   1225
>
> Nice numbers.
>
> > Could someone please take a look at the patch and comment? Thanks.
>
> 1. What's with all the FIXMEs?

Those were a trail of breadcrumbs left to explain some of the issues.
I have removed them and revised the comments.

> 2. I think you need more of an explanation of what your patch actually
> does.

I am writing a more complete description of the issues with the current
code and what the patch does to to address them.

> 3. Do we want to backpatch?  Changes in behavior aren't acceptable on
> existing branches, because it might destabilize autovacuum behavior
> that's been carefully tuned in existing systems.  So if we want
> something to backpatch, ideally it shouldn't change the ordering in
> which tables are vacuumed, and instead arrive at the same results
> faster.  (I don't care about this restriction myself, but others do and
> strongly so.)

The current order of autovacuum operations is the physical order of the
rows in pg_class plus some jitter depending on which worker is able to grab
a table first. It seems unlikely anything could depend on this
particular order.

The heart of this patch is that it establishes a consistent order for all
the workers to take tables from so that they don't compete for work.

The issue is that autovacuum is ineffective and costly with large numbers of
tables. The recent multixact fixes will expose more users to this issue as
they update to 9.3.9/9.4.4 and beyond.

> 4. In the master branch, behavior changes are acceptable, so we can do
> something more invasive.

I will make a version for master.

> 5. Is it easier to use a binary heap rather than the OID list thing you
> have? (see src/include/lib/binaryheap.h)  I don't think so, but it's
> worth asking.  Note that older branches don't have this, so
> backpatchable should not rely on it.

Thanks for the suggestion, but the binary heap requires knowing the size
in advance. The method of using an array and sorting is modeled on:

  src/backend/catalog/pg_inherits.c:find_inheritance_children()

which has a similar need for a sorted list of indefinite size.

Let's move this discussion to hackers to get a bit more coverage, especially
if we are considering a version against master. I will post an expanded
problem description, the revised patch and test results there after the
weekend.

-dg

--
David Gould                                    daveg@sonic.net
If simplicity worked, the world would be overrun with insects.

pgsql-bugs by date:

Previous
From: Vijay Krishnan
Date:
Subject: Re: BUG #13757: Able to write to postgres even when the main process has been killed
Next
From: Michael Paquier
Date:
Subject: Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled