On Wed, 7 Mar 2018 21:39:08 -0800
Jeff Janes <jeff.janes@gmail.com> wrote:
> As for preventing it in the first place, based on your description of your
> hardware and operations, I was going to say you need to increase the max
> number of autovac workers, but then I remembered you from "Autovacuum slows
> down with large numbers of tables. More workers makes it slower" (
> https://www.postgresql.org/message-id/20151030133252.3033.4249%40wrigleys.postgresql.org).
> So you are probably still suffering from that? Your patch from then seemed
> to be pretty invasive and so controversial.
We have been building from source using that patch for the worker contention
since then. It's very effective, there is no way we could have continued to
rely on autovacuum without it. It's sort of a nuisance to keep updating it
for each point release that touches autovacuum, but here we are.
The current patch is motivated by the fact that even with effective workers
we still regularly find tables with inflated reltuples. I have some theories
about why, but not really proof. Mainly variants on "all the vacuum workers
were busy making their way through a list of 100,000 tables and did not get
back to the problem table before it became a problem."
I do have a design in mind for a larger more principled patch that fixes the
same issue and some others too, but given the reaction to the earlier one I
hesitate to spend a lot of time on it. I'd be happy to discuss a way to try
to move forward though if any one is interested.
Your patch helped, but mainly was targeted at the lock contention part of the
problem.
The other part of the problem was that autovacuum workers will force a rewrite
of the stats file every time they try to choose a new table to work on.
With large numbers of tables and many autovacuum workers this is a significant
extra workload.
-dg
--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.