Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum
Date
Msg-id CA+TgmoYp8OvDG30CvrV9gJEkTfrSZO=fSsjvO5hw--=W_6GvLA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
On Fri, Jan 13, 2017 at 8:45 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Amit Khandekar wrote:
>> In a server where autovacuum is disabled and its databases reach
>> autovacuum_freeze_max_age limit, an autovacuum is forced to prevent
>> xid wraparound issues. At this stage, when the server is loaded with a
>> lot of DML operations, an exceedingly high number of autovacuum
>> workers keep on getting spawned, and these do not do anything, and
>> then quit.
>
> I think this is the same problem as reported in
> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com

If I understand correctly, and it's possible that I don't, the issues
are distinct.  I think that the issue in that thread has to do with
the autovacuum launcher starting workers over and over again in a
tight loop, whereas this issue seems to be about autovacuum workers
restarting the launcher over and over again in a tight loop.  In that
thread, it's the autovacuum launcher that is looping, which can only
happen when autovacuum=on.  In this thread, the autovacuum launcher is
repeatedly exiting and getting restarted, which can only happen when
autovacuum=off.

In general, it seems we've been pretty cavalier about just how often
it's reasonable to start the autovacuum launcher when autovacuum=off.
That code probably doesn't see much real-world use.  Foreground
processes signal the postmaster only every 64kB transactions, which on
today's hardware can't happen more than every couple of seconds if
you're not using subtransactions or intentionally burning XIDs, but
hardware keeps getting faster, and you might be using subtransactions.
However, requiring that 65,536 transactions pass between signals does
serve as something of a rate limit.  In the case about which Amit is
complaining, there's no rate limit at all.  As fast as the autovacuum
launcher starts up, it spawns a worker and exits; as fast as the
worker can determine that it can't do anything useful, it starts a new
launcher.  Clearly, some kind of rate control is needed here; the only
question is about where to put it.

I would be tempted to install something directly in postmaster.c.  If
CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) && Shutdown ==
NoShutdown but we last set start_autovac_launcher = true less than 10
seconds ago, don't do it again.  That limits us to launching the
autovacuum launcher at most six times a minute when autovacuum = off.
You could argue that defeats the point of the SendPostmasterSignal in
SetTransactionIdLimit, but I don't think so.  If vacuuming the oldest
database took less than 10 seconds, then we won't vacuum the
next-oldest database until we hit the next 64kB transaction ID
boundary, but that can only cause a problem if we've got so many
databases that we don't get to them all before we run out of
transaction IDs, which is almost unthinkable.  If you had a ten
million tiny databases that all crossed the threshold at the same
instant, it would take you 640 million transaction IDs to visit them
all.  If you also had autovacuum_freeze_max_age set very close to the
upper limit for that variable, you could conceivably have the system
shut down before all of those databases were reached.  But that's a
pretty artificial scenario.  If someone has that scenario, perhaps
they should consider more sensible configuration choices.

I wondered for a while why the existing guard in
vac_update_datfrozenxid() isn't sufficient to prevent this problem.
That turns out to be due to Tom's commit
794e3e81a0e8068de2606015352c1254cb071a78, which causes
ForceTransactionIdLimitUpdate() always returns true when we're past
xidVacLimit.  The commit doesn't contain much in the way of
justification for the change, but I think the issue must be that if
the database nearest to wraparound is dropped, we need some mechanism
for eventually forcing xidVacLimit to get updated, rather than just
spewing warnings.

Another place where we could insert a guard is inside
SetTransactionIdLimit itself.  This is a little tricky.  The easy idea
would be just to skip sending the signal if xidVacLimit hasn't
advanced, but that's wrong in the case where there are multiple
databases with exactly the same oldest XID; vacuuming the first one
doesn't change anything.  It would be correct -- I think -- to skip
sending the signal when xidVacLimit doesn't advance and
vac_update_datfrozenxid() didn't change the current database's value
either, but that requires passing a flag down the call stack a few
levels.  That's only mildly ugly so I'd be fine with it if it were the
best fix, but there seem to be better options.

Amit's chosen yet another possible place to insert the guard: teach
autovacuum that if a worker skips at least one table due to concurrent
autovacuum activity AND ends up vacuuming no tables, don't call
vac_update_datfrozenxid().  Since there is or was another worker
running, vac_update_datfrozenxid() either already has been called or
will be when that worker finishes.  So that seems safe.  If his patch
were changed to skip vac_update_datfrozenxid() in all cases where we
do nothing rather than only when we skip a table due to concurrent
activity, we'd reintroduce the dropped-database problem that was fixed
by 794e3e81a0e8068de2606015352c1254cb071a78.

I'm not entirely sure whether Amit's fix is better or worse than the
postmaster-based fix.  It seems like a fairly fundamental weakness for
the postmaster to have no rate-limiting logic whatsoever here; it
should be the postmaster's job to judge whether it's getting swamped
with signals, and if we fix it in the postmaster then it stops systems
with high rates of XID consumption from going bonkers for that reason.
On the other hand, if somebody does have a scenario where repeatedly
signaling the postmaster to start the launcher in a tight loop is
allowing the system to zip through many small databases efficiently,
Amit's fix will let that keep working, whereas throttling in the
postmaster will make it take longer to get to all of those databases.
In many cases, that could be an improvement, since it would tend to
spread out the datfrozenxid values better, but I can't quite shake the
niggling fear that there might be some case I'm not thinking of where
it's problematic.  So I don't know.

As far as the problem on the other thread, maybe we could extend
Amit's approach so that when a worker exits after having skipped some
tables but not vacuum any tables, we blacklist the database for some
period of time or some number of iterations: autovacuum workers aren't
allowed to choose that database until the blacklist entry expires.
That way, if it becomes evident that more autovacuum workers in that
database are useless, other databases get a chance to attract some
workers, at least for some period of time.  I'm not sure how to
calibrate that exactly, but it's a thought.  I think we should fix
this problem first, though; it's subject to a narrower and
less-speculative repair.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] Gather Merge
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)