autovacuum scheduling starvation and frenzy - Mailing list pgsql-hackers

From Jeff Janes
Subject autovacuum scheduling starvation and frenzy
Date
Msg-id CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
Whole thread Raw
Responses Re: autovacuum scheduling starvation and frenzy
List pgsql-hackers
In testing 9.4 with some long running tests, I noticed that autovacuum launcher/worker sometimes goes a bit nuts.  It vacuums the same database repeatedly without respect to the nap time.

As far as I can tell, the behavior is the same in older versions, but I haven't tested that.

This is my understanding of what is happening:

If you have a database with a large table in it that has just passed autovacuum_freeze_max_age, all future workers will be funnelled into that database until the wrap-around completes.  But only one of those workers can actually vacuum the one table which is holding back the frozenxid. Maybe the 2nd worker to come along will find other useful work to do, but eventually all the vacuuming that needs doing is already in progress, and so each worker starts up, gets directed to this database, finds it can't help, and exits.  So all other databases are entirely starved of autovacuuming for the entire duration of the wrap-around vacuuming of this one large table.

Also, the launcher decides when to launch the next worker by looking at the scheduled time of the least-recently-vacuumed database (with the implicit intention that that is the one that will get chosen to vacuum next).  But since the worker gets redirected to the wrap-around database instead of the least-recently-vacuumed database, the least-recently-vacuumed database never gets it schedule updated and always looks like it is chronologically overdue.  That means the launcher keeps launching new workers as fast as the previous ones exit, ignoring the nap time. So there is one long running worker actually making progress, plus a frenzy of workers all attacking the same database, finding that there is nothing they can do.

I think that a database more than autovacuum_freeze_max_age should get first priority, but only if its next scheduled vacuum time is in the past.  If it can beneficially use more than one vacuum worker, they would usually accumulate there naturally within a few naptimes iterations[1].  And if it can't usefully use more than one worker, don't prevent other databases from using them.

[1] you could argue that all other max_workers processes could become pinned down in long running vacuums of other nonrisk databases between the time that the database crosses autovacuum_freeze_max_age (and has its first worker started), and the time its nap time expires and so it becomes eligible for a second one.  But that seems like a weak argument, as it could just have easily happened that all of them got pinned down in nonrisk databases a few transactions *before* the database crosses autovacuum_freeze_max_age in the first place.

Does this analysis and proposal seem sound?

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: Problem with txid_snapshot_in/out() functionality)
Next
From: Bruce Momjian
Date:
Subject: Re: New timezones used in regression tests