Re: another autovacuum scheduling thread - Mailing list pgsql-hackers
| From | Robert Haas |
|---|---|
| Subject | Re: another autovacuum scheduling thread |
| Date | |
| Msg-id | CA+TgmoY27S+nbgdCrVrc8S4p38NwTAC8_Uyq5ZaX6zxYToebXA@mail.gmail.com Whole thread Raw |
| In response to | Re: another autovacuum scheduling thread (Nathan Bossart <nathandbossart@gmail.com>) |
| Responses |
Re: another autovacuum scheduling thread
|
| List | pgsql-hackers |
On Wed, Nov 12, 2025 at 3:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > I do think re-prioritization is worth considering, but IMHO we should leave > it out of phase 1. I think it's pretty easy to reason about one round of > prioritization being okay. The order is completely arbitrary today, so how > could ordering by vacuum-related criteria make things any worse? In my > view, changing the list contents in fancier ways (e.g., adding > just-processed tables back to the list) is a step further that requires > more discussion and testing. I agree with your view around reprioritization. To answer your rhetorical question, the way that reordering the list could hurt is if the current ordering (pg_class scan order) happened to be a near-optimal choice. For example, suppose the last table in pg_class order in a state where vacuuming appears to be necessary but will be painful and/or useless (VACUUM will error, xmin will prevent all or most tuple removal, located on an incredibly slow disk with nothing cached, whatever). Re-sorting the list figures to move that table earlier, which will not work out for the best. I suspect that reprioritization actually increases the danger of this kind of failure mode. The more aggressive you are about making sure that the highest-priority tables actually get handled first, the more important it is to be correct about the real order of priority. I do think in the long term a really good system is probably going to accumulate a bunch of extra logic to deal with cases like this. For example, if the first table in the queue causes VACUUM to spend an hour chugging a way and then fail with an I/O error, we would ideally want to make sure to wait a while before retrying that table, so that others don't get starved. But like you say, there's no need to solve every problem at once. What seems important to me for this patch is that we don't choose an actively bad sort order. For instance, if we don't get the balance between prioritizing anti-wraparound activity and controlling runaway bloat correct, and especially if there's no way to recover by tweaking settings, to me that's a scary scenario. I do think it's fairly realistic for a bad choice of sort order to end up being a regression over the current lack of a sort order. You might just be getting lucky right now -- say, because the catalog tables all occur first in the catalog and vacuuming those tends to be important, and among user tables, the ones you created first are actually the ones that are most important. That's not a particularly crazy scenario, IMHO. Point being: I think we need to avoid the mindset that we can't be stupider than we are now. I don't think there's any way we would commit something that is GENERALLY stupider than we are now, but it's not about averages. It's about whether there are specific cases that are common enough to worry about which end up getting regressed. I'm honestly not sure how much of a risk that is, and, again, I'm not trying to kill the patch. It might well be that the patch is already good enough that such scenarios will be extremely rare. However, it's easy to get overconfident when replacing a completely unintelligent system with a smarter one. The risk of something backfiring can sometimes be higher than one anticipates. One idea that might be worth considering is adding a reloption of some kind that lets the user exert positive control over the sort order. I know that's scope creep, so maybe it's a bad idea for that reason. But I think it would be a better idea than Sami's proposal to score system catalogs more highly, not so much because his idea is necessary wrong-headed as because it doesn't help with what I see as the principal danger here, namely, that whatever we do will sometimes turn out to be wrong. Trying to be right 100% of the time is not going to work out as well as having a backup plan for the cases where we are wrong. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: