Re: another autovacuum scheduling thread - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: another autovacuum scheduling thread |
Date | |
Msg-id | CA+TgmoZVdQvtiQJPcKyK7Kfc-J=_jph2HqiDufb1jxD836LfYA@mail.gmail.com Whole thread Raw |
In response to | Re: another autovacuum scheduling thread (Jeremy Schneider <schneider@ardentperf.com>) |
List | pgsql-hackers |
On Fri, Oct 10, 2025 at 6:00 PM Jeremy Schneider <schneider@ardentperf.com> wrote: > The spectacular failures I've seen with autovac usually come down to > things like too much sleeping (cost_delay) or too few workers, where > better ordering would be nice but probably wouldn't fix any real > problems leading to the spectacular failures Since I have said the same thing myself, I can hardly disagree. However, there are probably a few exceptions. For instance, if autovacuum on a certain table is failing repeatedly or accomplishing nothing without removing the apparent need to autovacuum, and happens to be the first one in pg_class, it could divert a lot of attention from other tables. > Robert it sounds to me like the main use case you're focused on here > is where basically wraparound is imminent - we are already screwed - and > our very last hope was that a last-ditch autovac can finish just in time Yes, I would argue that this is the scenario that really matters. As you say above, the main thing is having little enough sleeping and a sufficient number of workers. When that's the case, we can do the work in any order and life will mostly be fine. However, if we get into a desperate situation by, say, having one table that can't be vacuumed, and eventually someone fixes that, say by dropping the corrupt index that is preventing vacuuming of that table, we might like it if autovacuum focused on getting that table vacuumed rather than getting lost in the sauce. Of course, if we have the pretty common situation where autovacuum gets behind on all tables, say due to a stale replication slot, then this is less critical, although a perfect system would probably prioritize vacuuming the *largest* tables in this situation, since those will take the longest to finish, and it's when a vacuum of every table in the cluster has been *completed* that the XID horizons can advance. > I hope y'all just pick something and commit it without getting too lost > in the details. I honestly think in the list of improvements around > autovac, this is the lowest priority on my list of hopes and dreams as a > user for wraparound prevention :) because if this ever matters to me for > avoiding wraparound, I was screwed long before we got to this point and > this is not going to fix my underlying problems. I'm not sure if this was your intention, but to me this kind of reads like "well, it's not going to matter anyway so just do whatever and move on" and I don't agree with that. I think that if we're not going to do high-quality engineering here, we just shouldn't change anything at all. It's better to keep having the same bad behavior than for each release to have new and different bad behavior. One possible positive result of leaning into this prioritization problem is that whoever's working in it (Nathan, in this case) might gain some useful insights about how to tackle some of the other problems in this space. All of this is hard enough that we haven't really had any major improvements in this area since, I want to say, 8.3, and it's desirable to break that logjam even if we don't all agree on which problems are most urgent. Even if I ultimately don't agree with whatever Nathan wants to do or proposes, I'm glad he's trying to do something, which is (in my experience) generally much better than making no effort at all. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: