Re: another autovacuum scheduling thread - Mailing list pgsql-hackers
From | Jeremy Schneider |
---|---|
Subject | Re: another autovacuum scheduling thread |
Date | |
Msg-id | 20251010145959.414a2c27@ardentperf.com Whole thread Raw |
In response to | Re: another autovacuum scheduling thread (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: another autovacuum scheduling thread
|
List | pgsql-hackers |
On Fri, 10 Oct 2025 16:24:51 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > I don't think we > need something dramatically awesome to make a change to the status > quo, but if it's extremely easy to think up simple scenarios in which > a given idea will fail spectacularly, I'd be inclined to suspect that > there will be a lot of real-world spectacular failures. What does a real-world spectacular failure look like? "If those 3 autovac workers had processed tables in a different order everything would have been peachy" But if autovac is going to get jammed up long enough to wraparound the system, does it matter whether or not it did a one-time processing of a bunch of small tables before it got jammed? One particular table always scoring high shouldn't block autovac from other tables, because it doesn't start a new iteration until it goes all the way through the list from its current iteration right? And one iteration of autovac needs to process everything in the list... so it should take the same overall time regardless of order? The spectacular failures I've seen with autovac usually come down to things like too much sleeping (cost_delay) or too few workers, where better ordering would be nice but probably wouldn't fix any real problems leading to the spectacular failures From Robert's 2024 pgConf.dev talk: 1. slow - forward progress not fast enough 2. stuck - no forward progress 3. spinning - not accomplishing anything 4. skipped - thinks not needed 5. starvation - cant keep up I don't think any of these are really addressed by simply changing table order. From Robert's 2022 email to hackers: > A few people have proposed scoring systems, which I think is closer > to the right idea, because our basic goal is to start vacuuming any > given table soon enough that we finish vacuuming it before some > catastrophe strikes. ... > If table A will cause wraparound in 2 hours and take 2 hours to > vacuum, and table B will cause wraparound in 1 hour and take 10 > minutes to vacuum, table A is more urgent even though the catastrophe > is further out. Robert it sounds to me like the main use case you're focused on here is where basically wraparound is imminent - we are already screwed - and our very last hope was that a last-ditch autovac can finish just in time Failsafe and dynamic cost updates were huge advancements. Do we allow dynamic adjustment to worker count yet? I hope y'all just pick something and commit it without getting too lost in the details. I honestly think in the list of improvements around autovac, this is the lowest priority on my list of hopes and dreams as a user for wraparound prevention :) because if this ever matters to me for avoiding wraparound, I was screwed long before we got to this point and this is not going to fix my underlying problems. -Jeremy
pgsql-hackers by date: