On Thu, 30 Oct 2025 at 15:58, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
> In fact, with the introduction of the vacuum_max_eager_freeze_failure_rate feature, if a table’s age still exceeds
morethan 1.x times the autovacuum_freeze_max_age, it suggests that the vacuum freeze process is not functioning
properly.Once the age surpasses vacuum_failsafe_age, wraparound issues are likely to occur soon.Taking the average of
vacuum_failsafe_ageand autovacuum_freeze_max_age is not a complex approach. Under the default configuration, this
averagealready exceeds four times the autovacuum_freeze_max_age. At that stage, a DBA should have already intervened to
investigateand resolve why the table age is not decreasing. 
I don't think anyone would like to modify PostgreSQL in any way that
increases the chances that a table gets as old as vacuum_failsafe_age.
Regardless of the order in which tables are vacuumed, if a table gets
as old as that then vacuum is configured to run too slowly, or there
are not enough workers configured to cope with the given amount of
work. I think we need to tackle prioritisation and rate limiting as
two separate items. Nathan is proposing to improve the prioritisation
in this thread and it seems to me that your concerns are with rate
limiting. I've suggested an idea that might help with reducing the
cost_delay based on the score of the table in this thread. I'd rather
not introduce that as a topic for further discussion here (I imagine
Nathan agrees). It's not as if the server is going to consume 1
billion xids in 5 mins. It's at least going to take a day to days or
longer for that to happen and if autovacuum has not managed to get on
top of the workload in that time, then it's configured to run too
slowly and the cost_limit or delay needs to be adjusted.
My concern is that there are countless problems with autovacuum and if
you try and lump them all into a single thread to fix them all at
once, we'll get nowhere. Autovacuum was added to core in 8.1, 20 years
ago and I don't believe we've done anything to change the ratelimiting
aside from reducing the default cost_delay since then. It'd be good to
fix that at some point, just not here, please.
FWIW, I agree with Nathan about keeping the score calculation
non-magical. The score should be simple and easy to document. We can
introduce complexity to it as and when it's needed and when the
supporting evidence arrives, rather than from people waving their
hands.
David