Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CAH2-Wzn6dhY3M4MomiABKZDxVC4m=7ub3A8cy3_13j2pJTv1=w@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
Re: New strategies for freezing, advancing relfrozenxid early Re: New strategies for freezing, advancing relfrozenxid early |
List | pgsql-hackers |
On Wed, Aug 31, 2022 at 12:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > Actually I'm also ignoring some subtleties with Multis that could make > this not quite happen, but again, that's only a super obscure corner case. > The idea that just setting vacuum_freeze_min_age = 0 and > vacuum_multixact_freeze_min_age = 0 will be enough is definitely true > in spirit. You don't need to touch vacuum_freeze_table_age (if you did > then you'd get aggressive VACUUMs, and one goal here is to avoid > those whenever possible -- especially aggressive antiwraparound > autovacuums). Attached is v3. There is a new patch included here -- v3-0004-*patch, or "Unify aggressive VACUUM with antiwraparound VACUUM". No other notable changes. I decided to work on this now because it seems like it might give a more complete picture of the high level direction that I'm pushing towards. Perhaps this will make it easier to review the patch series as a whole, even. The new patch unifies the concept of antiwraparound VACUUM with the concept of aggressive VACUUM. Now there is only antiwraparound and regular VACUUM (uh, barring VACUUM FULL). And now antiwraparound VACUUMs are not limited to antiwraparound autovacuums -- a manual VACUUM can also be antiwraparound (that's just the new name for "aggressive"). We will typically only get antiwraparound vacuuming in a regular VACUUM when the user goes out of their way to get that behavior. VACUUM FREEZE is the best example. For the most part the skipping/freezing strategy stuff has a good sense of what matters already, and shouldn't need to be guided very often. The patch relegates vacuum_freeze_table_age to a compatibility option, making its default -1, meaning "just use autovacuum_freeze_max_age". I always thought that having two table age based GUCs was confusing. There was a period between 2006 and 2009 when we had autovacuum_freeze_max_age, but did not yet have vacuum_freeze_table_age. This change can almost be thought of as a return to the simpler user interface that existed at that time. Of course we must not resurrect the problems that vacuum_freeze_table_age was intended to address (see originating commit 65878185) by mistake. We need an improved version of the same basic concept, too. The patch more or less replaces the table-age-aggressive-escalation concept (previously implemented using vacuum_freeze_table_age) with new logic that makes lazyvacuum.c's choice of skipping strategy *also* depend upon table age -- it is now one more factor to be considered. Both costs and benefits are weighed here. We now give just a little weight to table age at a relatively early stage (XID-age-wise early), and escalate from there. As the table's relfrozenxid gets older and older, we give less and less weight to putting off the cost of freezing. This general approach is possible because the false dichotomy that is "aggressive vs non-aggressive" has mostly been eliminated. This makes things less confusing for users and hackers. The details of the skipping-strategy-choice algorithm are still unsettled in v3 (no real change there). ISTM that the important thing is still the high level concepts. Jeff was slightly puzzled by the emphasis placed on the cost model/strategy stuff, at least at one point. Hopefully my intent will be made clearer by the ideas featured in the new patch. The skipping strategy decision making process isn't particularly complicated, but it now looks more like an optimization problem of some kind or other. It might make sense to go further in the same direction by making "regular vs aggressive/antiwraparound" into a *strict* continuum. In other words, it might make sense to get rid of the two remaining cases where VACUUM conditions its behavior on whether this VACUUM operation is antiwraparound/aggressive or not. I'm referring to the cleanup lock skipping behavior around lazy_scan_noprune(), as well as the PROC_VACUUM_FOR_WRAPAROUND no-auto-cancellation behavior enforced in autovacuum workers. We will still need to keep roughly the same two behaviors, but the timelines can be totally different. We must be reasonably sure that the cure won't be worse than the disease -- I'm aware of quite a few individual cases that felt that way [1]. Aggressive interventions can make sense, but they need to be proportionate to the problem that's right in front of us. "Kicking the can down the road" is often the safest and most responsible approach -- it all depends on the details. [1] https://www.tritondatacenter.com/blog/manta-postmortem-7-27-2015 is the most high profile example, but I have personally been called in to deal with similar problems in the past -- Peter Geoghegan
Attachment
pgsql-hackers by date: