Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CA+TgmoaGoZ2wX6T4sj0eL5YAOQKW3tS8ViMuN+tcqWJqFPKFaA@mail.gmail.com Whole thread Raw |
In response to | Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
List | pgsql-hackers |
On Fri, Jan 7, 2022 at 5:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > I thought I was being conservative by suggesting > autovacuum_freeze_max_age/2. My first thought was to teach VACUUM to > make its FreezeLimit "OldestXmin - autovacuum_freeze_max_age". To me > these two concepts really *are* the same thing: vacrel->FreezeLimit > becomes a backstop, just as anti-wraparound autovacuum (the > autovacuum_freeze_max_age cutoff) becomes a backstop. I can't follow this. If the idea is that we're going to opportunistically freeze a page whenever that allows us to mark it all-visible, then the remaining question is what XID age we should use to force freezing when that rule doesn't apply. It seems to me that there is a rebuttable presumption that that case ought to work just as it does today - and I think I hear you saying that it should NOT work as it does today, but should use some other threshold. Yet I can't understand why you think that. > I couldn't agree more. In fact, I was mostly thinking about how to > *help* these users. Insisting on waiting for a cleanup lock before it > becomes strictly necessary (when the table age is only 50 > million/vacuum_freeze_min_age) is actually a big part of the problem > for these users. vacuum_freeze_min_age enforces a false dichotomy on > aggressive VACUUMs, that just isn't unhelpful. Why should waiting on a > cleanup lock fix anything? Because waiting on a lock means that we'll acquire it as soon as it's available. If you repeatedly call your local Pizzeria Uno's and ask whether there is a wait, and head to the restaurant only when the answer is in the negative, you may never get there, because they may be busy every time you call - especially if you always call around lunch or dinner time. Even if you eventually get there, it may take multiple days before you find a time when a table is immediately available, whereas if you had just gone over there and stood in line, you likely would have been seated in under an hour and savoring the goodness of quality deep-dish pizza not too long thereafter. The same principle applies here. I do think that waiting for a cleanup lock when the age of the page is only vacuum_freeze_min_age seems like it might be too aggressive, but I don't think that's how it works. AFAICS, it's based on whether the vacuum is marked as aggressive, which has to do with vacuum_freeze_table_age, not vacuum_freeze_min_age. Let's turn the question around: if the age of the oldest XID on the page is >150 million transactions and the buffer cleanup lock is not available now, what makes you think that it's any more likely to be available when the XID age reaches 200 million or 300 million or 700 million? There is perhaps an argument for some kind of tunable that eventually shoots the other session in the head (if we can identify it, anyway) but it seems to me that regardless of what threshold we pick, polling is strictly less likely to find a time when the page is available than waiting for the cleanup lock. It has the counterbalancing advantage of allowing the autovacuum worker to do other useful work in the meantime and that is indeed a significant upside, but at some point you're going to have to give up and admit that polling is a failed strategy, and it's unclear why 150 million XIDs - or probably even 50 million XIDs - isn't long enough to say that we're not getting the job done with half measures. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: