Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CAH2-WzmjHQJ7pbdO4BtWVJ6CLG-Mp9CNe914WUJdiScOTNRKRw@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
Re: New strategies for freezing, advancing relfrozenxid early |
List | pgsql-hackers |
On Thu, Dec 15, 2022 at 10:53 AM Peter Geoghegan <pg@bowt.ie> wrote: > I agree that the burden of catch-up freezing is excessive here (in > fact I already wrote something to that effect on the wiki page). The > likely solution can be simple enough. Attached is v10, which fixes this issue, but using a different approach to the one I sketched here. This revision also changes the terminology around VM skipping: we now call the strategies there "scanning strategies", per feedback from Jeff and John. This does seem a lot clearer. Also cleaned up the docs a little bit, which were messed up by a rebasing issue in v9. I ended up fixing the aforementioned "too much catch-up freezing" issue by just getting rid of the whole concept of a second table-size threshold that forces the eager scanning strategy. I now believe that it's fine to just rely on the generic logic that determines scanning strategy based on a combination of table age and the added cost of eager scanning. It'll work in a way that doesn't result in too much of a freezing spike during any one VACUUM operation, without waiting until an antiwraparound autovacuum to advance relfrozenxid (it'll happen far earlier than that, though still quite a lot later than what you'd see with v9, so as to avoid that big spike in freezing that was possible in pgbench_history-like tables [1]). This means that vacuum_freeze_strategy_threshold is now strictly concerned with freezing. A table that is always frozen eagerly will inevitably fall into a pattern of advancing relfrozenxid in every VACUUM operation, but that isn't something that needs to be documented or anything. We don't need to introduce a special case here. The other notable change for v10 is in the final patch, which removes aggressive mode altogether. v10 now makes lazy_scan_noprune less willing to give up on setting relfrozenxid to a relatively recent XID. Now lazy_scan_noprune is willing to wait a short while for a cleanup lock on a heap page (a few tens of milliseconds) when doing so might be all it takes to preserve VACUUM's ability to advance relfrozenxid all the way up to FreezeLimit, which is the traditional guarantee made by aggressive mode VACUUM. This makes lazy_scan_noprune "under promise and over deliver". It now only promises to advance relfrozenxid up to MinXid in the very worst case -- even if that means waiting indefinitely long for a cleanup lock. That's not a very strong promise, because advancing relfrozenxid up to MinXid is only barely adequate. At the same time, lazy_scan_noprune is willing to go to extra trouble to get a recent enough FreezeLimit -- it'll wait for a few 10s of milliseconds. It's just not willing to wait indefinitely. This seems likely to give us the best of both worlds. This was based in part on something that Andres said about cleanup locks a while back. He had a concern about cases where even MinXid was before OldestXmin. To some degree that's addressed here, because I've also changed the way that MinXid is determined, so that it'll be a much earlier value. That doesn't have much downside now, because of the way that lazy_scan_noprune is now "aggressive-ish" when that happens to make sense. Not being able to get a cleanup lock on our first attempt is relatively rare, and when it happens it's often something completely benign. For example, it might just be that the checkpointer was writing out the same page at the time, which signifies nothing about it really being hard to get a cleanup lock -- the checkpointer will have dropped its conflicting buffer pin almost immediately. It would be a shame to accept a significantly older final relfrozenxid during an infrequent, long running antiwraparound autovacuum of larger tables when that happens -- we should be willing to wait 30 milliseconds (just not 30 minutes, or 30 days). None of this even comes up for pages whose XIDs are >= FreezeLimit, which is actually most pages with the patch, even in larger tables. It's relatively rare for VACUUM to need to process any heap page in lazy_scan_noprune, but it'll be much rarer still for it to have to do a "short wait" like this. So "short waits" have a very small downside, and (at least occasionally) a huge upside. By inventing a third alternative behavior (to go along with processing pages via standard lazy_scan_noprune skipping and processing pages in lazy_scan_prune), VACUUM has the flexibility to respond in a way that's proportionate to the problem at hand, in one particular heap page. The new behavior has zero chance of mattering in most individual tables/workloads, but it's good to have every possible eventuality covered. I really hate the idea of getting a significantly worse outcome just because of something that happened in one single heap page, because the wind changed directions at the wrong time. [1] https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch -- Peter Geoghegan
Attachment
pgsql-hackers by date: