Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Removing more vacuumlazy.c special cases, relfrozenxid optimizations |
Date | |
Msg-id | CAH2-Wznp=c=Opj8Z7RMR3G=ec3_JfGYMN_YvmCEjoPCHzWbx0g@mail.gmail.com Whole thread Raw |
Responses |
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
|
List | pgsql-hackers |
Attached WIP patch series significantly simplifies the definition of scanned_pages inside vacuumlazy.c. Apart from making several very tricky things a lot simpler, and moving more complex code outside of the big "blkno" loop inside lazy_scan_heap (building on the Postgres 14 work), this refactoring directly facilitates 2 new optimizations (also in the patch): 1. We now collect LP_DEAD items into the dead_tuples array for all scanned pages -- even when we cannot get a cleanup lock. 2. We now don't give up on advancing relfrozenxid during a non-aggressive VACUUM when we happen to be unable to get a cleanup lock on a heap page. Both optimizations are much more natural with the refactoring in place. Especially #2, which can be thought of as making aggressive and non-aggressive VACUUM behave similarly. Sure, we shouldn't wait for a cleanup lock in a non-aggressive VACUUM (by definition) -- and we still don't in the patch (obviously). But why wouldn't we at least *check* if the page has tuples that need to be frozen in order for us to advance relfrozenxid? Why give up on advancing relfrozenxid in a non-aggressive VACUUM when there's no good reason to? See the draft commit messages from the patch series for many more details on the simplifications I am proposing. I'm not sure how much value the second optimization has on its own. But I am sure that the general idea of teaching non-aggressive VACUUM to be conscious of the value of advancing relfrozenxid is a good one -- and so #2 is a good start on that work, at least. I've discussed this idea with Andres (CC'd) a few times before now. Maybe we'll need another patch that makes VACUUM avoid setting heap pages to all-visible without also setting them to all-frozen (and freezing as necessary) in order to really get a benefit. Since, of course, a non-aggressive VACUUM still won't be able to advance relfrozenxid when it skipped over all-visible pages that are not also known to be all-frozen. Masahiko (CC'd) has expressed interest in working on opportunistic freezing. This refactoring patch seems related to that general area, too. At a high level, to me, this seems like the tuple freezing equivalent of the Postgres 14 work on bypassing index vacuuming when there are very few LP_DEAD items (interpret that as 0 LP_DEAD items, which is close to the truth anyway). There are probably quite a few interesting opportunities to make VACUUM better by not having such a sharp distinction between aggressive and non-aggressive VACUUM. Why should they be so different? A good medium term goal might be to completely eliminate aggressive VACUUMs. I have heard many stories about anti-wraparound/aggressive VACUUMs where the cure (which suddenly made autovacuum workers non-cancellable) was worse than the disease (not actually much danger of wraparound failure). For example: https://www.joyent.com/blog/manta-postmortem-7-27-2015 Yes, this problem report is from 2015, which is before we even had the freeze map stuff. I still think that the point about aggressive VACUUMs blocking DDL (leading to chaos) remains valid. There is another interesting area of future optimization within VACUUM, that also seems relevant to this patch: the general idea of *avoiding* pruning during VACUUM, when it just doesn't make sense to do so -- better to avoid dirtying the page for now. Needlessly pruning inside lazy_scan_prune is hardly rare -- standard pgbench (maybe only with heap fill factor reduced to 95) will have autovacuums that *constantly* do it (granted, it may not matter so much there because VACUUM is unlikely to re-dirty the page anyway). This patch seems relevant to that area because it recognizes that pruning during VACUUM is not necessarily special -- a new function called lazy_scan_noprune may be used instead of lazy_scan_prune (though only when a cleanup lock cannot be acquired). These pages are nevertheless considered fully processed by VACUUM (this is perhaps 99% true, so it seems reasonable to round up to 100% true). I find it easy to imagine generalizing the same basic idea -- recognizing more ways in which pruning by VACUUM isn't necessarily better than opportunistic pruning, at the level of each heap page. Of course we *need* to prune sometimes (e.g., might be necessary to do so to set the page all-visible in the visibility map), but why bother when we don't, and when there is no reason to think that it'll help anyway? Something to think about, at least. -- Peter Geoghegan
Attachment
pgsql-hackers by date: