Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: should vacuum's first heap pass be read-only? |
Date | |
Msg-id | CAH2-Wz=igvvGSPzXvhdgy3v69X02MK7Yg7To_YF4Oj2hrv3ZvA@mail.gmail.com Whole thread Raw |
In response to | Re: should vacuum's first heap pass be read-only? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: should vacuum's first heap pass be read-only?
(Robert Haas <robertmhaas@gmail.com>)
|
List | pgsql-hackers |
On Tue, Apr 5, 2022 at 1:10 PM Robert Haas <robertmhaas@gmail.com> wrote: > I had assumed that this would not be the case, because if the page is > being accessed by the workload, it can be pruned - and probably frozen > too, if we wanted to write code for that and spend the cycles on it - > and if it isn't, pruning and freezing probably aren't needed. VACUUM has a top-down structure, and so seems to me to be the natural place to think about the high level needs of the table as a whole, especially over time. I don't think we actually need to scan the pages that we left some LP_DEAD items in previous VACUUM operations. It seems possible to freeze newly appended pages quite often, without needlessly revisiting the pages from previous batches (even those with LP_DEAD items left behind). Maybe we need to rethink the definition of "VACUUM operation" a little to do that, but it seems relatively tractable. As I said upthread recently, I am excited about the potential of "locking in" a set of scanned_pages using a local/private version of the visibility map (a copy from just after OldestXmin is initially established), that VACUUM can completely work off of. Especially if combined with the conveyor belt, which could make VACUUM operations suspendable and resumable. I don't see any reason why it wouldn't be possible to "lock in" an initial scanned_pages, and then use that data structure (which could be persisted) to avoid revisiting the pages that we know we already visited (and left LP_DEAD items in). We could "resume the VACUUM operation that was suspended earlier" a bit later (not have several technically unrelated VACUUM operations together in close succession). The later rounds of processing could even use new cutoffs for both freezing and freezing, despite being from "the same VACUUM operation". They could have an "expanded rel_pages" that covers the newly appended pages that we want to quickly freeze tuples on. AFAICT the only thing that we need to do to make this safe is to carry forward our original vacrel->NewRelfrozenXid (which can never be later than our original vacrel->OldestXmin). Under this architecture, we don't really "skip index vacuuming" at all. Rather, we redefine VACUUM operations in a way that makes the final rel_pages provisional, at least when run in autovacuum. VACUUM itself can notice that it might be a good idea to "expand rel_pages" and expand the scope of the work it ultimately does, based on the observed characteristics of the table. No heap pages get repeat processing per "VACUUM operation" (relative to the current definition of the term). Some indexes will get "extra, earlier index vacuuming", which we've already said is the right way to think about all this (we should think of it as extra index vacuuming, not less index vacuuming). > > But, these same LP_DEAD-heavy tables *also* have a very decent > > chance of benefiting from a better index vacuuming strategy, something > > *also* enabled by the conveyor belt design. So overall, in either scenario, > > VACUUM concentrates on problems that are particular to a given table > > and workload, without being hindered by implementation-level > > restrictions. > > Well this is what I'm not sure about. We need to demonstrate that > there are at least some workloads where retiring the LP_DEAD line > pointers doesn't become the dominant concern. It will eventually become the dominant concern. But that could take a while, compared to the growth in indexes. An LP_DEAD line pointer stub in a heap page is 4 bytes. The smallest possible B-Tree index tuple is 20 bytes on mainstream platforms (16 bytes + 4 byte line pointer). Granted deduplication makes this less true, but that's far from guaranteed to help. Also, many tables have way more than one index. Of course it isn't nearly as simple as comparing the bytes of bloat in each case. More generally, I don't claim that it's easy to characterize which factor is more important, even in the abstract, even under ideal conditions -- it's very hard. But I'm sure that there are routinely very large differences among indexes and the heap structure. -- Peter Geoghegan
pgsql-hackers by date: