Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: should vacuum's first heap pass be read-only? |
Date | |
Msg-id | CAH2-Wzkt9Ey9NNm7q9nSaw5jdBjVsAq3yvb4UT4M93UaJVd_xg@mail.gmail.com Whole thread Raw |
In response to | Re: should vacuum's first heap pass be read-only? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: should vacuum's first heap pass be read-only?
|
List | pgsql-hackers |
On Thu, Mar 31, 2022 at 1:25 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I don't think we want to, exactly. Maybe it's easier to store > > redundant TIDs than to avoid storing them in the first place (we can > > probably just accept some redundancy). There is a trade-off to be made > > there. I'm not at all sure of what the best trade-off is, though. > > My intuition is that storing redundant TIDs will turn out to be a very > bad idea. I think that if we do that, the system will become prone to > a new kind of vicious cycle (to try to put this in words similar to > the ones you've been using to write about similar effects). I don't feel very strongly about it either way. I definitely think that there are workloads for which that will be true, and I in general I am no fan of putting off work that cannot possibly turn out to be unnecessary in the end. I am in favor of "good laziness", not "bad laziness". > The more data we collect on the conveyor belt, the > harder it's going to be when we eventually need to deduplicate. > > Also, I think it's possible to deduplicate at a very reasonable cost > so long as (1) we enter each batch of TIDs into the conveyor belt as a > distinguishable "run" I definitely think that's the way to go, in general (regardless of what we do about deduplicating TIDs). > and (2) we never accumulate so many of these > runs at the same time that we can't do a single merge pass to turn > them into a sorted output stream. We're always going to discover dead > TIDs in sorted order, so as we make our pass over the heap, we can > simultaneously be doing an on-the-fly merge pass over the existing > runs that are on the conveyor belt. That means we never need to store > all the dead TIDs in memory at once. That's a good idea IMV. I am vaguely reminded of an LSM tree. > We just fetch enough data from > each "run" to cover the block numbers for which we're performing the > heap scan, and when the heap scan advances we throw away data for > blocks that we've passed and fetch data for the blocks we're now > reaching. I wonder if you've thought about exploiting the new approach to skipping pages using the visibility map from my patch series (which you reviewed recently). I think that putting that in scope here could be very helpful. As a way of making the stuff we already want to do with [global] indexes easier, but also as independently useful work based on the conveyor belt. The visibility map is very underexploited as a source of accurate information about what VACUUM should be doing IMV (in autovacuum.c's scheduling logic, but also in vacuumlazy.c itself). Imagine a world in which we decide up-front what pages we're going to scan (i.e. our final vacrel->scanned_pages), by first scanning the visibility map, and serializing it in local memory, or sometimes in disk using the conveyor belt. Now you'll have a pretty good idea how much TID deduplication will be necessary when you're done (to give one example). In general "locking in" a plan of action for VACUUM seems like it would be very useful. It will tend to cut down on the number of "dead but not yet removable" tuples that VACUUM encounters right now -- you at least avoid visiting concurrently modified heap pages that were all-visible back when OldestXmin was first established. When all skippable ranges are known in advance, we can reorder things in many different ways -- since the work of vacuuming can be decomposed on the lazy_scan_heap side, too. The only ordering dependency among heap pages (that I can think of offhand) is FSM vacuuming, which seems like it could be addressed without great difficulty. Ideally everything can be processed in whatever order is convenient as of right now, based on costs and benefits. We could totally decompose the largest VACUUM operations into individually processable unit of work (heap and index units), so that individual autovacuum workers no longer own particular VACUUM operations at all. Autovacuum workers would instead multiplex all of the units of work from all pending VACUUMs, that are scheduled centrally, based on the current load of the system. We can probably afford to be much more sensitive to the number of pages we'll dirty relative to what the system can take right now, and so on. Cancelling an autovacuum worker may just make the system temporarily suspend the VACUUM operation for later with this design (in fact cancelling an autovacuum may not really be meaningful at all). A design like this could also enable things like changing our mind about advancing relfrozenxid: if we decided to skip all-visible pages in this VACUUM operation, and come to regret that decision by the end (because lots of XIDs are consumed in the interim), maybe we can go back and get the all-visible pages we missed first time around. I don't think that all of these ideas will turn out to be winners, but I'm just trying to stimulate discussion. Breaking almost all dependencies on the order that we process things in seems like it has real promise. > > It all depends, of course. The decision needs to be made using a cost > > model. I suspect it will be necessary to try it out, and see. > > But having said that, coming back to this with fresh eyes, I think > Dilip has a really good point here. If the first thing we do at the > start of every VACUUM is scan the heap in a way that is guaranteed to > rediscover all of the dead TIDs that we've previously added to the > conveyor belt plus maybe also new ones, we may as well just forget the > whole idea of having a conveyor belt at all. I definitely agree that that's bad, and would be the inevitable result of being lazy about deduplicating consistently. > The only way the conveyor belt system has any > value is if we think that there is some set of circumstances where the > heap scan is separated in time from the index vacuum, such that we > might sometimes do an index vacuum without having done a heap scan > just before. I agree. -- Peter Geoghegan
pgsql-hackers by date: