Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: should vacuum's first heap pass be read-only? |
Date | |
Msg-id | CAH2-WzmV_eZ+=g8mBavygHwRVr1yxsx+WM+AspBZUiP_BEyr1A@mail.gmail.com Whole thread Raw |
In response to | Re: should vacuum's first heap pass be read-only? (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Thu, Mar 31, 2022 at 5:31 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I agree. > > But in http://postgr.es/m/CA+Tgmoa6kVEeurtyeOi3a+rA2XuynwQmJ_s-h4kUn6-bKMMDRw@mail.gmail.com > (and the messages just before and just after it) we seemed to be > agreeing on a design where that's exactly what happens. It seemed like > a good idea to me at the time, but now it seems like it's a bad idea, > because it involves using the conveyor belt in a way that adds no > value. There are two types of heap scans (in my mind, at least): those that prune, and those that VACUUM. While there has traditionally been a 1:1 correspondence between these two scans (barring cases with no LP_DEAD items whatsoever), that's no longer in Postgres 14, which added the "bypass index scan in the event of few LP_DEAD items left by pruning" optimization (or Postgres 12 if you count INDEX_CLEANUP=off). When I said "I agree" earlier today, I imagined that I was pretty much affirming everything else that I'd said up until that point of the email. Which is that the conveyor belt is interesting as a way of breaking (or even just loosening) dependencies on the *order* in which we perform work within a given "VACUUM cycle". Things can be much looser than they are today, with indexes (which we've discussed a lot already), and even with heap pruning (which I brought up for the first time just today). However, I don't see any way that it will be possible to break one particular ordering dependency, even with the conveyor belt stuff: The "basic invariant" described in comments above lazy_scan_heap(), which describes rules about TID recycling -- we can only recycle TIDs when a full "VACUUM cycle" completes, just like today. That was a point I was making in the email from back in February: obviously it's unsafe to do lazy_vacuum_heap_page() processing of a page until we're already 100% sure that the LP_DEAD items are not referenced by any indexes, even indexes that have very little bloat (that don't really need to be vacuumed for their own sake). However, the conveyor belt can add value by doing much more frequent processing in lazy_scan_prune() (of different pages each time, or perhaps even repeat processing of the same heap pages), and much more frequent index vacuuming for those indexes that seem to need it. So the lazy_scan_prune() work (pruning and freezing) can and probably should be separated in time from the index vacuuming (compared to the current design). Maybe not for all of the indexes -- typically for the majority, maybe 8 out of 10. We can do much less index vacuuming in those indexes that don't really need it, in order to be able to do much more in those that do. At some point we must "complete a whole cycle of heap vacuuming" by processing all the heap pages using lazy_vacuum_heap_page() that need it. Separately, the conveyor belt seems to have promise as a way of breaking up work for multiplexing, or parallel processing. -- Peter Geoghegan
pgsql-hackers by date: