Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: should vacuum's first heap pass be read-only? |
Date | |
Msg-id | CAH2-WzkKM2sj8TajHyTb_QEikYVzGGatZgXH913SrnZk=xzCMw@mail.gmail.com Whole thread Raw |
In response to | Re: should vacuum's first heap pass be read-only? (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: should vacuum's first heap pass be read-only?
Re: should vacuum's first heap pass be read-only? |
List | pgsql-hackers |
On Fri, Feb 25, 2022 at 5:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > Based on this discussion, IIUC, we are saying that now we will do the > lazy_scan_heap every time like we are doing now. And we will > conditionally skip the index vacuum for all or some of the indexes and > then based on how much index vacuum is done we will conditionally do > the lazy_vacuum_heap_rel(). Is my understanding correct? I can only speak for myself, but that sounds correct to me. IMO what we really want here is to create lots of options for VACUUM. To do the work of index vacuuming when it is most convenient, based on very recent information about what's going on in each index. There at some specific obvious ways that it might help. For example, it would be nice if the failsafe could not really skip index vacuuming -- it could just put it off until later, after relfrozenxid has been advanced to a safe value. Bear in mind that the cost of lazy_scan_heap is often vastly less than the cost of vacuuming all indexes -- and so doing a bit more work there than theoretically necessary is not necessarily a problem. Especially if you have something like UUID indexes, where there is no natural locality. Many tables have 10+ indexes. Even large tables. > IMHO, if we are doing the heap scan every time and then we are going > to get the same dead items again which we had previously collected in > the conveyor belt. I agree that we will not add them again into the > conveyor belt but why do we want to store them in the conveyor belt > when we want to redo the whole scanning again? I don't think we want to, exactly. Maybe it's easier to store redundant TIDs than to avoid storing them in the first place (we can probably just accept some redundancy). There is a trade-off to be made there. I'm not at all sure of what the best trade-off is, though. > I think (without global indexes) the main advantage of using the > conveyor belt is that if we skip the index scan for some of the > indexes then we can save the dead item somewhere so that without > scanning the heap again we have those dead items to do the index > vacuum sometime in future Global indexes are important in their own right, but ISTM that they have similar needs to other things anyway. Having this flexibility is even more important with global indexes, but the concepts themselves are similar. We want options and maximum flexibility, everywhere. > but if you are going to rescan the heap > again next time before doing any index vacuuming then why we want to > store them anyway. It all depends, of course. The decision needs to be made using a cost model. I suspect it will be necessary to try it out, and see. -- Peter Geoghegan
pgsql-hackers by date: