Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: should vacuum's first heap pass be read-only? |
Date | |
Msg-id | CAFiTN-tf=cg8Xxcz5vpDozTQO3Q2tXvqmt4uVPYn_WmZOBDcdA@mail.gmail.com Whole thread Raw |
In response to | Re: should vacuum's first heap pass be read-only? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: should vacuum's first heap pass be read-only?
|
List | pgsql-hackers |
On Mon, Feb 7, 2022 at 10:06 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 4, 2022 at 4:12 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I had imagined that we'd > > want to do heap vacuuming in the same way as today with the dead TID > > conveyor belt stuff -- it just might take several VACUUM operations > > until we are ready to do a round of heap vacuuming. > > I am trying to understand exactly what you are imagining here. Do you > mean we'd continue to lazy_scan_heap() at the start of every vacuum, > and lazy_vacuum_heap_rel() at the end? I had assumed that we didn't > want to do that, because we might already know from the conveyor belt > that there are some dead TIDs that could be marked unused, and it > seems strange to just ignore that knowledge at a time when we're > scanning the heap anyway. However, on reflection, that approach has > something to recommend it, because it would be somewhat simpler to > understand what's actually being changed. We could just: > > 1. Teach lazy_scan_heap() that it should add TIDs to the conveyor > belt, if we're using one, unless they're already there, but otherwise > work as today. > > 2. Teach lazy_vacuum_heap_rel() that it, if there is a conveyor belt, > it should try to clear from the indexes all of the dead TIDs that are > eligible. > > 3. If there is a conveyor belt, use some kind of magic to decide when > to skip vacuuming some or all indexes. When we skip one or more > indexes, the subsequent lazy_vacuum_heap_rel() can't possibly mark as > unused any of the dead TIDs we found this time, so we should just skip > it, unless somehow there are TIDs on the conveyor belt that were > already ready to be marked unused at the start of this VACUUM, in > which case we can still handle those. Based on this discussion, IIUC, we are saying that now we will do the lazy_scan_heap every time like we are doing now. And we will conditionally skip the index vacuum for all or some of the indexes and then based on how much index vacuum is done we will conditionally do the lazy_vacuum_heap_rel(). Is my understanding correct? IMHO, if we are doing the heap scan every time and then we are going to get the same dead items again which we had previously collected in the conveyor belt. I agree that we will not add them again into the conveyor belt but why do we want to store them in the conveyor belt when we want to redo the whole scanning again? I think (without global indexes) the main advantage of using the conveyor belt is that if we skip the index scan for some of the indexes then we can save the dead item somewhere so that without scanning the heap again we have those dead items to do the index vacuum sometime in future but if you are going to rescan the heap again next time before doing any index vacuuming then why we want to store them anyway. IMHO, what we should do is, if there are not many new dead tuples in the heap (total dead tuple based on the statistic - existing items in the conveyor belt) then we should conditionally skip the heap scanning (first pass) and directly jump to the index vacuuming for some or all the indexes based on the index size bloat. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: