Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: should vacuum's first heap pass be read-only?
Date
Msg-id CAFiTN-voaUVoKhEiJ_yjxvdgQndgnZeDN=KiMCaaU4Pv-p7vNQ@mail.gmail.com
Whole thread Raw
In response to Re: should vacuum's first heap pass be read-only?  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, Feb 25, 2022 at 10:45 PM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Feb 25, 2022 at 5:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > Based on this discussion, IIUC, we are saying that now we will do the
> > lazy_scan_heap every time like we are doing now.  And we will
> > conditionally skip the index vacuum for all or some of the indexes and
> > then based on how much index vacuum is done we will conditionally do
> > the lazy_vacuum_heap_rel().  Is my understanding correct?
>
> Bear in mind that the cost of lazy_scan_heap is often vastly less than
> the cost of vacuuming all indexes -- and so doing a bit more work
> there than theoretically necessary is not necessarily a problem.
> Especially if you have something like UUID indexes, where there is no
> natural locality. Many tables have 10+ indexes. Even large tables.

Completely agree with that.

> > IMHO, if we are doing the heap scan every time and then we are going
> > to get the same dead items again which we had previously collected in
> > the conveyor belt.  I agree that we will not add them again into the
> > conveyor belt but why do we want to store them in the conveyor belt
> > when we want to redo the whole scanning again?
>
> I don't think we want to, exactly. Maybe it's easier to store
> redundant TIDs than to avoid storing them in the first place (we can
> probably just accept some redundancy). There is a trade-off to be made
> there. I'm not at all sure of what the best trade-off is, though.

Yeah we can think of that.

> > I think (without global indexes) the main advantage of using the
> > conveyor belt is that if we skip the index scan for some of the
> > indexes then we can save the dead item somewhere so that without
> > scanning the heap again we have those dead items to do the index
> > vacuum sometime in future
>
> Global indexes are important in their own right, but ISTM that they
> have similar needs to other things anyway. Having this flexibility is
> even more important with global indexes, but the concepts themselves
> are similar. We want options and maximum flexibility, everywhere.

+1

> > but if you are going to rescan the heap
> > again next time before doing any index vacuuming then why we want to
> > store them anyway.
>
> It all depends, of course. The decision needs to be made using a cost
> model. I suspect it will be necessary to try it out, and see.

Yeah right.  But I still think that we should be thinking toward
skipping the first vacuum pass also conditionally.  I mean if there
are not many new dead tuples which we realize before evejn starting
the heap scan then why not directly jump to the index vacuuming if
some of the index needs vacuum.  But I agree based on some testing and
cost model we can decide what is the best way forward.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix typo in logicalfuncs.c - :%s/private date/Private data
Next
From: Michael Paquier
Date:
Subject: Re: Frontend error logging style