Home > mailing lists

Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: should vacuum's first heap pass be read-only?
Date	February 25, 2022 20:15:25
Msg-id	CAH2-WzkKM2sj8TajHyTb_QEikYVzGGatZgXH913SrnZk=xzCMw@mail.gmail.com Whole thread Raw
In response to	Re: should vacuum's first heap pass be read-only? (Dilip Kumar <dilipbalaut@gmail.com>)
Responses	Re: should vacuum's first heap pass be read-only? Re: should vacuum's first heap pass be read-only?
List	pgsql-hackers

Tree view

On Fri, Feb 25, 2022 at 5:06 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Based on this discussion, IIUC, we are saying that now we will do the
> lazy_scan_heap every time like we are doing now.  And we will
> conditionally skip the index vacuum for all or some of the indexes and
> then based on how much index vacuum is done we will conditionally do
> the lazy_vacuum_heap_rel().  Is my understanding correct?

I can only speak for myself, but that sounds correct to me. IMO what
we really want here is to create lots of options for VACUUM. To do the
work of index vacuuming when it is most convenient, based on very
recent information about what's going on in each index. There at some
specific obvious ways that it might help. For example, it would be
nice if the failsafe could not really skip index vacuuming -- it could
just put it off until later, after relfrozenxid has been advanced to a
safe value.

Bear in mind that the cost of lazy_scan_heap is often vastly less than
the cost of vacuuming all indexes -- and so doing a bit more work
there than theoretically necessary is not necessarily a problem.
Especially if you have something like UUID indexes, where there is no
natural locality. Many tables have 10+ indexes. Even large tables.

> IMHO, if we are doing the heap scan every time and then we are going
> to get the same dead items again which we had previously collected in
> the conveyor belt.  I agree that we will not add them again into the
> conveyor belt but why do we want to store them in the conveyor belt
> when we want to redo the whole scanning again?

I don't think we want to, exactly. Maybe it's easier to store
redundant TIDs than to avoid storing them in the first place (we can
probably just accept some redundancy). There is a trade-off to be made
there. I'm not at all sure of what the best trade-off is, though.

> I think (without global indexes) the main advantage of using the
> conveyor belt is that if we skip the index scan for some of the
> indexes then we can save the dead item somewhere so that without
> scanning the heap again we have those dead items to do the index
> vacuum sometime in future

Global indexes are important in their own right, but ISTM that they
have similar needs to other things anyway. Having this flexibility is
even more important with global indexes, but the concepts themselves
are similar. We want options and maximum flexibility, everywhere.

> but if you are going to rescan the heap
> again next time before doing any index vacuuming then why we want to
> store them anyway.

It all depends, of course. The decision needs to be made using a cost
model. I suspect it will be necessary to try it out, and see.

--
Peter Geoghegan

pgsql-hackers by date:

From: Andres Freund
Date: 25 February 2022, 20:01:27
Subject: Re: BufferAlloc: don't take two simultaneous locks

From: Andrew Dunstan
Date: 25 February 2022, 20:28:36
Subject: Re: Readd use of TAP subtests

Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers

Previous

Next