Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-Wzk3Z3KhyMtrw6RHB1f+gVVKcaVWgPE0sBFEmjnL0ur5Fg@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Re: decoupling table and index vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Sun, Feb 6, 2022 at 11:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > One thing we could try doing in order to make that easier would be:
> > tweak things so that when autovacuum vacuums the table, it only
> > vacuums the indexes if they meet some threshold for bloat. I'm not
> > sure exactly what happens with the heap vacuuming then - do we do
> > phases 1 and 2 always, or a combined heap pass, or what? But if we
> > pick some criteria that vacuums indexes sometimes and not other times,
> > we can probably start doing some meaningful measurement of whether
> > this patch is making bloat better or worse, and whether it's using
> > fewer or more resources to do it.
>
> I think we can always trigger phase 1 and 2 and phase 2 will only
> vacuum conditionally based on if all the indexes are vacuumed for some
> conveyor belt pages so we don't have risk of scanning without marking
> anything unused.

Not sure what you mean about a risk of scanning without marking any
LP_DEAD items as LP_UNUSED. If VACUUM always does some amount of this,
then it follows that the new mechanism added by the patch just can't
safely avoid any work at all, making it all pointless. We have to
expect heap vacuuming to take place much less often with the patch.
Simply because that's what the invariant described in comments above
lazy_scan_heap() requires.

Note that this is not the same thing as saying that we do less
*absolute* heap vacuuming with the conveyor belt -- my statement about
less heap vacuuming taking place is *only* true relative to the amount
of other work that happens in any individual "shortened" VACUUM
operation. We could do exactly the same total amount of heap vacuuming
as before (in a version of Postgres without the conveyor belt but with
the same settings), but much *more* index vacuuming (at least for one
or two problematic indexes).

> And we can try to measure with other approaches as
> well where we completely avoid phase 2 and it will be done only along
> with phase 1 whenever applicable.

I believe that the main benefit of the dead TID conveyor belt (outside
of global index use cases) will be to enable us to do more (much more)
index vacuuming for one index in particular. So it's not really about
doing less index vacuuming or less heap vacuuming -- it's about doing
a *greater* amount of *useful* index vacuuming, in less time. There is
often some way in which failing to vacuum one index for a long time
does lasting damage to the index structure.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Next
From: Robert Haas
Date:
Subject: Re: decoupling table and index vacuum