Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: decoupling table and index vacuum
Date
Msg-id CAH2-Wz=8E5QecDmzVcEWhwCyVhc2wsGRzviDZq0CyCwiv=zgLw@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Andres Freund <andres@anarazel.de>)
Responses Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Apr 22, 2021 at 11:44 AM Andres Freund <andres@anarazel.de> wrote:
> I'm honestly getting a bit annoyed about this stuff.

You're easily annoyed.

> Yes it's a cool
> improvement, but no, it doesn't mean that there aren't still relevant
> issues in important cases. It doesn't help that you repeatedly imply
> that people that don't see it your way need to have their view "cleared
> up".

I don't think that anything that I've said about it contradicts
anything that you or Robert said. What I said that you're missing a
couple of important subtleties (or that you seem to be). It's not
really about the optimization in particular -- it's about the
subtleties that it exploits. I think that they're generalizable. Even
if there was only a 1% chance of that being true, it would still be
worth exploring in depth.

I think that everybody's beliefs about VACUUM tend to be correct. It
almost doesn't matter if scenario A is the problem in 90% or cases
versus 10% of cases for scenario B (or vice-versa). What actually
matters is that we have good handling for both. (It's probably some
weird combination of scenario A and scenario B in any case.)

> "Bottom up index deletion" is practically *irrelevant* for a significant
> set of workloads.

You're missing the broader point. Which is that we don't know how much
it helps in each case, just as we don't know how much some other
complementary optimization helps. It's important to develop
complementary techniques precisely because (say) bottom-up index
deletion only solves one class of problem. And because it's so hard to
predict.

I actually went on at length about the cases that the optimization
*doesn't* help. Because that'll be a disproportionate source of
problems now. And you really need to avoid all of the big sources of
trouble to get a really good outcome. Avoiding each and every source
of trouble might be much much more useful than avoiding all but one.

> > You both seem to be assuming that everything would be fine if you
> > could somehow inexpensively know the total number of undeleted dead
> > tuples in each index at all times.
>
> I don't think we'd need an exact number. Just a reasonable approximation
> so we know whether it's worth spending time vacuuming some index.

I agree.

> You also have to assume that you have roughly evenly distributed index
> insertions and deletions. But workloads that insert into some parts of a
> value range and delete from another range are common.
>
> I even would say that *precisely* because "Bottom up index deletion" can
> be very efficient in some workloads it is useful to have per-index stats
> determining whether an index should be vacuumed or not.

Exactly!

> Except that heap bloat not index bloat might be the more pressing
> concern. Or that there will be no meaningful amount of bottom-up
> deletions. Or ...

Exactly!

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: decoupling table and index vacuum
Next
From: Alvaro Herrera
Date:
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY