Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Robert Haas
Subject Re: decoupling table and index vacuum
Date
Msg-id CA+TgmoYu+iWzpbB+OQDoM2VdyzwF9r3S_4Ndmd8vW_aHKiNzjg@mail.gmail.com
Whole thread Raw
In response to Re: decoupling table and index vacuum  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: decoupling table and index vacuum  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Apr 21, 2021 at 7:51 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I'm very happy to see that you've taken an interest in this work! I
> believe it's an important area. It's too important to be left to only
> two contributors. I welcome your participation as an equal partner in
> the broader project to fix problems with VACUUM.

Err, thanks. I agree this needs broad discussion and participation.

> My most ambitious goal is finding a way to remove the need to freeze
> or to set hint bits. I think that we can do this by inventing a new
> kind of VACUUM just for aborted transactions, which doesn't do index
> vacuuming. You'd need something like an ARIES-style dirty page table
> to make this cheap -- so it's a little like UNDO, but not very much.

I don't see how that works. An aborted transaction can have made index
entries, and those index entries can have already been moved by page
splits, and there can be arbitrarily many of them, so that you can't
keep track of them all in RAM. Also, you can crash after making the
index entries and writing them to the disk and before the abort
happens. Anyway, this is probably a topic for a separate thread.

> I know I say this all the time these days, but it seems worth
> repeating now: it is a qualitative difference, not a quantitative
> difference.

For the record, I find your quantitative vs. qualitative distinction
to be mostly unhelpful in understanding what's actually going on here.
I've backed into it by reading the explanatory statements you've made
at various times (including here, in the part I didn't quote). But
that phrase in and of itself means very little to me. Other people's
mileage may vary, of course; I'm just telling you how I feel about it.

> Right. And, the differences between index A and index B will tend to
> be pretty consistent and often much larger than this.
>
> Many indexes would never have to be vacuumed, even with non-HOT
> UPDATES due to bottom-up index deletion -- because they literally
> won't even have one single page split for hours, while maybe one index
> gets 3x larger in the same timeframe. Eventually you'll need to vacuum
> the indexes all the same (not just the bloated index), but that's only
> required to enable safely performing heap vacuuming. It's not so bad
> if the non-bloated indexes won't be dirtied and if it's not so
> frequent (dirtying pages is the real cost to keep under control here).

Interesting.

> The cost shouldn't be terribly noticeable because you have the
> flexibility to change your mind at the first sign of an issue. So you
> never pay an extreme cost (you pay a pretty low fixed cost
> incrementally, at worst), but you do sometimes (and maybe even often)
> get an extreme benefit -- the benefit of avoiding current pathological
> performance issues. We know that the cost of bloat is very non-linear
> in a bunch of ways that can be pretty well understood, so that seems
> like the right thing to focus on -- this is perhaps the only thing
> that we can expect to understand with a relatively high degree of
> confidence. We can live with a lot of uncertainty about what's going
> on with the workload by managing it continually, ramping up and down,
> etc.

I generally agree. You want to design a system in a way that's going
to do a good job avoiding pathological cases. The current system is
kind of naive about that. It does things that work well in
middle-of-the-road cases, but often does stupid things in extreme
cases. There are numerous examples of that; one is the "useless
vacuuming" problem about which I've blogged in
http://rhaas.blogspot.com/2020/02/useless-vacuuming.html where the
system keeps on vacuuming because relfrozenxid is old but doesn't
actually succeed in advancing it, so that it's just spinning to no
purpose. Another thing is when it picks the "wrong" thing to do first,
focusing on a less urgent problem rather than a more urgent one. This
can go either way: we might spend a lot of energy cleaning up bloat
when a wraparound shutdown is imminent, but we also might spend a lot
of energy dealing with a wraparound issue that's not yet urgent while
some table bloats out of control. I think it's important not to let
the present discussion get overbroad; we won't be able to solve
everything at once, and trying to do too many things at the same time
will likely result in instability.

> > Clearly,
> > we do not want to vacuum each partition by scanning the 1GB partition
> > + the 50MB local index + the 50GB global index. That's insane. With
> > the above system, since everything's decoupled, you can vacuum the
> > partition tables individually as often as required, and similarly for
> > their local indexes, but put off vacuuming the global index until
> > you've vacuumed a bunch, maybe all, of the partitions, so that the
> > work of cleaning up the global index cleans up dead TIDs from many or
> > all partitions instead of just one at a time.
>
> I can't think of any other way of sensibly implementing global indexes.

Awesome.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: multi-install PostgresNode fails with older postgres versions
Next
From: Peter Geoghegan
Date:
Subject: Re: decoupling table and index vacuum