Re: decoupling table and index vacuum - Mailing list pgsql-hackers

From Andres Freund
Subject Re: decoupling table and index vacuum
Date
Msg-id 20210421213825.op5pfkjzr5lc3sqi@alap3.anarazel.de
Whole thread Raw
In response to decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: decoupling table and index vacuum  (Robert Haas <robertmhaas@gmail.com>)
Re: decoupling table and index vacuum  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
Hi,

On 2021-04-21 11:21:31 -0400, Robert Haas wrote:
> Opportunistic index cleanup strategies like
> kill_prior_tuple and bottom-up deletion may work much better for some
> indexes than others, meaning that you could have some indexes that
> badly need to be vacuumed because they are full of garbage, and other
> indexes on the same table where the opportunistic cleanup has worked
> perfectly and there is no need for vacuuming at all.

Partial indexes are another case that can lead to individual indexes
being without bloat, with others severely bloated.


> This requires a scheme where relations can be efficiently truncated
> from the beginning rather than only at the end, which is why I said "a
> conveyor belt" and "similar to WAL". Details deliberately vague since
> I am just brainstorming here.

I'm not sure that's the only way to deal with this. While some form of
generic "conveyor belt" infrastructure would be a useful building block,
and it'd be sensible to use it here if it existed, it seems feasible to
dead tids in a different way here. You could e.g. have per-heap-vacuum
files with a header containing LSNs that indicate the age of the
contents.


> This scheme adds a lot of complexity, which is a concern, but it seems
> to me that it might have several benefits. One is concurrency. You
> could have one process gathering dead TIDs and adding them to the
> dead-TID fork while another process is vacuuming previously-gathered
> TIDs from some index.

I think it might even open the door to using multiple processes
gathering dead TIDs for the same relation.


> In fact, every index could be getting vacuumed at the same time, and
> different indexes could be removing different TID ranges.

We kind of have this feature right now, due to parallel vacuum...


> It's not completely independent: if you need to set some dead TIDs in
> the table to unused, you may have to force index vacuuming that isn't
> needed for bloat control. However, you only need to force it for
> indexes that haven't been vacuumed recently enough for some other
> reason, rather than every index.

Hm - how would we know how recently that TID has been marked dead? We
don't even have xids for dead ItemIds... Maybe you're intending to
answer that in your next paragraph, but it's not obvious to me that'd be
sufficient...

> If you have a target of reclaiming 30,000 TIDs, you can just pick the
> indexes where there are fewer than 30,000 dead TIDs behind their
> oldest-entry pointers and force vacuuming only of those. By the time
> that's done, there will be at least 30,000 dead line pointers you can
> mark unused, and maybe more, minus whatever reclamation someone else
> did concurrently.



One thing that you didn't mention so far is that this'd allow us to add
dead TIDs to the "dead tid" file outside of vacuum too. In some
workloads most of the dead tuple removal happens as part of on-access
HOT pruning. While some indexes are likely to see that via the
killtuples logic, others may not. Being able to have more aggressive
index vacuum for the one or two bloated index, without needing to rescan
the heap, seems like it'd be a significant improvement.


> Suppose index A needs to be vacuumed every hour to avoid bloat, index
> B needs to be vacuumed every 4 hours to avoid bloat, and the table
> needs dead line pointers reclaimed every 5.5 hours. Well, now you can
> gain a lot.  You can vacuum index A frequently while vacuuming index B
> only as often as it needs, and you can reclaim dead line pointers on
> their own schedule based on whatever index vacuuming was already done
> for bloat avoidance. Without this scheme, there's just no way to give
> everybody what they need without some of the participants being
> "dragged along for the ride" and forced into work that they don't
> actually need done simply because "that's how it works."

Have you thought about how we would do the scheduling of vacuums for the
different indexes? We don't really have useful stats for the number of
dead index entries to be expected in an index. It'd not be hard to track
how many entries are removed in an index via killtuples, but
e.g. estimating how many dead entries there are in a partial index seems
quite hard (at least without introducing significant overhead).


> One thing I don't know is whether the kind of scenario that I describe
> above is common, i.e. is the main reason we need to vacuum to control
> index bloat, where this kind of approach seems likely to help, or is
> it to reclaim dead line pointers in the heap, where it's not? I'd be
> interested in hearing from people who have some experience in this
> area, or at least better intuition than I do.

I think doing something like this has a fair bit of potential. Being
able to perform freezing independently of index scans, without needing
to scan the table again to re-discover dead line item pointers seems
like it'd be a win. More aggressive/targeted index vacuum in cases where
most tuples are removed via HOT pruning seems like a win. Not having to
restart from scratch after a cancelled autvacuum would be a
win. Additional parallelization seems like a win...


> One rather serious objection to this whole line of attack is that we'd
> ideally like VACUUM to reclaim disk space without using any more, in
> case the motivation for running VACUUM in the first place.

I suspect we'd need a global limit of space used for this data. If above
that limit we'd switch to immediately performing the work required to
remove some of that space.


> A related objection is that if it's sometimes agreable to do
> everything all at once as we currently do, the I/O overhead could be
> avoided. I think we'd probably have to retain a code path that buffers
> the dead TIDs in memory to account, at least, for the
> low-on-disk-space case, and maybe that can also be used to avoid I/O
> in some other cases, too.

We'd likely want to do some batching of insertions into the "dead tid"
map - which'd probably end up looking similar to a purely in-memory path
anyway.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: when the startup process doesn't
Next
From: Bruce Momjian
Date:
Subject: Re: proposal for PostgreSQL program