Re: Emit fewer vacuum records by reaping removable tuples during pruning - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date
Msg-id 8a33d7cc-25fa-4949-a08f-998946861274@gmail.com
Whole thread Raw
In response to Re: Emit fewer vacuum records by reaping removable tuples during pruning  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 1/8/24 2:10 PM, Robert Haas wrote:
> On Fri, Jan 5, 2024 at 3:57 PM Andres Freund <andres@anarazel.de> wrote:
>>> I will be astonished if you can make this work well enough to avoid
>>> huge regressions in plausible cases. There are plenty of cases where
>>> we do a very thorough job opportunistically removing index tuples.
>>
>> These days the AM is often involved with that, via
>> table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
>> happen before physically removing the already-marked-killed index entries. We
>> can't rely on being able to actually prune the heap page at that point, there
>> might be other backends pinning it, but often we will be able to. If we were
>> to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
>> index again during "individual tuple pruning", if the to-be-marked-unused heap
>> tuple is one of the tuples passed to heap_index_delete_tuples(). Which
>> presumably will be very commonly the case.
>>
>> At least for nbtree, we are much more aggressive about marking index entries
>> as killed, than about actually removing the index entries. "individual tuple
>> pruning" would have to look for killed-but-still-present index entries, not
>> just for "live" entries.
> 
> I don't want to derail this thread, but I don't really see what you
> have in mind here. The first paragraph sounds like you're imagining
> that while pruning the index entries we might jump over to the heap
> and clean things up there, too, but that seems like it wouldn't work
> if the table has more than one index. I thought you were talking about
> starting with a heap tuple and bouncing around to every index to see
> if we can find index pointers to kill in every one of them. That
> *could* work out, but you only need one index to have been
> opportunistically cleaned up in order for it to fail to work out.
> There might well be some workloads where that's often the case, but
> the regressions in the workloads where it isn't the case seem like
> they would be rather substantial, because doing an extra lookup in
> every index for each heap tuple visited sounds pricey.

The idea of probing indexes for tuples that are now dead has come up in 
the past, and the concern has always been whether it's actually safe to 
do so. An obvious example is an index on a function and now the function 
has changed so you can't reliably determine if a particular tuple is 
present in the index. That's bad enough during an index scan, but 
potentially worse while doing heeap cleanup. Given that operators are 
functions, this risk exists to some degree in even simple indexes.

Depending on the gains this might still be worth doing, at least for 
some cases. It's hard to conceive of this breaking for indexes on 
integers, for example. But we'd still need to be cautious.
-- 
Jim Nasby, Data Architect, Austin TX




pgsql-hackers by date:

Previous
From: "Tristan Partin"
Date:
Subject: Re: Add BF member koel-like indentation checks to SanityCheck CI
Next
From: Tom Lane
Date:
Subject: Re: Add BF member koel-like indentation checks to SanityCheck CI