On 1/8/24 2:10 PM, Robert Haas wrote:
> On Fri, Jan 5, 2024 at 3:57 PM Andres Freund <andres@anarazel.de> wrote:
>>> I will be astonished if you can make this work well enough to avoid
>>> huge regressions in plausible cases. There are plenty of cases where
>>> we do a very thorough job opportunistically removing index tuples.
>>
>> These days the AM is often involved with that, via
>> table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
>> happen before physically removing the already-marked-killed index entries. We
>> can't rely on being able to actually prune the heap page at that point, there
>> might be other backends pinning it, but often we will be able to. If we were
>> to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
>> index again during "individual tuple pruning", if the to-be-marked-unused heap
>> tuple is one of the tuples passed to heap_index_delete_tuples(). Which
>> presumably will be very commonly the case.
>>
>> At least for nbtree, we are much more aggressive about marking index entries
>> as killed, than about actually removing the index entries. "individual tuple
>> pruning" would have to look for killed-but-still-present index entries, not
>> just for "live" entries.
>
> I don't want to derail this thread, but I don't really see what you
> have in mind here. The first paragraph sounds like you're imagining
> that while pruning the index entries we might jump over to the heap
> and clean things up there, too, but that seems like it wouldn't work
> if the table has more than one index. I thought you were talking about
> starting with a heap tuple and bouncing around to every index to see
> if we can find index pointers to kill in every one of them. That
> *could* work out, but you only need one index to have been
> opportunistically cleaned up in order for it to fail to work out.
> There might well be some workloads where that's often the case, but
> the regressions in the workloads where it isn't the case seem like
> they would be rather substantial, because doing an extra lookup in
> every index for each heap tuple visited sounds pricey.
The idea of probing indexes for tuples that are now dead has come up in
the past, and the concern has always been whether it's actually safe to
do so. An obvious example is an index on a function and now the function
has changed so you can't reliably determine if a particular tuple is
present in the index. That's bad enough during an index scan, but
potentially worse while doing heeap cleanup. Given that operators are
functions, this risk exists to some degree in even simple indexes.
Depending on the gains this might still be worth doing, at least for
some cases. It's hard to conceive of this breaking for indexes on
integers, for example. But we'd still need to be cautious.
--
Jim Nasby, Data Architect, Austin TX