What I'm thinking of is the regular indexscan that's done internally
by get_actual_variable_range, not whatever ends up getting chosen as
the plan for the user query. I had supposed that that would kill
dead index entries as it went, but maybe that's not happening for
some reason.
Really, this happens as you said. Index entries are marked as dead.
But after this, backends spends cpu time on skip this killed entries
in _bt_checkkeys :
if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
{
/* return immediately if there are more tuples on the page */
if (ScanDirectionIsForward(dir))
{
if (offnum < PageGetMaxOffsetNumber(page))
return NULL;
}
else
{
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (offnum > P_FIRSTDATAKEY(opaque))
return NULL;
}
This confirmed by perf records and backtrace reported by Vladimir earlier.
root@pgload01e ~ # perf report | grep -v '^#' | head 56.67% postgres postgres [.] _bt_checkkeys 19.27% postgres postgres [.] _bt_readpage 2.09% postgres postgres [.] pglz_decompress 2.03% postgres postgres [.] LWLockAttemptLock 1.61% postgres postgres [.] PinBuffer.isra.3 1.14% postgres postgres [.] hash_search_with_hash_value 0.68% postgres postgres [.] LWLockRelease 0.42% postgres postgres [.] AllocSetAlloc 0.40% postgres postgres [.] SearchCatCache 0.40% postgres postgres [.] ReadBuffer_common
root@pgload01e ~ #
It seems like killing dead tuples does not solve this problem.