Multiple FPI_FOR_HINT for the same block during killing btree index items - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Multiple FPI_FOR_HINT for the same block during killing btree index items
Date
Msg-id CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com
Whole thread Raw
Responses Re: Multiple FPI_FOR_HINT for the same block during killing btreeindex items  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Multiple FPI_FOR_HINT for the same block during killing btreeindex items  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi all,

During investigating the issue our customer had, I realized that
_bt_killitems() can record the same FPI pages multiple times
simultaneously. This can happen when several concurrent index scans
are processing pages that contain killable tuples. Because killing
index items could be performed while holding a buffer lock in shared
mode concurrent processes record multiple FPI_FOR_HINT for the same
block.

Here is the reproducer:

cat <<EOF | psql -d postgres
drop table if exists tbl;
create table tbl (c int primary key) with (autovacuum_enabled = off);
insert into tbl select generate_series(1,300);
update tbl set c = c * -1 where c = 100;
checkpoint;
EOF

for n in `seq 1 4`
do
    psql -d postgres -c "select from tbl where c = 100" &
done

The server needs to enable wal_log_hints and this might need to run
several times. After running the script we can see this issue by
pg_waldump:

rmgr: XLOG len (rec/tot): 49/ 8209, tx: 0, top: 0, lsn: 1/8FD1C3D8,
prev 1/8FD1C368, desc: FPI_FOR_HINT , blkref #0: rel 1663/12643/16767
blk 0 FPW
rmgr: XLOG len (rec/tot): 49/ 8209, tx: 0, top: 0, lsn: 1/8FD1E408,
prev 1/8FD1C3D8, desc: FPI_FOR_HINT , blkref #0: rel 1663/12643/16767
blk 0 FPW

This is an excerpt from _bt_killitems() of version 12.2. By recent
changes the code of HEAD looks different much but the part in question
is essentially not changed much. That is, it's reproducible even with
HEAD.

    for (i = 0; i < numKilled; i++)
    {
        int         itemIndex = so->killedItems[i];
        BTScanPosItem *kitem = &so->currPos.items[itemIndex];
        OffsetNumber offnum = kitem->indexOffset;

        Assert(itemIndex >= so->currPos.firstItem &&
               itemIndex <= so->currPos.lastItem);
        if (offnum < minoff)
            continue;           /* pure paranoia */
        while (offnum <= maxoff)
        {
            ItemId      iid = PageGetItemId(page, offnum);
            IndexTuple  ituple = (IndexTuple) PageGetItem(page, iid);

            if (ItemPointerEquals(&ituple->t_tid, &kitem->heapTid))
            {
                /* found the item */
                ItemIdMarkDead(iid);
                killedsomething = true;
                break;          /* out of inner search loop */
            }
            offnum = OffsetNumberNext(offnum);
        }
    }

    /*
     * Since this can be redone later if needed, mark as dirty hint.
     *
     * Whenever we mark anything LP_DEAD, we also set the page's
     * BTP_HAS_GARBAGE flag, which is likewise just a hint.
     */
    if (killedsomething)
    {
        opaque->btpo_flags |= BTP_HAS_GARBAGE;
        MarkBufferDirtyHint(so->currPos.buf, true);
    }

The inner test in the comment "found the item" never tests the item
for being dead. So maybe we can add !ItemIdIsDead(iid) to that
condition. But there still is a race condition of recording multiple
FPIs can happen. Maybe a better solution is to change the lock to
exclusive, at least when wal_log_hints = on, so that only one process
can run this code -- the reduction in concurrency might be won back by
the fact that we don't wal-log the page multiple times.

I understand that we can call MarkBufferDirtyHint while holding a
buffer lock in share mode as the comment of MarkBufferDirtyHint()
says, but I'd like to improve this behavior so that we can avoid
multiple FPI_FOR_HINT for the same block.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Vacuum o/p with (full 1, parallel 0) option throwing an error
Next
From: Julien Rouhaud
Date:
Subject: Re: Commitfest 2020-03 Now in Progress