Re: Multiple FPI_FOR_HINT for the same block during killing btreeindex items - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Multiple FPI_FOR_HINT for the same block during killing btreeindex items
Date
Msg-id 20200410.133013.920903448679757354.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Multiple FPI_FOR_HINT for the same block during killing btreeindex items  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
At Fri, 10 Apr 2020 12:32:31 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in 
> On Fri, 10 Apr 2020 at 04:05, Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Wed, Apr 8, 2020 at 10:56 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > > Here is the reproducer:
> >
> > What version of Postgres did you notice the actual customer issue on?
> > I ask because I wonder if the work on B-Tree indexes in Postgres 12
> > affects the precise behavior you get here with real world workloads.
> > It probably makes _bt_killitems() more effective with some workloads,
> > which naturally increases the likelihood of having multiple FPI issued
> > in the manner that you describe. OTOH, it might make it less likely
> > with low cardinality indexes, since large groups of garbage duplicate
> > tuples tend to get concentrated on just a few leaf pages.
> >
> > > The inner test in the comment "found the item" never tests the item
> > > for being dead. So maybe we can add !ItemIdIsDead(iid) to that
> > > condition. But there still is a race condition of recording multiple
> > > FPIs can happen. Maybe a better solution is to change the lock to
> > > exclusive, at least when wal_log_hints = on, so that only one process
> > > can run this code -- the reduction in concurrency might be won back by
> > > the fact that we don't wal-log the page multiple times.
> >
> > I like the idea of checking !ItemIdIsDead(iid) as a further condition
> > of killing the item -- there is clearly no point in doing work to kill
> > an item that is already dead. I don't like the idea of using an
> > exclusive buffer lock (even if it's just with wal_log_hints = on),
> > though.
> >
> 
> Okay. I think only adding the check would also help with reducing the
> likelihood. How about the changes for the current HEAD I've attached?

FWIW, looks good to me.

> Related to this behavior on btree indexes, this can happen even on
> heaps during searching heap tuples. To reduce the likelihood of that
> more generally I wonder if we can acquire a lock on buffer descriptor
> right before XLogSaveBufferForHint() and set a flag to the buffer
> descriptor that indicates that we're about to log FPI for hint bit so
> that concurrent process can be aware of that.

Makes sense if the lock were acquired just before the "BM_DIRTY |
BM_JUST_DIRTIED) check.  Could we use double-checking, as similar to
the patch for ItemIdIsDead()?

> if ((pg_atomic_read_u32(&bufHdr->state) & (BM_DIRTY | BM_JUST_DIRTIED)) !=
>     (BM_DIRTY | BM_JUST_DIRTIED))
> {
...
>       * essential that CreateCheckpoint waits for virtual transactions
>       * rather than full transactionids.
>       */
>      /* blah, blah */   
>      buf_state = LockBufHdr(bufHdr);
>
>      if (buf_state & (BM_ | BM_JUST) != (..))
>      {
>        MyProc->delayChkpt = delayChkpt = true;
>        lsn = XLogSaveBufferForHint(buffer, buffer_std);
>      }
>    }
>    else
>     buf_state = LockBuffer(bufHdr);
      

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: SyncRepLock acquired exclusively in default configuration
Next
From: Amit Kapila
Date:
Subject: Re: Vacuum o/p with (full 1, parallel 0) option throwing an error