Home > mailing lists

Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
Date	December 11, 2017 08:51:19
Msg-id	CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com Whole thread Raw
Responses	Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD (Thomas Munro <thomas.munro@enterprisedb.com>) Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

Hi hackers,

I heard a report of a 10.1 cluster hanging with several 'BtreePage'
wait_events showing in pg_stat_activity.  The query plan involved
Parallel Index Only Scan, and the table is concurrently updated quite
heavily.  I tried and failed to make a reproducer, but from the clues
available it seemed clear that somehow *all* participants in a
Parallel Index Scan must be waiting for someone else to advance the
scan.  The report came with a back trace[1] that was the same in all 3
backends (leader + 2 workers), which I'll summarise here:

  ConditionVariableSleep
  _bt_parallel_seize
  _bt_readnextpage
  _bt_steppage
  _bt_next
  btgettuple
  index_getnext_tid
  IndexOnlyNext

I think _bt_steppage() called _bt_parallel_seize(), then it called
_bt_readnextpage() which I guess must have encountered a BTP_DELETED
or BTP_HALF_DEAD-marked page so didn't take this early break out of
the loop:

                        /* check for deleted page */
                        if (!P_IGNORE(opaque))
                        {
                                PredicateLockPage(rel, blkno,
scan->xs_snapshot);
                                /* see if there are any matches on this page */
                                /* note that this will clear moreRight
if we can stop */
                                if (_bt_readpage(scan, dir,
P_FIRSTDATAKEY(opaque)))
                                        break;
                        }

... and then it called _bt_parallel_seize() itself, in violation of
the rule (by my reading of the code) that you must call
_bt_parallel_release() (via _bt_readpage()) or _bt_parallel_done()
after seizing the scan.  If you call _bt_parallel_seize() again
without doing that first, you'll finish up waiting for yourself
forever.  Does this theory make sense?

[1] http://dpaste.com/05PGJ4E

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Amit Langote
Date: 11 December 2017, 08:30:02
Subject: Re: ScalarArrayOpExpr and multi-dimensional arrays

From: Thomas Munro
Date: 11 December 2017, 09:07:23
Subject: Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD - Mailing list pgsql-hackers

Previous

Next