Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date
Msg-id 5203.1556642446@sss.pgh.pa.us
Whole thread Raw
In response to Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> I haven't been able to reproduce this locally yet, but my guess is that
> the REINDEX wants to update some row that was already updated by the
> concurrent transaction, so it has to wait to see if the latter commits
> or not.  And, of course, waiting while holding AccessExclusiveLock on
> any index of pg_class is a Bad Idea (TM).  But I can't quite see why
> we'd be doing something like that during the reindex ...

Ah-hah: the secret to making it reproducible is what prion is doing:
-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE

Here's a stack trace from reindex's side:

#0  0x00000033968e9223 in __epoll_wait_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000787cb5 in WaitEventSetWaitBlock (set=0x22d52f0, timeout=-1,
    occurred_events=0x7ffc77117c00, nevents=1,
    wait_event_info=<value optimized out>) at latch.c:1080
#2  WaitEventSetWait (set=0x22d52f0, timeout=-1,
    occurred_events=0x7ffc77117c00, nevents=1,
    wait_event_info=<value optimized out>) at latch.c:1032
#3  0x00000000007886da in WaitLatchOrSocket (latch=0x7f90679077f4,
    wakeEvents=<value optimized out>, sock=-1, timeout=-1,
    wait_event_info=50331652) at latch.c:407
#4  0x000000000079993d in ProcSleep (locallock=<value optimized out>,
    lockMethodTable=<value optimized out>) at proc.c:1290
#5  0x0000000000796ba2 in WaitOnLock (locallock=0x2200600, owner=0x2213470)
    at lock.c:1768
#6  0x0000000000798719 in LockAcquireExtended (locktag=0x7ffc77117f90,
    lockmode=<value optimized out>, sessionLock=<value optimized out>,
    dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#7  0x00000000007939b7 in XactLockTableWait (xid=2874,
    rel=<value optimized out>, ctid=<value optimized out>,
    oper=XLTW_InsertIndexUnique) at lmgr.c:658
#8  0x00000000004d4841 in heapam_index_build_range_scan (
    heapRelation=0x7f905eb3fcd8, indexRelation=0x7f905eb3c5b8,
    indexInfo=0x22d50c0, allow_sync=<value optimized out>, anyvisible=false,
    progress=true, start_blockno=0, numblocks=4294967295,
    callback=0x4f8330 <_bt_build_callback>, callback_state=0x7ffc771184f0,
    scan=0x2446fb0) at heapam_handler.c:1527
#9  0x00000000004f9db0 in table_index_build_scan (heap=0x7f905eb3fcd8,
    index=0x7f905eb3c5b8, indexInfo=0x22d50c0)
    at ../../../../src/include/access/tableam.h:1437
#10 _bt_spools_heapscan (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8,
    indexInfo=0x22d50c0) at nbtsort.c:489
#11 btbuild (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8, indexInfo=0x22d50c0)
    at nbtsort.c:337
#12 0x0000000000547e33 in index_build (heapRelation=0x7f905eb3fcd8,
    indexRelation=0x7f905eb3c5b8, indexInfo=0x22d50c0, isreindex=true,
    parallel=<value optimized out>) at index.c:2724
#13 0x0000000000548b97 in reindex_index (indexId=2662,
    skip_constraint_checks=false, persistence=112 'p', options=0)
    at index.c:3349
#14 0x00000000005490f1 in reindex_relation (relid=<value optimized out>,
    flags=5, options=0) at index.c:3592
#15 0x00000000005ed295 in ReindexTable (relation=0x21e2938, options=0,
    concurrent=<value optimized out>) at indexcmds.c:2422
#16 0x00000000007b5f69 in standard_ProcessUtility (pstmt=0x21e2cf0,
    queryString=0x21e1f18 "REINDEX TABLE pg_class;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x21e2de8, completionTag=0x7ffc77118d80 "") at utility.c:790
#17 0x00000000007b1689 in PortalRunUtility (portal=0x2247c38, pstmt=0x21e2cf0,
    isTopLevel=<value optimized out>, setHoldSnapshot=<value optimized out>,
    dest=0x21e2de8, completionTag=<value optimized out>) at pquery.c:1175
#18 0x00000000007b2611 in PortalRunMulti (portal=0x2247c38, isTopLevel=true,
    setHoldSnapshot=false, dest=0x21e2de8, altdest=0x21e2de8,
    completionTag=0x7ffc77118d80 "") at pquery.c:1328
#19 0x00000000007b2eb0 in PortalRun (portal=0x2247c38,
    count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x21e2de8,
    altdest=0x21e2de8, completionTag=0x7ffc77118d80 "") at pquery.c:796
#20 0x00000000007af2ab in exec_simple_query (
    query_string=0x21e1f18 "REINDEX TABLE pg_class;") at postgres.c:1215

So basically, the problem here lies in trying to re-verify uniqueness
of pg_class's indexes --- there could easily be entries in pg_class that
haven't committed yet.

I don't think there's an easy way to make this not deadlock against
concurrent DDL.  For sure I don't want to disable the uniqueness
checks.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Sv: Sv: Re: Sv: Re: ERROR: failed to add item to the index page
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: ERROR: failed to add item to the index page