Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 |
Date | |
Msg-id | 5203.1556642446@sss.pgh.pa.us Whole thread Raw |
In response to | Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
I wrote: > I haven't been able to reproduce this locally yet, but my guess is that > the REINDEX wants to update some row that was already updated by the > concurrent transaction, so it has to wait to see if the latter commits > or not. And, of course, waiting while holding AccessExclusiveLock on > any index of pg_class is a Bad Idea (TM). But I can't quite see why > we'd be doing something like that during the reindex ... Ah-hah: the secret to making it reproducible is what prion is doing: -DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE Here's a stack trace from reindex's side: #0 0x00000033968e9223 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000787cb5 in WaitEventSetWaitBlock (set=0x22d52f0, timeout=-1, occurred_events=0x7ffc77117c00, nevents=1, wait_event_info=<value optimized out>) at latch.c:1080 #2 WaitEventSetWait (set=0x22d52f0, timeout=-1, occurred_events=0x7ffc77117c00, nevents=1, wait_event_info=<value optimized out>) at latch.c:1032 #3 0x00000000007886da in WaitLatchOrSocket (latch=0x7f90679077f4, wakeEvents=<value optimized out>, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:407 #4 0x000000000079993d in ProcSleep (locallock=<value optimized out>, lockMethodTable=<value optimized out>) at proc.c:1290 #5 0x0000000000796ba2 in WaitOnLock (locallock=0x2200600, owner=0x2213470) at lock.c:1768 #6 0x0000000000798719 in LockAcquireExtended (locktag=0x7ffc77117f90, lockmode=<value optimized out>, sessionLock=<value optimized out>, dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050 #7 0x00000000007939b7 in XactLockTableWait (xid=2874, rel=<value optimized out>, ctid=<value optimized out>, oper=XLTW_InsertIndexUnique) at lmgr.c:658 #8 0x00000000004d4841 in heapam_index_build_range_scan ( heapRelation=0x7f905eb3fcd8, indexRelation=0x7f905eb3c5b8, indexInfo=0x22d50c0, allow_sync=<value optimized out>, anyvisible=false, progress=true, start_blockno=0, numblocks=4294967295, callback=0x4f8330 <_bt_build_callback>, callback_state=0x7ffc771184f0, scan=0x2446fb0) at heapam_handler.c:1527 #9 0x00000000004f9db0 in table_index_build_scan (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8, indexInfo=0x22d50c0) at ../../../../src/include/access/tableam.h:1437 #10 _bt_spools_heapscan (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8, indexInfo=0x22d50c0) at nbtsort.c:489 #11 btbuild (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8, indexInfo=0x22d50c0) at nbtsort.c:337 #12 0x0000000000547e33 in index_build (heapRelation=0x7f905eb3fcd8, indexRelation=0x7f905eb3c5b8, indexInfo=0x22d50c0, isreindex=true, parallel=<value optimized out>) at index.c:2724 #13 0x0000000000548b97 in reindex_index (indexId=2662, skip_constraint_checks=false, persistence=112 'p', options=0) at index.c:3349 #14 0x00000000005490f1 in reindex_relation (relid=<value optimized out>, flags=5, options=0) at index.c:3592 #15 0x00000000005ed295 in ReindexTable (relation=0x21e2938, options=0, concurrent=<value optimized out>) at indexcmds.c:2422 #16 0x00000000007b5f69 in standard_ProcessUtility (pstmt=0x21e2cf0, queryString=0x21e1f18 "REINDEX TABLE pg_class;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x21e2de8, completionTag=0x7ffc77118d80 "") at utility.c:790 #17 0x00000000007b1689 in PortalRunUtility (portal=0x2247c38, pstmt=0x21e2cf0, isTopLevel=<value optimized out>, setHoldSnapshot=<value optimized out>, dest=0x21e2de8, completionTag=<value optimized out>) at pquery.c:1175 #18 0x00000000007b2611 in PortalRunMulti (portal=0x2247c38, isTopLevel=true, setHoldSnapshot=false, dest=0x21e2de8, altdest=0x21e2de8, completionTag=0x7ffc77118d80 "") at pquery.c:1328 #19 0x00000000007b2eb0 in PortalRun (portal=0x2247c38, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x21e2de8, altdest=0x21e2de8, completionTag=0x7ffc77118d80 "") at pquery.c:796 #20 0x00000000007af2ab in exec_simple_query ( query_string=0x21e1f18 "REINDEX TABLE pg_class;") at postgres.c:1215 So basically, the problem here lies in trying to re-verify uniqueness of pg_class's indexes --- there could easily be entries in pg_class that haven't committed yet. I don't think there's an easy way to make this not deadlock against concurrent DDL. For sure I don't want to disable the uniqueness checks. regards, tom lane
pgsql-hackers by date: