Thread: pg15b4: FailedAssertion("TransactionIdIsValid(xmax)
The sequence of events leading up to this: 0) Yesterday I upgraded an internal VM to pg15b4 using PGDG RPMs; It's the same VM that hit the prefetch_recovery bug which was fixed by adb466150. I don't think that should've left it in a weird state (since recovery was sucessful when prefetch was disabled, and the cluser worked fine until now). 1) This evening, I started running sqlsmith connected to the postgres DB that has some logging tables in it; 2) There was a lot of swapping, and the backend was finally killed due to OOM when sqlsmith tried to run a terrible query involving database_to_xml(); 3) lots of backends started crashing with SIGABRT; 4) I was simultaneously compiling pg14b4 to run with with -DRELCACHE_FORCE_RELEASE and installing it into /usr/local. I don't *think* running libraries would've been overwritten, and that shouldn't have affected the running instance anyway... I got a lot (100+) of SIGABRT. I suspect all the backtraces are the same. My hypothesis is that the OOM crash caused bad xmax to be written (or rather, recovery didn't cause it to be written correctly?). We may have hit a race condition due to heavy swapping. (gdb) bt #0 0x00007fb8a22f31f7 in raise () from /lib64/libc.so.6 #1 0x00007fb8a22f48e8 in abort () from /lib64/libc.so.6 #2 0x000000000098f9be in ExceptionalCondition (conditionName=conditionName@entry=0x9fada4 "TransactionIdIsValid(xmax)",errorType=errorType@entry=0x9ed217 "FailedAssertion", fileName=fileName@entry=0x9fad90 "heapam_visibility.c", lineNumber=lineNumber@entry=1353) at assert.c:69 #3 0x00000000004fd4d7 in HeapTupleSatisfiesVacuumHorizon (htup=htup@entry=0x7ffc225a87e0, buffer=buffer@entry=5100, dead_after=dead_after@entry=0x7ffc225a87d0)at heapam_visibility.c:1353 #4 0x0000000000501702 in heap_prune_satisfies_vacuum (buffer=5100, tup=0x7ffc225a87e0, prstate=0x7ffc225a8a50) at pruneheap.c:504 #5 heap_page_prune (relation=relation@entry=0x7fb8a50c3438, buffer=buffer@entry=5100, vistest=vistest@entry=0xec7890 <GlobalVisDataRels>,old_snap_xmin=<optimized out>, old_snap_ts=<optimized out>, nnewlpdead=nnewlpdead@entry=0x7ffc225a964c, off_loc=off_loc@entry=0x0) at pruneheap.c:351 #6 0x0000000000502326 in heap_page_prune_opt (relation=0x7fb8a50c3438, buffer=buffer@entry=5100) at pruneheap.c:209 #7 0x00000000004f3ae3 in heapgetpage (sscan=sscan@entry=0x199b1d0, page=page@entry=2892) at heapam.c:415 #8 0x00000000004f44c2 in heapgettup_pagemode (scan=scan@entry=0x199b1d0, dir=<optimized out>, nkeys=0, key=0x0) at heapam.c:1120 #9 0x00000000004f5abe in heap_getnextslot (sscan=0x199b1d0, direction=<optimized out>, slot=0x1967be8) at heapam.c:1352 #10 0x00000000006de16b in table_scan_getnextslot (slot=0x1967be8, direction=ForwardScanDirection, sscan=<optimized out>)at ../../../src/include/access/tableam.h:1046 #11 SeqNext (node=node@entry=0x1967a38) at nodeSeqscan.c:80 #12 0x00000000006b109a in ExecScanFetch (recheckMtd=0x6de0f0 <SeqRecheck>, accessMtd=0x6de100 <SeqNext>, node=0x1967a38)at execScan.c:133 #13 ExecScan (node=0x1967a38, accessMtd=0x6de100 <SeqNext>, recheckMtd=0x6de0f0 <SeqRecheck>) at execScan.c:199 #14 0x00000000006add88 in ExecProcNodeInstr (node=0x1967a38) at execProcnode.c:479 #15 0x00000000006a6182 in ExecProcNode (node=0x1967a38) at ../../../src/include/executor/executor.h:259 #16 ExecutePlan (execute_once=<optimized out>, dest=0x1988448, direction=<optimized out>, numberTuples=0, sendTuples=true,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1967a38, estate=0x19677a0) at execMain.c:1636 #17 standard_ExecutorRun (queryDesc=0x1996e80, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #18 0x00007fb8960913bd in pgss_ExecutorRun (queryDesc=0x1996e80, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at pg_stat_statements.c:1010 #19 0x00007fb895c6f781 in explain_ExecutorRun (queryDesc=0x1996e80, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at auto_explain.c:320 #20 0x000000000084976e in PortalRunSelect (portal=portal@entry=0x18fed30, forward=forward@entry=true, count=0, count@entry=9223372036854775807,dest=dest@entry=0x1988448) at pquery.c:924 #21 0x000000000084af4f in PortalRun (portal=0x18fed30, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x1988448, altdest=0x1988448, qc=0x7ffc225a9ce0) at pquery.c:768 #22 0x000000000084679b in exec_simple_query (query_string=0x186d8a0 "SELECT alarm_id, alarm_disregard FROM alarms WHERE alarm_ack_timeIS NULL AND alarm_clear_time IS NULL AND alarm_office = 'ETHERNET'") at postgres.c:1250 #23 0x000000000084848a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4581 #24 0x0000000000495afe in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4504 #25 BackendStartup (port=0x1894250) at postmaster.c:4232 #26 ServerLoop () at postmaster.c:1806 #27 0x00000000007b0c60 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1868280) at postmaster.c:1478 #28 0x00000000004976a6 in main (argc=3, argv=0x1868280) at main.c:202 < 2022-09-09 19:44:03.329 CDT >LOG: server process (PID 8949) was terminated by signal 6: Aborted < 2022-09-09 19:44:03.329 CDT >DETAIL: Failed process was running: SELECT alarm_id, alarm_disregard FROM alarms WHERE alarm_shakeout_time<=now() AND alarm_shakeout_time>now()-$1::interval AND alarm_time!=alarm_shakeout_time AND alarm_clear_time IS NULL For context, that's a partition the main table on this DB, and any problem there would've been immediately apparent. We have multiple instances of that query which run continuously. I saved a copy of the data dir, but I'm going to have to bring the DB back online soon. I don't know if I can get a page image easily since GDB is a bit busted (won't show source code) and I can't remember how I fixed that last time... (gdb) fr 6 #6 0x0000000000502326 in heap_page_prune_opt (relation=0x7fb8a50c3438, buffer=buffer@entry=5100) at pruneheap.c:209 209 pruneheap.c: No such file or directory. (gdb) p *vistest $2 = {definitely_needed = {value = 8074911083}, maybe_needed = {value = 8074911083}} #2 0x000000000098f9be in ExceptionalCondition (conditionName=conditionName@entry=0x9fada4 "TransactionIdIsValid(xmax)",errorType=errorType@entry=0x9ed217 "FailedAssertion", fileName=fileName@entry=0x9fad90 "heapam_visibility.c", lineNumber=lineNumber@entry=1353) at assert.c:69 No locals. #3 0x00000000004fd4d7 in HeapTupleSatisfiesVacuumHorizon (htup=htup@entry=0x7ffc225a87e0, buffer=buffer@entry=5100, dead_after=dead_after@entry=0x7ffc225a87d0)at heapam_visibility.c:1353 xmax = 0 tuple = 0x2aaab5a5fdf8 #4 0x0000000000501702 in heap_prune_satisfies_vacuum (buffer=5100, tup=0x7ffc225a87e0, prstate=0x7ffc225a8a50) at pruneheap.c:504 res = <optimized out> dead_after = 0 #5 heap_page_prune (relation=relation@entry=0x7fb8a50c3438, buffer=buffer@entry=5100, vistest=vistest@entry=0xec7890 <GlobalVisDataRels>,old_snap_xmin=<optimized out>, old_snap_ts=<optimized out>, nnewlpdead=nnewlpdead@entry=0x7ffc225a964c, off_loc=off_loc@entry=0x0) at pruneheap.c:351 itemid = 0x2aaab5a5fd38 htup = <optimized out> ndeleted = 0 page = 0x2aaab5a5fd00 <Address 0x2aaab5a5fd00 out of bounds> blockno = 2892 offnum = 9 maxoff = 10 prstate = {rel = 0x7fb8a50c3438, vistest = 0xec7890 <GlobalVisDataRels>, old_snap_ts = 0, old_snap_xmin = 0, old_snap_used= false, new_prune_xid = 0, latestRemovedXid = 0, nredirected = 0, ndead = 0, nunused = 0, redirected = {32696, 0, 22724, 151, 0, 0, 1114, 0, 0, 0, 434, 0, 0, 0, 434, 0, 0, 0, 1114, 0, 0, 0, 1114, 0, 0,0, 1, 0, 0, 0, 0, 0, 1, 3, 0, 0, 950, 0, 6960, 144, 0, 0, 184, 0, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 52144, 397, 0, 0, 0, 0, 0, 0, 434, 0, 0, 0, 0, 0, 2, 3, 0, 0, 950, 0, 6960, 144, 0, 0, 184, 0, 2, 1, 2, 0, 0, 0, 0,0, 0, 0, 52144, 397, 0, 0, 0, 0, 0, 0, 1114, 0, 0, 0, 0, 0, 3, 3, 0, 0, 950, 0, 6960, 144, 0, 0, 184, 0, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 52144, 397, 0, 0, 0, 0, 0, 0, 1114, 0, 0, 0, 0, 0, 4, 3, 0, 0, 950, 0, 51440, 139,0, 0, 63, 0, 2, 1, 2, 0, 0, 0, 0, 0, 0, 0, 52144, 397, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 5120, 60453, 16693, 62684, 0, 0, 0, 0, 60168, 397, 0, 0, 1304, 408, 0, 0, 29816, 7947, 0, 0, 3472, 408...}, nowdead= {63057, 4137, 0, 0, 64763, 42226, 32696, 0, 26544, 121, 0, 0, 36704, 8794, 32764, 0, 27788, 41847, 32696, 0, 32664, 41847, 32696, 0, 36976, 8794, 32764, 0, 36960, 8794, 32764, 0, 17, 0, 32764, 0, 0, 0,0, 0, 0, 0, 0, 0, 19304, 42256, 32696, 0, 54472, 42260, 32696, 0, 14860, 69, 0, 0, 36432, 41847, 32696, 0, 28184, 65, 0, 0, 0, 0, 1, 0, 177, 0, 1, 0, 11240, 408, 0, 0, 37160, 8794, 32764, 0, 37120, 8794, 32764, 0,1, 0, 0, 0, 19304, 42256, 32696, 0, 38056, 42260, 32696, 0, 37200, 42260, 32696, 0, 65503, 42226, 32696, 0, 0, 0, 0, 0, 19304, 42256, 32696, 0, 1, 0, 0, 16368, 0, 0, 0, 0, 1, 0, 32764, 0, 37200, 42260, 32696,0, 0, 0, 0, 0, 42289, 152, 0, 0, 53392, 406, 0, 0, 11240, 408, 0, 0, 0, 0, 0, 0, 38056, 42260, 32696, 0, 36976, 8794, 32764, 0, 36960, 8794, 32764, 0, 63057, 4137, 0, 0, 14860, 69, 0, 0, 65535, 65535, 0, 0, 3, 0,0, 0, 32664, 41847, 32696, 0, 54472, 42260, 32696, 0, 0, 0, 0, 16368, 0, 0, 0, 16400, 0, 0, 0, 0, 5243, 18350, 31457, 16228...}, nowunused = {0, 32000, 46500, 10922, 0, 11, 0, 0, 0, 17, 2560, 0, 0, 61105, 6,0, 0, 37544, 8794, 32764, 0, 28552, 408, 0, 0, 0, 0, 0, 0, 52144, 397, 0, 0, 8096, 0, 65535, 2, 0 <repeats 98 times>, 11, 0, 10, 0, 4, 25089, 78, 11265, 0, 0, 0, 0, 0, 0, 1007, 0, 42, 0, 43, 0, 2406, 0, 2407,0, 0, 0, 0, 0, 0, 0, 32639, 32639, 32639, 32639, 32639, 32639, 32639, 32639, 36353, 33032, 24272, 16340, 0, 0, 0, 0, 0, 0, 0, 16368, 0, 0, 0, 0, 0, 0, 18136, 16663, 0, 0, 0, 0, 0, 49152, 18127, 16359, 0, 0,0, 0, 59295, 42150, 63299, 48114, 0, 0, 0, 0, 28399, 5317, 64332, 48184, 0, 0, 0, 0, 8832, 36553, 22255, 16426, 0, 0, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 0, 0, 0, 0, 55168, 390, 0, 0, 64, 0, 0, 0, 231, 0, 0, 0, 38652, 8794,32764, 0, 38640, 8794, 32764, 0, 0, 0, 0, 0, 3092, 156, 0, 0, 0, 0, 0, 0, 64, 0, 0, 0, 6, 0 <repeats 13 times>}, marked = {false <repeats 292 times>}, htsv = "\340Xm&\343:\367/$\254\001\357\335@*q\212^D\032\070\327\004\347!6\231\275\347Ac\207\207\236\262\211%2(\373\303\240U[\207\274\033\246\374{T\355\331\177\311\310m\222\200", '\000'<repeats 17 times>, "\220\314\277MH<\262\000\000\000\000\000\221\022\000\000\065A\334\364\370>\262\000\000\000\000\000\000\024%\354\065A\334\364\330\227\035\263\252*\000\000\300\227\035\263\252*\000\000\000\375\245\265\252*\000\000\200\301\300\252\252*\000\000P\347\226\001\000\000\000\000\330\227\035\263\252*\000\000\375\017\233\260\000\000\000\000\277\303\200\000\000\000\000\000`\213\223\001\000\000\000\000\333\227\035\263\252*\000\000\000\000\000\000L\v\000\000\000\000\000\000\000\000\000\000"...} tup = {t_len = 596, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 2892}, ip_posid = 9}, t_tableOid = 17852325, t_data= 0x2aaab5a5fdf8} #6 0x0000000000502326 in heap_page_prune_opt (relation=0x7fb8a50c3438, buffer=buffer@entry=5100) at pruneheap.c:209 ndeleted = <optimized out> nnewlpdead = -186891979 page = 0x2aaab5a5fd00 <Address 0x2aaab5a5fd00 out of bounds> prune_xid = <optimized out> vistest = 0xec7890 <GlobalVisDataRels> limited_xmin = 0 limited_ts = 0 minfree = 819 #7 0x00000000004f3ae3 in heapgetpage (sscan=sscan@entry=0x199b1d0, page=page@entry=2892) at heapam.c:415 scan = 0x199b1d0 buffer = 5100 snapshot = 0x188ef18 lines = <optimized out> ntup = <optimized out> lineoff = <optimized out> lpp = <optimized out> all_visible = <optimized out> #8 0x00000000004f44c2 in heapgettup_pagemode (scan=scan@entry=0x199b1d0, dir=<optimized out>, nkeys=0, key=0x0) at heapam.c:1120 tuple = 0x199b228 backward = false page = 2892 finished = false dp = 0x2aaab5a5dd00 <Address 0x2aaab5a5dd00 out of bounds> lines = <optimized out> lineindex = 8 lineoff = <optimized out> linesleft = 0 lpp = <optimized out> #9 0x00000000004f5abe in heap_getnextslot (sscan=0x199b1d0, direction=<optimized out>, slot=0x1967be8) at heapam.c:1352 -- Justin
On Fri, Sep 09, 2022 at 09:06:37PM -0500, Justin Pryzby wrote: > #0 0x00007fb8a22f31f7 in raise () from /lib64/libc.so.6 > #1 0x00007fb8a22f48e8 in abort () from /lib64/libc.so.6 > #2 0x000000000098f9be in ExceptionalCondition (conditionName=conditionName@entry=0x9fada4 "TransactionIdIsValid(xmax)",errorType=errorType@entry=0x9ed217 "FailedAssertion", > fileName=fileName@entry=0x9fad90 "heapam_visibility.c", lineNumber=lineNumber@entry=1353) at assert.c:69 > #3 0x00000000004fd4d7 in HeapTupleSatisfiesVacuumHorizon (htup=htup@entry=0x7ffc225a87e0, buffer=buffer@entry=5100, dead_after=dead_after@entry=0x7ffc225a87d0)at heapam_visibility.c:1353 > #4 0x0000000000501702 in heap_prune_satisfies_vacuum (buffer=5100, tup=0x7ffc225a87e0, prstate=0x7ffc225a8a50) at pruneheap.c:504 > #5 heap_page_prune (relation=relation@entry=0x7fb8a50c3438, buffer=buffer@entry=5100, vistest=vistest@entry=0xec7890 <GlobalVisDataRels>,old_snap_xmin=<optimized out>, old_snap_ts=<optimized out>, > nnewlpdead=nnewlpdead@entry=0x7ffc225a964c, off_loc=off_loc@entry=0x0) at pruneheap.c:351 > #6 0x0000000000502326 in heap_page_prune_opt (relation=0x7fb8a50c3438, buffer=buffer@entry=5100) at pruneheap.c:209 > #7 0x00000000004f3ae3 in heapgetpage (sscan=sscan@entry=0x199b1d0, page=page@entry=2892) at heapam.c:415 > #8 0x00000000004f44c2 in heapgettup_pagemode (scan=scan@entry=0x199b1d0, dir=<optimized out>, nkeys=0, key=0x0) at heapam.c:1120 > #9 0x00000000004f5abe in heap_getnextslot (sscan=0x199b1d0, direction=<optimized out>, slot=0x1967be8) at heapam.c:1352 > #10 0x00000000006de16b in table_scan_getnextslot (slot=0x1967be8, direction=ForwardScanDirection, sscan=<optimized out>)at ../../../src/include/access/tableam.h:1046 > #11 SeqNext (node=node@entry=0x1967a38) at nodeSeqscan.c:80 > #12 0x00000000006b109a in ExecScanFetch (recheckMtd=0x6de0f0 <SeqRecheck>, accessMtd=0x6de100 <SeqNext>, node=0x1967a38)at execScan.c:133 > #13 ExecScan (node=0x1967a38, accessMtd=0x6de100 <SeqNext>, recheckMtd=0x6de0f0 <SeqRecheck>) at execScan.c:199 > #14 0x00000000006add88 in ExecProcNodeInstr (node=0x1967a38) at execProcnode.c:479 > #15 0x00000000006a6182 in ExecProcNode (node=0x1967a38) at ../../../src/include/executor/executor.h:259 > #16 ExecutePlan (execute_once=<optimized out>, dest=0x1988448, direction=<optimized out>, numberTuples=0, sendTuples=true,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1967a38, estate=0x19677a0) > at execMain.c:1636 > #17 standard_ExecutorRun (queryDesc=0x1996e80, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 > #18 0x00007fb8960913bd in pgss_ExecutorRun (queryDesc=0x1996e80, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at pg_stat_statements.c:1010 > #19 0x00007fb895c6f781 in explain_ExecutorRun (queryDesc=0x1996e80, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at auto_explain.c:320 > #20 0x000000000084976e in PortalRunSelect (portal=portal@entry=0x18fed30, forward=forward@entry=true, count=0, count@entry=9223372036854775807,dest=dest@entry=0x1988448) at pquery.c:924 > #21 0x000000000084af4f in PortalRun (portal=0x18fed30, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimizedout>, dest=0x1988448, altdest=0x1988448, qc=0x7ffc225a9ce0) at pquery.c:768 > #22 0x000000000084679b in exec_simple_query (query_string=0x186d8a0 "SELECT alarm_id, alarm_disregard FROM alarms WHEREalarm_ack_time IS NULL AND alarm_clear_time IS NULL AND alarm_office = 'ETHERNET'") at postgres.c:1250 > #23 0x000000000084848a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4581 > #24 0x0000000000495afe in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4504 > #25 BackendStartup (port=0x1894250) at postmaster.c:4232 > #26 ServerLoop () at postmaster.c:1806 > #27 0x00000000007b0c60 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1868280) at postmaster.c:1478 > #28 0x00000000004976a6 in main (argc=3, argv=0x1868280) at main.c:202 > > I saved a copy of the data dir, but I'm going to have to bring the DB back > online soon. I don't know if I can get a page image easily since GDB is a bit > busted (won't show source code) and I can't remember how I fixed that last > time... Actually gdb seems to be fine, except it doesn't work well on the corefile. Breakpoint 4, HeapTupleSatisfiesVacuumHorizon (htup=htup@entry=0x7ffcbdc89970, buffer=buffer@entry=690, dead_after=dead_after@entry=0x7ffcbdc898fc)at heapam_visibility.c:1196 1196 { (gdb) n 1197 HeapTupleHeader tuple = htup->t_data; (gdb) p *tuple $21 = {t_choice = {t_heap = {t_xmin = 3779887563, t_xmax = 133553150, t_field3 = {t_cid = 103, t_xvac = 103}}, t_datum ={datum_len_ = -515079733, datum_typmod = 133553150, datum_typeid = 103}}, t_ctid = {ip_blkid = { bi_hi = 0, bi_lo = 30348}, ip_posid = 11}, t_infomask2 = 8225, t_infomask = 4423, t_hoff = 32 ' ', t_bits = 0x2aaab37ebe0f"\303\377\367\377\001"} (gdb) n 1350 Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)); (gdb) p xmax $20 = 0 (gdb) n 1353 Assert(TransactionIdIsValid(xmax)); (gdb) down #3 0x00000000004f3817 in heap_page_prune_opt (relation=0x7f5c989043e0, buffer=buffer@entry=690) at pruneheap.c:209 209 ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, (gdb) l 204 if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) 205 { (gdb) p ((PageHeader) page)->pd_flags $26 = 3 => src/include/storage/bufpage.h-#define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */ src/include/storage/bufpage.h:#define PD_PAGE_FULL 0x0002 /* not enough free space for new tuple? */ (gdb) p PageGetHeapFreeSpace(page) $28 = 180 (gdb) p minfree $29 = 819 ts=# SELECT * FROM verify_heapam('child.alarms_null'); blkno | offnum | attnum | msg -------+--------+--------+----------------------------------------------------------------- 2892 | 9 | | update xid is invalid 10336 | 9 | | update xid is invalid 14584 | 5 | | update xid is invalid 30449 | 3 | | update xid is invalid 30900 | 2 | | xmin 3779946727 precedes relation freeze threshold 1:3668608554 43078 | 2 | | update xid is invalid 43090 | 1 | | update xid is invalid (7 rows) child | alarms_null | table | telsasoft | permanent | heap | 357 MB |
Hi,
That’s interesting, dig into it for a while but not too much progress.
Maybe we could add some logs to print MultiXactMembers’ xid and status if xid is 0.
Inside MultiXactIdGetUpdateXid()
```
nmembers = GetMultiXactIdMembers(xmax, &members, false, false);
if (nmembers > 0)
{
int i;
for (i = 0; i < nmembers; i++)
{
/* Ignore lockers */
if (!ISUPDATE_from_mxstatus(members[i].status))
continue;
/* there can be at most one updater */
Assert(update_xact == InvalidTransactionId);
update_xact = members[i].xid;
// log here if xid is invalid
#ifndef USE_ASSERT_CHECKING
/*
* in an assert-enabled build, walk the whole array to ensure
* there's no other updater.
*/
break;
#endif
}
pfree(members);
}
// and here if didn’t update update_xact at all (it shouldn’t happen as designed)
return update_xact;
```
That will help a little if we can reproduce it.
And could we see multixact reply in logs if db does recover?
That’s interesting, dig into it for a while but not too much progress.
Maybe we could add some logs to print MultiXactMembers’ xid and status if xid is 0.
Inside MultiXactIdGetUpdateXid()
```
nmembers = GetMultiXactIdMembers(xmax, &members, false, false);
if (nmembers > 0)
{
int i;
for (i = 0; i < nmembers; i++)
{
/* Ignore lockers */
if (!ISUPDATE_from_mxstatus(members[i].status))
continue;
/* there can be at most one updater */
Assert(update_xact == InvalidTransactionId);
update_xact = members[i].xid;
// log here if xid is invalid
#ifndef USE_ASSERT_CHECKING
/*
* in an assert-enabled build, walk the whole array to ensure
* there's no other updater.
*/
break;
#endif
}
pfree(members);
}
// and here if didn’t update update_xact at all (it shouldn’t happen as designed)
return update_xact;
```
That will help a little if we can reproduce it.
And could we see multixact reply in logs if db does recover?
Regards,
Zhang Mingli
On Sat, Sep 10, 2022 at 12:07:30PM +0800, Zhang Mingli wrote: > That’s interesting, dig into it for a while but not too much progress. > > Maybe we could add some logs to print MultiXactMembers’ xid and status if xid is 0. > > Inside MultiXactIdGetUpdateXid() > > ``` > nmembers = GetMultiXactIdMembers(xmax, &members, false, false); > > if (nmembers > 0) > { > int i; > > for (i = 0; i < nmembers; i++) > { > /* Ignore lockers */ > if (!ISUPDATE_from_mxstatus(members[i].status)) > continue; > > /* there can be at most one updater */ > Assert(update_xact == InvalidTransactionId); > update_xact = members[i].xid; > > // log here if xid is invalid > #ifndef USE_ASSERT_CHECKING > > /* > * in an assert-enabled build, walk the whole array to ensure > * there's no other updater. > */ > break; > #endif > } > > pfree(members); > } > // and here if didn’t update update_xact at all (it shouldn’t happen as designed) Yeah. I added assertions for the above case inside the loop, and for this one, and this fails right before "return". TRAP: FailedAssertion("update_xact != InvalidTransactionId", File: "src/backend/access/heap/heapam.c", Line: 6939, PID: 4743) It looks like nmembers==2, both of which are lockers and being ignored. > And could we see multixact reply in logs if db does recover? Do you mean waldump or ?? BTW, after a number of sigabrt's, I started seeing these during recovery: < 2022-09-09 19:44:04.180 CDT >LOG: unexpected pageaddr 1214/AF0FE000 in log segment 0000000100001214000000B4, offset 1040384 < 2022-09-09 23:20:50.830 CDT >LOG: unexpected pageaddr 1214/CF65C000 in log segment 0000000100001214000000D8, offset 6668288 -- Justin
The OOM was at: < 2022-09-09 19:34:24.043 CDT >LOG: server process (PID 14841) was terminated by signal 9: Killed The first SIGABRT was at: < 2022-09-09 19:37:31.650 CDT >LOG: server process (PID 7363) was terminated by signal 6: Aborted And I've just found a bunch of "interesting" logs between the two: < 2022-09-09 19:36:48.505 CDT telsasoft >ERROR: MultiXactId 133553150 has not been created yet -- apparent wraparound < 2022-09-09 19:36:48.505 CDT telsasoft >STATEMENT: SELECT alarm_id, alarm_disregard FROM alarms WHERE alarm_ack_time ISNULL AND alarm_clear_time IS NULL AND alarm_office = 'ETHERNET' < 2022-09-09 19:36:48.788 CDT telsasoft >ERROR: could not access status of transaction 3779944583 < 2022-09-09 19:36:48.788 CDT telsasoft >DETAIL: Could not read from file "pg_subtrans/E14D" at offset 98304: read too fewbytes. ... < 2022-09-09 19:37:08.550 CDT telsasoft >ERROR: MultiXactId 133553156 has not been created yet -- apparent wraparound ... < 2022-09-09 19:37:13.792 CDT telsasoft >ERROR: could not access status of transaction 3779946306 < 2022-09-09 19:37:13.792 CDT telsasoft >DETAIL: Could not read from file "pg_subtrans/E14D" at offset 98304: read too fewbytes. ... < 2022-09-09 19:37:19.682 CDT telsasoft >ERROR: could not access status of transaction 3779946306 < 2022-09-09 19:37:19.682 CDT telsasoft >DETAIL: Could not read from file "pg_subtrans/E14D" at offset 98304: read too fewbytes. < 2022-09-09 19:37:19.682 CDT telsasoft >CONTEXT: while locking tuple (11755,5) in relation "alarms_null" ... < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has not been created yet -- apparent wraparound BTW, if I load the datadir backup to crash it, I see: t_infomask = 4423, which is: ; 0x1000+0x0100+0x0040+0x0004+0x0002+0x0001 4423 src/include/access/htup_details.h-#define HEAP_HASNULL 0x0001 /* has null attribute(s) */ src/include/access/htup_details.h-#define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */ src/include/access/htup_details.h-#define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */ src/include/access/htup_details.h-#define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */ src/include/access/htup_details.h-#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */ src/include/access/htup_details.h:#define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */ I was I could say what autovacuum had been doing during that period, but unfortunately I have "log_autovacuum_min_duration = 9s"...
On Sat, Sep 10, 2022 at 5:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > BTW, after a number of sigabrt's, I started seeing these during > recovery: > > < 2022-09-09 19:44:04.180 CDT >LOG: unexpected pageaddr 1214/AF0FE000 in log segment 0000000100001214000000B4, offset1040384 > < 2022-09-09 23:20:50.830 CDT >LOG: unexpected pageaddr 1214/CF65C000 in log segment 0000000100001214000000D8, offset6668288 That's just what it looks like when we discover the end of the WAL by hitting a page that hasn't been overwritten yet in a recycled WAL segment, so the pageaddr is off by a multiple of 16MB. Depending on timing and chance you might be more used to seeing the error where we hit zeroes in a partially filled page, the famous 'wanted 24, got 0', and you can also hit a fully zero-initialised page 'invalid magic number 0000'. All of these are expected, and more exotic errors are possible with power loss torn writes or on crash of a streaming standbys where we currently fail to zero the rest of overwritten pages.
On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has not been created yet -- apparent wraparound I guess what happened here is that after one of your (apparently several?) OOM crashes, crash recovery didn't run all the way to the true end of the WAL due to the maintenance_io_concurrency=0 bug. In the case you reported, it couldn't complete an end-of-recovery checkpoint until you disabled recovery_prefetch, but that's only because of the somewhat unusual way that vismap pages work. In another case it might have been able to (bogusly) complete a checkpoint, leaving things in an inconsistent state.
On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has not been created yet -- apparent wraparound > > I guess what happened here is that after one of your (apparently > several?) OOM crashes, crash recovery didn't run all the way to the > true end of the WAL due to the maintenance_io_concurrency=0 bug. In > the case you reported, it couldn't complete an end-of-recovery > checkpoint until you disabled recovery_prefetch, but that's only > because of the somewhat unusual way that vismap pages work. In > another case it might have been able to (bogusly) complete a > checkpoint, leaving things in an inconsistent state. I think you're saying is that this can be explained by the io_concurrency bug in recovery_prefetch, if run under 15b3. But yesterday I started from initdb and restored this cluster from backup, and started up sqlsmith, and sent some kill -9, and now got more corruption. Looks like it took ~10 induced crashes before this happened. At the moment, I have no reason to believe this issue is related to prefetch_recovery; I am wondering about changes to vacuum. < 2022-09-11 20:19:03.071 CDT telsasoft >ERROR: MultiXactId 732646 has not been created yet -- apparent wraparound < 2022-09-11 20:24:00.530 CDT telsasoft >ERROR: MultiXactId 732646 has not been created yet -- apparent wraparound Program terminated with signal 6, Aborted. #0 0x00007f413716b1f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 libgcc-4.8.5-44.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64lz4-1.8.3-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007f413716b1f7 in raise () from /lib64/libc.so.6 #1 0x00007f413716c8e8 in abort () from /lib64/libc.so.6 #2 0x0000000000962c5c in ExceptionalCondition (conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && !P_ISDELETED(opaque)",errorType=errorType@entry=0x9bad97 "FailedAssertion", fileName=fileName@entry=0x9cdcd1 "nbtpage.c", lineNumber=lineNumber@entry=1778) at assert.c:69 #3 0x0000000000507e34 in _bt_rightsib_halfdeadflag (rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53) atnbtpage.c:1778 #4 0x0000000000507fba in _bt_mark_page_halfdead (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, stack=stack@entry=0x144ca20)at nbtpage.c:2121 #5 0x000000000050af1d in _bt_pagedel (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, vstate=vstate@entry=0x7ffef18de8b0)at nbtpage.c:2004 #6 0x000000000050c996 in btvacuumpage (vstate=vstate@entry=0x7ffef18de8b0, scanblkno=scanblkno@entry=36) at nbtree.c:1342 #7 0x000000000050caf8 in btvacuumscan (info=info@entry=0x7ffef18deac0, stats=stats@entry=0x142fb70, callback=callback@entry=0x67e89b<vac_tid_reaped>, callback_state=callback_state@entry=0x1461220, cycleid=<optimized out>) at nbtree.c:997 #8 0x000000000050cc2f in btbulkdelete (info=0x7ffef18deac0, stats=0x142fb70, callback=0x67e89b <vac_tid_reaped>, callback_state=0x1461220)at nbtree.c:801 #9 0x00000000004fc64b in index_bulk_delete (info=info@entry=0x7ffef18deac0, istat=istat@entry=0x0, callback=callback@entry=0x67e89b<vac_tid_reaped>, callback_state=callback_state@entry=0x1461220) at indexam.c:701 #10 0x000000000068108c in vac_bulkdel_one_index (ivinfo=ivinfo@entry=0x7ffef18deac0, istat=istat@entry=0x0, dead_items=0x1461220)at vacuum.c:2324 #11 0x00000000004f72ae in lazy_vacuum_one_index (indrel=<optimized out>, istat=0x0, reltuples=<optimized out>, vacrel=vacrel@entry=0x142f100)at vacuumlazy.c:2726 #12 0x00000000004f738b in lazy_vacuum_all_indexes (vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2328 #13 0x00000000004f75df in lazy_vacuum (vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:2261 #14 0x00000000004f7f14 in lazy_scan_heap (vacrel=vacrel@entry=0x142f100) at vacuumlazy.c:1264 #15 0x00000000004f895f in heap_vacuum_rel (rel=0x7f4138a67c00, params=0x143cbec, bstrategy=0x143ea20) at vacuumlazy.c:534 #16 0x000000000067f62b in table_relation_vacuum (bstrategy=<optimized out>, params=0x143cbec, rel=0x7f4138a67c00) at ../../../src/include/access/tableam.h:1680 #17 vacuum_rel (relid=1249, relation=<optimized out>, params=params@entry=0x143cbec) at vacuum.c:2086 #18 0x000000000068065c in vacuum (relations=0x144a118, params=params@entry=0x143cbec, bstrategy=<optimized out>, bstrategy@entry=0x143ea20,isTopLevel=isTopLevel@entry=true) at vacuum.c:475 #19 0x0000000000796a0e in autovacuum_do_vac_analyze (tab=tab@entry=0x143cbe8, bstrategy=bstrategy@entry=0x143ea20) at autovacuum.c:3149 #20 0x00000000007987bf in do_autovacuum () at autovacuum.c:2472 #21 0x0000000000798e72 in AutoVacWorkerMain (argc=argc@entry=0, argv=argv@entry=0x0) at autovacuum.c:1715 #22 0x0000000000798eed in StartAutoVacWorker () at autovacuum.c:1493 #23 0x000000000079fe49 in StartAutovacuumWorker () at postmaster.c:5534 #24 0x00000000007a0c44 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5239 #25 <signal handler called> #26 0x00007f4137225783 in __select_nocancel () from /lib64/libc.so.6 #27 0x00000000007a0fc5 in ServerLoop () at postmaster.c:1770 #28 0x00000000007a2361 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x136e9d0) at postmaster.c:1478 #29 0x00000000006ed9c4 in main (argc=3, argv=0x136e9d0) at main.c:202 #2 0x0000000000962c5c in ExceptionalCondition (conditionName=conditionName@entry=0xaa6a32 "false", errorType=errorType@entry=0x9bad97"FailedAssertion", fileName=fileName@entry=0x9c6207 "heapam.c", lineNumber=lineNumber@entry=7803) at assert.c:69 #3 0x00000000004d9db9 in index_delete_sort_cmp (deltid2=0x7ffef18dcc90, deltid1=<optimized out>) at heapam.c:7803 #4 index_delete_sort (delstate=delstate@entry=0x7ffef18ddf10) at heapam.c:7844 #5 0x00000000004e9323 in heap_index_delete_tuples (rel=0x7f4138a672b8, delstate=0x7ffef18ddf10) at heapam.c:7502 #6 0x000000000050a512 in table_index_delete_tuples (delstate=0x7ffef18ddf10, rel=0x0) at ../../../../src/include/access/tableam.h:1329 #7 _bt_delitems_delete_check (rel=rel@entry=0x7f41389dbea0, buf=buf@entry=9183, heapRel=heapRel@entry=0x7f4138a672b8, delstate=delstate@entry=0x7ffef18ddf10)at nbtpage.c:1540 #8 0x00000000004fff30 in _bt_simpledel_pass (rel=rel@entry=0x7f41389dbea0, buffer=buffer@entry=9183, heapRel=heapRel@entry=0x7f4138a672b8,deletable=deletable@entry=0x7ffef18ddfb0, ndeletable=4, newitem=<optimized out>, minoff=2, maxoff=215) at nbtinsert.c:2899 #9 0x0000000000500171 in _bt_delete_or_dedup_one_page (rel=rel@entry=0x7f41389dbea0, heapRel=heapRel@entry=0x7f4138a672b8,insertstate=insertstate@entry=0x7ffef18de3c0, simpleonly=simpleonly@entry=false, checkingunique=checkingunique@entry=true, uniquedup=uniquedup@entry=false, indexUnchanged=indexUnchanged@entry=false)at nbtinsert.c:2710 #10 0x00000000005051ad in _bt_findinsertloc (rel=rel@entry=0x7f41389dbea0, insertstate=insertstate@entry=0x7ffef18de3c0,checkingunique=checkingunique@entry=true, indexUnchanged=indexUnchanged@entry=false, stack=stack@entry=0x157dbc8, heapRel=heapRel@entry=0x7f4138a672b8) at nbtinsert.c:902 #11 0x00000000005055ad in _bt_doinsert (rel=rel@entry=0x7f41389dbea0, itup=itup@entry=0x157dcc8, checkUnique=checkUnique@entry=UNIQUE_CHECK_YES,indexUnchanged=indexUnchanged@entry=false, heapRel=heapRel@entry=0x7f4138a672b8) at nbtinsert.c:256 #12 0x000000000050b16c in btinsert (rel=0x7f41389dbea0, values=<optimized out>, isnull=<optimized out>, ht_ctid=0x157d1bc,heapRel=0x7f4138a672b8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, indexInfo=0x1803df0) at nbtree.c:200 #13 0x00000000004fb95d in index_insert (indexRelation=indexRelation@entry=0x7f41389dbea0, values=values@entry=0x7ffef18de520,isnull=isnull@entry=0x7ffef18de500, heap_t_ctid=heap_t_ctid@entry=0x157d1bc, heapRelation=heapRelation@entry=0x7f4138a672b8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=indexUnchanged@entry=false,indexInfo=indexInfo@entry=0x1803df0) at indexam.c:193 #14 0x0000000000581ae6 in CatalogIndexInsert (indstate=indstate@entry=0x157c2f8, heapTuple=heapTuple@entry=0x157d1b8) atindexing.c:158 #15 0x0000000000581b9f in CatalogTupleInsert (heapRel=heapRel@entry=0x7f4138a672b8, tup=tup@entry=0x157d1b8) at indexing.c:231 #16 0x000000000057996f in InsertPgClassTuple (pg_class_desc=0x7f4138a672b8, new_rel_desc=new_rel_desc@entry=0x7f41389d9e30,new_rel_oid=<optimized out>, relacl=relacl@entry=0, reloptions=reloptions@entry=0)at heap.c:939 #17 0x0000000000579a07 in AddNewRelationTuple (pg_class_desc=pg_class_desc@entry=0x7f4138a672b8, new_rel_desc=new_rel_desc@entry=0x7f41389d9e30,new_rel_oid=new_rel_oid@entry=500038, new_type_oid=new_type_oid@entry=0, reloftype=reloftype@entry=0, relowner=relowner@entry=16556, relkind=relkind@entry=116 't', relfrozenxid=17414307, relminmxid=730642,relacl=relacl@entry=0, reloptions=reloptions@entry=0) at heap.c:998 #18 0x000000000057a204 in heap_create_with_catalog (relname=relname@entry=0x7ffef18dea90 "pg_toast_500035", relnamespace=relnamespace@entry=20138,reltablespace=<optimized out>, relid=500038, relid@entry=0, reltypeid=reltypeid@entry=0, reloftypeid=reloftypeid@entry=0, ownerid=16556, accessmtd=2, tupdesc=tupdesc@entry=0x1800420,cooked_constraints=cooked_constraints@entry=0x0, relkind=relkind@entry=116 't', relpersistence=relpersistence@entry=116 't', shared_relation=shared_relation@entry=false, mapped_relation=mapped_relation@entry=false,oncommit=oncommit@entry=ONCOMMIT_NOOP, reloptions=reloptions@entry=0, use_user_acl=use_user_acl@entry=false, allow_system_table_mods=allow_system_table_mods@entry=true, is_internal=is_internal@entry=true,relrewrite=relrewrite@entry=0, typaddress=typaddress@entry=0x0) at heap.c:1386 #19 0x00000000005a41e2 in create_toast_table (rel=rel@entry=0x7f41389ddfb8, toastOid=toastOid@entry=0, toastIndexOid=toastIndexOid@entry=0,reloptions=reloptions@entry=0, lockmode=lockmode@entry=8, check=check@entry=false, OIDOldToast=OIDOldToast@entry=0) at toasting.c:249 #20 0x00000000005a4571 in CheckAndCreateToastTable (relOid=relOid@entry=500035, reloptions=reloptions@entry=0, lockmode=lockmode@entry=8,check=check@entry=false, OIDOldToast=OIDOldToast@entry=0) at toasting.c:88 #21 0x00000000005a45d3 in NewRelationCreateToastTable (relOid=relOid@entry=500035, reloptions=reloptions@entry=0) at toasting.c:75 #22 0x0000000000609e47 in create_ctas_internal (attrList=attrList@entry=0x17ff798, into=into@entry=0x1374c80) at createas.c:135 #23 0x000000000060a0cf in intorel_startup (self=0x1547678, operation=<optimized out>, typeinfo=0x17fc530) at createas.c:528 #24 0x0000000000694b1e in standard_ExecutorRun (queryDesc=queryDesc@entry=0x1569188, direction=direction@entry=ForwardScanDirection,count=count@entry=0, execute_once=execute_once@entry=true) at execMain.c:352 #25 0x00007f41307d2a2e in pgss_ExecutorRun (queryDesc=0x1569188, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at pg_stat_statements.c:1010 #26 0x00007f41303af648 in explain_ExecutorRun (queryDesc=0x1569188, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at auto_explain.c:320 #27 0x0000000000694c13 in ExecutorRun (queryDesc=queryDesc@entry=0x1569188, direction=direction@entry=ForwardScanDirection,count=count@entry=0, execute_once=execute_once@entry=true) at execMain.c:305 #28 0x000000000060a894 in ExecCreateTableAs (pstate=pstate@entry=0x14bd950, stmt=stmt@entry=0x1545140, params=params@entry=0x0,queryEnv=queryEnv@entry=0x0, qc=qc@entry=0x7ffef18df720) at createas.c:336 #29 0x00000000008378dc in ProcessUtilitySlow (pstate=pstate@entry=0x14bd950, pstmt=pstmt@entry=0x15ce250, queryString=queryString@entry=0x1373df0 "\n-- do paging substitutions\nCREATE TEMPORARY TABLE SU AS\n\tSELECT ...
On Sun, Sep 11, 2022 at 6:42 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I think you're saying is that this can be explained by the > io_concurrency bug in recovery_prefetch, if run under 15b3. > > But yesterday I started from initdb and restored this cluster from backup, and > started up sqlsmith, and sent some kill -9, and now got more corruption. > Looks like it took ~10 induced crashes before this happened. Have you tested fsync on the system? The symptoms here are all over the place. This assertion failure seems like a pretty good sign that the problems happen during recovery, or because basic guarantees needed by for crash safety aren't met: > #2 0x0000000000962c5c in ExceptionalCondition (conditionName=conditionName@entry=0x9ce238 "P_ISLEAF(opaque) && !P_ISDELETED(opaque)",errorType=errorType@entry=0x9bad97 "FailedAssertion", > fileName=fileName@entry=0x9cdcd1 "nbtpage.c", lineNumber=lineNumber@entry=1778) at assert.c:69 > #3 0x0000000000507e34 in _bt_rightsib_halfdeadflag (rel=rel@entry=0x7f4138a238a8, leafrightsib=leafrightsib@entry=53)at nbtpage.c:1778 > #4 0x0000000000507fba in _bt_mark_page_halfdead (rel=rel@entry=0x7f4138a238a8, leafbuf=leafbuf@entry=13637, stack=stack@entry=0x144ca20)at nbtpage.c:2121 This shows that the basic rules for page deletion have somehow seemingly been violated. It's as if a page deletion went ahead, but didn't work as an atomic operation -- there were some lost writes for some but not all pages. Actually, it looks like a mix of states from before and after both the first and the second phases of page deletion -- so not just one atomic operation. -- Peter Geoghegan
On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has not been created yet -- apparent wraparound > > > > I guess what happened here is that after one of your (apparently > > several?) OOM crashes, crash recovery didn't run all the way to the > > true end of the WAL due to the maintenance_io_concurrency=0 bug. In > > the case you reported, it couldn't complete an end-of-recovery > > checkpoint until you disabled recovery_prefetch, but that's only > > because of the somewhat unusual way that vismap pages work. In > > another case it might have been able to (bogusly) complete a > > checkpoint, leaving things in an inconsistent state. > > I think you're saying is that this can be explained by the > io_concurrency bug in recovery_prefetch, if run under 15b3. Well I don't know, but it's one way I could think of that you could have a data page referring to a multixact that isn't on disk after recovery (because the data page happens to have been flushed, but we didn't replay the WAL that would create the multixact). > But yesterday I started from initdb and restored this cluster from backup, and > started up sqlsmith, and sent some kill -9, and now got more corruption. > Looks like it took ~10 induced crashes before this happened. $SUBJECT says 15b4, which doesn't have the fix. Are you still using maintainance_io_concurrent=0?
On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote: > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Sep 12, 2022 at 10:44:38AM +1200, Thomas Munro wrote: > > > On Sat, Sep 10, 2022 at 5:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > < 2022-09-09 19:37:25.835 CDT telsasoft >ERROR: MultiXactId 133553154 has not been created yet -- apparent wraparound > > > > > > I guess what happened here is that after one of your (apparently > > > several?) OOM crashes, crash recovery didn't run all the way to the > > > true end of the WAL due to the maintenance_io_concurrency=0 bug. In > > > the case you reported, it couldn't complete an end-of-recovery > > > checkpoint until you disabled recovery_prefetch, but that's only > > > because of the somewhat unusual way that vismap pages work. In > > > another case it might have been able to (bogusly) complete a > > > checkpoint, leaving things in an inconsistent state. > > > > I think you're saying is that this can be explained by the > > io_concurrency bug in recovery_prefetch, if run under 15b3. > > Well I don't know, but it's one way I could think of that you could > have a data page referring to a multixact that isn't on disk after > recovery (because the data page happens to have been flushed, but we > didn't replay the WAL that would create the multixact). > > > But yesterday I started from initdb and restored this cluster from backup, and > > started up sqlsmith, and sent some kill -9, and now got more corruption. > > Looks like it took ~10 induced crashes before this happened. > > $SUBJECT says 15b4, which doesn't have the fix. Are you still using > maintainance_io_concurrent=0? Yeah ... I just realized that I've already forgotten the relevant chronology. The io_concurrency bugfix wasn't included in 15b4, so (if I understood you correctly), that might explain these symptoms - right ? -- Justin
On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Mon, Sep 12, 2022 at 02:25:48PM +1200, Thomas Munro wrote: > > On Mon, Sep 12, 2022 at 1:42 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > But yesterday I started from initdb and restored this cluster from backup, and > > > started up sqlsmith, and sent some kill -9, and now got more corruption. > > > Looks like it took ~10 induced crashes before this happened. > > > > $SUBJECT says 15b4, which doesn't have the fix. Are you still using > > maintainance_io_concurrent=0? > > Yeah ... I just realized that I've already forgotten the relevant > chronology. > > The io_concurrency bugfix wasn't included in 15b4, so (if I understood > you correctly), that might explain these symptoms - right ? Yeah.
On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote: > On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote: >> Yeah ... I just realized that I've already forgotten the relevant >> chronology. >> >> The io_concurrency bugfix wasn't included in 15b4, so (if I understood >> you correctly), that might explain these symptoms - right ? > > Yeah. Could you double-check if the issues you are seeing persist after syncing up with the latest point of REL_15_STABLE? For now, I have added an open item just to be on the safe side. -- Michael
Attachment
On 2022-Sep-09, Justin Pryzby wrote: > 4) I was simultaneously compiling pg14b4 to run with with > -DRELCACHE_FORCE_RELEASE and installing it into /usr/local. I don't *think* > running libraries would've been overwritten, and that shouldn't have > affected the running instance anyway... If you were installing new files with the system running and under duress, then yeah bad things could happen -- if any ABIs are changed and new connections are opened in between, then these new connections could load new copies of the libraries with changed ABI. This might or might not be happening here, but I wouldn't waste too much time chasing broken databases created this way. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)
On Mon, Sep 12, 2022 at 11:53:14AM +0900, Michael Paquier wrote: > On Mon, Sep 12, 2022 at 02:34:48PM +1200, Thomas Munro wrote: > > On Mon, Sep 12, 2022 at 2:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > >> Yeah ... I just realized that I've already forgotten the relevant > >> chronology. > >> > >> The io_concurrency bugfix wasn't included in 15b4, so (if I understood > >> you correctly), that might explain these symptoms - right ? > > > > Yeah. > > Could you double-check if the issues you are seeing persist after > syncing up with the latest point of REL_15_STABLE? For now, I have > added an open item just to be on the safe side. After another round of restore-from-backup, and sqlsmith-with-kill-9, it looks to be okay. The issue was evidently another possible symptom of the recovery prefetch bug, which is already fixed in REL_15_STABLE (but not in pg15b4). -- Justin
On Mon, Sep 12, 2022 at 04:29:22PM -0500, Justin Pryzby wrote: > After another round of restore-from-backup, and sqlsmith-with-kill-9, it > looks to be okay. The issue was evidently another possible symptom of > the recovery prefetch bug, which is already fixed in REL_15_STABLE (but > not in pg15b4). Nice! Thanks for double-checking. -- Michael