Thread: Segmentation fault in backend/access/heap/pruneheap.c: heap_page_prune_opt() calling PageIsPrunable () with NULL page on FreeBSD / PowerPC
Segmentation fault in backend/access/heap/pruneheap.c: heap_page_prune_opt() calling PageIsPrunable () with NULL page on FreeBSD / PowerPC
From
Nick Withers
Date:
Hi y'all, I've been experiencing segfaults of PostgreSQL for quite a quite now (since July 2008, PostgreSQL 8.3.3, perhaps?) on a FreeBSD 7 PowerPC (7400) system (not sure if anyone really cares about this particular platform, but I'll try :-)): ____ internal# gdb postgres postgres.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "powerpc-marcel-freebsd"... Core was generated by `postgres'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/local/lib/libintl.so.8...done. Loaded symbols for /usr/local/lib/libintl.so.8 Reading symbols from /usr/local/lib/libxml2.so.5...done. Loaded symbols for /usr/local/lib/libxml2.so.5 Reading symbols from /usr/lib/libssl.so.5...done. Loaded symbols for /usr/lib/libssl.so.5 Reading symbols from /lib/libcrypto.so.5...done. Loaded symbols for /lib/libcrypto.so.5 Reading symbols from /lib/libcrypt.so.4...done. Loaded symbols for /lib/libcrypt.so.4 Reading symbols from /lib/libm.so.5...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libthr.so.3...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /usr/local/lib/libiconv.so.3...done. Loaded symbols for /usr/local/lib/libiconv.so.3 Reading symbols from /lib/libz.so.4...done. Loaded symbols for /lib/libz.so.4 Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1, OldestXmin=28440) at pruneheap.c:79 79 if (!PageIsPrunable(dp, OldestXmin)) [New Thread 0x22401100 (LWP 100093)] (gdb) bt full #0 0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1, OldestXmin=28440) at pruneheap.c:79 dp = 0x0 minfree = 4294967295 #1 0x0185f790 in heapgetpage (scan=0x22970d88, page=0) at heapam.c:216 buffer = -1 snapshot = 0x224af818 dp = 0x20100 <Address 0x20100 out of bounds> lines = 184 ntup = 28449656 lineoff = 0 lpp = 0x10003 #2 0x01860848 in heapgettup_pagemode (scan=0x22970d88, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:583 tuple = 0x22970dac backward = 0 '\0' page = 0 finished = 0 '\0' dp = 0x3f <Address 0x3f out of bounds> lines = 28315260 lineindex = 0 lineoff = 4 linesleft = 0 lpp = 0x19 #3 0x01861fa8 in heap_getnext (scan=0x22970d88, direction=ForwardScanDirection) at heapam.c:1269 No locals. #4 0x019b39e8 in SeqNext (node=0x22970c50) at nodeSeqscan.c:92 tuple = 0x7fffd1c0 scandesc = 0x22970d88 scanrelid = 4 estate = (EState *) 0x224f3018 direction = ForwardScanDirection slot = (TupleTableSlot *) 0x224f34b0 #5 0x019a07b8 in ExecScan (node=0x22970c50, accessMtd=0x19b38bc <SeqNext>) at execScan.c:103 slot = (TupleTableSlot *) 0x168 econtext = (ExprContext *) 0x22970cd8 qual = (List *) 0x0 projInfo = (ProjectionInfo *) 0x229712c8 isDone = 580330028 resultSlot = (TupleTableSlot *) 0x7fffd1f0 #6 0x019b3a74 in ExecSeqScan (node=0x22970c50) at nodeSeqscan.c:130 No locals. #7 0x01995530 in ExecProcNode (node=0x22970c50) at execProcnode.c:334 result = (TupleTableSlot *) 0x4e6958 __func__ = "ExecProcNode" #8 0x019b46e4 in ExecSort (node=0x22970bc8) at nodeSort.c:102 plannode = (Sort *) 0x224ec828 outerNode = (PlanState *) 0x22970c50 tupDesc = 0x229711a0 estate = (EState *) 0x224f3018 dir = ForwardScanDirection tuplesortstate = (Tuplesortstate *) 0x22972018 slot = (TupleTableSlot *) 0x224f4f70 #9 0x01995638 in ExecProcNode (node=0x22970bc8) at execProcnode.c:386 result = (TupleTableSlot *) 0x224d18d8 __func__ = "ExecProcNode" #10 0x019b4c3c in ExecUnique (node=0x22970b40) at nodeUnique.c:69 plannode = (Unique *) 0x224ecd58 resultTupleSlot = (TupleTableSlot *) 0x224f33b0 slot = (TupleTableSlot *) 0xffd310 outerPlan = (PlanState *) 0x22970bc8 #11 0x01995680 in ExecProcNode (node=0x22970b40) at execProcnode.c:398 result = (TupleTableSlot *) 0x224f4f70 __func__ = "ExecProcNode" #12 0x019b8db0 in SubqueryNext (node=0x224f4f70) at nodeSubqueryscan.c:58 slot = (TupleTableSlot *) 0x224f4f70 #13 0x019a07b8 in ExecScan (node=0x224f4f70, accessMtd=0x19b8d88 <SubqueryNext>) at execScan.c:103 slot = (TupleTableSlot *) 0x224e6a70 econtext = (ExprContext *) 0x22970080 qual = (List *) 0x22970b28 projInfo = (ProjectionInfo *) 0x22971710 isDone = 575586072 resultSlot = (TupleTableSlot *) 0x19 #14 0x019b8e14 in ExecSubqueryScan (node=0x224f4f70) at nodeSubqueryscan.c:85 No locals. #15 0x01995590 in ExecProcNode (node=0x224f4f70) at execProcnode.c:352 result = (TupleTableSlot *) 0x224f4f70 __func__ = "ExecProcNode" #16 0x01992868 in ExecutePlan (estate=0x224f3018, planstate=0x224f4f70, operation=CMD_INSERT, numberTuples=0, direction=ForwardScanDirection, dest=0x224ed840) at execMain.c:1335 junkfilter = (JunkFilter *) 0x2000 planSlot = (TupleTableSlot *) 0xb slot = (TupleTableSlot *) 0x224eb8b8 tupleid = 0x0 tuple_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 128}, ip_posid = 0} current_tuple_count = 0 result = (TupleTableSlot *) 0x0 __func__ = "ExecutePlan" #17 0x01990a14 in ExecutorRun (queryDesc=0x224af840, direction=ForwardScanDirection, count=0) at execMain.c:270 estate = (EState *) 0x224f3018 operation = CMD_INSERT dest = (DestReceiver *) 0x224ed840 sendTuples = 0 '\0' result = (TupleTableSlot *) 0x7fffd510 oldcontext = 0x224d17c8 #18 0x01aa17fc in ProcessQuery (plan=0x224ed7d0, params=0x0, dest=0x224ed840, completionTag=0x7fffd81e "") at pquery.c:179 queryDesc = (QueryDesc *) 0x224af840 __func__ = "ProcessQuery" #19 0x01aa3494 in PortalRunMulti (portal=0x224d7018, isTopLevel=1 '\001', dest=0x224ed840, altdest=0x224ed840, completionTag=0x7fffd81e "") at pquery.c:1242 pstmt = (PlannedStmt *) 0x224ed7d0 stmt = (Node *) 0x224ed7d0 stmtlist_item = (ListCell *) 0x224ed818 #20 0x01aa2910 in PortalRun (portal=0x224d7018, count=2147483647, isTopLevel=1 '\001', dest=0x224ed840, altdest=0x224ed840, completionTag=0x7fffd81e "") at pquery.c:813 save_exception_stack = (sigjmp_buf *) 0x7fffd920 save_context_stack = (ErrorContextCallback *) 0x0 local_sigjmp_buf = {{_sjb = {0, 2147473696, 0, 2147472896, 574843656, 567325576, 2147472832, 27928436, 671088770, 0 <repeats 16 times>, 3, 2147474872, 2147472832, 0, 0, 0, 0, 0, 0, 3, 2147474872, 2147472864, 3, 575447184, 2147473040, 2147473088, 3, 4, 2147473056, 4, 575477696, 0, 574816384, 2147473104, 1, 574816384, 2147473088, 2147473136, 27060864, 3, 2147473104, 2147473152, 3, 0, 575180896, 4, 575477832, 575180824, 2147473136, 2147473232, 14, 574816384, 2147473152, 2147473264, 22472, 575477800, 64, 3, 575592504, 575582208, 574843656, 0, 14, 574843656, 40, 575592512, 575477928, 575180824, 29967036, 0, 3, 2147473312, 2147473232, 2147473264, 29295016, 574843656, 40, 0, 3, 575477840, 2147473264, 2147473312, 25429760, 575592512, 29967036, 2147473344}}} result = 127 '\177' saveTopTransactionResourceOwner = 0x224d1488 saveTopTransactionContext = 0x22436b90 saveActivePortal = 0x0 saveActiveSnapshot = 0x0 saveResourceOwner = 0x224d1488 savePortalContext = 0x0 saveMemoryContext = 0x22436b90 __func__ = "PortalRun" #21 0x01a9bae4 in exec_simple_query ( query_string=0x22493018 "INSERT INTO Path (Path) SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path) ") at postgres.c:986 parsetree = (Node *) 0x22494090 portal = 0x224d7018 commandTag = 0x1c9618c "INSERT" completionTag = "\000 \177��P\001\234��\"I0\030\000\000\000\217\"I0\030 \001\234�X\000\000\000\000\000\000\000T\177��P\000\000\000\003A�\v& \177��P\177��\200\001\234��\000\000\000\217\"I" querytree_list = (List *) 0x224e5c28 plantree_list = (List *) 0x224ed828 receiver = (DestReceiver *) 0x224ed840 format = 0 dest = DestRemote oldcontext = 0x22436b90 parsetree_list = (List *) 0x224940c8 parsetree_item = (ListCell *) 0x224940b8 save_log_statement_stats = 0 '\0' was_logged = 0 '\0' isTopLevel = 1 '\001' msec_str = "\"I0\030\000\000\000\217\000\000\000\003\000\000\000\000 \177��\020\177��P\001��l\000\000\000Q" __func__ = "exec_simple_query" #22 0x01aa08b0 in PostgresMain (argc=4, argv=0x22433630, username=0x22433610 "bacula") at postgres.c:3572 query_string = 0x22493018 "INSERT INTO Path (Path) SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path) " flag = -1 dbname = 0x2241e128 "bacula" userDoption = 0x0 secure = 0 '\0' errs = 0 debug_flag = -1 guc_names = (List *) 0x0 guc_values = (List *) 0x0 ctx = PGC_USERSET gucsource = PGC_S_CLIENT am_superuser = 0 '\0' firstchar = 81 stack_base = 0 '\0' input_message = { data = 0x22493018 "INSERT INTO Path (Path) SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path FROM Path WHERE Path = a.Path) ", len = 144, maxlen = 1024, cursor = 144} local_sigjmp_buf = {{_sjb = {1, 2147151939, -1, -1, -1, 567325576, 2147473536, 27919876, 1207959620, 0 <repeats 16 times>, 3, 2147474872, 2147473536, 0, 29945672, 2147474872, 0, 2147474192, 572907720, 67108864, 2147474200, 2147473872, 0, 574620516, 30567600, 23, 34144255, 573125272, 4095, 2147473968, 574624000, 571861904, 939524226, 0, 571861780, 572989924, 61490, 2147473936, 167772160, 3072, 0, 0, 0, 0, 0, 30567552, 573164272, 30264272, 574619648, 574840832, 30567552, 573125272, 574620296, 2147474016, 572458996, 0, 0, 2147474032, 0, 4, 574619648, 0, 574840832, 0, 574620296, 2147474080, 572459680, 0, 2147474032, 2147474096, 0, 4, 0, 0, 574824600, 574816384, 0, 0, 3, 574816384, 574824608, 2147474112, 29289624, 12, 18, 0}}} send_ready_for_query = 0 '\0' __func__ = "PostgresMain" #23 0x01a56748 in BackendRun (port=0x22407400) at postmaster.c:3207 av = (char **) 0x22433630 maxac = 10 ac = 4 secs = 285943416 usecs = 894679 protobuf = "-v196608\000@t\000\001�� \"C6 \"C6\020\000\000\000@\001��X" i = 4 __func__ = "BackendRun" #24 0x01a55988 in BackendStartup (port=0x22407400) at postmaster.c:2830 bn = (Backend *) 0x224051d0 pid = 0 __func__ = "BackendStartup" #25 0x01a52994 in ServerLoop () at postmaster.c:1274 port = (Port *) 0x22407400 i = 2 rmask = {__fds_bits = {32, 0 <repeats 31 times>}} selres = 1 readmask = {__fds_bits = {56, 0 <repeats 31 times>}} nSockets = 6 now = 1232628203 last_touch_time = 1232626270 __func__ = "ServerLoop" #26 0x01a52198 in PostmasterMain (argc=3, argv=0x7fffdda8) at postmaster.c:1029 opt = -1 status = 0 userDoption = 0x7fffde6f "/usr/local/pgsql/data" i = 64 __func__ = "PostmasterMain" #27 0x019ce288 in main (argc=3, argv=0x7fffdda8) at main.c:188 No locals. (gdb) ____ This is during a Bacula (http://www.bacula.org) backup run, by the way. Should I try adding a null-pointer check in PageIsPrunable () or might this be symptomatic of wider issues? Could this be an indication of database corruption, or definitely a run-time thing? Any pointers (including "you idiot, there's a password in your backtrace! :-)) appreciated! -- Nick Withers email: nick@nickwithers.com Web: http://www.nickwithers.com Mobile: +61 414 397 446
Attachment
Re: Segmentation fault in backend/access/heap/pruneheap.c: heap_page_prune_opt() calling PageIsPrunable () with NULL page on FreeBSD / PowerPC
From
Tom Lane
Date:
Nick Withers <nick@nickwithers.com> writes: > I've been experiencing segfaults of PostgreSQL for quite a quite now > (since July 2008, PostgreSQL 8.3.3, perhaps?) on a FreeBSD 7 PowerPC > (7400) system (not sure if anyone really cares about this particular > platform, but I'll try :-)): Hmm, is this query accessing a temporary table? > #0 0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1, > OldestXmin=28440) at pruneheap.c:79 > 79 if (!PageIsPrunable(dp, OldestXmin)) > [New Thread 0x22401100 (LWP 100093)] > (gdb) bt full > #0 0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1, > OldestXmin=28440) at pruneheap.c:79 > dp = 0x0 > minfree = 4294967295 > #1 0x0185f790 in heapgetpage (scan=0x22970d88, page=0) at heapam.c:216 > buffer = -1 > snapshot = 0x224af818 > dp = 0x20100 <Address 0x20100 out of bounds> > lines = 184 > ntup = 28449656 > lineoff = 0 > lpp = 0x10003 Buffer -1 should refer to the first local buffer (ie, the first one used for temp tables). The value of dp seems to indicate that LocalBufferBlockPointers[0] was zero, but that should certainly not be possible if ReadBuffer returned the buffer as valid. So I'm confused. Can you extract a self-contained test case to reproduce this? regards, tom lane
Re: Segmentation fault in backend/access/heap/pruneheap.c: heap_page_prune_opt() calling PageIsPrunable () with NULL page on FreeBSD / PowerPC
From
Nick Withers
Date:
On Thu, 2009-01-22 at 23:42 -0500, Tom Lane wrote: > Nick Withers <nick@nickwithers.com> writes: > > I've been experiencing segfaults of PostgreSQL for quite a quite now > > (since July 2008, PostgreSQL 8.3.3, perhaps?) on a FreeBSD 7 PowerPC > > (7400) system (not sure if anyone really cares about this particular > > platform, but I'll try :-)): > > Hmm, is this query accessing a temporary table? Couldn't tell you, I'm afraid... I'm not very knowledgeable here. > > #0 0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1, > > OldestXmin=28440) at pruneheap.c:79 > > 79 if (!PageIsPrunable(dp, OldestXmin)) > > [New Thread 0x22401100 (LWP 100093)] > > (gdb) bt full > > #0 0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1, > > OldestXmin=28440) at pruneheap.c:79 > > dp = 0x0 > > minfree = 4294967295 > > #1 0x0185f790 in heapgetpage (scan=0x22970d88, page=0) at heapam.c:216 > > buffer = -1 > > snapshot = 0x224af818 > > dp = 0x20100 <Address 0x20100 out of bounds> > > lines = 184 > > ntup = 28449656 > > lineoff = 0 > > lpp = 0x10003 > > Buffer -1 should refer to the first local buffer (ie, the first one used > for temp tables). The value of dp seems to indicate that > LocalBufferBlockPointers[0] was zero, but that should certainly not be > possible if ReadBuffer returned the buffer as valid. So I'm confused. > Can you extract a self-contained test case to reproduce this? Don't know that I can (again, don't really know what I'm doing), but if that's what's needed I'll have to have a crack...? The segfault doesn't get triggered reliably on every backup or anything like that. Sometimes there'll be no problem making relatively large updates to the database, sometimes it'll fall over on small jobs. I'm not aware of any pattern. Would've been good if I reported this earlier, really, as I'm not too sure when the problem started (it was fine for at least a year before this). It could've been between 8.3.1 and 8.3.3, or even after a shift from 8.1.10 (-ish) to 8.3.3. I can't remember what I was running when it wasn't an issue... Sorry! This is now on 8.3.5, by the way (of all the things to forget to mention!). > regards, tom lane Cheers Tom -- Nick Withers email: nick@nickwithers.com Web: http://www.nickwithers.com Mobile: +61 414 397 446