Thread: Segmentation fault in backend/access/heap/pruneheap.c: heap_page_prune_opt() calling PageIsPrunable () with NULL page on FreeBSD / PowerPC

Hi y'all,

I've been experiencing segfaults of PostgreSQL for quite a quite now
(since July 2008, PostgreSQL 8.3.3, perhaps?) on a FreeBSD 7 PowerPC
(7400) system (not sure if anyone really cares about this particular
platform, but I'll try :-)):
____

internal# gdb postgres postgres.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "powerpc-marcel-freebsd"...
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/local/lib/libintl.so.8...done.
Loaded symbols for /usr/local/lib/libintl.so.8
Reading symbols from /usr/local/lib/libxml2.so.5...done.
Loaded symbols for /usr/local/lib/libxml2.so.5
Reading symbols from /usr/lib/libssl.so.5...done.
Loaded symbols for /usr/lib/libssl.so.5
Reading symbols from /lib/libcrypto.so.5...done.
Loaded symbols for /lib/libcrypto.so.5
Reading symbols from /lib/libcrypt.so.4...done.
Loaded symbols for /lib/libcrypt.so.4
Reading symbols from /lib/libm.so.5...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libthr.so.3...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /usr/local/lib/libiconv.so.3...done.
Loaded symbols for /usr/local/lib/libiconv.so.3
Reading symbols from /lib/libz.so.4...done.
Loaded symbols for /lib/libz.so.4
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1,
    OldestXmin=28440) at pruneheap.c:79
79 if (!PageIsPrunable(dp, OldestXmin))
[New Thread 0x22401100 (LWP 100093)]
(gdb) bt full
#0  0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1,
    OldestXmin=28440) at pruneheap.c:79
dp = 0x0
minfree = 4294967295
#1  0x0185f790 in heapgetpage (scan=0x22970d88, page=0) at heapam.c:216
buffer = -1
snapshot = 0x224af818
dp = 0x20100 <Address 0x20100 out of bounds>
lines = 184
ntup = 28449656
lineoff = 0
lpp = 0x10003
#2  0x01860848 in heapgettup_pagemode (scan=0x22970d88,
    dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:583
tuple = 0x22970dac
backward = 0 '\0'
page = 0
finished = 0 '\0'
dp = 0x3f <Address 0x3f out of bounds>
lines = 28315260
lineindex = 0
lineoff = 4
linesleft = 0
lpp = 0x19
#3  0x01861fa8 in heap_getnext (scan=0x22970d88,
    direction=ForwardScanDirection) at heapam.c:1269
No locals.
#4  0x019b39e8 in SeqNext (node=0x22970c50) at nodeSeqscan.c:92
tuple = 0x7fffd1c0
scandesc = 0x22970d88
scanrelid = 4
estate = (EState *) 0x224f3018
direction = ForwardScanDirection
slot = (TupleTableSlot *) 0x224f34b0
#5  0x019a07b8 in ExecScan (node=0x22970c50, accessMtd=0x19b38bc
<SeqNext>)
    at execScan.c:103
slot = (TupleTableSlot *) 0x168
econtext = (ExprContext *) 0x22970cd8
qual = (List *) 0x0
projInfo = (ProjectionInfo *) 0x229712c8
isDone = 580330028
resultSlot = (TupleTableSlot *) 0x7fffd1f0
#6  0x019b3a74 in ExecSeqScan (node=0x22970c50) at nodeSeqscan.c:130
No locals.
#7  0x01995530 in ExecProcNode (node=0x22970c50) at execProcnode.c:334
result = (TupleTableSlot *) 0x4e6958
__func__ = "ExecProcNode"
#8  0x019b46e4 in ExecSort (node=0x22970bc8) at nodeSort.c:102
plannode = (Sort *) 0x224ec828
outerNode = (PlanState *) 0x22970c50
tupDesc = 0x229711a0
estate = (EState *) 0x224f3018
dir = ForwardScanDirection
tuplesortstate = (Tuplesortstate *) 0x22972018
slot = (TupleTableSlot *) 0x224f4f70
#9  0x01995638 in ExecProcNode (node=0x22970bc8) at execProcnode.c:386
result = (TupleTableSlot *) 0x224d18d8
__func__ = "ExecProcNode"
#10 0x019b4c3c in ExecUnique (node=0x22970b40) at nodeUnique.c:69
plannode = (Unique *) 0x224ecd58
resultTupleSlot = (TupleTableSlot *) 0x224f33b0
slot = (TupleTableSlot *) 0xffd310
outerPlan = (PlanState *) 0x22970bc8
#11 0x01995680 in ExecProcNode (node=0x22970b40) at execProcnode.c:398
result = (TupleTableSlot *) 0x224f4f70
__func__ = "ExecProcNode"
#12 0x019b8db0 in SubqueryNext (node=0x224f4f70) at
nodeSubqueryscan.c:58
slot = (TupleTableSlot *) 0x224f4f70
#13 0x019a07b8 in ExecScan (node=0x224f4f70,
    accessMtd=0x19b8d88 <SubqueryNext>) at execScan.c:103
slot = (TupleTableSlot *) 0x224e6a70
econtext = (ExprContext *) 0x22970080
qual = (List *) 0x22970b28
projInfo = (ProjectionInfo *) 0x22971710
isDone = 575586072
resultSlot = (TupleTableSlot *) 0x19
#14 0x019b8e14 in ExecSubqueryScan (node=0x224f4f70) at
nodeSubqueryscan.c:85
No locals.
#15 0x01995590 in ExecProcNode (node=0x224f4f70) at execProcnode.c:352
result = (TupleTableSlot *) 0x224f4f70
__func__ = "ExecProcNode"
#16 0x01992868 in ExecutePlan (estate=0x224f3018, planstate=0x224f4f70,
    operation=CMD_INSERT, numberTuples=0,
direction=ForwardScanDirection,
    dest=0x224ed840) at execMain.c:1335
junkfilter = (JunkFilter *) 0x2000
planSlot = (TupleTableSlot *) 0xb
slot = (TupleTableSlot *) 0x224eb8b8
tupleid = 0x0
tuple_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 128}, ip_posid = 0}
current_tuple_count = 0
result = (TupleTableSlot *) 0x0
__func__ = "ExecutePlan"
#17 0x01990a14 in ExecutorRun (queryDesc=0x224af840,
    direction=ForwardScanDirection, count=0) at execMain.c:270
estate = (EState *) 0x224f3018
operation = CMD_INSERT
dest = (DestReceiver *) 0x224ed840
sendTuples = 0 '\0'
result = (TupleTableSlot *) 0x7fffd510
oldcontext = 0x224d17c8
#18 0x01aa17fc in ProcessQuery (plan=0x224ed7d0, params=0x0,
dest=0x224ed840,
    completionTag=0x7fffd81e "") at pquery.c:179
queryDesc = (QueryDesc *) 0x224af840
__func__ = "ProcessQuery"
#19 0x01aa3494 in PortalRunMulti (portal=0x224d7018, isTopLevel=1
'\001',
    dest=0x224ed840, altdest=0x224ed840, completionTag=0x7fffd81e "")
    at pquery.c:1242
pstmt = (PlannedStmt *) 0x224ed7d0
stmt = (Node *) 0x224ed7d0
stmtlist_item = (ListCell *) 0x224ed818
#20 0x01aa2910 in PortalRun (portal=0x224d7018, count=2147483647,
    isTopLevel=1 '\001', dest=0x224ed840, altdest=0x224ed840,
    completionTag=0x7fffd81e "") at pquery.c:813
save_exception_stack = (sigjmp_buf *) 0x7fffd920
save_context_stack = (ErrorContextCallback *) 0x0
local_sigjmp_buf = {{_sjb = {0, 2147473696, 0, 2147472896, 574843656,
      567325576, 2147472832, 27928436, 671088770, 0 <repeats 16 times>,
3,
      2147474872, 2147472832, 0, 0, 0, 0, 0, 0, 3, 2147474872,
2147472864, 3,
      575447184, 2147473040, 2147473088, 3, 4, 2147473056, 4, 575477696,
0,
      574816384, 2147473104, 1, 574816384, 2147473088, 2147473136,
27060864,
      3, 2147473104, 2147473152, 3, 0, 575180896, 4, 575477832,
575180824,
      2147473136, 2147473232, 14, 574816384, 2147473152, 2147473264,
22472,
      575477800, 64, 3, 575592504, 575582208, 574843656, 0, 14,
574843656, 40,
      575592512, 575477928, 575180824, 29967036, 0, 3, 2147473312,
2147473232,
      2147473264, 29295016, 574843656, 40, 0, 3, 575477840, 2147473264,
      2147473312, 25429760, 575592512, 29967036, 2147473344}}}
result = 127 '\177'
saveTopTransactionResourceOwner = 0x224d1488
saveTopTransactionContext = 0x22436b90
saveActivePortal = 0x0
saveActiveSnapshot = 0x0
saveResourceOwner = 0x224d1488
savePortalContext = 0x0
saveMemoryContext = 0x22436b90
__func__ = "PortalRun"
#21 0x01a9bae4 in exec_simple_query (
    query_string=0x22493018 "INSERT INTO Path (Path) SELECT a.Path FROM
(SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path
FROM Path WHERE Path = a.Path) ") at postgres.c:986
parsetree = (Node *) 0x22494090
portal = 0x224d7018
commandTag = 0x1c9618c "INSERT"
completionTag = "\000 \177��P\001\234��\"I0\030\000\000\000\217\"I0\030
\001\234�X\000\000\000\000\000\000\000T\177��P\000\000\000\003A�\v&
\177��P\177��\200\001\234��\000\000\000\217\"I"
querytree_list = (List *) 0x224e5c28
plantree_list = (List *) 0x224ed828
receiver = (DestReceiver *) 0x224ed840
format = 0
dest = DestRemote
oldcontext = 0x22436b90
parsetree_list = (List *) 0x224940c8
parsetree_item = (ListCell *) 0x224940b8
save_log_statement_stats = 0 '\0'
was_logged = 0 '\0'
isTopLevel = 1 '\001'
msec_str = "\"I0\030\000\000\000\217\000\000\000\003\000\000\000\000
\177��\020\177��P\001��l\000\000\000Q"
__func__ = "exec_simple_query"
#22 0x01aa08b0 in PostgresMain (argc=4, argv=0x22433630,
    username=0x22433610 "bacula") at postgres.c:3572
query_string = 0x22493018 "INSERT INTO Path (Path) SELECT a.Path FROM
(SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path
FROM Path WHERE Path = a.Path) "
flag = -1
dbname = 0x2241e128 "bacula"
userDoption = 0x0
secure = 0 '\0'
errs = 0
debug_flag = -1
guc_names = (List *) 0x0
guc_values = (List *) 0x0
ctx = PGC_USERSET
gucsource = PGC_S_CLIENT
am_superuser = 0 '\0'
firstchar = 81
stack_base = 0 '\0'
input_message = {
  data = 0x22493018 "INSERT INTO Path (Path) SELECT a.Path FROM (SELECT
DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path FROM Path
WHERE Path = a.Path) ", len = 144, maxlen = 1024, cursor = 144}
local_sigjmp_buf = {{_sjb = {1, 2147151939, -1, -1, -1, 567325576,
      2147473536, 27919876, 1207959620, 0 <repeats 16 times>, 3,
2147474872,
      2147473536, 0, 29945672, 2147474872, 0, 2147474192, 572907720,
67108864,
      2147474200, 2147473872, 0, 574620516, 30567600, 23, 34144255,
573125272,
      4095, 2147473968, 574624000, 571861904, 939524226, 0, 571861780,
      572989924, 61490, 2147473936, 167772160, 3072, 0, 0, 0, 0, 0,
30567552,
      573164272, 30264272, 574619648, 574840832, 30567552, 573125272,
      574620296, 2147474016, 572458996, 0, 0, 2147474032, 0, 4,
574619648, 0,
      574840832, 0, 574620296, 2147474080, 572459680, 0, 2147474032,
      2147474096, 0, 4, 0, 0, 574824600, 574816384, 0, 0, 3, 574816384,
      574824608, 2147474112, 29289624, 12, 18, 0}}}
send_ready_for_query = 0 '\0'
__func__ = "PostgresMain"
#23 0x01a56748 in BackendRun (port=0x22407400) at postmaster.c:3207
av = (char **) 0x22433630
maxac = 10
ac = 4
secs = 285943416
usecs = 894679
protobuf = "-v196608\000@t\000\001�� \"C6 \"C6\020\000\000\000@\001��X"
i = 4
__func__ = "BackendRun"
#24 0x01a55988 in BackendStartup (port=0x22407400) at postmaster.c:2830
bn = (Backend *) 0x224051d0
pid = 0
__func__ = "BackendStartup"
#25 0x01a52994 in ServerLoop () at postmaster.c:1274
port = (Port *) 0x22407400
i = 2
rmask = {__fds_bits = {32, 0 <repeats 31 times>}}
selres = 1
readmask = {__fds_bits = {56, 0 <repeats 31 times>}}
nSockets = 6
now = 1232628203
last_touch_time = 1232626270
__func__ = "ServerLoop"
#26 0x01a52198 in PostmasterMain (argc=3, argv=0x7fffdda8) at
postmaster.c:1029
opt = -1
status = 0
userDoption = 0x7fffde6f "/usr/local/pgsql/data"
i = 64
__func__ = "PostmasterMain"
#27 0x019ce288 in main (argc=3, argv=0x7fffdda8) at main.c:188
No locals.
(gdb)
____

This is during a Bacula (http://www.bacula.org) backup run, by the way.

Should I try adding a null-pointer check in PageIsPrunable () or might
this be symptomatic of wider issues?

Could this be an indication of database corruption, or definitely a
run-time thing?

Any pointers (including "you idiot, there's a password in your
backtrace! :-)) appreciated!
--
Nick Withers
email: nick@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446

Attachment
Nick Withers <nick@nickwithers.com> writes:
> I've been experiencing segfaults of PostgreSQL for quite a quite now
> (since July 2008, PostgreSQL 8.3.3, perhaps?) on a FreeBSD 7 PowerPC
> (7400) system (not sure if anyone really cares about this particular
> platform, but I'll try :-)):

Hmm, is this query accessing a temporary table?

> #0  0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1,
>     OldestXmin=28440) at pruneheap.c:79
> 79 if (!PageIsPrunable(dp, OldestXmin))
> [New Thread 0x22401100 (LWP 100093)]
> (gdb) bt full
> #0  0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1,
>     OldestXmin=28440) at pruneheap.c:79
> dp = 0x0
> minfree = 4294967295
> #1  0x0185f790 in heapgetpage (scan=0x22970d88, page=0) at heapam.c:216
> buffer = -1
> snapshot = 0x224af818
> dp = 0x20100 <Address 0x20100 out of bounds>
> lines = 184
> ntup = 28449656
> lineoff = 0
> lpp = 0x10003

Buffer -1 should refer to the first local buffer (ie, the first one used
for temp tables).  The value of dp seems to indicate that
LocalBufferBlockPointers[0] was zero, but that should certainly not be
possible if ReadBuffer returned the buffer as valid.  So I'm confused.
Can you extract a self-contained test case to reproduce this?

            regards, tom lane

On Thu, 2009-01-22 at 23:42 -0500, Tom Lane wrote:
> Nick Withers <nick@nickwithers.com> writes:
> > I've been experiencing segfaults of PostgreSQL for quite a quite now
> > (since July 2008, PostgreSQL 8.3.3, perhaps?) on a FreeBSD 7 PowerPC
> > (7400) system (not sure if anyone really cares about this particular
> > platform, but I'll try :-)):
>
> Hmm, is this query accessing a temporary table?

Couldn't tell you, I'm afraid... I'm not very knowledgeable here.

> > #0  0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1,
> >     OldestXmin=28440) at pruneheap.c:79
> > 79 if (!PageIsPrunable(dp, OldestXmin))
> > [New Thread 0x22401100 (LWP 100093)]
> > (gdb) bt full
> > #0  0x0186a1ec in heap_page_prune_opt (relation=0x224c9f70, buffer=-1,
> >     OldestXmin=28440) at pruneheap.c:79
> > dp = 0x0
> > minfree = 4294967295
> > #1  0x0185f790 in heapgetpage (scan=0x22970d88, page=0) at heapam.c:216
> > buffer = -1
> > snapshot = 0x224af818
> > dp = 0x20100 <Address 0x20100 out of bounds>
> > lines = 184
> > ntup = 28449656
> > lineoff = 0
> > lpp = 0x10003
>
> Buffer -1 should refer to the first local buffer (ie, the first one used
> for temp tables).  The value of dp seems to indicate that
> LocalBufferBlockPointers[0] was zero, but that should certainly not be
> possible if ReadBuffer returned the buffer as valid.  So I'm confused.
> Can you extract a self-contained test case to reproduce this?

Don't know that I can (again, don't really know what I'm doing), but if
that's what's needed I'll have to have a crack...?

The segfault doesn't get triggered reliably on every backup or anything
like that. Sometimes there'll be no problem making relatively large
updates to the database, sometimes it'll fall over on small jobs. I'm
not aware of any pattern.

Would've been good if I reported this earlier, really, as I'm not too
sure when the problem started (it was fine for at least a year before
this). It could've been between 8.3.1 and 8.3.3, or even after a shift
from 8.1.10 (-ish) to 8.3.3. I can't remember what I was running when it
wasn't an issue... Sorry!

This is now on 8.3.5, by the way (of all the things to forget to
mention!).

>             regards, tom lane

Cheers Tom
--
Nick Withers
email: nick@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446

Attachment