On Thu, Feb 12, 2015 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Johansen <davejohansen@gmail.com> writes: > I'm running Postgres 8.4.20 on RHEL 6.4 and it will occasionally crash. The > postgres.log file just says that a PID was terminated. The output from > dmesg has a message like this one: > postmaster[22905]: segfault at 686 ip 0000000000000686 sp 00007fff83d72e88 > error 14 in postgres[400000+463000]
> What can I do to try and figure out what is causing the crash and fix it?
(1) install relevant postgresql-debuginfo package (assuming we're talking about a Red Hat-originated postgres package)
(2) run postmaster under "ulimit -c unlimited" (easiest way is probably to add such a command to /etc/rc.d/init.d/postgresql and restart the service)
(3) wait for crash
(4) gdb the resulting corefile (should be under your $PGDATA directory)
(5) send in a stack trace.
Here's the stacktrace from gdb (if it matters, the package version from RHEL is postgresql-8.4.18-1.el6_4.x86_64):
#0 0x0000000000000686 in ?? ()
#1 0x00007f76ae551801 in ?? ()
#2 0x00000000019f7793 in ?? ()
#3 0x00007fff06ad6be0 in ?? ()
#4 0x00007fff06ad6be0 in ?? ()
#5 0x0000000000545e35 in ExecMakeFunctionResult (fcache=0x19f5680, econtext=0x19f37e8, isNull=0x19f7793 "", isDone=0x19f7b8c) at execQual.c:1870
#6 0x0000000000541096 in ExecTargetList (projInfo=<value optimized out>, isDone=0x7fff06ad704c) as execQual.c:5212
#7 ExecProject (projeInfo=<value optimized out>, isDone=0xfff06ad704c) as execQual.c:5427
#8 0x0000000000553c5b in ExecResult (node=0x1999a68) at nodeResult.c:155
#9 0x00000000005406c8 in ExecProcNode (node=0x1999a68) at execProcnode.c:344
#10 0x000000000053e942 in ExecutePlan (queryDesc=0x1990c60, direction=<value optimized out>, count=0) as execMain.c:1542
#11 0xstandardExecutorRun (queryDesc=0x1990c60, direction=<value optimized out>, count=0) as execMain.c:310
... (I can include the rest, if it's needed)
Any insight? Thanks, Dave
So from looking at the stacktrace it looked like the issue was happening in one of our C functions. I did some digging and what had happened was the permissions on the folder that has those functions had been set wide open, so whenever someone built our software it overwrote the .so files. Normally, it's a process that's only done by the postgres when a new "version" is rolled out, but that check was being overwritten because of the incorrect permissions.
So that brings up a different question that I will start a new thread for.