Thread: Core dump on 7.1.3 on Linux 2.2.19

Core dump on 7.1.3 on Linux 2.2.19

From
Barry Lind
Date:
On a production server I am getting periodic core dumps from postgres. 
The server can go for days or weeks fine without any problems, but does 
dump core every so often.

It happened to me this afternoon while running a 'vacuum analyze 
verbose'.  I have attached the stack trace below.  I looked at a core 
from the vacuum as well as another core file from a prior operation 
(which wasn't a vacuum) and they had the same stack.  So I don't think 
this is a vacuum problem.

Any ideas?  (I intend to rebuild to get some better info in the stack 
trace, but it may be a while before I get around to that).

thanks,
--Barry

[root@xythos1 26382]# gdb postgres core
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...

warning: core file may not match specified executable file.
Core was generated by `postgres: postgres files 127.0.0.1 SELECT        '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/local/pgsql/lib/plpgsql.so...done.
Loaded symbols for /usr/local/pgsql/lib/plpgsql.so
#0  0x80b9693 in ExecEvalVar ()
(gdb) where
#0  0x80b9693 in ExecEvalVar ()
#1  0x80ba219 in ExecEvalExpr ()
#2  0x80b9c6b in ExecEvalFuncArgs ()
#3  0x80b9ce4 in ExecMakeFunctionResult ()
#4  0x80b9e81 in ExecEvalOper ()
#5  0x80ba289 in ExecEvalExpr ()
#6  0x80ba39a in ExecQual ()
#7  0x80bdca1 in IndexNext ()
#8  0x80ba83f in ExecScan ()
#9  0x80bde78 in ExecIndexScan ()
#10 0x80b8fc1 in ExecProcNode ()
#11 0x80bf6f4 in ExecNestLoop ()
#12 0x80b8ffd in ExecProcNode ()
#13 0x80b8c5e in EvalPlanQualNext ()
#14 0x80b8c35 in EvalPlanQual ()
#15 0x80b818a in ExecutePlan ()
#16 0x80b7738 in ExecutorRun ()
#17 0x80fc3af in ProcessQuery ()
#18 0x80faebe in pg_exec_query_string ()
#19 0x80fbea6 in PostgresMain ()
#20 0x80e6fc8 in DoBackend ()
#21 0x80e6bc7 in BackendStartup ()
#22 0x80e5e3d in ServerLoop ()
#23 0x80e5888 in PostmasterMain ()
#24 0x80c7107 in main ()
#25 0x400ecf31 in __libc_start_main (main=0x80c6fd4 <main>, argc=3,    ubp_av=0xbffffa74, init=0x8065314 <_init>,
fini=0x813e19c<_fini>,    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffa6c)    at
../sysdeps/generic/libc-start.c:129




Re: Core dump on 7.1.3 on Linux 2.2.19

From
Tom Lane
Date:
Barry Lind <barry@xythos.com> writes:
> It happened to me this afternoon while running a 'vacuum analyze 
> verbose'.  I have attached the stack trace below.

That trace is certainly not from a vacuum operation.

I'd suggest rebuilding with --enable-debug; we won't be able to learn
much without that.  Until you do that, possibly it'd help to turn on
query logging so that we can learn what query is crashing.

I find the presence of EvalPlanQual in the backtrace suggestive.
I don't trust that code at all ;-) ... but without a lot more info
we're not going to be able to figure out anything.

BTW, EvalPlanQual is only called if the query is an UPDATE or DELETE
that tries to update a row that's already been updated by a
not-yet-committed transaction.  That probably explains why you don't
see the crash often --- if you deliberately set up the right
circumstances, you could perhaps reproduce it on-demand.
        regards, tom lane


Re: Core dump on 7.1.3 on Linux 2.2.19

From
"Dmitry G. Mastrukov" Дмитрий Геннадьевич Мастрюков
Date:
On Втр, 2001-11-06 at 04:41, Barry Lind wrote:
> On a production server I am getting periodic core dumps from postgres. 
> The server can go for days or weeks fine without any problems, but does 
> dump core every so often.
> 
> It happened to me this afternoon while running a 'vacuum analyze 
> verbose'.  I have attached the stack trace below.  I looked at a core 
> from the vacuum as well as another core file from a prior operation 
> (which wasn't a vacuum) and they had the same stack.  So I don't think 
> this is a vacuum problem.
> 
> Any ideas?  (I intend to rebuild to get some better info in the stack 
> trace, but it may be a while before I get around to that).
> 
I experienced problem with 'vacuum analyze' with postgres 7.1.2 on
glibc-2.2.2. And it was bug in libc. Upgrading to glibc-2.2.3 solved  my
problem.

Regards,
Dmitry