Re: Autovacuum daemon terminated by signal 11 - Mailing list pgsql-general

From Justin Pasher
Subject Re: Autovacuum daemon terminated by signal 11
Date
Msg-id 4970FD5E.4000905@newmediagateway.com
Whole thread Raw
In response to Re: Autovacuum daemon terminated by signal 11  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Autovacuum daemon terminated by signal 11  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
> I read it like this:
>
> #0  0x0827441d in MemoryContextAlloc ()        <-- real
> #1  0x08274467 in MemoryContextStrdup ()    <-- real
> #2  0x0826501c in database_getflatfilename ()    <-- real
> #3  0x0826504e in database_getflatfilename ()    <-- must be write_database_file
> #4  0x08265ec1 in AtEOXact_UpdateFlatFiles ()    <-- real
> #5  0x080a9111 in RecordTransactionCommit ()    <-- must be CommitTransaction
> #6  0x080a93a7 in CommitTransactionCommand ()    <-- real
> #7  0x081a6c3b in autovac_stopped ()        <-- must be process_whole_db
> #8  0x081a75cd in autovac_start ()        <-- real
> #9  0x081ae33c in ClosePostmasterPorts ()    <-- must be ServerLoop
> #10 0x081af058 in PostmasterMain ()
> #11 0x0816b3e2 in main ()
>
> although this requires one or two leaps of faith about single-call
> static functions getting inlined so that they don't produce a callstack
> entry (in particular that must have happened to AutoVacMain).  In any
> case, it's very hard to see how MemoryContextAlloc would dump core
> unless the method pointer of the context it was pointed to was
> clobbered.  So I'm pretty sure that's what happened, and now we must
> work backwards to how it happened,
>
> Justin, it's entirely possible that the only way we'll figure it out
> is for a developer to go poking at the entrails.  Are you in a position
> to give Alvaro or me ssh access to your test machine?
>
>             regards, tom lane

OK. Here's an update on this.

I was able to reduce the database cluster down to just one real database
(aside from template0/1 and postgres) and I was still getting the
segfault. I was even able to delete all the data from a lot of the
sensitive tables and still get the segfault. At least this means it's
easier for me to give access to the DB now if need be.

I recompiled from the Debian source package and added --enable-cassert
(--enable-debug was already there). I replaced the Debian standard
packages with the recompiled versions and started up the cluster. Now it
is hitting a failure on one of the assert lines, and the log message is
a little different.

2009-01-16 15:24:48 CST LOG:  transaction ID wrap limit is 1076038431,
limited by database "template1"
TRAP: BadArgument("!(((context) != ((void *)0) &&
(((((Node*)((context)))->type) == T_AllocSetContext))))", File:
"mcxt.c", Line: 502)
2009-01-16 15:24:52 CST LOG:  autovacuum process (PID 7066) was
terminated by signal 6
2009-01-16 15:24:52 CST LOG:  terminating any other active server processes

A new backtrace from the core dump is below, although it looks almost
identical to me.

------------------------------
hostname:/var/lib/postgresql/8.1# gdb
/usr/lib/postgresql/8.1/bin/postmaster mc-db2/core
GNU gdb 6.4.90-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".


warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpam.so.0
Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8
Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...(no
debugging symbols found)...done.
Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8
Reading symbols from /usr/lib/libkrb5.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libkrb5.so.3
Reading symbols from /lib/libcom_err.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libcom_err.so.2
Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1
Reading symbols from /lib/tls/i686/cmov/libdl.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
Reading symbols from /lib/tls/i686/cmov/libm.so.6...(no debugging
symbols found)...done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/tls/i686/cmov/libc.so.6...(no debugging
symbols found)...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libk5crypto.so.3...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libk5crypto.so.3
Reading symbols from /lib/tls/i686/cmov/libresolv.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib/tls/i686/cmov/libresolv.so.2
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libkrb5support.so.0...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib/libkrb5support.so.0
(no debugging symbols found)
Core was generated by `postgres: autovacuum process
mc_itec                                        '.
Program terminated with signal 6, Aborted.
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7c37811 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7c38fb9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0x0828cdf3 in ExceptionalCondition ()
#4  0x082a8cd2 in MemoryContextAlloc ()
#5  0x082a8d67 in MemoryContextStrdup ()
#6  0x0829749c in database_getflatfilename ()
#7  0x082974ce in database_getflatfilename ()
#8  0x08298341 in AtEOXact_UpdateFlatFiles ()
#9  0x080bcc81 in RecordTransactionCommit ()
#10 0x080bcf8f in CommitTransactionCommand ()
#11 0x081cd1eb in autovac_stopped ()
#12 0x081cdbcd in autovac_start ()
#13 0x081d4c0c in ClosePostmasterPorts ()
#14 0x081d5968 in PostmasterMain ()
#15 0x0818bd22 in main ()



Justin Pasher

pgsql-general by date:

Previous
From: Brendan Duddridge
Date:
Subject: Running queries to fetch a count of hits in DB on several things
Next
From: Leif Jensen
Date:
Subject: Slave server: FATAL: incorrect checksum in control file