Thread: Unpredictable signal 11 crashes on Mac OS X

Unpredictable signal 11 crashes on Mac OS X

From
Gareth Boden
Date:
Hi,

We've been having a lot of problems with unpredicatable crashes with
7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour).
Having run with verbose logging for some time, we have not noticed any
real consistency in the types of queries which are causing the crash
recently. A few weeks ago we were noticing crashes most commonly when
processing queries with extremely large IN () lists (> 1000 entries)
but we have now removed most of these from the application and are
still seeing crashes. Some days we get none, some days they happen
every five minutes! Today we have connected to the backends with gdb to
get a bit more information and have seen the following problems:

Program received signal EXC_BAD_ACCESS, Could not access memory.
0x0012c284 in hash_search (hashp=0x0, keyPtr=0x1195b84,
action=3212837216, foundPtr=0x1195b80 "") at dynahash.c:512
512     {

Program received signal EXC_BAD_ACCESS, Could not access memory.
_copyRangeTblEntry (from=0x12836c0) at copyfuncs.c:1455
1455    {

Has anyone got any clues or suggestions? Looks like some kind of stack
corruption to me but I don't know what we can do to solve it. The
relevant background info is:

Hardware:
Apple Xserve (cluster node), 2 x 1GHz processors, 2GB RAM

Software:
Apple Computer, Inc. GCC version 1175, based on gcc version 3.1
20020420 (prerelease)
Darwin palin.egsgroup.com 6.8 Darwin Kernel Version 6.8: Wed Sep 10
15:20:55 PDT 2003; root:xnu/xnu-344.49.obj~2/RELEASE_PPC  Power
Macintosh powerpc
PostgreSQL 7.3.4 built with: ./configure --with-includes=/sw/include/
--with-libraries=/sw/lib --prefix=/
Volumes/Palin-RAID-Array-1/pg/postgresql-7.3.4
Fink 0.13.2 distribution version 0.5.3.cvs (for readline)

Non-default postgresql.conf settings:
max_connections = 60
shared_buffers = 200            # min max_connections*2 or 16, 8KB each
sort_mem = 1024         # min 64, size in KB

Thanks in advance
Gareth Boden

eGovernment Solutions Limited
Baird House, 15-17 St Cross Street, London EC1N 8UW
t: 020 7539 2815
f: 020 7539 2829
www.egsgroup.com

Re: Unpredictable signal 11 crashes on Mac OS X

From
Tom Lane
Date:
Gareth Boden <gareth.boden@egsgroup.com> writes:
> We've been having a lot of problems with unpredicatable crashes with
> 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour).

Have you looked into the possibility of bad hardware?  I routinely test
PG on OS X (10.2.6 currently) and have never noticed any instability.

            regards, tom lane

Re: Unpredictable signal 11 crashes on Mac OS X

From
Gareth Boden
Date:
On Wednesday, October 8, 2003, at 03:54 PM, Tom Lane wrote:

> Gareth Boden <gareth.boden@egsgroup.com> writes:
>> We've been having a lot of problems with unpredicatable crashes with
>> 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour).
>
> Have you looked into the possibility of bad hardware?  I routinely test
> PG on OS X (10.2.6 currently) and have never noticed any instability.
>

Thanks for that, Tom. We were heading down the same path with our
thoughts and are at this moment moving the database onto a backup
server to see if we have the same problems. I will let you know how we
get on.

On a separate note I can say that, with regard to crashing on very
large IN clauses, we have seen such behaviour exhibited on several
servers.  However that is no longer a problem for us since we have
removed the badly-written queries!

Regards
Gareth Boden

eGovernment Solutions Limited
Baird House, 15-17 St Cross Street, London EC1N 8UW
t: 020 7539 2815
f: 020 7539 2829
www.egsgroup.com

Re: Unpredictable signal 11 crashes on Mac OS X (SOLVED[?])

From
Gareth Boden
Date:
On Wednesday, October 8, 2003, at 03:54 PM, Tom Lane wrote:

> Gareth Boden <gareth.boden@egsgroup.com> writes:
>> We've been having a lot of problems with unpredicatable crashes with
>> 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour).
>
> Have you looked into the possibility of bad hardware?  I routinely test
> PG on OS X (10.2.6 currently) and have never noticed any instability.

This is probably now solved. Time will tell, since the crashes were
unpredictable to start with.

Bizarrely, it came down to an infinite recursion in a plpgsql function
we were using which exhausted memory and crashed the db server. Is
there any way of limiting stack depth/having some sort of monitor on
these functions to prevent this behaviour in future? Intuitively it
seems to me that dodgy plpgsql functions should not be able to crash
the database server (unlike dodgy C functions, say!).

The offending function was not reported in the logs on Mac OS X for
some reason but only when we compiled a new version on an x86 linux box
(with --enable-debug --enable-cassert, admittedly).

Thanks for listening!
Gareth

Re: Unpredictable signal 11 crashes on Mac OS X (SOLVED[?])

From
Tom Lane
Date:
Gareth Boden <gareth.boden@egsgroup.com> writes:
> Bizarrely, it came down to an infinite recursion in a plpgsql function
> we were using which exhausted memory and crashed the db server. Is
> there any way of limiting stack depth/having some sort of monitor on
> these functions to prevent this behaviour in future?

There's been some talk of this, but the sticking point has been to
figure out a reasonably portable way to measure and enforce stack
usage...

            regards, tom lane