Thread: Unpredictable signal 11 crashes on Mac OS X
Hi, We've been having a lot of problems with unpredicatable crashes with 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour). Having run with verbose logging for some time, we have not noticed any real consistency in the types of queries which are causing the crash recently. A few weeks ago we were noticing crashes most commonly when processing queries with extremely large IN () lists (> 1000 entries) but we have now removed most of these from the application and are still seeing crashes. Some days we get none, some days they happen every five minutes! Today we have connected to the backends with gdb to get a bit more information and have seen the following problems: Program received signal EXC_BAD_ACCESS, Could not access memory. 0x0012c284 in hash_search (hashp=0x0, keyPtr=0x1195b84, action=3212837216, foundPtr=0x1195b80 "") at dynahash.c:512 512 { Program received signal EXC_BAD_ACCESS, Could not access memory. _copyRangeTblEntry (from=0x12836c0) at copyfuncs.c:1455 1455 { Has anyone got any clues or suggestions? Looks like some kind of stack corruption to me but I don't know what we can do to solve it. The relevant background info is: Hardware: Apple Xserve (cluster node), 2 x 1GHz processors, 2GB RAM Software: Apple Computer, Inc. GCC version 1175, based on gcc version 3.1 20020420 (prerelease) Darwin palin.egsgroup.com 6.8 Darwin Kernel Version 6.8: Wed Sep 10 15:20:55 PDT 2003; root:xnu/xnu-344.49.obj~2/RELEASE_PPC Power Macintosh powerpc PostgreSQL 7.3.4 built with: ./configure --with-includes=/sw/include/ --with-libraries=/sw/lib --prefix=/ Volumes/Palin-RAID-Array-1/pg/postgresql-7.3.4 Fink 0.13.2 distribution version 0.5.3.cvs (for readline) Non-default postgresql.conf settings: max_connections = 60 shared_buffers = 200 # min max_connections*2 or 16, 8KB each sort_mem = 1024 # min 64, size in KB Thanks in advance Gareth Boden eGovernment Solutions Limited Baird House, 15-17 St Cross Street, London EC1N 8UW t: 020 7539 2815 f: 020 7539 2829 www.egsgroup.com
Gareth Boden <gareth.boden@egsgroup.com> writes: > We've been having a lot of problems with unpredicatable crashes with > 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour). Have you looked into the possibility of bad hardware? I routinely test PG on OS X (10.2.6 currently) and have never noticed any instability. regards, tom lane
On Wednesday, October 8, 2003, at 03:54 PM, Tom Lane wrote: > Gareth Boden <gareth.boden@egsgroup.com> writes: >> We've been having a lot of problems with unpredicatable crashes with >> 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour). > > Have you looked into the possibility of bad hardware? I routinely test > PG on OS X (10.2.6 currently) and have never noticed any instability. > Thanks for that, Tom. We were heading down the same path with our thoughts and are at this moment moving the database onto a backup server to see if we have the same problems. I will let you know how we get on. On a separate note I can say that, with regard to crashing on very large IN clauses, we have seen such behaviour exhibited on several servers. However that is no longer a problem for us since we have removed the badly-written queries! Regards Gareth Boden eGovernment Solutions Limited Baird House, 15-17 St Cross Street, London EC1N 8UW t: 020 7539 2815 f: 020 7539 2829 www.egsgroup.com
On Wednesday, October 8, 2003, at 03:54 PM, Tom Lane wrote: > Gareth Boden <gareth.boden@egsgroup.com> writes: >> We've been having a lot of problems with unpredicatable crashes with >> 7.3.4 on OS X Server (10.2.6 and 10.2.8 exhibit the same behaviour). > > Have you looked into the possibility of bad hardware? I routinely test > PG on OS X (10.2.6 currently) and have never noticed any instability. This is probably now solved. Time will tell, since the crashes were unpredictable to start with. Bizarrely, it came down to an infinite recursion in a plpgsql function we were using which exhausted memory and crashed the db server. Is there any way of limiting stack depth/having some sort of monitor on these functions to prevent this behaviour in future? Intuitively it seems to me that dodgy plpgsql functions should not be able to crash the database server (unlike dodgy C functions, say!). The offending function was not reported in the logs on Mac OS X for some reason but only when we compiled a new version on an x86 linux box (with --enable-debug --enable-cassert, admittedly). Thanks for listening! Gareth
Gareth Boden <gareth.boden@egsgroup.com> writes: > Bizarrely, it came down to an infinite recursion in a plpgsql function > we were using which exhausted memory and crashed the db server. Is > there any way of limiting stack depth/having some sort of monitor on > these functions to prevent this behaviour in future? There's been some talk of this, but the sticking point has been to figure out a reasonably portable way to measure and enforce stack usage... regards, tom lane