Thread: Multiple Crashs on OSX Intel
Hello ! Here is the problem : We have a database that works perfectly on a Xserv G5, 10.4.8. (PPC) , but encounter multiple postmaster and postgres crashes when we try it on an intel mac. We tried on a 10.4.8 intel xserv xeon, and on a 10.4.8 intel macbook pro, with postgres8.1.5 and 8.2.1 ... with the same issues. The postmaster is well launched, and no problem is logged. But after a few times, when the machine begin to have many requests, i can see almost regular crashes from some postgres or postmaster launched process. Postgres log is a bit helpless cause crashes seems to occure "randomly" with every requests, even with the autovacuum. The CrashLog log for every crashes one of this two codes Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0 Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000 I was thinking of some memory problem, so I tried to run the server without any special configuration in postgresql.conf like shared_buffer (exept allowing external connections) but, the problem still remains. I don't know why it works on ppc and not on intel. Must I try to recompile postgres from the sources with specials options ? Does anyone have an idea or a hint ? Thank you in advance ! Marc PS : This actually occured on these platforms : CPU : MacbookPro Core2duo & Xserve Xeon (both Intel cpus) OS Version: 10.4.8 (Build 8N1051) & (Build 8N1215) Postgres Version : 8.1.5 (Macport & Entropy ports) , 8.2.1 (Entropy port)
Marc Simonin <m.simonin@allibert-trekking.com> writes: > The CrashLog log for every crashes one of this two codes > Exception: EXC_BAD_ACCESS (0x0001) > Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0 > Exception: EXC_BAD_ACCESS (0x0001) > Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000 Always those same addresses? If so I'd wonder about a corrupt-data problem. Can you get a stack trace from the core files? regards, tom lane
That's a point ! Let's see ... In fact I found that with 69 crashs (sic!) , I had only 8 differents adresses. But if it's really a currupt-data related problem, could the same base really works without errors on another platform (PPC G5) ? Knowing that the crashing database is a clean new install from postgres, where I then load the base from another machine with a pg_dumpall | psql . I always do this way cause I never had this kind a problem, but I'm a newb in the great PG world :-) Best regards, Marc Simonin All the crash look like this one. Some process terminated by signal 10. **************** Log file CETLOG: autovacuum: processing database "bozo" CETLOG: autovacuum process (PID 24305) was terminated by signal 10 CETLOG: terminating any other active server processes CETWARNING: terminating connection because of crash of another server process CETDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. CETHINT: In a moment you should be able to reconnect to the database and repeat your command. CETLOG: all server processes terminated; reinitializing *************** In fact ... I don't really know where to find this stack trace :-D But I put here one of the Crash logs and put attached the gzipped entire Crashlog file (if it pass through the mailing list !). Hope it can help ! *************** Command: postmaster Path: /opt/local/lib/pgsql8/bin/postmaster Parent: postmaster [120] Version: ??? (???) PID: 18876 Thread: 0 Exception: EXC_BAD_ACCESS (0x0001) Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7bc Thread 0 Crashed: 0 postmaster 0x002635f1 AllocSetAlloc + 1158 1 postmaster 0x0026407d MemoryContextAllocZero + 105 2 postmaster 0x0023b74b InitCatCache + 189 3 postmaster 0x00244110 InitCatalogCache + 281 4 postmaster 0x00256d65 InitPostgres + 710 5 postmaster 0x001ab757 PostgresMain + 4366 6 postmaster 0x00176ca7 BackendRun + 2173 7 postmaster 0x00176141 BackendStartup + 197 8 postmaster 0x00173a1b ServerLoop + 614 9 postmaster 0x001731be PostmasterMain + 4390 10 postmaster 0x0011c96a main + 660 11 postmaster 0x0000196a _start + 216 12 postmaster 0x00001891 start + 41 Thread 0 crashed with X86 Thread State (32-bit): eax: 0x00000000 ebx: 0x00263179 ecx: 0x0001dfe0 edx: 0x9003b7bc edi: 0x002efd90 esi: 0x0000000b ebp: 0xbfffedf8 esp: 0xbfffed90 ss: 0x0000001f efl: 0x00010206 eip: 0x002635f1 cs: 0x00000017 ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037 Binary Images Description: 0x1000 - 0x2f0fff postmaster /opt/local/lib/pgsql8/bin/postmaster 0x387000 - 0x3b9fff libssl.0.9.8.dylib /opt/local/lib/libssl.0.9.8.dylib 0x3cc000 - 0x3ddfff libz.1.dylib /opt/local/lib/libz.1.dylib 0x505000 - 0x5f4fff libcrypto.0.9.8.dylib /opt/local/lib/libcrypto.0.9.8.dylib 0x65d000 - 0x67afff libreadline.5.1.dylib /opt/local/lib/libreadline.5.1.dylib 0x8fe00000 - 0x8fe49fff dyld 46.1 /usr/lib/dyld 0x90000000 - 0x9016ffff libSystem.B.dylib /usr/lib/libSystem.B.dylib 0x901bf000 - 0x901c1fff libmathCommon.A.dylib /usr/lib/system/libmathCommon.A.dylib 0x90bcf000 - 0x90bd6fff libgcc_s.1.dylib /usr/lib/libgcc_s.1.dylib 0x94960000 - 0x9497dfff libresolv.9.dylib /usr/lib/libresolv.9.dylib 0x95a2e000 - 0x95a5cfff libncurses.5.4.dylib /usr/lib/libncurses.5.4.dylib « Tom Lane » <tgl@sss.pgh.pa.us> a écrit : > Marc Simonin <m.simonin@allibert-trekking.com> writes: >> The CrashLog log for every crashes one of this two codes >> Exception: EXC_BAD_ACCESS (0x0001) >> Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0 >> Exception: EXC_BAD_ACCESS (0x0001) >> Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000000 > > Always those same addresses? If so I'd wonder about a corrupt-data > problem. Can you get a stack trace from the core files? > > regards, tom lane
Attachment
Marc Simonin <m.simonin@allibert-trekking.com> writes: > But I put here one of the Crash logs and put attached the gzipped entire > Crashlog file (if it pass through the mailing list !). Wow, those stack traces are all over the map, aren't they. Either you are hitting a dozen different Postgres bugs that no one else has ever seen, or you've got a flaky machine. I think the second is considerably more likely --- especially since several of the crashes are in startup code that every backend process ought to execute exactly the same way every time. Perhaps bad RAM, or a bad motherboard? I've also seen machines go nuts like this if the fan froze up, allowing the CPU to overheat. Anyway, take it back to Apple ... I hope it's still under warranty ... regards, tom lane
Fabrice Vincent <f.vincent@allibert-trekking.com> writes: > Tom, it is very unlikely that the issue is located with the hardware as we > tested on 2 brand new hardware and both exibit exactly the same symptoms > despite they are differents models... [ shrug... ] It's not impossible that you've got two lemons ... stranger things have happened. One pretty obvious opportunity for a common-mode failure is if you loaded them up with RAM chips from the same batch. The symptoms shown in your crashreporter logs don't look anything like a software problem to me: they're not consistent, and a lot of the crashes are in code that is exercised exactly the same way on every process start. Also, we're not seeing reports of similar problems from anyone else running PG on Intel Mac; which is definitely a nonempty population --- there's one in the buildfarm for instance, and it's showing zero failures in the back branches: http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=jackal&br=REL8_1_STABLE So I'm going to stick to my bet that it's a hardware problem. regards, tom lane
Hi, Marc is unavailable today so I take over in order to move forward with our crash issue. Tom, it is very unlikely that the issue is located with the hardware as we tested on 2 brand new hardware and both exibit exactly the same symptoms despite they are differents models... Would you have any other idea of where to look for the cause of these crash? For example would it be possible that the crash would be caused by some system library rather than the postgres code itself? Also, is there any debugging option we could turn on on the faulty systems in order to pin point where is located the bug? Thanks a million for your help. Best regards. Fabrice > De : Tom Lane <tgl@sss.pgh.pa.us> > Date : Thu, 01 Feb 2007 12:29:01 -0500 > À : Marc Simonin <m.simonin@allibert-trekking.com> > Cc : <pgsql-ports@postgresql.org> > Objet : Re: [PORTS] Multiple Crashs on OSX Intel > > Marc Simonin <m.simonin@allibert-trekking.com> writes: >> But I put here one of the Crash logs and put attached the gzipped entire >> Crashlog file (if it pass through the mailing list !). > > Wow, those stack traces are all over the map, aren't they. Either you > are hitting a dozen different Postgres bugs that no one else has ever > seen, or you've got a flaky machine. I think the second is considerably > more likely --- especially since several of the crashes are in startup > code that every backend process ought to execute exactly the same way > every time. > > Perhaps bad RAM, or a bad motherboard? I've also seen machines go nuts > like this if the fan froze up, allowing the CPU to overheat. Anyway, > take it back to Apple ... I hope it's still under warranty ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings