Thread: Multiple Crashs on OSX Intel

Multiple Crashs on OSX Intel

From
Marc Simonin
Date:
Hello !

Here is the problem :


We have a database that works perfectly on a Xserv G5, 10.4.8. (PPC) , but
encounter multiple postmaster and postgres crashes when we try it on an
intel mac.

We tried on a 10.4.8 intel xserv xeon, and on a 10.4.8 intel macbook pro,
with postgres8.1.5 and 8.2.1  ... with the same issues.

The postmaster is well launched, and no problem is logged.
But after a few times, when the machine begin to have many requests, i can
see almost regular crashes from some postgres or postmaster launched
process.

Postgres log is a bit helpless cause crashes seems to occure "randomly" with
every requests, even with the autovacuum.


The CrashLog log for every crashes one of this two codes

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

I was thinking of some memory problem, so I tried to run the server without
any special configuration in postgresql.conf like shared_buffer (exept
allowing external connections) but, the problem still remains.


I don't know why it works on ppc and not on intel. Must I try to recompile
postgres from the sources with specials options ?

Does anyone have an idea or a hint ?
Thank you in advance !


Marc



PS : This actually occured on these platforms :

CPU : MacbookPro Core2duo & Xserve Xeon (both Intel cpus)
OS Version:     10.4.8 (Build 8N1051) & (Build 8N1215)
Postgres Version : 8.1.5 (Macport & Entropy ports) , 8.2.1 (Entropy port)




Re: Multiple Crashs on OSX Intel

From
Tom Lane
Date:
Marc Simonin <m.simonin@allibert-trekking.com> writes:
> The CrashLog log for every crashes one of this two codes
> Exception:  EXC_BAD_ACCESS (0x0001)
> Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0
> Exception:  EXC_BAD_ACCESS (0x0001)
> Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

Always those same addresses?  If so I'd wonder about a corrupt-data
problem.  Can you get a stack trace from the core files?

            regards, tom lane

Re: Multiple Crashs on OSX Intel

From
Marc Simonin
Date:
That's a point !
Let's see ...
In fact I found that with 69 crashs (sic!) , I had only 8 differents
adresses.


But if it's really a currupt-data related problem, could the same base
really works without errors on another platform (PPC G5) ?

Knowing that the crashing database is a clean new install from postgres,
where I then load the base from another machine with a pg_dumpall | psql . I
always do this way cause I never had this kind a problem, but I'm a newb in
the great PG world :-)


Best regards,
Marc Simonin



All the crash look like this one. Some process terminated by signal 10.
****************
Log file

CETLOG:  autovacuum: processing database "bozo"
CETLOG:  autovacuum process (PID 24305) was terminated by signal 10
CETLOG:  terminating any other active server processes
CETWARNING:  terminating connection because of crash of another server
process
CETDETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly
corrupted shared memory.
CETHINT:  In a moment you should be able to reconnect to the database and
repeat your command.
CETLOG:  all server processes terminated; reinitializing

***************


In fact ... I don't really know where to find this stack trace :-D
But I put here one of the Crash logs and put attached the gzipped entire
Crashlog file (if it pass through the mailing list !).


Hope it can help !

***************

Command: postmaster
Path:    /opt/local/lib/pgsql8/bin/postmaster
Parent:  postmaster [120]

Version: ??? (???)

PID:    18876
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7bc

Thread 0 Crashed:
0   postmaster  0x002635f1 AllocSetAlloc + 1158
1   postmaster  0x0026407d MemoryContextAllocZero + 105
2   postmaster  0x0023b74b InitCatCache + 189
3   postmaster  0x00244110 InitCatalogCache + 281
4   postmaster  0x00256d65 InitPostgres + 710
5   postmaster  0x001ab757 PostgresMain + 4366
6   postmaster  0x00176ca7 BackendRun + 2173
7   postmaster  0x00176141 BackendStartup + 197
8   postmaster  0x00173a1b ServerLoop + 614
9   postmaster  0x001731be PostmasterMain + 4390
10  postmaster  0x0011c96a main + 660
11  postmaster  0x0000196a _start + 216
12  postmaster  0x00001891 start + 41

Thread 0 crashed with X86 Thread State (32-bit):
  eax: 0x00000000    ebx: 0x00263179 ecx: 0x0001dfe0 edx: 0x9003b7bc
  edi: 0x002efd90    esi: 0x0000000b ebp: 0xbfffedf8 esp: 0xbfffed90
   ss: 0x0000001f    efl: 0x00010206 eip: 0x002635f1  cs: 0x00000017
   ds: 0x0000001f     es: 0x0000001f  fs: 0x00000000  gs: 0x00000037

Binary Images Description:
    0x1000 -   0x2f0fff postmaster      /opt/local/lib/pgsql8/bin/postmaster
  0x387000 -   0x3b9fff libssl.0.9.8.dylib
/opt/local/lib/libssl.0.9.8.dylib
  0x3cc000 -   0x3ddfff libz.1.dylib    /opt/local/lib/libz.1.dylib
  0x505000 -   0x5f4fff libcrypto.0.9.8.dylib
/opt/local/lib/libcrypto.0.9.8.dylib
  0x65d000 -   0x67afff libreadline.5.1.dylib
/opt/local/lib/libreadline.5.1.dylib
0x8fe00000 - 0x8fe49fff dyld 46.1       /usr/lib/dyld
0x90000000 - 0x9016ffff libSystem.B.dylib       /usr/lib/libSystem.B.dylib
0x901bf000 - 0x901c1fff libmathCommon.A.dylib
/usr/lib/system/libmathCommon.A.dylib
0x90bcf000 - 0x90bd6fff libgcc_s.1.dylib        /usr/lib/libgcc_s.1.dylib
0x94960000 - 0x9497dfff libresolv.9.dylib       /usr/lib/libresolv.9.dylib
0x95a2e000 - 0x95a5cfff libncurses.5.4.dylib
/usr/lib/libncurses.5.4.dylib



 « Tom Lane » <tgl@sss.pgh.pa.us> a écrit :

> Marc Simonin <m.simonin@allibert-trekking.com> writes:
>> The CrashLog log for every crashes one of this two codes
>> Exception:  EXC_BAD_ACCESS (0x0001)
>> Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x9003b7e0
>> Exception:  EXC_BAD_ACCESS (0x0001)
>> Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000
>
> Always those same addresses?  If so I'd wonder about a corrupt-data
> problem.  Can you get a stack trace from the core files?
>
> regards, tom lane


Attachment

Re: Multiple Crashs on OSX Intel

From
Tom Lane
Date:
Marc Simonin <m.simonin@allibert-trekking.com> writes:
> But I put here one of the Crash logs and put attached the gzipped entire
> Crashlog file (if it pass through the mailing list !).

Wow, those stack traces are all over the map, aren't they.  Either you
are hitting a dozen different Postgres bugs that no one else has ever
seen, or you've got a flaky machine.  I think the second is considerably
more likely --- especially since several of the crashes are in startup
code that every backend process ought to execute exactly the same way
every time.

Perhaps bad RAM, or a bad motherboard?  I've also seen machines go nuts
like this if the fan froze up, allowing the CPU to overheat.  Anyway,
take it back to Apple ... I hope it's still under warranty ...

            regards, tom lane

Re: Multiple Crashs on OSX Intel

From
Tom Lane
Date:
Fabrice Vincent <f.vincent@allibert-trekking.com> writes:
> Tom, it is very unlikely that the issue is located with the hardware as we
> tested on 2 brand new hardware and both exibit exactly the same symptoms
> despite they are differents models...

[ shrug... ]  It's not impossible that you've got two lemons ... stranger
things have happened.  One pretty obvious opportunity for a common-mode
failure is if you loaded them up with RAM chips from the same batch.

The symptoms shown in your crashreporter logs don't look anything like
a software problem to me: they're not consistent, and a lot of the
crashes are in code that is exercised exactly the same way on every
process start.  Also, we're not seeing reports of similar problems from
anyone else running PG on Intel Mac; which is definitely a nonempty
population --- there's one in the buildfarm for instance, and it's
showing zero failures in the back branches:
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=jackal&br=REL8_1_STABLE

So I'm going to stick to my bet that it's a hardware problem.

            regards, tom lane

Re: Multiple Crashs on OSX Intel

From
Fabrice Vincent
Date:
Hi,

Marc is unavailable today so I take over in order to move forward with our
crash issue.

Tom, it is very unlikely that the issue is located with the hardware as we
tested on 2 brand new hardware and both exibit exactly the same symptoms
despite they are differents models...

Would you have any other idea of where to look for the cause of these crash?
For example would it be possible that the crash would be caused by some
system library rather than the postgres code itself?
Also, is there any debugging option we could turn on on the faulty systems
in order to pin point where is located the bug?

Thanks a million for your help.

Best regards.
Fabrice


> De : Tom Lane <tgl@sss.pgh.pa.us>
> Date : Thu, 01 Feb 2007 12:29:01 -0500
> À : Marc Simonin <m.simonin@allibert-trekking.com>
> Cc : <pgsql-ports@postgresql.org>
> Objet : Re: [PORTS] Multiple Crashs on OSX Intel
>
> Marc Simonin <m.simonin@allibert-trekking.com> writes:
>> But I put here one of the Crash logs and put attached the gzipped entire
>> Crashlog file (if it pass through the mailing list !).
>
> Wow, those stack traces are all over the map, aren't they.  Either you
> are hitting a dozen different Postgres bugs that no one else has ever
> seen, or you've got a flaky machine.  I think the second is considerably
> more likely --- especially since several of the crashes are in startup
> code that every backend process ought to execute exactly the same way
> every time.
>
> Perhaps bad RAM, or a bad motherboard?  I've also seen machines go nuts
> like this if the fan froze up, allowing the CPU to overheat.  Anyway,
> take it back to Apple ... I hope it's still under warranty ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings