Re: signal 11 on AIX: 7.4.2 - Mailing list pgsql-hackers

From Andrew Sullivan
Subject Re: signal 11 on AIX: 7.4.2
Date
Msg-id 20040617191835.GB16886@phlogiston.dyndns.org
Whole thread Raw
In response to Re: signal 11 on AIX: 7.4.2  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: signal 11 on AIX: 7.4.2  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
On Thu, Jun 17, 2004 at 01:12:10PM -0400, Bruce Momjian wrote:
> Well, the bad news is that this backtrace isn't very useful. 

No kidding.  It's pretty frustrating.

> My only guess is that getaddrinfo in your libc has a bug somehow that is
> corrupting the stack (hance the improper backtrace), then crashing.

It could be libc on AIX, I suppose, but it strikes me as sort of odd
that nobody else ever seens this.  Unless nobody else is using AIX
5.1, which is of course possible.

One hypothesis is that this is happening at start up time (this core
dump didn't show up in the data/ area, but in the init directory,
however, which makes that theory a little suspect).

> As to the cause, I assume this is not reproducable, right?  Is there

Well, it's reproduced itsef a few times, but it isn't reproducible at
will, and we have no clue what is causing it.

> something unusual about your DNS setup or something that might have
> changed recently that caused getaddrinfo() to do something new?

Nothing has changed recently, but we started having this not long
after promoting an RS/6000 to production on AIX 5.1.  Before that we
were all-Solaris.  We have never managed to tickle this on a test
machine.  It's pretty tough to guess what might be going on, at least
for me.  If there are any AIX gurus around, I'd sure like to talk to
them.  (I do have a budget to pay such gurus, BTW!)

> Of course, the memmove() might be causing the problem and the
> getaddrinfo is a corrupt part of the backtrace too.

Yeah, which is why it's so frustrating.  If I could see what it was
doing when it did it, I'd be able to tell.  But without knowing why
it's happening, there's no way to sit up for 6 weeks while I wait for
it to happen.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.     --Dennis Ritchie


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: signal 11 on AIX: 7.4.2
Next
From: Simon Riggs
Date:
Subject: Re: PITR Recovery