Re: 9.4 beta1 crash on Debian sid/i386 - Mailing list pgsql-hackers

From Christoph Berg
Subject Re: 9.4 beta1 crash on Debian sid/i386
Date
Msg-id 20140519144717.GG7296@msgid.df7cb.de
Whole thread Raw
In response to Re: 9.4 beta1 crash on Debian sid/i386  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 9.4 beta1 crash on Debian sid/i386  (Christoph Berg <cb@df7cb.de>)
List pgsql-hackers
Re: Andres Freund 2014-05-19 <20140519141221.GC5098@alap3.anarazel.de>
> On 2014-05-19 09:53:11 -0400, Tom Lane wrote:
> > I think throwing an error out of a SIGBUS handler is right out.  There
> > would be no way to know exactly what code we were interrupting.  It's
> > the same reason we don't let, eg, the SIGALRM handler throw a timeout
> > error directly (in most places anyway).

Right. I just mentioned that for completeness.

> Agreed. I think if we really, really feel the need to do something about
> this - which I don't - we could allocate a separate stack very early on
> and use that.

Hmm, that'd be an extension of the other idea, "write something deep
in the stack on startup". This is probably less evil, though I agree
it's a big hammer for solving something that should probably be fixed
elsewhere.

> > >> * PostgreSQL allocates lots of heap using brk() instead of mmap()
> > 
> > > It doesn't really do that, btw. It's the libc's mmap that makes those
> > > decisions, not postgres.
> > 
> > It occurs to me that maybe this is a glibc bug, not a kernel bug?
> 
> You think malloc() should try to be careful when calling brk() and check
> beforehand wether it'll conflict with stack_base + RLIMIT_STACK? That's
> not a bad argument, but it still seems a really bad choice to leave that
> little space for the heap. Especially when it's dependant on -pie being
> used.

It's probably both, the default ASLR layout providing too little heap,
plus malloc() running into the stack area - I'm not sure if the former
is the kernel's fault or libc/ld.so's, probably they need to work
together on that anyway.

Disabling -pie for all 32bit archs seems to be the way to go for us
now.

Does this topic warrant being mentioned in the docs?

Christoph



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 9.4 release notes
Next
From: Andres Freund
Date:
Subject: Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)