Thread: Re: Why is "osprey" dumping core in REL8_2 branch?

Re: Why is "osprey" dumping core in REL8_2 branch?

From

Tom Lane

Date:

11 March 2007, 04:32:43

I wrote:
> Rémi Zara <remi_zara@mac.com> writes:
>> (gdb) info locals
>> block = 0x4395000
>> chunk = 0x4395010
>> priorfree = 0x5395020
>> chunk_size = 16777216
>> blksize = 70864912
>> (gdb) p *block
>> $5 = {aset = 0x306d10, next = 0x0, freeptr = 0x5395020 <Address  0x5395020 out of bounds>, endptr = 0x5395020
<Address0x5395020 out of bounds>}
 

> Well, that's pretty dang interesting.  If the end of the block is indeed
> out of bounds as gdb claims, that'd explain why it crashes right here
> (actually the crash would be induced by the preceding line of code,
> where it tries to store a marker byte).  But how can that be, unless
> malloc is completely broken?  And if it is, why's it only affecting the
> 8.2 branch?  I'm confused.

Whoa ... osprey just went green in the 8.2 branch, following what is
most surely an unrelated patch in vacuum.c.  Can anyone explain that?
I distrust gift horses ...
        regards, tom lane

Re: Why is "osprey" dumping core in REL8_2 branch?

From

Rémi Zara

Date:

11 March 2007, 06:59:59

Hi,

I know the answer :)

I tried to find the patch that caused the failure, and when doing so,
rechecking a build which had succeeded now failed. So this was an
environment problem.

The solution was to change the ulimit for data segment size. I hadn't
thought of that because I had originally enabled this conf because pg
would not otherwise BUILD...

Doesn't this mean that there is some place where the return value of
malloc is not checked for null ?

Regards,

Rémi Zara


Le 11 mars 07 à 08:32, Tom Lane a écrit :

> I wrote:
>> Rémi Zara <remi_zara@mac.com> writes:
>>> (gdb) info locals
>>> block = 0x4395000
>>> chunk = 0x4395010
>>> priorfree = 0x5395020
>>> chunk_size = 16777216
>>> blksize = 70864912
>>> (gdb) p *block
>>> $5 = {aset = 0x306d10, next = 0x0, freeptr = 0x5395020 <Address
>>> 0x5395020 out of bounds>, endptr = 0x5395020 <Address 0x5395020
>>> out of bounds>}
>
>> Well, that's pretty dang interesting.  If the end of the block is
>> indeed
>> out of bounds as gdb claims, that'd explain why it crashes right here
>> (actually the crash would be induced by the preceding line of code,
>> where it tries to store a marker byte).  But how can that be, unless
>> malloc is completely broken?  And if it is, why's it only
>> affecting the
>> 8.2 branch?  I'm confused.
>
> Whoa ... osprey just went green in the 8.2 branch, following what is
> most surely an unrelated patch in vacuum.c.  Can anyone explain that?
> I distrust gift horses ...
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that
> your
>        message can get through to the mailing list cleanly
>

Re: Why is "osprey" dumping core in REL8_2 branch?

From

Tom Lane

Date:

11 March 2007, 23:53:27

Rémi Zara <remi_zara@mac.com> writes:
> The solution was to change the ulimit for data segment size.

Oh really ...

> Doesn't this mean that there is some place where the return value of
> malloc is not checked for null ?

You can see for yourself that the value *is* checked in the routine
that's at issue --- see line 520 in 8.2's aset.c.  Also the gdb'ing
you did showed that a nonzero value had been returned.

I think what you're looking at is a platform-specific bug in malloc().
        regards, tom lane