Thread: osprey dumped core on 8.2

osprey dumped core on 8.2

From
Alvaro Herrera
Date:
Osprey is a NetBSD running on m68k

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=osprey&dt=2007-02-22%2023:00:18

It dumped core running VACUUM:

--- 1,5 ---- VACUUM;
! server closed the connection unexpectedly
!     This probably means the server terminated abnormally
!     before or while processing the request.
! connection to server was lost


The stack trace report looks incomplete:

================== stack trace: pgsql.27009/src/test/regress/tmp_check/data/postgres.core ==================
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
#0  0x001f74d6 in AllocSetAlloc (context=0x307d10, size=16777212) at aset.c:546
546            if (set->blocks != NULL)

It's missing the "bt" part.


I don't understand how can this happen, given that "set" cannot be NULL
at this point.

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Puedes elegir el color de tu auto -- siempre y cuando sea negro."
(HenryFord)
 


Re: osprey dumped core on 8.2

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Osprey is a NetBSD running on m68k

Yeah, it's been failing consistently on the 8.2 branch for a while, but
not either 8.1 or HEAD, which is awfully strange.

> Program terminated with signal 11, Segmentation fault.
> #0  0x001f74d6 in AllocSetAlloc (context=0x307d10, size=16777212) at aset.c:546
> 546            if (set->blocks != NULL)

> I don't understand how can this happen, given that "set" cannot be NULL
> at this point.

I talked to Remi about this last month, and we concluded that the core
dump is probably really at the line just prior, where it's trying to
stick a marker at the end of the used space:
        ((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;

But neither of us could see how that could happen unless malloc is
outright broken.  Remi did some gdb'ing that seemed to indicate
that malloc had failed to provide a block as large as it claimed:

: Rémi Zara <remi_zara@mac.com> writes:
: > (gdb) info locals
: > block = 0x4395000
: > chunk = 0x4395010
: > priorfree = 0x5395020
: > chunk_size = 16777216
: > blksize = 70864912
: > (gdb) p *block
: > $5 = {aset = 0x306d10, next = 0x0, freeptr = 0x5395020 <Address  0x5395020 out of bounds>, endptr = 0x5395020
<Address0x5395020 out of bounds>}
 
: 
: Well, that's pretty dang interesting.  If the end of the block is indeed
: out of bounds as gdb claims, that'd explain why it crashes right here
: (actually the crash would be induced by the preceding line of code,
: where it tries to store a marker byte).  But how can that be, unless
: malloc is completely broken?  And if it is, why's it only affecting the
: 8.2 branch?  I'm confused.

and it kinda tailed off there ...
        regards, tom lane