Re: PosgreSQL is crashing with a signal 11 - Bug? - Mailing list pgsql-bugs

From Rafael Martinez
Subject Re: PosgreSQL is crashing with a signal 11 - Bug?
Date
Msg-id 1094592043.5232.38.camel@linux.site
Whole thread Raw
In response to Re: PosgreSQL is crashing with a signal 11 - Bug?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PosgreSQL is crashing with a signal 11 - Bug?
List pgsql-bugs
On Tue, 2004-09-07 at 19:58, Tom Lane wrote:

> Rafael Martinez Guerrero <r.m.guerrero@usit.uio.no> writes:
> > * Information from CORE dump we got with --enable-debug. We have
> > compiled a new version of postgres and run it through gdb with the core
> > dump we had/got from postgres without --enable-debug.=20
>
> Okay, theoretically that works, but it might be smarter to install the
> debug build and get a fresh core dump that definitely corresponds to it.
>

It is late in Norway and we need to sleep, we will try this tomorrow
morning.


> > #0  0xb734d07c in memcpy () from /lib/tls/libc.so.6
>
> > #1  0x0806bba8 in DataFill (data=3D0xb7489000 <Address 0xb7489000 out of
> > bounds>, tupleDesc=3D0x82fd554, value=3D0x82fd550, nulls=3D0xbfff7ec0 "  n =
> >  ",
> > infomask=3D0x836e904c, bit=3D0x836e904f "\003\f") at heaptuple.c:139
>
> If accurate, that says it's crashing here:
>
>             /* fixed-length pass-by-reference */
>             Assert(att[i]->attlen > 0);
>             data_length = att[i]->attlen;
> -->         memcpy(data, DatumGetPointer(value[i]), data_length);
>
> which suggests either that att[i]->attlen is corrupt, or that the
> computed length for the preceding column was wacko (leading to the
> data pointer being moved to a silly address), or that the provided
> value[i] is wrong.  In the context at hand none of these seem especially
> likely, but one of them must be the case.  Can you look with jdb to
>  see what the value of i is, and print out the contents of the *(att[i])
> struct?  Also look at "data" and "value[i]" to see if they are sensible
> pointers or not.
>

I got this from one of our developers (from the core dump generated by
7.3.7 without --enable-debug):
--------------------------------------
(gdb) inspect i
$1 = 1

(gdb) inspect att[i]
$2 = 0x82fd6e8

(gdb) inspect *att[i]
$3 = {attrelid = 0, attname = {data = '\0' <repeats 63 times>,
alignmentDummy = 0}, atttypid = 1700, attstattarget = -1, attlen = -1,
attnum = 2, attndims = 0, attcacheoff = -1, atttypmod = 393220, attbyval
= 0 '\0', attstorage = 109 'm', attisset = 0 '\0', attalign = 105 'i',
attnotnull = 0 '\0', atthasdef = 0 '\0', attisdropped = 0 '\0',
attislocal = 1 '\001', attinhcount = 0}

(gdb) inspect data
$4 = 0xb7489000 <Address 0xb7489000 out of bounds>

(gdb) inspect value[i]
$5 = 3054556648


> How reproducible is the crash --- does it happen every time you execute
> this particular FETCH?
>

We are not sure about this. We did not log as much as we should in the
beginning. One thing is sure, the last time, it happens after this
FETCH. We have full logging on now and we will be able to know more
about this if/when it crash again.



>             regards, tom lane


Thanks for your help. I hope you/we will be able to find out this, right
now is a big crisis for us.

--
 Rafael Martinez, <r.m.guerrero@usit.uio.no>
 Center for Information Technology Services
 University of Oslo, Norway

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: PosgreSQL is crashing with a signal 11 - Bug?
Next
From: Rafael Martinez
Date:
Subject: Re: PosgreSQL is crashing with a signal 11 - Bug?