Re: Postgresql 8.4.1 segfault, backtrace - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Postgresql 8.4.1 segfault, backtrace
Date
Msg-id 23820.1253805366@sss.pgh.pa.us
Whole thread Raw
In response to Postgresql 8.4.1 segfault, backtrace  (Richard Neill <rn214@cam.ac.uk>)
List pgsql-bugs
Richard Neill <rn214@cam.ac.uk> writes:
> I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and
> we've found that this is still happening repeatedly in 8.4.1.

Oh dear.  I just got an off-list report that seems to point to the same
kind of thing.

> The backtrace points to line 2654 in relcache.c, in
>    RelationCacheInitializePhase2()

> There is a NULL dereference of "relation"

>   => needNewCacheFile = false
>      criticalRelcachesBuilt = true

> => nothing is happening before it enters the failure code block.

<spock>Fascinating.</spock>

I think this must mean that corrupt data is being read from the relcache
init file.  The reason a restart fixes it is probably that restart
forcibly removes the old init file, which is good for recovery but not
so good for finding out what's wrong.  Could you modify
RelationCacheInitFileRemove (at the bottom of relcache.c) to rename the
file someplace else instead of deleting it?  And then send me a copy
of the bad file once you have one?

> I can give you a core dump if anyone would like to see it, but it's 405
> MB after bzipping.

Not going to help anyone else anyway, since it's uninterpretable without
a duplicate system.  (If you have a spare machine with the same OS and
the same postgres executables, maybe you could put the core file on that
and let me ssh in to have a look?)

> One last observation: a dump and restore of the DB seems to prevent it
> crashing for about a day.

Do you have any maintenance operations that touch the system catalogs
(like maybe a forced REINDEX)?  Can you correlate the crashes with any
activity of that sort?

BTW, the other reporter claimed that the problem went away after
building with asserts+debug.  I'm not sure I believe that, especially
seeing that you evidently have debug on.  But if you don't have asserts
enabled, please rebuild with them and see if that changes anything.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Dave Page
Date:
Subject: Re: Porblem running on Windows 2003 server
Next
From: "Sergey Manakov"
Date:
Subject: BUG #5078: returns setof functions fails after table structure altered