Re: strange buildfarm failures - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: strange buildfarm failures
Date
Msg-id 20070429162552.GH18593@alvh.no-ip.org
Whole thread Raw
In response to Re: strange buildfarm failures  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: strange buildfarm failures
List pgsql-hackers
Alvaro Herrera wrote:
> Stefan Kaltenbrunner wrote:
> 
> > well - i now have a core file but it does not seem to be much worth
> > except to prove that autovacuum seems to be the culprit:
> > 
> > Core was generated by `postgres: autovacuum worker process
> >                              '.
> > Program terminated with signal 6, Aborted.
> > 
> > [...]
> > 
> > #0  0x00000ed9 in ?? ()
> > warning: GDB can't find the start of the function at 0xed9.
> 
> Interesting.  Notice how it doesn't have the database name in the ps
> display.  This means it must have crashed between the initial
> init_ps_display and the set_ps_display call just before starting to
> vacuum.  So the bug is probably in the startup code; probably the code
> dealing with the PGPROC which is the newest and weirder stuff.

Oh, another thing that I think may be happening is that the stack is
restored in longjmp, so it is trying to report an error elsewhere but
it crashes because something got overwritten or something; i.e. a
bug in the error recovery code.  I don't know how feasible this is or
even if it makes sense (would longjmp() restore the ps display?), but we
had similar, very hard to debug errors in Mammoth Replicator, which is
why I'm mentioning it in case it rings a bell.

-- 
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: strange buildfarm failures
Next
From: Tom Lane
Date:
Subject: Re: Reducing stats collection overhead