Re: strange buildfarm failures - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: strange buildfarm failures
Date
Msg-id 20070502030116.GJ5867@alvh.no-ip.org
Whole thread Raw
In response to Re: strange buildfarm failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: strange buildfarm failures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Oh, another thing that I think may be happening is that the stack is
> > restored in longjmp, so it is trying to report an error elsewhere but
> > it crashes because something got overwritten or something; i.e. a
> > bug in the error recovery code.
> 
> Hm, something trying to elog before the setjmp's been executed?
> Although I thought it was coded so that elog.c would just proc_exit
> if there was noplace to longjmp to.  A mistake here might explain
> the lack of any message in the postmaster log: if elog.c thinks it
> should longjmp then it doesn't print the message first.

Well, there seems to be plenty of code which is happy to elog(ERROR)
before the longjmp target block has been set; for example
InitFileAccess(), which is called on BaseInit(), which comes before
sigsetjmp() both on postgres.c and autovacuum.c.  (This particular case
is elog(FATAL) not ERROR however).  mdinit() also does some memory
allocation which could certainly fail.

I'm wondering if it wouldn't be more robust to define a longjmp target
block before calling BaseInit(), and have it exit cleanly in case of
failure (which is what you say elog.c should be doing if there is no
target block).

In errstart(), it is checked if PG_exception_stack is NULL.  Now, this
symbol is defined in elog.c and initialized to NULL, but I wonder if a
child process inherits the value that postmaster set, or it comes back
to NULL.  The backend would not inherit any of the values the postmaster
set if the latter were the case, so I'm assuming that PG_exception_stack
stays as the postmaster left it.  I wonder what happens if the child
process finds that this is an invalid point to jump to?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Patch queue triage
Next
From: Greg Smith
Date:
Subject: Re: Patch queue triage