Re: [HACKERS] ERROR: infinite recursion in proc_exit - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] ERROR: infinite recursion in proc_exit
Date
Msg-id 2486.941911693@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] ERROR: infinite recursion in proc_exit  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] ERROR: infinite recursion in proc_exit  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Fix applied:

>     /*
>      * If proc_exit is called too many times something bad is happening, so
>      * exit immediately.  This is crafted in two if's for a reason.
>      */
>     if (proc_exit_inprogress == 9)
>         elog(ERROR, "infinite recursion in proc_exit");
>     if (proc_exit_inprogress >= 9)
>         goto exit;

That isn't going to make things any better, because it's still laboring
under the same basic oversight: elog(ERROR) does not return to the
caller, it returns control to the main loop.  Thus, having proc_exit
call elog(ERROR) is simply *guaranteed* to create a failure.  proc_exit
must not allow control to return anywhere, under any circumstances,
because it is used as the final recourse when things are too screwed up
to consider continuing.

Although it's not hard to remove this silliness from proc_exit itself,
I suspect that the real source of the problem is probably elog(ERROR)
being called by one of the on_shmem_exit or on_proc_exit routines that
proc_exit is supposed to call.  I don't think we can hunt down and
eliminate all possible paths where that could happen (even if we could
do it today, it's too likely that future code changes would make it
possible again).

I think what we must do instead is to fix things so that if an
on_shmem_exit/on_proc_exit routine elog's, control comes back to
proc_exit and we carry on exiting with the remaining
on_shmem_exit/on_proc_exit routines (*not* calling the one that
failed again, of course).

A sketch of the necessary changes is:

1. proc_exit_inprogress becomes a global, and we change the
setjmp-catching code in postgres.c so that if proc_exit_inprogress is
nonzero, it just calls proc_exit() immediately.  This allows proc_exit
to recover control after an on_shmem_exit/on_proc_exit routine fails via
elog().  (Note: for elog(FATAL), it's already true that elog passes
control straight off to proc_exit.)

2. proc_exit and shmem_exit should decrement on_proc_exit_index and
on_shmem_exit_index as they work through the lists of registered
routines --- in effect, each routine is removed from the list just
before it is called.  That prevents the same routine from being called
again if it fails.

3. Ensure that the final exit() code is nonzero if we cycled through
proc_exit more than once --- errors during proc_exit should always be
treated as the equivalent of an outright crash, forcing a postmaster
cleanup cycle, I think.

Before I try to implement this scheme, can anyone spot a problem with
it?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Re: New version of psql
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] ERROR: infinite recursion in proc_exit