Thread: ERROR: infinite recursion in proc_exit

ERROR: infinite recursion in proc_exit

From
Kristofer Munn
Date:
Hi all!

On a number of occasions over the past couple of weeks I have encountered
the following error during my nightly vacuum process.  It repeats
infinitely until outside intervention.  It seems to identify the fact that
it is stuck in an infinite recursion but unable to deal with it.  The
relevant portion of the log is at the end of the message.  Any
hints/suggestions are welcome.

Also, after exiting from the infinite loop I have the following errors
appearing during vacuum and vacuum analyze:

NOTICE:  CreatePortal: portal <vacuum> already exists
NOTICE:  Index pg_attribute_attrelid_index: NUMBER OF INDEX' TUPLES (1203) IS NOT THE SAME AS HEAP' (1193)
NOTICE:  Index pg_attribute_relid_attnum_index: NUMBER OF INDEX' TUPLES (1203) IS NOT THE SAME AS HEAP' (1193)
NOTICE:  Index pg_attribute_relid_attnam_index: NUMBER OF INDEX' TUPLES (1203) IS NOT THE SAME AS HEAP' (1193)
ERROR:  cannot find attribute 1 of relation pg_temp.13894.119

Any suggestions on cleaning this up is appreciated...

- Kristofer

Nov  4 03:37:00 mymailman logger: DEBUG:  --Relation tblvalue-- 
Nov  4 03:37:04 mymailman logger: DEBUG:  Pages 945: Changed 83, Reapped 5, Empty 0, New 0; Tup 128483: Vac 13,
Keep/VTL0/0, Crash 0, UnUsed 0, MinLen 53, MaxLen 85; Re-using: Free/Avail. Space 876/876; EndEmpty/Avail. Pages 0/5.
Elapsed0/0 sec. 
 
Nov  4 03:37:08 mymailman logger: DEBUG:  Index tblvalue_idx2: Pages 432; Tuples 128483: Deleted 13. Elapsed 0/0 sec. 
Nov  4 03:37:10 mymailman logger: DEBUG:  Index tblvalue_oid: Pages 289; Tuples 128483: Deleted 13. Elapsed 0/1 sec. 
Nov  4 03:37:20 mymailman logger: ERROR:  cannot find attribute 1 of relation pg_temp.13894.119 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: FATAL 1:  Socket command type  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: ERROR:  unknown frontend message was received 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: FATAL 1:  Socket command type ? unknown 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: ERROR:  unknown frontend message was received 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: FATAL 1:  Socket command type ? unknown 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: ERROR:  unknown frontend message was received 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: FATAL 1:  Socket command type ? unknown 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: ERROR:  unknown frontend message was received 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: FATAL 1:  Socket command type ? unknown 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: ERROR:  unknown frontend message was received 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: ERROR:  infinite recursion in proc_exit 

Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: ERROR:  infinite recursion in proc_exit 

Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: ERROR:  infinite recursion in proc_exit 

Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
Nov  4 03:37:20 mymailman logger: ERROR:  infinite recursion in proc_exit 

- K

Kristofer Munn * KMI * 973-509-9414 * AIM KrMunn * ICQ 352499 * www.munn.com



Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Tom Lane
Date:
Kristofer Munn <kmunn@munn.com> writes:
> On a number of occasions over the past couple of weeks I have encountered
> the following error during my nightly vacuum process.  It repeats
> infinitely until outside intervention.  It seems to identify the fact that
> it is stuck in an infinite recursion but unable to deal with it.

What Postgres version are you using?  This loop:

> Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor
> Nov  4 03:37:20 mymailman logger: FATAL 1:  Socket command type ? unknown 
> Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
> Nov  4 03:37:20 mymailman logger: ERROR:  unknown frontend message was received 
> Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
> Nov  4 03:37:20 mymailman logger: NOTICE:  AbortTransaction and not in in-progress state  
> Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
> Nov  4 03:37:20 mymailman logger: pq_flush: send() failed: Bad file descriptor 
> Nov  4 03:37:20 mymailman logger: pq_recvbuf: recv() failed: Bad file descriptor

looks like we are trying to read a command from the frontend, failing
because the socket descriptor's been clobbered, trying to send the
elog() report to the client --- which also fails of course --- returning
to the main loop and failing again.  But as far as I can tell from
looking at either 6.5 or current code, that can't happen: a failure
return from pq_recvbuf should lead to proc_exit *without* reaching the
'Socket command type ? unknown' elog.  So I think you are working with
6.4 or older code, in which case an update would be your best bet.

If you want to try to recover without doing an update, I think you'll
still need to do a pg_dump/destroydb/createdb/reload.  It looks like
the indexes on pg_attribute have been corrupted, and there's not any
easier way to clean that up.  (If it were a user table, you could just
drop and recreate the index, but don't try that on pg_attribute...)
        regards, tom lane


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Kristofer Munn
Date:
> What Postgres version are you using?

My apologies, should have included that with my original request:

[PostgreSQL 6.5.2 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66]

> If you want to try to recover without doing an update, I think you'll
> still need to do a pg_dump/destroydb/createdb/reload.  It looks like
> the indexes on pg_attribute have been corrupted, and there's not any
> easier way to clean that up.  (If it were a user table, you could just
> drop and recreate the index, but don't try that on pg_attribute...)

What are the ramifications of continuing with the corrupted indexes -
undefined behavior?  Filesystems have fsck to fix stuff - are there any
tools on the docket to reconstruct the indexes or other recoverable
things?  These would be useful for systems where the database is 500+ Megs
and takes quite awhile to reload.

The application involved uses temp files (you see some sort of attribute
failure in the VACUUM before everything goes haywire - perhaps unrelated)
as well as transactions.  I don't create or drop temp tables inside
transactions to avoid the accompanying error messages.  A database
maintenance program runs nightly to cull old data from the database and
then runs a vacuum.  During that time, transactions continue unabated.

- K

Kristofer Munn * KMI * 973-509-9414 * AIM KrMunn * ICQ 352499 * www.munn.com



Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Tom Lane
Date:
Kristofer Munn <kmunn@munn.com> writes:
>> What Postgres version are you using?
> My apologies, should have included that with my original request:
> [PostgreSQL 6.5.2 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66]

Hmm.  If that trace is from 6.5 code, then postgres.c should certainly
be calling proc_exit after the recv() fails.  I wonder if proc_exit is
returning because proc_exit_inprogress is nonzero?  proc_exit's use of
elog(ERROR) does look mighty bogus to me --- that path could possibly
cause a recursion just like this, but how did the code get into it to
begin with?

But that's not very relevant to your real problem, which is that
there must be something corrupted in pg_attribute's indexes.

> What are the ramifications of continuing with the corrupted indexes -
> undefined behavior?

I wouldn't recommend it.

> Filesystems have fsck to fix stuff - are there any
> tools on the docket to reconstruct the indexes or other recoverable
> things?

I've thought for some time that vacuum ought to just rebuild the indexes
from scratch.  That'd provide a recovery path for this sort of problem,
and I suspect it'd actually be faster than what vacuum does now.  I'm
not volunteering to make it happen, though.
        regards, tom lane


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> Kristofer Munn <kmunn@munn.com> writes:
> > [PostgreSQL 6.5.2 on i686-pc-linux-gnu, compiled by gcc egcs-2.91.66]
>
> But that's not very relevant to your real problem, which is that
> there must be something corrupted in pg_attribute's indexes.
>
> > What are the ramifications of continuing with the corrupted indexes -
> > undefined behavior?
>
> I wouldn't recommend it.
>
> > Filesystems have fsck to fix stuff - are there any
> > tools on the docket to reconstruct the indexes or other recoverable
> > things?
>
> I've thought for some time that vacuum ought to just rebuild the indexes
> from scratch.  That'd provide a recovery path for this sort of problem,
> and I suspect it'd actually be faster than what vacuum does now.  I'm
> not volunteering to make it happen, though.

    I  don't  know if you could drop/rebuild an index on a system
    catalog while the database is online.

    But there was sometimes a utility called reindexdb. That used
    the  bootstrap  processing  mode  interface of the backend to
    drop and recreate all system catalog indices.  I  don't  know
    who removed that and why, neither do I know if it would still
    be  possible  to  drop  and  reindex  through  the  bootstrap
    interface.

    Does someone remember why it's gone?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Massimo Dal Zotto
Date:
> 
> Hmm.  If that trace is from 6.5 code, then postgres.c should certainly
> be calling proc_exit after the recv() fails.  I wonder if proc_exit is
> returning because proc_exit_inprogress is nonzero?  proc_exit's use of
> elog(ERROR) does look mighty bogus to me --- that path could possibly
> cause a recursion just like this, but how did the code get into it to
> begin with?

The proc_exit_inprogress stuff was added by me after I found some backends
doing exactly that sort of infinite recursion after a socket recv error.
It doesn't correct the original error but at least il will exit the backend
after 10 iterations. The elog(ERROR) might be bogus in this context, but how
can you otherwise notify the error? Maybe a better solution could be this:
if (proc_exit_inprogress++ == 9)    elog(ERROR, "infinite recursion in proc_exit");if (proc_exit_inprogress >= 9)
gotoexit;
 

-- 
Massimo Dal Zotto

+----------------------------------------------------------------------+
|  Massimo Dal Zotto               email: dz@cs.unitn.it               |
|  Via Marconi, 141                phone: ++39-0461534251              |
|  38057 Pergine Valsugana (TN)      www: http://www.cs.unitn.it/~dz/  |
|  Italy                             pgp: finger dz@tango.cs.unitn.it  |
+----------------------------------------------------------------------+


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Bruce Momjian
Date:
> > 
> > Hmm.  If that trace is from 6.5 code, then postgres.c should certainly
> > be calling proc_exit after the recv() fails.  I wonder if proc_exit is
> > returning because proc_exit_inprogress is nonzero?  proc_exit's use of
> > elog(ERROR) does look mighty bogus to me --- that path could possibly
> > cause a recursion just like this, but how did the code get into it to
> > begin with?
> 
> The proc_exit_inprogress stuff was added by me after I found some backends
> doing exactly that sort of infinite recursion after a socket recv error.
> It doesn't correct the original error but at least il will exit the backend
> after 10 iterations. The elog(ERROR) might be bogus in this context, but how
> can you otherwise notify the error? Maybe a better solution could be this:
> 
>     if (proc_exit_inprogress++ == 9)
>         elog(ERROR, "infinite recursion in proc_exit");
>     if (proc_exit_inprogress >= 9)
>         goto exit;


Fix applied:
   /*    * If proc_exit is called too many times something bad is happening, so    * exit immediately.  This is crafted
intwo if's for a reason.    */   if (proc_exit_inprogress == 9)       elog(ERROR, "infinite recursion in proc_exit");
if(proc_exit_inprogress >= 9)       goto exit;
 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Fix applied:

>     /*
>      * If proc_exit is called too many times something bad is happening, so
>      * exit immediately.  This is crafted in two if's for a reason.
>      */
>     if (proc_exit_inprogress == 9)
>         elog(ERROR, "infinite recursion in proc_exit");
>     if (proc_exit_inprogress >= 9)
>         goto exit;

That isn't going to make things any better, because it's still laboring
under the same basic oversight: elog(ERROR) does not return to the
caller, it returns control to the main loop.  Thus, having proc_exit
call elog(ERROR) is simply *guaranteed* to create a failure.  proc_exit
must not allow control to return anywhere, under any circumstances,
because it is used as the final recourse when things are too screwed up
to consider continuing.

Although it's not hard to remove this silliness from proc_exit itself,
I suspect that the real source of the problem is probably elog(ERROR)
being called by one of the on_shmem_exit or on_proc_exit routines that
proc_exit is supposed to call.  I don't think we can hunt down and
eliminate all possible paths where that could happen (even if we could
do it today, it's too likely that future code changes would make it
possible again).

I think what we must do instead is to fix things so that if an
on_shmem_exit/on_proc_exit routine elog's, control comes back to
proc_exit and we carry on exiting with the remaining
on_shmem_exit/on_proc_exit routines (*not* calling the one that
failed again, of course).

A sketch of the necessary changes is:

1. proc_exit_inprogress becomes a global, and we change the
setjmp-catching code in postgres.c so that if proc_exit_inprogress is
nonzero, it just calls proc_exit() immediately.  This allows proc_exit
to recover control after an on_shmem_exit/on_proc_exit routine fails via
elog().  (Note: for elog(FATAL), it's already true that elog passes
control straight off to proc_exit.)

2. proc_exit and shmem_exit should decrement on_proc_exit_index and
on_shmem_exit_index as they work through the lists of registered
routines --- in effect, each routine is removed from the list just
before it is called.  That prevents the same routine from being called
again if it fails.

3. Ensure that the final exit() code is nonzero if we cycled through
proc_exit more than once --- errors during proc_exit should always be
treated as the equivalent of an outright crash, forcing a postmaster
cleanup cycle, I think.

Before I try to implement this scheme, can anyone spot a problem with
it?
        regards, tom lane


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Bruce Momjian
Date:
> That isn't going to make things any better, because it's still laboring
> under the same basic oversight: elog(ERROR) does not return to the
> caller, it returns control to the main loop.  Thus, having proc_exit
> call elog(ERROR) is simply *guaranteed* to create a failure.  proc_exit
> must not allow control to return anywhere, under any circumstances,
> because it is used as the final recourse when things are too screwed up
> to consider continuing.

Massimo pointed out some problems in my fix.  The new code is:
   if (++proc_exit_inprogress == 9)         elog(ERROR, "infinite recursion in proc_exit");   if (proc_exit_inprogress
>=9)       goto exit;      /* ----------------    *  if proc_exit_inprocess > 1, then it means that we    *  are being
invokedfrom within an on_exit() handler    *  and so we return immediately to avoid recursion.      * ----------------
 */   if (proc_exit_inprogress > 1)         return;
 

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
>     /* ----------------
>      *  if proc_exit_inprocess > 1, then it means that we
>      *  are being invoked from within an on_exit() handler
>      *  and so we return immediately to avoid recursion.  
>      * ----------------
>      */
>     if (proc_exit_inprogress > 1)  
>         return;

No, no, no, noooo!!!

proc_exit MUST NOT RETURN.  EVER, UNDER ANY CIRCUMSTANCES.

If it does, that means that elog(STOP) can return under some
circumstances.  The callers of elog() are not expecting that,
and they are likely to screw things up even worse if elog returns
control unexpectedly.

AFAICS, this set of problems cannot be fixed by localized patching in
proc_exit.  We have to globally change the way in which errors are
processed after proc_exit has begun execution.
        regards, tom lane


Re: [HACKERS] ERROR: infinite recursion in proc_exit

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> >     /* ----------------
> >      *  if proc_exit_inprocess > 1, then it means that we
> >      *  are being invoked from within an on_exit() handler
> >      *  and so we return immediately to avoid recursion.  
> >      * ----------------
> >      */
> >     if (proc_exit_inprogress > 1)  
> >         return;
> 
> No, no, no, noooo!!!
> 
> proc_exit MUST NOT RETURN.  EVER, UNDER ANY CIRCUMSTANCES.
> 
> If it does, that means that elog(STOP) can return under some
> circumstances.  The callers of elog() are not expecting that,
> and they are likely to screw things up even worse if elog returns
> control unexpectedly.
> 
> AFAICS, this set of problems cannot be fixed by localized patching in
> proc_exit.  We have to globally change the way in which errors are
> processed after proc_exit has begun execution.
> 

Oh, well.  I tried.  Code is better than it used to be at least.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026