Thread: Cause of "can't wait without a PROC structure"

Cause of "can't wait without a PROC structure"

From
Tom Lane
Date:
I've identified the reason for the occasional "can't wait without a PROC
structure" failures we've seen reported.  I had been thinking that this
must occur during backend startup, before MyProc is initialized ...
but I was mistaken.  Actually, it happens during backend shutdown,
and the reason is that ProcKill (which releases the PGPROC structure
and resets MyProc to NULL) is called before ShutdownBufferPoolAccess.
But the latter tries to acquire the bufmgr LWLock.  If it has to wait,
kaboom.

The ordering of these shutdown hooks is the reverse of the ordering
of the startup initialization of the modules.  It looks like we'll
need to rejigger the startup ordering ... and it also looks like that's
going to be a rather ticklish issue.  (See comments in BaseInit and
InitPostgres.)  Any thoughts on how to do it?
        regards, tom lane


Re: Cause of "can't wait without a PROC structure"

From
Scott Shattuck
Date:
On Wed, 2002-09-25 at 09:52, Tom Lane wrote:
> I've identified the reason for the occasional "can't wait without a PROC
> structure" failures we've seen reported.  I had been thinking that this
> must occur during backend startup, before MyProc is initialized ...
> but I was mistaken.  Actually, it happens during backend shutdown,
> and the reason is that ProcKill (which releases the PGPROC structure
> and resets MyProc to NULL) is called before ShutdownBufferPoolAccess.
> But the latter tries to acquire the bufmgr LWLock.  If it has to wait,
> kaboom.
> 

Great news that you've identified the problem. We continue to see this
every few days and it's the only thing that takes our servers down over
weeks of pounding.

> The ordering of these shutdown hooks is the reverse of the ordering
> of the startup initialization of the modules.  It looks like we'll
> need to rejigger the startup ordering ... and it also looks like that's
> going to be a rather ticklish issue.  (See comments in BaseInit and
> InitPostgres.)  Any thoughts on how to do it?
> 

Sorry I can't add any insight at this level...but I can say that it
would be significant to my customer(s) and my ability to recommend PG to
future "ex-Oracle users" ;) to see a fix make it into the 7.3 final.

ss


Scott Shattuck
Technical Pursuit Inc.




Re: Cause of "can't wait without a PROC structure"

From
Tom Lane
Date:
Scott Shattuck <ss@technicalpursuit.com> writes:
> Sorry I can't add any insight at this level...but I can say that it
> would be significant to my customer(s) and my ability to recommend PG to
> future "ex-Oracle users" ;) to see a fix make it into the 7.3 final.

Rest assured that it *will* be fixed in 7.3 final; this is a "must fix"
item in my book ... and now that we know the cause, it's just a matter
of choosing the cleanest solution.
        regards, tom lane


Re: Cause of "can't wait without a PROC structure"

From
Tom Lane
Date:
I said:
> The ordering of these shutdown hooks is the reverse of the ordering
> of the startup initialization of the modules.  It looks like we'll
> need to rejigger the startup ordering ... and it also looks like that's
> going to be a rather ticklish issue.  (See comments in BaseInit and
> InitPostgres.)  Any thoughts on how to do it?

I eventually decided that the most reasonable solution was to leave the
startup sequence alone, and fold the ProcKill and
ShutdownBufferPoolAccess shutdown hooks together.  This is a little ugly
but it seems to beat the alternatives.  ShutdownBufferPoolAccess was
effectively assuming that LWLockReleaseAll was called just before it,
so the two modules aren't really independent anyway.
        regards, tom lane


Re: Cause of "can't wait without a PROC structure"

From
Bruce Momjian
Date:
Tom Lane wrote:
> I said:
> > The ordering of these shutdown hooks is the reverse of the ordering
> > of the startup initialization of the modules.  It looks like we'll
> > need to rejigger the startup ordering ... and it also looks like that's
> > going to be a rather ticklish issue.  (See comments in BaseInit and
> > InitPostgres.)  Any thoughts on how to do it?
> 
> I eventually decided that the most reasonable solution was to leave the
> startup sequence alone, and fold the ProcKill and
> ShutdownBufferPoolAccess shutdown hooks together.  This is a little ugly
> but it seems to beat the alternatives.  ShutdownBufferPoolAccess was
> effectively assuming that LWLockReleaseAll was called just before it,
> so the two modules aren't really independent anyway.

I understand.  Sometimes the dependencies are too intricate to break
apart, and you just reorder them.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073