Thread: Cause of "can't wait without a PROC structure"
I've identified the reason for the occasional "can't wait without a PROC structure" failures we've seen reported. I had been thinking that this must occur during backend startup, before MyProc is initialized ... but I was mistaken. Actually, it happens during backend shutdown, and the reason is that ProcKill (which releases the PGPROC structure and resets MyProc to NULL) is called before ShutdownBufferPoolAccess. But the latter tries to acquire the bufmgr LWLock. If it has to wait, kaboom. The ordering of these shutdown hooks is the reverse of the ordering of the startup initialization of the modules. It looks like we'll need to rejigger the startup ordering ... and it also looks like that's going to be a rather ticklish issue. (See comments in BaseInit and InitPostgres.) Any thoughts on how to do it? regards, tom lane
On Wed, 2002-09-25 at 09:52, Tom Lane wrote: > I've identified the reason for the occasional "can't wait without a PROC > structure" failures we've seen reported. I had been thinking that this > must occur during backend startup, before MyProc is initialized ... > but I was mistaken. Actually, it happens during backend shutdown, > and the reason is that ProcKill (which releases the PGPROC structure > and resets MyProc to NULL) is called before ShutdownBufferPoolAccess. > But the latter tries to acquire the bufmgr LWLock. If it has to wait, > kaboom. > Great news that you've identified the problem. We continue to see this every few days and it's the only thing that takes our servers down over weeks of pounding. > The ordering of these shutdown hooks is the reverse of the ordering > of the startup initialization of the modules. It looks like we'll > need to rejigger the startup ordering ... and it also looks like that's > going to be a rather ticklish issue. (See comments in BaseInit and > InitPostgres.) Any thoughts on how to do it? > Sorry I can't add any insight at this level...but I can say that it would be significant to my customer(s) and my ability to recommend PG to future "ex-Oracle users" ;) to see a fix make it into the 7.3 final. ss Scott Shattuck Technical Pursuit Inc.
Scott Shattuck <ss@technicalpursuit.com> writes: > Sorry I can't add any insight at this level...but I can say that it > would be significant to my customer(s) and my ability to recommend PG to > future "ex-Oracle users" ;) to see a fix make it into the 7.3 final. Rest assured that it *will* be fixed in 7.3 final; this is a "must fix" item in my book ... and now that we know the cause, it's just a matter of choosing the cleanest solution. regards, tom lane
I said: > The ordering of these shutdown hooks is the reverse of the ordering > of the startup initialization of the modules. It looks like we'll > need to rejigger the startup ordering ... and it also looks like that's > going to be a rather ticklish issue. (See comments in BaseInit and > InitPostgres.) Any thoughts on how to do it? I eventually decided that the most reasonable solution was to leave the startup sequence alone, and fold the ProcKill and ShutdownBufferPoolAccess shutdown hooks together. This is a little ugly but it seems to beat the alternatives. ShutdownBufferPoolAccess was effectively assuming that LWLockReleaseAll was called just before it, so the two modules aren't really independent anyway. regards, tom lane
Tom Lane wrote: > I said: > > The ordering of these shutdown hooks is the reverse of the ordering > > of the startup initialization of the modules. It looks like we'll > > need to rejigger the startup ordering ... and it also looks like that's > > going to be a rather ticklish issue. (See comments in BaseInit and > > InitPostgres.) Any thoughts on how to do it? > > I eventually decided that the most reasonable solution was to leave the > startup sequence alone, and fold the ProcKill and > ShutdownBufferPoolAccess shutdown hooks together. This is a little ugly > but it seems to beat the alternatives. ShutdownBufferPoolAccess was > effectively assuming that LWLockReleaseAll was called just before it, > so the two modules aren't really independent anyway. I understand. Sometimes the dependencies are too intricate to break apart, and you just reorder them. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073