Thread: Weird postmaster crashes
I am experiencing database server crashes quite frequently (sometimes, *daily*), and I am having hard time identifying what could possibly be causing them :-( They seem to be happenning kinda randomly, I was unable to attribute them to any specific database activity going on at the time... The postgres log looks like: 2003-06-10 13:53:32 [14522] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [16915] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [14523] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [17095] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without a PROC structure 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without a PROC structure 2003-06-10 13:53:32 [14527] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [14685] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [17093] DEBUG: pq_recvbuf: unexpected EOF on client connection 2003-06-10 13:53:32 [17092] DEBUG: pq_recvbuf: unexpected EOF on client connection .... <snip a few identical messages (with different pids) 2003-06-10 13:53:33 [14072] DEBUG: server process (pid 14551) exited with exit code 1 2003-06-10 13:53:33 [14072] DEBUG: terminating any other active server processes 2003-06-10 13:53:33 [1609] NOTICE: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. ..... It does not even produce a core file after this - just silently exists, and restarts itself. Could somebody please point me to any clue what could possibly be wrong with it? This is 7.2.1 - I know, I need to upgrade. Working on it, but it is going to take a while, and at the time being I would greatly appreciate any ideas on what I can do about this thing. Thanks a lot! Dima
the mantra is to always check hardware first. Do a disk and memory check. Dmitry Tkach wrote: > I am experiencing database server crashes quite frequently (sometimes, > *daily*), and I am having hard time identifying what could possibly be > causing them :-( > They seem to be happenning kinda randomly, I was unable to attribute > them to any specific database activity going on at the time... > The postgres log looks like: > > 2003-06-10 13:53:32 [14522] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [16915] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [14523] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [17095] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without > a PROC structure > 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without > a PROC structure > 2003-06-10 13:53:32 [14527] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [14685] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [17093] DEBUG: pq_recvbuf: unexpected EOF on > client connection > 2003-06-10 13:53:32 [17092] DEBUG: pq_recvbuf: unexpected EOF on > client connection > .... <snip a few identical messages (with different pids) > > 2003-06-10 13:53:33 [14072] DEBUG: server process (pid 14551) exited > with exit code 1 > 2003-06-10 13:53:33 [14072] DEBUG: terminating any other active server > processes > 2003-06-10 13:53:33 [1609] NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend > died abnormally and possibly corrupted shared memory. > ..... > > > It does not even produce a core file after this - just silently exists, > and restarts itself. > > Could somebody please point me to any clue what could possibly be wrong > with it? > > This is 7.2.1 - I know, I need to upgrade. > Working on it, but it is going to take a while, and at the time being I > would greatly appreciate any ideas on what I can do about this thing. > > Thanks a lot! > > Dima > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Dmitry Tkach <dmitry@openratings.com> writes: > I am experiencing database server crashes quite frequently > 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without > a PROC structure > This is 7.2.1 - I know, I need to upgrade. Yes, you do. This is a known bug that was fixed in .3 or .4. regards, tom lane
Tom Lane wrote: >Dmitry Tkach <dmitry@openratings.com> writes: > > >>I am experiencing database server crashes quite frequently >> >> > > > >>2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without >>a PROC structure >> >> > > > >>This is 7.2.1 - I know, I need to upgrade. >> >> > >Yes, you do. This is a known bug that was fixed in .3 or .4. > > regards, tom lane > > Thanks, Tom! That's kinda what I suspected.... Could you give me some idea on what circumstances cause this to happen? Thanks again! Dima
Dmitry Tkach <dmitry@openratings.com> writes: > 2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without > a PROC structure > Could you give me some idea on what circumstances cause this to happen? IIRC, it's an order-of-operations mistake during backend shutdown: the proc structure is deallocated while it's still possible to receive an interrupt from another backend --- and if you get such an interrupt, you need the proc. So from the user's point of view it's pretty unpredictable. Short answer: upgrade. This is not the only nasty bug in 7.2.1. regards, tom lane
Makes sense. Thanks! One more thing to clarify - when you said it was fixed in .3 and .4 did you mean 7.3 or 7.2.3? Thanks! Dima Tom Lane wrote: >Dmitry Tkach <dmitry@openratings.com> writes: > > >>2003-06-10 13:53:32 [14551] FATAL 1: LWLockAcquire: can't wait without >>a PROC structure >> >> > > > >>Could you give me some idea on what circumstances cause this to happen? >> >> > >IIRC, it's an order-of-operations mistake during backend shutdown: the >proc structure is deallocated while it's still possible to receive an >interrupt from another backend --- and if you get such an interrupt, you >need the proc. So from the user's point of view it's pretty >unpredictable. > >Short answer: upgrade. This is not the only nasty bug in 7.2.1. > > regards, tom lane > >
Dmitry Tkach <dmitry@openratings.com> writes: > One more thing to clarify - when you said it was fixed in .3 and .4 did > you mean 7.3 or 7.2.3? I meant I couldn't remember whether it was first fixed in 7.2.3 or 7.2.4. Doesn't matter for your purposes --- as long as you're updating, you should go to 7.2.4. 7.3.* has the fix also of course, but updating to 7.3 is a much bigger task. regards, tom lane