Thread: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
I encountered an error when I fast shutdown 8.1.1 on Win2k: FATAL: semctl(1672698088, 12, SETVAL, 0) failed: A blocking operation was interrupted by a call to WSACancelBlockingCall. A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't dig out the original post from our web archives): From: Niederland Date: Tues, Dec 13 2005 9:49 am 2005-12-12 20:30:00 FATAL: semctl(50884184, 15, SETVAL, 0) failed: A non-blocking socket operation could not be completed immediately. --- There are two problems here: (1) Why a socket error? In port/win32.h, we have #undef EAGAIN #undef EINTR #define EINTR WSAEINTR #define EAGAIN WSAEWOULDBLOCK What's the rationale of doing so? (2) What's happened here? It may come from PGSemaphoreReset(), and win32 semop() looks like this: ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg & IPC_NOWAIT) ? 0 : INFINITE, TRUE); ... else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION) { pgwin32_dispatch_queued_signals(); errno = EINTR; } else if (ret == WAIT_TIMEOUT) errno = EAGAIN; So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused by a TIMEOUT ... any ideas? Regards, Qingqing
Qingqing Zhou wrote: > I encountered an error when I fast shutdown 8.1.1 on Win2k: > > FATAL: semctl(1672698088, 12, SETVAL, 0) failed: A blocking operation > was interrupted by a call to WSACancelBlockingCall. > > A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't > dig out the > original post from our web archives): > > From: Niederland > Date: Tues, Dec 13 2005 9:49 am > > 2005-12-12 20:30:00 FATAL: semctl(50884184, 15, SETVAL, 0) failed: A > non-blocking socket operation could not be completed immediately. > > --- > > There are two problems here: > > (1) Why a socket error? > In port/win32.h, we have > > #undef EAGAIN > #undef EINTR > #define EINTR WSAEINTR > #define EAGAIN WSAEWOULDBLOCK > > What's the rationale of doing so? We did this so that our code could refer to EINTR/EAGAIN without port-specific tests. > (2) What's happened here? > It may come from PGSemaphoreReset(), and win32 semop() looks like this: > > ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg & > IPC_NOWAIT) ? 0 : INFINITE, TRUE); > ... > else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION) > { > pgwin32_dispatch_queued_signals(); > errno = EINTR; > } > else if (ret == WAIT_TIMEOUT) > errno = EAGAIN; > > So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused > by a TIMEOUT ... any ideas? I looked at the documentation for the function: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp and it isn't clear what return failure values it has. We certainly could loop on WSAEINTR. Can you test it? -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <pgman@candle.pha.pa.us> wrote > > In port/win32.h, we have > > > > #undef EAGAIN > > #undef EINTR > > #define EINTR WSAEINTR > > #define EAGAIN WSAEWOULDBLOCK > > > > What's the rationale of doing so? > > We did this so that our code could refer to EINTR/EAGAIN without > port-specific tests. > AFAICS, by doing so, the EINTR/EAGAIN will be translated into WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not appropriate for the code not involving any socket stuff ... I think we need a fix here. > > (2) What's happened here? > > It may come from PGSemaphoreReset(), and win32 semop() looks like this: > > > > ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg & > > IPC_NOWAIT) ? 0 : INFINITE, TRUE); > > ... > > else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION) > > { > > pgwin32_dispatch_queued_signals(); > > errno = EINTR; > > } > > else if (ret == WAIT_TIMEOUT) > > errno = EAGAIN; > > > > So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused > > by a TIMEOUT ... any ideas? > > I looked at the documentation for the function: > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp > > and it isn't clear what return failure values it has. We certainly > could loop on WSAEINTR. Can you test it? > Yeah, looking at other code of using semop(), we could plug in a loop in the win32 semctl(): /* Quickly lock/unlock the semaphore (if we can) */ + do + { + errStatus = semop(semId, &sops, 1); + } while (errStatus < 0 && errno == EINTR); if (semop(semId, &sops, 1) < 0) return -1; But: (1) The EINTR problem happens rather rare, so testing it is difficult; (2) I would rather not doing the above changes before we understand what's happened here, especially when we have seen a EAGAIN reported here. Regards, Qingqing
Qingqing Zhou wrote: > > "Bruce Momjian" <pgman@candle.pha.pa.us> wrote > > > In port/win32.h, we have > > > > > > #undef EAGAIN > > > #undef EINTR > > > #define EINTR WSAEINTR > > > #define EAGAIN WSAEWOULDBLOCK > > > > > > What's the rationale of doing so? > > > > We did this so that our code could refer to EINTR/EAGAIN without > > port-specific tests. > > > > AFAICS, by doing so, the EINTR/EAGAIN will be translated into > WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not > appropriate for the code not involving any socket stuff ... I think we need > a fix here. Uh, how do we handle it now? I thought we did just that. > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp > > > > and it isn't clear what return failure values it has. We certainly > > could loop on WSAEINTR. Can you test it? > > > > Yeah, looking at other code of using semop(), we could plug in a loop in the > win32 semctl(): > > /* Quickly lock/unlock the semaphore (if we can) */ > + do > + { > + errStatus = semop(semId, &sops, 1); > + } while (errStatus < 0 && errno == EINTR); > > if (semop(semId, &sops, 1) < 0) > return -1; > > But: > (1) The EINTR problem happens rather rare, so testing it is difficult; > (2) I would rather not doing the above changes before we understand what's > happened here, especially when we have seen a EAGAIN reported here. OK, so how do we find the answer? -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +
On Tue, 28 Feb 2006, Bruce Momjian wrote: > > Uh, how do we handle it now? I thought we did just that. > > OK, so how do we find the answer? > For both problems, I am uncertain (or I've sent a patch already :-(). Call more artillery support here ... Regards, Qingqing
Thread added to TODO.detail for Win32: o Check WSACancelBlockingCall() for interrupts (win32intr) --------------------------------------------------------------------------- Qingqing Zhou wrote: > > > On Tue, 28 Feb 2006, Bruce Momjian wrote: > > > > > Uh, how do we handle it now? I thought we did just that. > > > > OK, so how do we find the answer? > > > > For both problems, I am uncertain (or I've sent a patch already :-(). Call > more artillery support here ... > > Regards, > Qingqing > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +