Thread: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

FATAL: semctl(1672698088, 12, SETVAL, 0) failed

From
"Qingqing Zhou"
Date:
I encountered an error when I fast shutdown 8.1.1 on Win2k:

    FATAL:  semctl(1672698088, 12, SETVAL, 0) failed:  A blocking operation
    was interrupted by a call to WSACancelBlockingCall.

A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't
dig out the
original post from our web archives):

    From:  Niederland
    Date:  Tues, Dec 13 2005 9:49 am

    2005-12-12 20:30:00 FATAL:  semctl(50884184, 15, SETVAL, 0) failed: A
    non-blocking socket operation could not be completed immediately.

---

There are two problems here:

(1) Why a socket error?
In port/win32.h, we have

#undef EAGAIN
#undef EINTR
#define EINTR WSAEINTR
#define EAGAIN WSAEWOULDBLOCK

What's the rationale of doing so?

(2) What's happened here?
It may come from PGSemaphoreReset(), and win32 semop() looks like this:

  ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
  ...
  else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
  {
   pgwin32_dispatch_queued_signals();
   errno = EINTR;
  }
  else if (ret == WAIT_TIMEOUT)
   errno = EAGAIN;

So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused
by a TIMEOUT ... any ideas?

Regards,
Qingqing

Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

From
Bruce Momjian
Date:
Qingqing Zhou wrote:
> I encountered an error when I fast shutdown 8.1.1 on Win2k:
>
>     FATAL:  semctl(1672698088, 12, SETVAL, 0) failed:  A blocking operation
>     was interrupted by a call to WSACancelBlockingCall.
>
> A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't
> dig out the
> original post from our web archives):
>
>     From:  Niederland
>     Date:  Tues, Dec 13 2005 9:49 am
>
>     2005-12-12 20:30:00 FATAL:  semctl(50884184, 15, SETVAL, 0) failed: A
>     non-blocking socket operation could not be completed immediately.
>
> ---
>
> There are two problems here:
>
> (1) Why a socket error?
> In port/win32.h, we have
>
> #undef EAGAIN
> #undef EINTR
> #define EINTR WSAEINTR
> #define EAGAIN WSAEWOULDBLOCK
>
> What's the rationale of doing so?

We did this so that our code could refer to EINTR/EAGAIN without
port-specific tests.

> (2) What's happened here?
> It may come from PGSemaphoreReset(), and win32 semop() looks like this:
>
>   ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
> IPC_NOWAIT) ? 0 : INFINITE, TRUE);
>   ...
>   else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
>   {
>    pgwin32_dispatch_queued_signals();
>    errno = EINTR;
>   }
>   else if (ret == WAIT_TIMEOUT)
>    errno = EAGAIN;
>
> So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused
> by a TIMEOUT ... any ideas?

I looked at the documentation for the function:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp

and it isn't clear what return failure values it has.  We certainly
could loop on WSAEINTR.  Can you test it?

--
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

From
"Qingqing Zhou"
Date:
"Bruce Momjian" <pgman@candle.pha.pa.us> wrote
> > In port/win32.h, we have
> >
> > #undef EAGAIN
> > #undef EINTR
> > #define EINTR WSAEINTR
> > #define EAGAIN WSAEWOULDBLOCK
> >
> > What's the rationale of doing so?
>
> We did this so that our code could refer to EINTR/EAGAIN without
> port-specific tests.
>

AFAICS, by doing so, the EINTR/EAGAIN will be translated into
WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not
appropriate for the code not involving any socket stuff ... I think we need
a fix here.

> > (2) What's happened here?
> > It may come from PGSemaphoreReset(), and win32 semop() looks like this:
> >
> >   ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
> > IPC_NOWAIT) ? 0 : INFINITE, TRUE);
> >   ...
> >   else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
> >   {
> >    pgwin32_dispatch_queued_signals();
> >    errno = EINTR;
> >   }
> >   else if (ret == WAIT_TIMEOUT)
> >    errno = EAGAIN;
> >
> > So it seems the EINTR is caused by an incoming signal, the EAGAIN is
caused
> > by a TIMEOUT ... any ideas?
>
> I looked at the documentation for the function:
>
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp
>
> and it isn't clear what return failure values it has.  We certainly
> could loop on WSAEINTR.  Can you test it?
>

Yeah, looking at other code of using semop(), we could plug in a loop in the
win32 semctl():

   /* Quickly lock/unlock the semaphore (if we can) */
+ do
+ {
+    errStatus = semop(semId, &sops, 1);
+ } while (errStatus < 0 && errno == EINTR);

   if (semop(semId, &sops, 1) < 0)
    return -1;

But:
(1) The EINTR problem happens rather rare, so testing it is difficult;
(2)  I would rather not doing the above changes before we understand what's
happened here, especially when we have seen a EAGAIN reported here.

Regards,
Qingqing

Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

From
Bruce Momjian
Date:
Qingqing Zhou wrote:
>
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote
> > > In port/win32.h, we have
> > >
> > > #undef EAGAIN
> > > #undef EINTR
> > > #define EINTR WSAEINTR
> > > #define EAGAIN WSAEWOULDBLOCK
> > >
> > > What's the rationale of doing so?
> >
> > We did this so that our code could refer to EINTR/EAGAIN without
> > port-specific tests.
> >
>
> AFAICS, by doing so, the EINTR/EAGAIN will be translated into
> WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not
> appropriate for the code not involving any socket stuff ... I think we need
> a fix here.

Uh, how do we handle it now?  I thought we did just that.

> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp
> >
> > and it isn't clear what return failure values it has.  We certainly
> > could loop on WSAEINTR.  Can you test it?
> >
>
> Yeah, looking at other code of using semop(), we could plug in a loop in the
> win32 semctl():
>
>    /* Quickly lock/unlock the semaphore (if we can) */
> + do
> + {
> +    errStatus = semop(semId, &sops, 1);
> + } while (errStatus < 0 && errno == EINTR);
>
>    if (semop(semId, &sops, 1) < 0)
>     return -1;
>
> But:
> (1) The EINTR problem happens rather rare, so testing it is difficult;
> (2)  I would rather not doing the above changes before we understand what's
> happened here, especially when we have seen a EAGAIN reported here.

OK, so how do we find the answer?

--
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

From
Qingqing Zhou
Date:
On Tue, 28 Feb 2006, Bruce Momjian wrote:

>
> Uh, how do we handle it now?  I thought we did just that.
>
> OK, so how do we find the answer?
>

For both problems, I am uncertain (or I've sent a patch already :-(). Call
more artillery support here ...

Regards,
Qingqing

Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed

From
Bruce Momjian
Date:
Thread added to TODO.detail for Win32:

        o Check WSACancelBlockingCall() for interrupts (win32intr)

---------------------------------------------------------------------------

Qingqing Zhou wrote:
>
>
> On Tue, 28 Feb 2006, Bruce Momjian wrote:
>
> >
> > Uh, how do we handle it now?  I thought we did just that.
> >
> > OK, so how do we find the answer?
> >
>
> For both problems, I am uncertain (or I've sent a patch already :-(). Call
> more artillery support here ...
>
> Regards,
> Qingqing
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +