Thread: unexpected SIGALRM

unexpected SIGALRM

From
"Hiroshi Inoue"
Date:
Hi all,

I've examined a cygwin hangup issue for a pretty long time.
It seems there are plural causes though it's hard for me
to identify them all.

Anyway I found some unexpected SIGALRM cases.
It may be caused by a cygwin's bug but isn't it safer to
return immediately from HandleDeadLock in any platform
unless the backend is waiting for a lock ?

The following is a back trace I saw very luckily.

#0  0x77f827e8 in _libkernel32_a_iname ()
#1  0x77e56a15 in _libkernel32_a_iname ()
#2  0x77e56a3d in _libkernel32_a_iname ()
#3  0x00587535 in semop ()
#4  0x005086b0 in IpcSemaphoreLock (semId=2688, sem=1, interruptOK=0
'\000')
    at ipc.c:422
#5  0x005114f7 in LWLockAcquire (lockid=LockMgrLock, mode=LW_EXCLUSIVE)
    at lwlock.c:272
#6  0x005101ba in HandleDeadLock (postgres_signal_arg=14) at proc.c:862
#7  0x6100fb63 in _libkernel32_a_iname ()
#8  0x0058650d in do_semop ()
#9  0x00587535 in semop ()
#10 0x005086b0 in IpcSemaphoreLock (semId=2688, sem=1, interruptOK=0
'\000')
    at ipc.c:422
#11 0x005114f7 in LWLockAcquire (lockid=LockMgrLock, mode=LW_EXCLUSIVE)
    at lwlock.c:272
#12 0x0050e60b in LockRelease (lockmethod=1, locktag=0x22f338,
xid=17826,
    lockmode=1) at lock.c:1018
#13 0x0050c8f5 in UnlockRelation (relation=0xa06c168, lockmode=1) at
lmgr.c:217
#14 0x0041d29e in index_endscan (scan=0xa08a8f8) at indexam.c:288
#15 0x004ae558 in ExecCloseR (node=0xa089a88) at execAmi.c:232
#16 0x004b8de2 in ExecEndIndexScan (node=0xa089a88) at
nodeIndexscan.c:474
#17 0x004b1e41 in ExecEndNode (node=0xa089a88, parent=0x0)
    at execProcnode.c:495

regards,
Hiroshi Inoue

Re: unexpected SIGALRM

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Anyway I found some unexpected SIGALRM cases.
> It may be caused by a cygwin's bug but isn't it safer to
> return immediately from HandleDeadLock in any platform
> unless the backend is waiting for a lock ?

If we can't rely on the signal handling facilities to interrupt only
when they're supposed to, I think HandleDeadlock is the least of our
worries :-(.  I'm not excited about inserting an ad-hoc test to work
around (only) one manifestation of a system-level bug.

            regards, tom lane

Re: unexpected SIGALRM

From
Hiroshi Inoue
Date:
Tom Lane wrote:
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > Anyway I found some unexpected SIGALRM cases.
> > It may be caused by a cygwin's bug but isn't it safer to
> > return immediately from HandleDeadLock in any platform
> > unless the backend is waiting for a lock ?
>
> If we can't rely on the signal handling facilities to interrupt only
> when they're supposed to, I think HandleDeadlock is the least of our
> worries :-(.

I'm not sure if it's a cygwin issue.
Isn't it preferable for a dbms to be insensitive to
other(e.g OS's) bugs anyway ?
Or how about blocking SIGALRM signals except when
the backend is waiting for a lock ? It seems a better
fix because it would also fix another issue.

>  I'm not excited about inserting an ad-hoc test to work
> around (only) one manifestation of a system-level bug.

OK so cygwin isn't considered as a supported platform ?

retgards,
Hiroshi Inoue

Re: unexpected SIGALRM

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> I'm not excited about inserting an ad-hoc test to work
>> around (only) one manifestation of a system-level bug.

> OK so cygwin isn't considered as a supported platform ?

I don't consider it our responsibility to work around cygwin bugs,
as opposed to reporting said bugs and expecting the cygwin folk to
fix 'em.

If the cost of such a workaround is minimal, then I'd be willing to
consider it; but in this case, you're talking about adding another pair
of kernel calls to every lock blockage.  That seems nontrivial.
But the more important argument is this: if cygwin contains a bug that
allows it to fire interrupts when it should not, how much improvement
do we really get from plugging this one hole?  Surely there are other
places that will have similar problems.  For that matter, how can you
be sure that adding a sigsetmask call will prevent it from firing the
interrupt --- how is that any more secure than setitimer?

I'd say the correct course of action is to report the problem to the
cygwin people first, and ask them whether a user-level workaround is
possible/useful.

            regards, tom lane

Re: unexpected SIGALRM

From
Hiroshi Inoue
Date:
Tom Lane wrote:
>
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > Tom Lane wrote:
> >> I'm not excited about inserting an ad-hoc test to work
> >> around (only) one manifestation of a system-level bug.
>
> > OK so cygwin isn't considered as a supported platform ?
>
> I don't consider it our responsibility to work around cygwin bugs,
> as opposed to reporting said bugs and expecting the cygwin folk to
> fix 'em.

OK I would leave as it is. I've already wasted a lot of time.
It has been 3 months since a pgbench hangup problem was
reported by Yutaka Tanida.

regards,
Hiroshi Inoue