Thread: signal weirdness

signal weirdness

From

Peter Galbavy

Date:

29 March 1999, 08:03:55

Hi folks,

I am trying to get a version (any version !) of PostgreSQL running
on OpenBSD 2.4 upwards, but I am getting some weird behaviour with
the SIGQUIT in elog():

from kdump:

21633 postgres RET   write 56/0x38
21633 postgres CALL  sendto(0x4,0x125640,0x3a,0,0,0)
21633 postgres GIO   fd 4 wrote 58 bytes     "EERROR:  destroydb: database 'regression' does not exist      \0"
21633 postgres RET   sendto 58/0x3a
21633 postgres CALL  kill(0x5481,0x3)
21633 postgres RET   kill -1 errno 1 Operation not permitted
21633 postgres CALL  sigprocmask(0x1,0)
21633 postgres RET   sigprocmask 0
21633 postgres CALL  sigsuspend(0)

For those who cannot read hex, FYI 0x5481 == 21633. What this basically
means that the process is getting permission denied sending a signal
to itself. Hmm. This is with a snapshot from a few days agao, but
the results are identical for 6.4.2 as well.

Oh, the result - the regression tests hang waiting for the postgres
process to receive the SIGQUIT to abort transaction that never
comes.

The OpenBSD folks don't seem to be bothered. I have RTFM, APUE and 
kern_sig.c in OpenBSD - no joy. Any ideas anyone ? Seen this before ?

Is it some bizarre interaction of sigprocmask() or whatever ?

Regards,
-- 
Peter Galbavy
Knowledge Matters Ltd
http://www.knowledge.com /http://www.wonderland.org/ http://www.literature.org/

Re: [HACKERS] signal weirdness

From

Peter Galbavy

Date:

29 March 1999, 09:17:14

On Mon, Mar 29, 1999 at 11:03:49AM +0100, Peter Galbavy wrote:
> Oh, the result - the regression tests hang waiting for the postgres
> process to receive the SIGQUIT to abort transaction that never
> comes.

My quick fix has been to replace the kill() with a direct call
to siglongjmp() - not sure if this is safe, but it works. I
am being a pragmatist today.

I will try to get more debugging done on the signal problem at
some stage, but I seem to have a "working" solution.

Is their any reason not to replace the kill() with a longjmp()
in the long run - no pun intended.

Regards,
-- 
Peter Galbavy
Knowledge Matters Ltd
http://www.knowledge.com /http://www.wonderland.org/ http://www.literature.org/

Re: [HACKERS] signal weirdness

From

Tom Lane

Date:

29 March 1999, 13:43:56

Peter Galbavy <Peter.Galbavy@knowledge.com> writes:
> Is their any reason not to replace the kill() with a longjmp()

I always wondered why elog uses such a bizarre approach to transferring
control back to the main loop, myself.kill() self -> SIGQUIT signal catcher -> longjmp -> main loop.
Seems to me two of these steps could be eliminated ;-)

So far there hasn't been a reason to touch the code (if it ain't broke
don't fix it) ... but if it is broken on at least one platform, the
situation is different.

I'd say OpenBSD is definitely broken, however.  A process should be
allowed to signal itself.  File a bug report...
        regards, tom lane