Thread: signal weirdness
Hi folks, I am trying to get a version (any version !) of PostgreSQL running on OpenBSD 2.4 upwards, but I am getting some weird behaviour with the SIGQUIT in elog(): from kdump: 21633 postgres RET write 56/0x38 21633 postgres CALL sendto(0x4,0x125640,0x3a,0,0,0) 21633 postgres GIO fd 4 wrote 58 bytes "EERROR: destroydb: database 'regression' does not exist \0" 21633 postgres RET sendto 58/0x3a 21633 postgres CALL kill(0x5481,0x3) 21633 postgres RET kill -1 errno 1 Operation not permitted 21633 postgres CALL sigprocmask(0x1,0) 21633 postgres RET sigprocmask 0 21633 postgres CALL sigsuspend(0) For those who cannot read hex, FYI 0x5481 == 21633. What this basically means that the process is getting permission denied sending a signal to itself. Hmm. This is with a snapshot from a few days agao, but the results are identical for 6.4.2 as well. Oh, the result - the regression tests hang waiting for the postgres process to receive the SIGQUIT to abort transaction that never comes. The OpenBSD folks don't seem to be bothered. I have RTFM, APUE and kern_sig.c in OpenBSD - no joy. Any ideas anyone ? Seen this before ? Is it some bizarre interaction of sigprocmask() or whatever ? Regards, -- Peter Galbavy Knowledge Matters Ltd http://www.knowledge.com /http://www.wonderland.org/ http://www.literature.org/
On Mon, Mar 29, 1999 at 11:03:49AM +0100, Peter Galbavy wrote: > Oh, the result - the regression tests hang waiting for the postgres > process to receive the SIGQUIT to abort transaction that never > comes. My quick fix has been to replace the kill() with a direct call to siglongjmp() - not sure if this is safe, but it works. I am being a pragmatist today. I will try to get more debugging done on the signal problem at some stage, but I seem to have a "working" solution. Is their any reason not to replace the kill() with a longjmp() in the long run - no pun intended. Regards, -- Peter Galbavy Knowledge Matters Ltd http://www.knowledge.com /http://www.wonderland.org/ http://www.literature.org/
Peter Galbavy <Peter.Galbavy@knowledge.com> writes: > Is their any reason not to replace the kill() with a longjmp() I always wondered why elog uses such a bizarre approach to transferring control back to the main loop, myself.kill() self -> SIGQUIT signal catcher -> longjmp -> main loop. Seems to me two of these steps could be eliminated ;-) So far there hasn't been a reason to touch the code (if it ain't broke don't fix it) ... but if it is broken on at least one platform, the situation is different. I'd say OpenBSD is definitely broken, however. A process should be allowed to signal itself. File a bug report... regards, tom lane