Thread: Is *that* why debugging backend startup is so hard!?

Is *that* why debugging backend startup is so hard!?

From
Tom Lane
Date:
I just spent a rather frustrating hour trying to debug a backend startup
failure --- and getting nowhere because I couldn't catch the failure in
a debugger, or even step to where I thought it might be.  I've seen this
sort of difficulty before, and always had to resort to expedients like
putting in printf's.  But tonight I finally realized what the problem is.

The early stages of startup are run under signal mask BlockSig, which we
initialize to include *EVERY SIGNAL* (except SIGUSR1 for some reason).
In particular SIGTRAP is blocked, which prevents debugger breakpoints
from working.  Even sillier, normally-fatal signals like SIGSEGV are
blocked.  I now know by observation that HPUX, at least, takes this
literally: for example, if you've blocked SEGV you don't hear about bus
errors, you just keep going.  Possibly rather slowly, if every attempted
instruction execution causes the hardware to fault to the kernel, but
by golly the system will keep trying to run your code.

Needless to say I find this braindead in the extreme.  Will anyone
object if I change the signal masks so that we never ever block
SIGABRT, SIGILL, SIGSEGV, SIGBUS, SIGTRAP, SIGCONT, SIGSYS?  Any
other candidates?  Are there any systems that do not define all
of these signal names?

BTW, once I turned this silliness off, I was able to home in on
my bug within minutes...
        regards, tom lane

PS: The postmaster spends most of its time running under BlockSig too.
Good thing we haven't had many postmaster bugs lately.


Re: Is *that* why debugging backend startup is so hard!?

From
Bruce Momjian
Date:
> Needless to say I find this braindead in the extreme.  Will anyone
> object if I change the signal masks so that we never ever block
> SIGABRT, SIGILL, SIGSEGV, SIGBUS, SIGTRAP, SIGCONT, SIGSYS?  Any
> other candidates?  Are there any systems that do not define all
> of these signal names?
> 
> BTW, once I turned this silliness off, I was able to home in on
> my bug within minutes...

Go ahead.  Current setup sound very broken.  Why do they even bother
doing all this.  Seems we should identify the signals we want to block,
and just block those.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Is *that* why debugging backend startup is so hard!?

From
Giles Lean
Date:
> Needless to say I find this braindead in the extreme.

Wow, definitely braindead.  Trapping some of them on systems that can
programmatically generate a stack backtrace might be useful -- it
would help reporting what happened.

Blocking them and continuing seems about the most dangerous thing that
could be done; if we've just got SIGSEGV or similar the code is
confused isn't to be trusted to safely modify data!

> Will anyone object if I change the signal masks so that we never
> ever block SIGABRT, SIGILL, SIGSEGV, SIGBUS, SIGTRAP, SIGCONT,
> SIGSYS?  Any other candidates?  Are there any systems that do not
> define all of these signal names?

I'd expect these everywhere; certainly they're all defined in the
"Single Unix Specification, version 2".  Some of them don't exist in
ANSI C, if that matters.

Usually it's easy enough to wrap code that cares in

#ifdef SIGABRT
...
#endif

so when/if a platform shows up that lacks one or more it's easy to
fix.

Potential additions to your list:

SIGFPE
SIGSTOP (can't be blocked)

Regards,

Giles



Re: Is *that* why debugging backend startup is so hard!?

From
Peter Eisentraut
Date:
Tom Lane writes:

> I just spent a rather frustrating hour trying to debug a backend startup
> failure --- and getting nowhere because I couldn't catch the failure in
> a debugger, or even step to where I thought it might be.  I've seen this
> sort of difficulty before, and always had to resort to expedients like
> putting in printf's.  But tonight I finally realized what the problem is.

Could that be contributing to the Heisenbug I decribed on Sunday in "Pid
file magically disappears"?


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Is *that* why debugging backend startup is so hard!?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> I just spent a rather frustrating hour trying to debug a backend startup
>> failure --- and getting nowhere because I couldn't catch the failure in
>> a debugger, or even step to where I thought it might be.  I've seen this
>> sort of difficulty before, and always had to resort to expedients like
>> putting in printf's.  But tonight I finally realized what the problem is.

> Could that be contributing to the Heisenbug I decribed on Sunday in "Pid
> file magically disappears"?

Hm.  Maybe.  I haven't tried to reproduce the pid-file issue here
(I'm up to my eyebrows in memmgr at the moment).  But the blocking
of SEGV and friends could certainly lead to some odd behavior, due
to code plowing on after getting an error that should have crashed it.

Depending on how robust your local implementation of abort(3) is,
it's even possible that the code would fall through a failed
Assert() test...
        regards, tom lane