On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.
I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.
> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)
You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't.
Agreed.
In theory we should be able to do this sort of thing using external trace and diagnostic tools like perf, systemtap, etc. In practice, these tools tend to be quite version-sensitive and hard to get right without multiple rounds of back-and-forth to deal with specifics of the site's setup, installed debuginfo or lack thereof, specific tool versions, etc.
It's quite common to have to fall back on attaching gdb with a breakpoint on a function in the export symbol table (so it works w/o debuginfo), saving a core, and then analysing the core on a separate system on which debuginfo is available for all the loaded modules. It's a major pain.
The ability to get a basic bt from within Pg is strongly desirable. IIRC gdb's basic unwinder works without external debuginfo, if not especially well. libunwind produces much better results, but that didn't pass the extra-dependency bar when backtracing support was introduced to core postgres.
On a side note, to help get better diagnostics I've also been meaning to look into building --enable-debug with -ggdb3 so we can embed macro info, and using dwz to deduplicate+compress the debuginfo so we can encourage people to install it by default on production. I also want to start exporting pointers to all the important data symbols for diagnostic use, even if we do so in a separate ELF section just for debug use.