Re: Printing backtrace of postgres processes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Printing backtrace of postgres processes
Date
Msg-id CA+TgmobpOhhm0poFsShjoM=sOxkbzc78f2TdNyZpGz4GSJHh6g@mail.gmail.com
Whole thread Raw
In response to Re: Printing backtrace of postgres processes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Printing backtrace of postgres processes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Printing backtrace of postgres processes  (Craig Ringer <craig.ringer@enterprisedb.com>)
List pgsql-hackers
On Sat, Jan 16, 2021 at 3:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd argue that backtraces for those processes aren't really essential,
> and indeed that trying to make the syslogger report its own backtrace
> is damn dangerous.

I agree. Ideally I'd like to be able to use the same mechanism
everywhere and include those processes too, but surely regular
backends and parallel workers are going to be the things that come up
most often.

> (Personally, I think this whole patch fails the safety-vs-usefulness
> tradeoff, but I expect I'll get shouted down.)

You and I are frequently on opposite sides of these kinds of
questions, but I think this is a closer call than many cases. I'm
convinced that it's useful, but I'm not sure whether it's safe. On the
usefulness side, backtraces are often the only way to troubleshoot
problems that occur on production systems. I wish we had better
logging and tracing tools instead of having to ask for this sort of
thing, but we don't. EDB support today frequently asks customers to
attach gdb and take a backtrace that way, and that has risks which
this implementation does not: for example, suppose you were unlucky
enough to attach during a spinlock protected critical section, and
suppose you didn't continue the stopped process before the 60 second
timeout expired and some other process caused a PANIC. Even if this
implementation were to end up emitting a backtrace with a spinlock
held, it would remove the risk of leaving the process stopped while
holding a critical lock, and would in that sense be safer. However, as
soon as you make something like this accessible via an SQL callable
function, some people are going to start spamming it. And, as soon as
they do that, any risks inherent in the implementation are multiplied.
If it carries an 0.01% chance of crashing the system, we'll have
people taking production systems down with this all the time. At that
point I wouldn't want the feature, even if the gdb approach had the
same risk (which I don't think it does).

What do you see as the main safety risks here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH v2 1/1] Fix detection of pwritev support for OSX.
Next
From: Jeff Davis
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts