Re: Printing backtrace of postgres processes - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Printing backtrace of postgres processes
Date
Msg-id CALj2ACVzvdjbXsUw+_pjSAoNkUWWhnmt2kCJCYDRfpOV-P0Ccw@mail.gmail.com
Whole thread Raw
In response to Re: Printing backtrace of postgres processes  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Printing backtrace of postgres processes
List pgsql-hackers
On Fri, Nov 11, 2022 at 6:04 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Nov 11, 2022 at 7:59 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Thu, 10 Nov 2022 15:56:35 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
> > > On Mon, Apr 18, 2022 at 9:10 AM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > The attached v21 patch has the changes for the same.
> > >
> > > Thanks for the patch. Here are some comments:
> > >
> > > 1. I think there's a fundamental problem with this patch, that is, it
> > > prints the backtrace when the interrupt is processed but not when
> > > interrupt is received. This way, the backends/aux processes will
> >
> > Yeah, but the obstacle was backtrace(3) itself. Andres pointed [1]
> > that that may be doable with some care (and I agree to the opinion).
> > AFAIS no discussions followed and things have been to the current
> > shape since then.
> >
> >
> > [1] https://www.postgresql.org/message-id/20201201032649.aekv5b5dicvmovf4%40alap3.anarazel.de
> > | > Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> > | > a signal handler.
> > |
> > | That's of course true for the current implementation - but I don't think
> > | it's a fundamental constraint. With a bit of care backtrace() and
> > | backtrace_symbols() itself can be signal safe:
> >
> > man 3 backtrace
> > >  *  backtrace()  and  backtrace_symbols_fd() don't call malloc() explic‐
> > >     itly, but they are part of libgcc,  which  gets  loaded  dynamically
> > >     when  first  used.   Dynamic loading usually triggers a call to mal‐
> > >     loc(3).  If you need certain calls to these  two  functions  to  not
> > >     allocate  memory (in signal handlers, for example), you need to make
> > >     sure libgcc is loaded beforehand.
>
> I missed that part. Thanks for pointing it out. The
> backtrace_symbols() seems to be returning a malloc'ed array [1],
> meaning it can't be used in signal handlers (if used, it can cause
> deadlocks as per [2]) and existing set_backtrace() is using it.
> Therefore, we need to either change set_backtrace() to use
> backtrace_symbols_fd() instead of backtrace_symobls() or introduce
> another function for the purpose of this feature. If done that, then
> we can think of preloading of libgcc which makes backtrace(),
> backtrace_symobols_fd() safe to use in signal handlers.
>
> Looks like we're not loading libgcc explicitly now into any of
> postgres processes, please correct me if I'm wrong here. If we're not
> loading it right now, is it acceptable to load libgcc into every
> postgres process for the sake of this feature?
>
> [1] https://linux.die.net/man/3/backtrace_symbols
> [2] https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock

I took a stab at it after discussing with Vignesh off-list. Here's
what I've done:

1. Make backtrace functions signal safe i.e. ensure libgcc is loaded,
just in case if it's not, by calling backtrace() function during the
early life of a process after SIGUSR1 signal handler and other signal
handlers are established.
2. Emit backtrace to stderr within the signal handler itself to keep
things simple so that we don't need to allocate dynamic memory inside
the signal handler.
3. Split the patch into 3 for ease of review, 0001 does preparatory
stuff, 0002 adds the function, 0003 adds the docs and tests.

I'm attaching the v23 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Strange failure on mamba
Next
From: Tom Lane
Date:
Subject: Re: Strange failure on mamba