Re: PATCH: backtraces for error messages - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: PATCH: backtraces for error messages
Date
Msg-id CAMsr+YGdJTGygmGhRhfn+8DLi6o+Teq+tcA-Dr3kK+8vYqwzCA@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: backtraces for error messages  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: PATCH: backtraces for error messages
List pgsql-hackers
On 25 June 2018 at 14:21, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hi.

At Mon, 25 Jun 2018 09:32:36 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in <CAMsr+YGBw9tgKRGxyihVeMzmjQx_2t8D17tE7t5-0gMdW7S6UA@mail.gmail.com>
> On 21 June 2018 at 19:09, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp
> > wrote:
>
> I think this for assertion failure is no problem but I'm not sure
> > for other cases.
>
>
> I think it's pretty strongly desirable for PANIC.

Ah, I forgot about that. I agree to that. The cost to collect the
information is not a problem on PANIC. However still I don't
think stack trace is valuable on all PANIC messages. I can accept
the guc to control it but it is preferable that this works fine
without such an extra setup.

Places such as?
 

> > We could set proper context description or other
> > additional information in error messages before just dumping a
> > trace for known cases.
> >
>
> Yeah. The trouble there is that there are a _lot_ of places to touch for
> such things, and inevitably the one you really want to see will be
> something that didn't get suitably annotated.

Agreed, it is the reality.  Instaed, can't we make a new error
classes PANIC_STACKDUMP and ERROR_STACKDUMP to explicitly
restrict stack dump for elog()?  Or elog_stackdump() and elog()
is also fine for me. Using it is easier than proper
annotating. It would be perfect if we could invent an automated
way but I don't think it is realistic.

That needlessly complicates error severity levels with information not really related to the severity. -1 from me.
 
Mmm. If I understand you correctly, I mean that perf doesn't dump
a backtrace on a probe point but trace points are usable to take
a symbolic backtrace. (I'm sorry that I cannot provide an example
since stap doesn't work in my box..)

perf record --call-graph dwarf -e sdt_postgresql:checkpoint__start -u postgres 
perf report -g
 
If your intention is to take back traces without any setting (I
think it is preferable), it should be restricted to the required
points. It can be achieved by the additional error classes or
substitute error output functions.

Who's classifying all the possible points?

Which PANICs or assertion failures do you want to exempt?

I definitely do not want to emit stacks for everything, like my patch currently does. It's just a proof of concept. Later on I'll want control on a fine grained level at runtime of when that happens, but that's out of scope for this. For now the goal is emit stacks at times it's obviously pretty sensible to have a stack, and do it in a way that doesn't require per-error-site maintenance/changes or create backport hassle.
 
As just an idea but can't we use an definition file on that
LOCATION of error messages that needs to dump a backtrace are
listed? That list is usually empty and should be very short if
any. The LOCATION information is easily obtained from a verbose
error message itself if once shown but a bit hard to find
otherwise..

That's again veering into selective logging control territory. Rather than doing it for stack dump control only, it should be part of a broader control over dynamic and scoped verbosity, selective logging, and log options, like Pavan raised. I see stacks as just one knob that can be turned on/off here.

> (That reminds me, I need to chat with Devrim about creating a longer lived
> debuginfo + old versions rpms repo for Pg its self, if not the accessory
> bits and pieces. I'm so constantly frustrated by not being able to get
> needed debuginfo packages to investigate some core or running system
> problem because they've been purged from the PGDG yum repo as soon as a new
> version comes out.)

We in our department take care to preserve them for ourselves for
the necessity of supporting older systems. I sometimes feel that
It is very helpful if they were available on the official 

Maybe if I can get some interest in that, you might be willing to contribute your archives as a starter so we have them for back-versions?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Incorrect fsync handling in pg_basebackup's tar_finish
Next
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression