Re: PATCH: backtraces for error messages - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: PATCH: backtraces for error messages
Date
Msg-id 20180626.114122.113588745.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: PATCH: backtraces for error messages  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
Hello. Thaks for discussing.

At Mon, 25 Jun 2018 17:08:41 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in
<CAMsr+YGdJTGygmGhRhfn+8DLi6o+Teq+tcA-Dr3kK+8vYqwzCA@mail.gmail.com>
> On 25 June 2018 at 14:21, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp
> > > I think it's pretty strongly desirable for PANIC.
> >
> > Ah, I forgot about that. I agree to that. The cost to collect the
> > information is not a problem on PANIC. However still I don't
> > think stack trace is valuable on all PANIC messages. I can accept
> > the guc to control it but it is preferable that this works fine
> > without such an extra setup.
> >
> 
> Places such as?

Sorry but I didn't get this.

> > Agreed, it is the reality.  Instaed, can't we make a new error
> > classes PANIC_STACKDUMP and ERROR_STACKDUMP to explicitly
> > restrict stack dump for elog()?  Or elog_stackdump() and elog()
> > is also fine for me. Using it is easier than proper
> > annotating. It would be perfect if we could invent an automated
> > way but I don't think it is realistic.
> >
> 
> That needlessly complicates error severity levels with information not
> really related to the severity. -1 from me.

Ok.

> > Mmm. If I understand you correctly, I mean that perf doesn't dump
> > a backtrace on a probe point but trace points are usable to take
> > a symbolic backtrace. (I'm sorry that I cannot provide an example
> > since stap doesn't work in my box..)
> >
> 
> perf record --call-graph dwarf -e sdt_postgresql:checkpoint__start -u
> postgres
> perf report -g

Good to know that but unfortunately it doesn't work for me. It
(USDT support) is still in the perf Wiki Todo list. Some googling
told me that it is available on Linux 4.4. CentOS 7's kernel is
3.10. It seems a cutting-edge feature. However, I suppose that
what it emits is different from what wanted here.

But regardless of that, I agree that it is a bit hard to use
on-site.. (stap doesn't work for me from uncertain reason..)

> > If your intention is to take back traces without any setting (I
> > think it is preferable), it should be restricted to the required
> > points. It can be achieved by the additional error classes or
> > substitute error output functions.
> >
> 
> Who's classifying all the possible points?

I didn't mean classifying all points. I meant that only the
points where it is needed would be enough for the first step.

> Which PANICs or assertion failures do you want to exempt?

I didn't intended to cover all required casees. Intended that
save just a few or several obviously valuable cases, where we
gave up to add adequate context information for some reasons and
backtrace gives valuable information. Most xlog stuff wouldn't
need it. Lock stuff might need it. Heapam doesn't seem to need
since they seem to caused by calling order. etc... But such
control is out of this patch, I know.

> I definitely do not want to emit stacks for everything, like my patch
> currently does. It's just a proof of concept. Later on I'll want control on
> a fine grained level at runtime of when that happens, but that's out of
> scope for this. For now the goal is emit stacks at times it's obviously
> pretty sensible to have a stack, and do it in a way that doesn't require
> per-error-site maintenance/changes or create backport hassle.

Ok. The PoC is focusing on the means to emit backtraces and
controlling logs are the different issue, as Andres said
upthread.

I think that backtrace() generates somewhat strange-looking
output and we can get cleaner one using unwind. So I vote for
unwind to implement this. The cost is not a matter as far as we
intend to emit this for only Asserts or PANICs. Choosing some
ERRORs by a GUC is also fine for me.

> > As just an idea but can't we use an definition file on that
> > LOCATION of error messages that needs to dump a backtrace are
> > listed? That list is usually empty and should be very short if
> > any. The LOCATION information is easily obtained from a verbose
> > error message itself if once shown but a bit hard to find
> > otherwise..
> >
> 
> That's again veering into selective logging control territory. Rather than
> doing it for stack dump control only, it should be part of a broader
> control over dynamic and scoped verbosity, selective logging, and log
> options, like Pavan raised. I see stacks as just one knob that can be
> turned on/off here.
> 
> > (That reminds me, I need to chat with Devrim about creating a longer lived
> > > debuginfo + old versions rpms repo for Pg its self, if not the accessory
> > > bits and pieces. I'm so constantly frustrated by not being able to get
> > > needed debuginfo packages to investigate some core or running system
> > > problem because they've been purged from the PGDG yum repo as soon as a
> > new
> > > version comes out.)
> >
> > We in our department take care to preserve them for ourselves for
> > the necessity of supporting older systems. I sometimes feel that
> > It is very helpful if they were available on the official
> 
> 
> Maybe if I can get some interest in that, you might be willing to
> contribute your archives as a starter so we have them for back-versions?

Unfortunately I think we cannot do that for some reasons. Apart
from the reasons, I think that the official site removes older
versions by a policy and I'm not sure breaking it in other places
is good deed or not.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: using expression syntax for partition bounds
Next
From: Isaac Morland
Date:
Subject: Re: automatic restore point