On Tue, 1 Dec 2020 at 11:31, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-11-30 13:35:46 +0800, Craig Ringer wrote:
> > I find that when I most often want a backtrace of a running, live
> > backend, it's because the backend is doing something that isn't
> > passing a CHECK_FOR_INTERRUPTS() so it's not responding to signals. So
> > it wouldn't help if a backend is waiting on an LWLock, busy in a
> > blocking call to some loaded library, a blocking syscall, etc. But
> > there are enough other times I want live backtraces, and I'm not the
> > only one whose needs matter.
>
> Random thought: Wonder if it could be worth adding a conditionally
> compiled mode where we track what the longest time between two
> CHECK_FOR_INTERRUPTS() calls is (with some extra logic for client
> IO).
>
> Obviously the regression tests don't tend to hit the worst cases of
> CFR() less code, but even if they did, we currently wouldn't know from
> running the regression tests.
We can probably determine that just as well with a perf or systemtap
run on an --enable-dtrace build. Just tag CHECK_FOR_INTERRUPTS() with
a SDT marker then record the timings.
It might be convenient to have it built-in I guess, but if we tag the
site and do the timing/tracing externally we don't have to bother
about conditional compilation and special builds.