Re: Printing backtrace of postgres processes - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: Printing backtrace of postgres processes |
Date | |
Msg-id | CAGRY4nwwEEkNvXM+7ZFseTb5ffYS+DY-d+22QhqRuJVT+BZjHw@mail.gmail.com Whole thread Raw |
In response to | Re: Printing backtrace of postgres processes (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Printing backtrace of postgres processes
Re: Printing backtrace of postgres processes |
List | pgsql-hackers |
> Surely this is *utterly* unsafe. You can't do that sort of stuff in > a signal handler. Not safely, anyway. The signal handler could be called in the middle of a malloc(), a pfree(), or all sorts of other exciting circumstances. It'd have to be extremely careful to use only local resources on the stack and I don't see how that's feasible here. It'll work - most of the time. But it could explode messily and excitingly just when you actually need it to work properly, which is rarely what you want from features clearly intended for production debugging. (It'd be interesting to add some test infrastructure that helps us fire signal handlers at awkward times in a controlled manner, so we could predictably test signal handler re-entrancy and concurrency behaviour.) > It might be all right to set a flag that would cause the next > CHECK_FOR_INTERRUPTS to print a backtrace, but I'm not sure > how useful that really is. I find that when I most often want a backtrace of a running, live backend, it's because the backend is doing something that isn't passing a CHECK_FOR_INTERRUPTS() so it's not responding to signals. So it wouldn't help if a backend is waiting on an LWLock, busy in a blocking call to some loaded library, a blocking syscall, etc. But there are enough other times I want live backtraces, and I'm not the only one whose needs matter. It'd be kinda handy when collecting samples of some backend's activity when it's taking an excessively long time doing something indeterminate. I generally use a gdb script for that because unfortunately the Linux trace tool situation is so hopeless that I can't rely on perf or systemtap being present, working, and understanding the same command line arguments across various distros and versions, so something built-in would be convenient. That convenience would probably be counterbalanced by the hassle of extracting the results from the log files unless it's possible to use a function in one backend to query the stack in another instead of just writing it to the log. So a weak +1 from me, assuming printing stacks from CHECK_FOR_INTERRUPTS() to the log. We already have most of the infrastructure for that so the change is trivial, and we already trust anyone who can read the log, so it seems like a pretty low-cost, low-risk change. Somewhat more interested favour if the results can be obtained from a function or view from another backend, but then the security implications get a bit more exciting if we let non-superusers do it. You may recall that I wanted to do something similar a while ago in order to request MemoryContextStats() without needing to attach gdb and invoke a function manually using ptrace(). Also changes to support reading TCP socket state for a process. So I find this sort of thing useful in general. If we're querying one backend from another we could read its stack with ptrace() and unwind it with libunwind within the requesting backend, which would be a whole lot safer to execute and would work fine even when blocked in syscalls or synchronous library calls. See the eu-stack command from elfutils. If the target backend had shared libraries loaded that the requested backend didn't, libunwind could load the debuginfo for us if available. The downsides would be that many system lockdowns disable ptrace() for non-root even for same-user-owned processes, and that we'd need to depend on libunwind (or other platform equivalents) so it'd probably be a contrib. In which case you have to wonder if it's that much better than running `system("eu-stack $pid")` in plperl or a trivial custom C extension function. > I would like to see some discussion of the security implications > of such a feature, as well. ("There aren't any" is the wrong > answer.) If the stack only goes to the log, I actually don't think there are significant security implications beyond those we already have with our existing backtrace printing features. We already trust anyone who can read the log almost completely, and we can already emit stacks to the log. But I'd still want it to be gated superuser-only, or a role that's GRANTable by superuser only by default, since it exposes arbitrary internals of the server. "same user id" matching is not sufficient. A less-privileged session user might be calling into SECURITY DEFINER code, code from a privileged view, or sensitive C library code. Checking for the effective user is no better because the effective user might be less privileged at the moment the bt is requested, but the state up-stack might reveal sensitive information from a more privileged user.
pgsql-hackers by date: