Thread: Use a signal to trigger a memory context dump?
Hi, I wonder if it'd make sense to allow a signal to trigger a memory context dump? I and others more than once had the need to examine memory usage on production systems and using gdb isn't always realistic. I wonder if we could install a signal handler for some unused signal (e.g. SIGPWR) to dump memory. I'd also considered adding a SQL function that uses the SIGUSR1 signal multiplexing for the purpose but that's not necessarily nice if you have to investigate while SQL access isn't yet possible. There's also the problem that not all possibly interesting processes use the sigusr1 signal multiplexing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres, * Andres Freund (andres@2ndquadrant.com) wrote: > I wonder if it'd make sense to allow a signal to trigger a memory > context dump? I and others more than once had the need to examine memory > usage on production systems and using gdb isn't always realistic. +100 I keep thinking we have this and then keep being disappointed when I go try to find it. > I wonder if we could install a signal handler for some unused signal > (e.g. SIGPWR) to dump memory. Interesting thought, but.. > I'd also considered adding a SQL function that uses the SIGUSR1 signal > multiplexing for the purpose but that's not necessarily nice if you have > to investigate while SQL access isn't yet possible. There's also the > problem that not all possibly interesting processes use the sigusr1 > signal multiplexing. I'd tend to think this would be sufficient. You're suggesting a case where you need to debug prior to SQL access (not specifically sure what you mean by that) or processes which are hopefully less likely to have memory issues, but you don't have gdb.. Another thought along the lines of getting information about running processes would be to see the call stack or execution plan.. I seem to recall there being a patch for the latter at one point? Thanks, Stephen
On 2014-06-23 08:36:02 -0400, Stephen Frost wrote: > Andres, > > * Andres Freund (andres@2ndquadrant.com) wrote: > > I wonder if it'd make sense to allow a signal to trigger a memory > > context dump? I and others more than once had the need to examine memory > > usage on production systems and using gdb isn't always realistic. > > +100 > > I keep thinking we have this and then keep being disappointed when I go > try to find it. > > > I wonder if we could install a signal handler for some unused signal > > (e.g. SIGPWR) to dump memory. > > Interesting thought, but.. > > > I'd also considered adding a SQL function that uses the SIGUSR1 signal > > multiplexing for the purpose but that's not necessarily nice if you have > > to investigate while SQL access isn't yet possible. There's also the > > problem that not all possibly interesting processes use the sigusr1 > > signal multiplexing. > > I'd tend to think this would be sufficient. You're suggesting a case > where you need to debug prior to SQL access (not specifically sure what > you mean by that) or processes which are hopefully less likely to have > memory issues, but you don't have gdb.. prior to SQL access := Before crash recovery finished/hot standby reached consistency. And I don't agree that memory dumps from non-plain backends are that uninteresting. E.g. background workers and logical decoding walsenders both can be interesting. > Another thought along the lines of getting information about running > processes would be to see the call stack or execution plan.. I seem to > recall there being a patch for the latter at one point? I think these are *much* more complicated. I don't want to tackle them at the same time, otherwise we'll never get anywhere. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres, * Andres Freund (andres@2ndquadrant.com) wrote: > On 2014-06-23 08:36:02 -0400, Stephen Frost wrote: > > I'd tend to think this would be sufficient. You're suggesting a case > > where you need to debug prior to SQL access (not specifically sure what > > you mean by that) or processes which are hopefully less likely to have > > memory issues, but you don't have gdb.. > > prior to SQL access := Before crash recovery finished/hot standby > reached consistency. > > And I don't agree that memory dumps from non-plain backends are that > uninteresting. E.g. background workers and logical decoding walsenders > both can be interesting. I didn't mean they're uninteresting- I meant that if you're dealing with those kinds of issues, having gdb isn't as huge a hurdle.. > > Another thought along the lines of getting information about running > > processes would be to see the call stack or execution plan.. I seem to > > recall there being a patch for the latter at one point? > > I think these are *much* more complicated. I don't want to tackle them > at the same time, otherwise we'll never get anywhere. Sure, just some things to keep in mind as you're thinking about changes in this area. Just to toss another random thought out there, what about an SQL function which does a LISTEN and then sends a signal to another backend which throws a NOTIFY with payload including the requested info? That'd be *very* useful as there are lots of cases where access to the logs isn't trivial (particularly if they've been properly locked down due to the sensetive info they can contain..). Thanks, Stephen
From: "Andres Freund" <andres@2ndquadrant.com> > I wonder if it'd make sense to allow a signal to trigger a memory > context dump? I and others more than once had the need to examine memory > usage on production systems and using gdb isn't always realistic. +1 It would be nice if there's a generic infrastructure on which the DBA can get information of running backends. I wish for a functionality to dump info of all backends with a single operation as well as one backend at a time, because it would be difficult to ask for users to choose a specific backend or operate on all backends, especially on Windows. The candidate info are: * memory context * stack trace: I'd like to implement this. * GUC settings: to know that backends are running with intended settings. * prepared statements (= pg_prepared_statements): to know if applications are taking advantage of prepared statements for performance. Regards MauMau
Andres Freund <andres@2ndquadrant.com> writes: > I wonder if it'd make sense to allow a signal to trigger a memory > context dump? I and others more than once had the need to examine memory > usage on production systems and using gdb isn't always realistic. > I wonder if we could install a signal handler for some unused signal > (e.g. SIGPWR) to dump memory. > I'd also considered adding a SQL function that uses the SIGUSR1 signal > multiplexing for the purpose but that's not necessarily nice if you have > to investigate while SQL access isn't yet possible. There's also the > problem that not all possibly interesting processes use the sigusr1 > signal multiplexing. Well, you can't just have the signal handler call MemoryContextStats directly. (Even if the memory manager's state were 100% interrupt-safe, which it ain't, fprintf itself might not be safe either.) The closest approximation that I think would be reasonable is to set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS macro. So you're already buying into the assumption that the process executes CHECK_FOR_INTERRUPTS fairly often. Which probably means that assuming it's using the standard sigusr1 handler isn't a big extra limitation. regards, tom lane
On 2014-06-23 10:07:36 -0700, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > I wonder if it'd make sense to allow a signal to trigger a memory > > context dump? I and others more than once had the need to examine memory > > usage on production systems and using gdb isn't always realistic. > > I wonder if we could install a signal handler for some unused signal > > (e.g. SIGPWR) to dump memory. > > I'd also considered adding a SQL function that uses the SIGUSR1 signal > > multiplexing for the purpose but that's not necessarily nice if you have > > to investigate while SQL access isn't yet possible. There's also the > > problem that not all possibly interesting processes use the sigusr1 > > signal multiplexing. > > Well, you can't just have the signal handler call MemoryContextStats > directly. (Even if the memory manager's state were 100% interrupt-safe, > which it ain't, fprintf itself might not be safe either.) Yea. And fprintf() definitely isn't. > The closest approximation that I think would be reasonable is to > set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS > macro. So you're already buying into the assumption that the process > executes CHECK_FOR_INTERRUPTS fairly often. Which probably means > that assuming it's using the standard sigusr1 handler isn't a big > extra limitation. There seem to be far more subsystems doing CHECK_FOR_INTERRUPTS than using SIGUSR1 multiplexing. Several processes have their own SIGUSR1 handlers: * bgworkers (Which certainly is a major candidate for this. And: Isn't this a bug? Think recovery conflicts.) * startup process (certainly interesting as well) * checkpointer * walreceiver * walsender * wal writer * bgwriter * archiver * syslogger At least bgworkers, startup process, walsenders are definitely interesting from this POV. It very well might be best to provide a common sigusr1 implementation supporting a subset of multiplexing for some of those since they essentially all do the same... Although that'd require a fair bit of surgery in procsignal.c Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
+1 for having an API better than GDB to make a process emit a memory usage dump. This is my top non-crash cause for use of GDB in production. On Mon, Jun 23, 2014 at 07:21:22PM +0200, Andres Freund wrote: > On 2014-06-23 10:07:36 -0700, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > I wonder if it'd make sense to allow a signal to trigger a memory > > > context dump? I and others more than once had the need to examine memory > > > usage on production systems and using gdb isn't always realistic. > > > I wonder if we could install a signal handler for some unused signal > > > (e.g. SIGPWR) to dump memory. SIGPWR is not widely available. Apart from SIGUSR1 and SIGUSR2, using a portable signal risks colliding with the standard use thereof. > > > I'd also considered adding a SQL function that uses the SIGUSR1 signal > > > multiplexing for the purpose but that's not necessarily nice if you have > > > to investigate while SQL access isn't yet possible. There's also the > > > problem that not all possibly interesting processes use the sigusr1 > > > signal multiplexing. I don't know whether to be interested in cases where SQL access is unavailable. If those cases are important, an idea for achieving it without leaning on unportable or already-used signals is to define SIGUSR2 as a second multiplexer that uses files instead of shared memory. You'd send the signal with something like this: : >$PGDATA/procsig/$targetpid.memdump kill -USR2 $targetpid (This would probably require first converting the existing autovacuum use of SIGUSR2 to the shared memory procsig mechanism.) > > The closest approximation that I think would be reasonable is to > > set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS > > macro. So you're already buying into the assumption that the process > > executes CHECK_FOR_INTERRUPTS fairly often. Which probably means > > that assuming it's using the standard sigusr1 handler isn't a big > > extra limitation. If it's acceptable to require SQL access and exclude would-be target processes that detach from shared memory, I favor an approach using the shared memory SIGUSR1 multiplexer. Bringing all the processes that do use shared memory into agreement about the use of SIGUSR1 feels like a valuable step forward. -- Noah Misch EnterpriseDB http://www.enterprisedb.com