Thread: Use a signal to trigger a memory context dump?

Use a signal to trigger a memory context dump?

From
Andres Freund
Date:
Hi,

I wonder if it'd make sense to allow a signal to trigger a memory
context dump? I and others more than once had the need to examine memory
usage on production systems and using gdb isn't always realistic.
I wonder if we could install a signal handler for some unused signal
(e.g. SIGPWR) to dump memory.
I'd also considered adding a SQL function that uses the SIGUSR1 signal
multiplexing for the purpose but that's not necessarily nice if you have
to investigate while SQL access isn't yet possible. There's also the
problem that not all possibly interesting processes use the sigusr1
signal multiplexing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Use a signal to trigger a memory context dump?

From
Stephen Frost
Date:
Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:
> I wonder if it'd make sense to allow a signal to trigger a memory
> context dump? I and others more than once had the need to examine memory
> usage on production systems and using gdb isn't always realistic.

+100

I keep thinking we have this and then keep being disappointed when I go
try to find it.

> I wonder if we could install a signal handler for some unused signal
> (e.g. SIGPWR) to dump memory.

Interesting thought, but..

> I'd also considered adding a SQL function that uses the SIGUSR1 signal
> multiplexing for the purpose but that's not necessarily nice if you have
> to investigate while SQL access isn't yet possible. There's also the
> problem that not all possibly interesting processes use the sigusr1
> signal multiplexing.

I'd tend to think this would be sufficient.  You're suggesting a case
where you need to debug prior to SQL access (not specifically sure what
you mean by that) or processes which are hopefully less likely to have
memory issues, but you don't have gdb..

Another thought along the lines of getting information about running
processes would be to see the call stack or execution plan..  I seem to
recall there being a patch for the latter at one point?
Thanks,
    Stephen

Re: Use a signal to trigger a memory context dump?

From
Andres Freund
Date:
On 2014-06-23 08:36:02 -0400, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > I wonder if it'd make sense to allow a signal to trigger a memory
> > context dump? I and others more than once had the need to examine memory
> > usage on production systems and using gdb isn't always realistic.
> 
> +100
> 
> I keep thinking we have this and then keep being disappointed when I go
> try to find it.
> 
> > I wonder if we could install a signal handler for some unused signal
> > (e.g. SIGPWR) to dump memory.
> 
> Interesting thought, but..
> 
> > I'd also considered adding a SQL function that uses the SIGUSR1 signal
> > multiplexing for the purpose but that's not necessarily nice if you have
> > to investigate while SQL access isn't yet possible. There's also the
> > problem that not all possibly interesting processes use the sigusr1
> > signal multiplexing.
> 
> I'd tend to think this would be sufficient.  You're suggesting a case
> where you need to debug prior to SQL access (not specifically sure what
> you mean by that) or processes which are hopefully less likely to have
> memory issues, but you don't have gdb..

prior to SQL access := Before crash recovery finished/hot standby
reached consistency.

And I don't agree that memory dumps from non-plain backends are that
uninteresting. E.g. background workers and logical decoding walsenders
both can be interesting.

> Another thought along the lines of getting information about running
> processes would be to see the call stack or execution plan..  I seem to
> recall there being a patch for the latter at one point?

I think these are *much* more complicated. I don't want to tackle them
at the same time, otherwise we'll never get anywhere.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Use a signal to trigger a memory context dump?

From
Stephen Frost
Date:
Andres,

* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2014-06-23 08:36:02 -0400, Stephen Frost wrote:
> > I'd tend to think this would be sufficient.  You're suggesting a case
> > where you need to debug prior to SQL access (not specifically sure what
> > you mean by that) or processes which are hopefully less likely to have
> > memory issues, but you don't have gdb..
>
> prior to SQL access := Before crash recovery finished/hot standby
> reached consistency.
>
> And I don't agree that memory dumps from non-plain backends are that
> uninteresting. E.g. background workers and logical decoding walsenders
> both can be interesting.

I didn't mean they're uninteresting- I meant that if you're dealing with
those kinds of issues, having gdb isn't as huge a hurdle..

> > Another thought along the lines of getting information about running
> > processes would be to see the call stack or execution plan..  I seem to
> > recall there being a patch for the latter at one point?
>
> I think these are *much* more complicated. I don't want to tackle them
> at the same time, otherwise we'll never get anywhere.

Sure, just some things to keep in mind as you're thinking about changes
in this area.  Just to toss another random thought out there, what about
an SQL function which does a LISTEN and then sends a signal to another
backend which throws a NOTIFY with payload including the requested info?
That'd be *very* useful as there are lots of cases where access to the
logs isn't trivial (particularly if they've been properly locked down
due to the sensetive info they can contain..).
Thanks,
    Stephen

Re: Use a signal to trigger a memory context dump?

From
"MauMau"
Date:
From: "Andres Freund" <andres@2ndquadrant.com>
> I wonder if it'd make sense to allow a signal to trigger a memory
> context dump? I and others more than once had the need to examine memory
> usage on production systems and using gdb isn't always realistic.

+1

It would be nice if there's a generic infrastructure on which the DBA can 
get information of running backends.    I wish for a functionality to dump 
info of all backends with a single operation as well as one backend at a 
time, because it would be difficult to ask for users to choose a specific 
backend or operate on all backends, especially on Windows.  The candidate 
info are:

* memory context

* stack trace: I'd like to implement this.

* GUC settings: to know that backends are running with intended settings.

* prepared statements (= pg_prepared_statements): to know if applications 
are taking advantage of prepared statements for performance.

Regards
MauMau




Re: Use a signal to trigger a memory context dump?

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> I wonder if it'd make sense to allow a signal to trigger a memory
> context dump? I and others more than once had the need to examine memory
> usage on production systems and using gdb isn't always realistic.
> I wonder if we could install a signal handler for some unused signal
> (e.g. SIGPWR) to dump memory.
> I'd also considered adding a SQL function that uses the SIGUSR1 signal
> multiplexing for the purpose but that's not necessarily nice if you have
> to investigate while SQL access isn't yet possible. There's also the
> problem that not all possibly interesting processes use the sigusr1
> signal multiplexing.

Well, you can't just have the signal handler call MemoryContextStats
directly.  (Even if the memory manager's state were 100% interrupt-safe,
which it ain't, fprintf itself might not be safe either.)

The closest approximation that I think would be reasonable is to
set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
macro.  So you're already buying into the assumption that the process
executes CHECK_FOR_INTERRUPTS fairly often.  Which probably means
that assuming it's using the standard sigusr1 handler isn't a big
extra limitation.
        regards, tom lane



Re: Use a signal to trigger a memory context dump?

From
Andres Freund
Date:
On 2014-06-23 10:07:36 -0700, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > I wonder if it'd make sense to allow a signal to trigger a memory
> > context dump? I and others more than once had the need to examine memory
> > usage on production systems and using gdb isn't always realistic.
> > I wonder if we could install a signal handler for some unused signal
> > (e.g. SIGPWR) to dump memory.
> > I'd also considered adding a SQL function that uses the SIGUSR1 signal
> > multiplexing for the purpose but that's not necessarily nice if you have
> > to investigate while SQL access isn't yet possible. There's also the
> > problem that not all possibly interesting processes use the sigusr1
> > signal multiplexing.
> 
> Well, you can't just have the signal handler call MemoryContextStats
> directly.  (Even if the memory manager's state were 100% interrupt-safe,
> which it ain't, fprintf itself might not be safe either.)

Yea. And fprintf() definitely isn't.

> The closest approximation that I think would be reasonable is to
> set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
> macro.  So you're already buying into the assumption that the process
> executes CHECK_FOR_INTERRUPTS fairly often.  Which probably means
> that assuming it's using the standard sigusr1 handler isn't a big
> extra limitation.

There seem to be far more subsystems doing CHECK_FOR_INTERRUPTS than
using SIGUSR1 multiplexing. Several processes have their own SIGUSR1
handlers:
* bgworkers (Which certainly is a major candidate for this. And: Isn't this a bug? Think recovery conflicts.)
* startup process (certainly interesting as well)
* checkpointer
* walreceiver
* walsender
* wal writer
* bgwriter
* archiver
* syslogger

At least bgworkers, startup process, walsenders are definitely
interesting from this POV.

It very well might be best to provide a common sigusr1 implementation
supporting a subset of multiplexing for some of those since they
essentially all do the same... Although that'd require a fair bit of
surgery in procsignal.c

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Use a signal to trigger a memory context dump?

From
Noah Misch
Date:
+1 for having an API better than GDB to make a process emit a memory usage
dump.  This is my top non-crash cause for use of GDB in production.

On Mon, Jun 23, 2014 at 07:21:22PM +0200, Andres Freund wrote:
> On 2014-06-23 10:07:36 -0700, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > I wonder if it'd make sense to allow a signal to trigger a memory
> > > context dump? I and others more than once had the need to examine memory
> > > usage on production systems and using gdb isn't always realistic.
> > > I wonder if we could install a signal handler for some unused signal
> > > (e.g. SIGPWR) to dump memory.

SIGPWR is not widely available.  Apart from SIGUSR1 and SIGUSR2, using a
portable signal risks colliding with the standard use thereof.

> > > I'd also considered adding a SQL function that uses the SIGUSR1 signal
> > > multiplexing for the purpose but that's not necessarily nice if you have
> > > to investigate while SQL access isn't yet possible. There's also the
> > > problem that not all possibly interesting processes use the sigusr1
> > > signal multiplexing.

I don't know whether to be interested in cases where SQL access is
unavailable.  If those cases are important, an idea for achieving it without
leaning on unportable or already-used signals is to define SIGUSR2 as a second
multiplexer that uses files instead of shared memory.  You'd send the signal
with something like this:
 : >$PGDATA/procsig/$targetpid.memdump kill -USR2 $targetpid

(This would probably require first converting the existing autovacuum use of
SIGUSR2 to the shared memory procsig mechanism.)

> > The closest approximation that I think would be reasonable is to
> > set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
> > macro.  So you're already buying into the assumption that the process
> > executes CHECK_FOR_INTERRUPTS fairly often.  Which probably means
> > that assuming it's using the standard sigusr1 handler isn't a big
> > extra limitation.

If it's acceptable to require SQL access and exclude would-be target processes
that detach from shared memory, I favor an approach using the shared memory
SIGUSR1 multiplexer.  Bringing all the processes that do use shared memory
into agreement about the use of SIGUSR1 feels like a valuable step forward.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com