Re: Use a signal to trigger a memory context dump? - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Use a signal to trigger a memory context dump?
Date
Msg-id 20140624052153.GA1241113@tornado.leadboat.com
Whole thread Raw
In response to Re: Use a signal to trigger a memory context dump?  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
+1 for having an API better than GDB to make a process emit a memory usage
dump.  This is my top non-crash cause for use of GDB in production.

On Mon, Jun 23, 2014 at 07:21:22PM +0200, Andres Freund wrote:
> On 2014-06-23 10:07:36 -0700, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > I wonder if it'd make sense to allow a signal to trigger a memory
> > > context dump? I and others more than once had the need to examine memory
> > > usage on production systems and using gdb isn't always realistic.
> > > I wonder if we could install a signal handler for some unused signal
> > > (e.g. SIGPWR) to dump memory.

SIGPWR is not widely available.  Apart from SIGUSR1 and SIGUSR2, using a
portable signal risks colliding with the standard use thereof.

> > > I'd also considered adding a SQL function that uses the SIGUSR1 signal
> > > multiplexing for the purpose but that's not necessarily nice if you have
> > > to investigate while SQL access isn't yet possible. There's also the
> > > problem that not all possibly interesting processes use the sigusr1
> > > signal multiplexing.

I don't know whether to be interested in cases where SQL access is
unavailable.  If those cases are important, an idea for achieving it without
leaning on unportable or already-used signals is to define SIGUSR2 as a second
multiplexer that uses files instead of shared memory.  You'd send the signal
with something like this:
 : >$PGDATA/procsig/$targetpid.memdump kill -USR2 $targetpid

(This would probably require first converting the existing autovacuum use of
SIGUSR2 to the shared memory procsig mechanism.)

> > The closest approximation that I think would be reasonable is to
> > set a flag that would be noticed by the next CHECK_FOR_INTERRUPTS
> > macro.  So you're already buying into the assumption that the process
> > executes CHECK_FOR_INTERRUPTS fairly often.  Which probably means
> > that assuming it's using the standard sigusr1 handler isn't a big
> > extra limitation.

If it's acceptable to require SQL access and exclude would-be target processes
that detach from shared memory, I favor an approach using the shared memory
SIGUSR1 multiplexer.  Bringing all the processes that do use shared memory
into agreement about the use of SIGUSR1 feels like a valuable step forward.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: releaseOk and LWLockWaitForVar
Next
From: Jeff Janes
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]