Re: Server instrumentation: pg_terminate_backend, pg_reload_conf - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Server instrumentation: pg_terminate_backend, pg_reload_conf
Date
Msg-id 200506041912.j54JCF527129@candle.pha.pa.us
Whole thread Raw
In response to Server instrumentation: pg_terminate_backend, pg_reload_conf  (Andreas Pflug <pgadmin@pse-consulting.de>)
Responses Re: Server instrumentation: pg_terminate_backend, pg_reload_conf  (Andreas Pflug <pgadmin@pse-consulting.de>)
List pgsql-patches
Andreas Pflug wrote:
> This patch reenables pg_terminate_backend, allowing (superuser only, of
> course) to terminate a backend. As taken from the discussion some weeks
> earlier, SIGTERM seems to be used quite widely, without a report of
> misbehavior so while the code path is officially not too well tested,
> in practice it's working ok and helpful.

I thought we had a discussion that the places we accept SIGTERM might be
places that can exit if the postmaster is shutting down, but might not
be places we can exit if the postmaster continues running, e.g. holding
locks.  Have you checked all the places we honor SIGTERM to check that
we are safe to exit?  I know Tom had concerns about that.

Looking at ProcessInterrupts() and friends, when it is called with
QueryCancelPending(), it does elog(ERROR) and longjumps out of elog, and
that cleans up some stuff.  The problem with SIGTERM/ProcDiePending is
that it just does a FATAL and I assume doesn't do the same cleanups that
elog(ERROR) does to cancel a query.

Ideally we would use another signal number, that would do a query
cancel, then up in the recovery code after the longjump, after we had
reset everything, we could then exit.  The problem, I think, is that we
don't have another signal available for use.  I see this in postgres.c:

    pqsignal(SIGHUP, SigHupHandler);    /* set flag to read config file */
    pqsignal(SIGINT, StatementCancelHandler);   /* cancel current query */
    pqsignal(SIGTERM, die);     /* cancel current query and exit */
    pqsignal(SIGQUIT, quickdie);    /* hard crash time */
    pqsignal(SIGALRM, handle_sig_alarm);        /* timeout conditions */

    /*
     * Ignore failure to write to frontend. Note: if frontend closes
     * connection, we will notice it and exit cleanly when control next
     * returns to outer loop.  This seems safer than forcing exit in the
     * midst of output during who-knows-what operation...
     */
    pqsignal(SIGPIPE, SIG_IGN);
    pqsignal(SIGUSR1, CatchupInterruptHandler);
    pqsignal(SIGUSR2, NotifyInterruptHandler);
    pqsignal(SIGFPE, FloatExceptionHandler);

It would be neat if we could do a combined Cancel/Terminate signal, but
signals don't work that way.  Any ideas on how we can do a combined
cancel/terminate?  Do we have a shared area that both the postmaster and
the backends can see?  Could we set a flag when the postmaster is
shutting down and then when a backend sets a SIGTERM, it could either
shut down right away or do the cancel and then shut down?  I don't think
we can do query cancel for server-wide backend shutdowns --- it should
be as quick as possible.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Fix PID file location?
Next
From: Bruce Momjian
Date:
Subject: Re: Simplify Win32 Signaling code