Re: Proposal for Signal Detection Refactoring - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposal for Signal Detection Refactoring
Date
Msg-id 1694.1564424822@sss.pgh.pa.us
Whole thread Raw
In response to Re: Proposal for Signal Detection Refactoring  (Chris Travers <chris.travers@adjust.com>)
Responses Re: Proposal for Signal Detection Refactoring  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Chris Travers <chris.travers@adjust.com> writes:
> Here's a new patch.  No rush on it.  I am moving it to next commitfest
> anyway because as code documentation I think this is a low priority late in
> the release cycle.
> The changes  mostly address Andres's feedback above.

I took a look through this version.  (BTW, please add a version number
to the patchfile name in future, so people can keep track of which
one we're talking about.)

+src/backend/utils/misc/README.SIGNAL_HANDLING

Hmm ... I'm not very sure of a good place to put this, but utils/misc/
seems kinda random, because there's no code at all in there that's
particularly related to this topic.  One alternate suggestion is
utils/init/, which is at least related by virtue of the signal flags
being defined in globals.c.  Another idea is src/backend/tcop/, since
postgres.c is where the rubber meets the road for these issues,
at least in regular backends.

Also, "README.SIGNAL_HANDLING" seems both overly long and not very
consistent with the naming of other README files.  The ones that aren't
just "README" are

README.git
doc/src/sgml/README.links
src/backend/access/heap/README.HOT
src/backend/access/heap/README.tuplock
src/backend/access/transam/README.parallel
src/backend/libpq/README.SSL
src/backend/statistics/README.dependencies
src/backend/statistics/README.mcv
src/backend/storage/lmgr/README-SSI
src/backend/storage/lmgr/README.barrier
src/interfaces/ecpg/README.dynSQL
src/interfaces/ecpg/preproc/README.parser

so there's a pretty clear preference for "README.lowercasesomething".
Hence I suggest "README.signals".

+Implementation Notes on Globals and Signal/Event Handling
+=========================================================

I'd drop "Globals and", it's not very relevant here; it's surely not
the lead keyword.  Maybe "Signal Handling in Postgres" would be even
more apropos.

+The approch to signal handling in PostgreSQL is designed to strictly conform
+with the C89 standard and designed to run with as few platform assumptions as
+possible.

s/approch/approach/, s/conform with/conform to/, s/run with/make/ (or just
drop "and designed..." entirely; C spec compliance is as much as we need
to say).

+Most signals (except SIGSEGV, and SIGKILL) are blocked by PostgreSQL
+during process startup.

The parenthetical remark is wrong, or at least incomplete; there's other
stuff we don't block such as SIGBUS.  I think I'd skip any attempt to
list them here and just say "Most blockable signals are blocked ...".
(Also, reading down, I note you return to this topic where you mention
SIGILL.  Perhaps better to enlarge on never-blocked signals at that
point.)

+An exception is made for SIGQUIT which is used by the postmaster to terminate
+backend sessions quickly when another backend dies so that the postmaster
+may re-initialize shared memory and otherwise return to a known-good state.

It'd likely be worth the verbiage to enlarge on this.  There is a clear
risk of the SIGQUIT handler malfunctioning because it interrupts something
that's not safe to interrupt.  We live with this on the theory that we're
just trying to kill the process anyway, so corruption of its internal
state is tolerable; plus the postmaster has SIGKILL-after-a-timeout logic
in case a child's SIGQUIT handler gets completely stuck.

+CHECK_FOR_INTERRUPTS() may be called only when there is no more cleanup to do
+because the query or process may be aborted.

This seems pretty wrong or at least confusing.  Maybe better
"CHECK_FOR_INTERRUPTS must be called only in places where it is safe to
invoke transaction abort and/or process termination.  Because Postgres
generally expects backend subsystems to maintain enough state to clean
up after themselves at transaction abort, this is a weak restriction.
However, functions that execute only straight-line code are allowed to
put persistent variables into transiently inconsistent states; that's
safe exactly as long as no CHECK_FOR_INTERRUPTS is executed while the
process state is unsafe for transaction abort".

+SIGHUP  - Reload config file (Postmaster Only)

Nope, that's not postmaster-only.  Backends and most other child
processes also re-read the config file on SIGHUP.

+SIGINT  - (Postmaster) Fast shutdown, (Backend) Cancel Query
+SIGQUIT - (Postmaster) Immediate Shutdown, (Backend) Exit Immediately
+SIGTERM - Terminate backend gracefully

IIRC, SIGTERM is also Smart Shutdown in the postmaster; why isn't
this documented similarly to SIGINT/SIGQUIT?

+SIGUSR1 - Pending IPC or LWLock event

Probably better to explain this as a general purpose signal multiplexor,
because that's what it is.  LWLock etc are specific use-cases.

+SIGINFO - Parent died
+SIGPWR  - Parent died (alternative)

I was totally confused by this to start with, until grepping reminded me
that POSTMASTER_DEATH_SIGNAL is implememented as one or the other of
them.  It might be better to use that name in this documentation, and
then explain that under the hood it is SIGINFO, SIGPWR, or possibly
some other otherwise-unused signal.  "Parent died" seems inappropriately
general, too, because it's specifically the postmaster we care about.

+A few other signals, such as SIGUSR2 may have different meanings in different
+worker processes.  In the case of SIGUSR2, we use it to promote a postmaster,
+but also to exit a checkpointer or to stop replication.  In general CPU-based
+interrupts (such as SIGILL) are not caught or handled by PostgreSQL.

Not sure if we want such an incomplete listing of SIGUSR2 uses as that.
Also, the point about SIGILL surely should be a separate para?

+CHECK_FOR_INTERRUPTS() may cause a non-local exit because the function it wraps
+utilizes PostgreSQL's built-in exception framework (ereport) to abort queries
+and can call exit() to exit a orocess. CHECK_FOR_INTERRUPTS() MUST NOT be called
+in places where the current function assumes that cleanup of shared resources
+may be required, or where other problems with non-local exits are foreseen in the
+case of ordinary usage of the function.  In cases where expectations of normal
+safety is suspended (critical sections and the like) there is a mechanism for
+holding off interrupts, as documented where the HOLD_INTERRUPTS() macro is defined.

s/orocess/process/, also this seems mighty repetitive with what was
written earlier.  Maybe best for the earlier text to be very minimal
and just say use of CHECK_FOR_INTERRUPTS is explained below.

+Because InterruptPending will always be set when QueryCancelPending and
+ProcDiePending are set, checking it first is a useful operation where the number
+of possible calls are very high.  In cases where the number of calls are lower,
+this micro-optimization may be omitted.

This para seems pretty pointless.  Are there any such places?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: should there be a hard-limit on the number of transactionspending undo?
Next
From: Alvaro Herrera
Date:
Subject: Re: minor fixes after pgindent prototype fixes