Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role) - Mailing list pgsql-hackers

This thread evolved out of an attempt to implement
pg_terminate_backend for non-superusers.  I thought -- probably
erroneously -- that the major objection to that was the known
possibility of a PID-cycling race condition, whereby a signal could be
misdirected, in the case of terminate_backend, SIGTERM.  So now this
fork of the thread is about fixing these unlikely races, and then
passing administration requests (such as "query cancel" or "die" ) as
out-of-band information via SIGUSR1, just like how LISTEN/NOTIFY and
conflict signals are passed.

To prevent ambiguity, I am using a new special number -- a 'SessionId'
-- that is guaranteed unique to all backends ever created during the
uptime of a database.  This number is currently implemented in a way
that is guessable (so it cannot be accepted from external sources),
but I actually think it may be even more useful for a number of other
uses if given a non-guessable form (like cancellation keys).  In this
respect it would fulfill pretty much the same purposes as the notion
of a "session" seen on the web.

Noah offered me these comments:
> This patch still changes the policy for pg_terminate_backend(), and it does
> not fix other SIGINT senders like processCancelRequest() and ProcSleep().  If
> you're concerned about PID-reuse races, audit all backend signalling.  Either
> fix all such problems or propose a plan to get there eventually.

Is the postmaster signaling its children intrinsically vulnerable to
PID racing?  Because it controls when it can call wait() or waitpid()
on child processes, it can unambiguously know that PIDs have not been
cycled for use.  For this reason, a credible and entirely alternate
design might be to bounce IPC requests through the postmaster, but
since postmaster is so critical I had decided not to introduce nor
change mechanics there.

The Postmaster I think keeps a private copy of cancellation keys that
are not in shared memory, if I read it properly (not 100% sure), and
uses that for cancellation requests.  This has a useful property of
allowing cancellations even in event that shared memory goes insane
(and since postmaster is typically left as last sane process of the
group I thought it wise to not have it reuse a shared-memory based
approach).

I cannot comment on ProcSleep at this time.

> Currently, when pg_terminate_backend() follows a pg_cancel_backend() on which
> the target has yet to act, the eventual outcome is a terminated process.  With
> this patch, the pg_terminate_backend() becomes a no-op with this warning:
>
>> !                      ereport(WARNING,
>> !                                      (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> !                                       (errmsg("process is busy responding to administrative "
>> !                                                       "request")),
>> !                                       (errhint("This is temporary, and may be retried."))));
>
> That's less useful than the current behavior.

Yes. It could be fixed with dynamic allocation (holding more
administration requests), but for just getting a flavor of what a
solution might be like.  I wanted to avoid additional dynamic
allocation (which would necessitate a similar condition in the form of
much-less likely OOM), but at some point I think this error condition
is inevitable in some form.  I see it as akin to EAGAIN.  Right now,
administrative requests are so short (copying and clearing a handful
of words out of PGPROC) that it's unlikely that this would be a
problem in practice.

> That being said, I can't get too excited about closing PID-reuse races.  I've
> yet to see another program do so.  I've never seen a trouble report around
> this race for any software.  Every OS I have used assigns PIDs so as to
> maximize the reuse interval, which seems like an important POLA measure given
> typical admin formulae like "kill `ps | grep ...`".  This defense can only
> matter in fork-bomb conditions, at which point a stray signal is minor.
>
> I do think it's worth keeping this idea in a back pocket for achieving those
> "more complex behaviors," should we ever desire them.

I feel mostly feel this way, too: I have a non-specific itch to solve
the race -- and I'm much more interested in pg_terminate_backend, the
original subject at hand (I'll submit a patch that is more of a
policy-fix than a signal wrangling one to the old thread).

I was also interested in sketching both as solution to this problem
and bringing down some of the non-intrinsic difficulty in
experimenting with rich inter-backend communication.  For example,
query progress (itself a much more complex beast, obviously) could
probably make use of these administration requests (with an async-back
channel akin to NOTIFY).  Or printing MemoryContextStats.  Or quick
profiling toggling/performance counters on a backend run amok without
having the ability to run gdb -p and completely freezing the process.

Thoughts appreciated.  I've marked the commitfest patch as "rejected"
for the time being, but if one is interested in peering at the work,
it is here:

https://commitfest.postgresql.org/action/patch_view?id=825

--
fdr


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: foreign key locks, 2nd attempt
Next
From: Jeff Janes
Date:
Subject: Re: Memory usage during sorting