Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role) - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role) |
Date | |
Msg-id | 20120318181020.GA21999@tornado.leadboat.com Whole thread Raw |
In response to | Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role) (Daniel Farina <daniel@heroku.com>) |
List | pgsql-hackers |
On Sat, Mar 17, 2012 at 05:28:11PM -0700, Daniel Farina wrote: > Noah offered me these comments: > > This patch still changes the policy for pg_terminate_backend(), and it does > > not fix other SIGINT senders like processCancelRequest() and ProcSleep(). ?If > > you're concerned about PID-reuse races, audit all backend signalling. ?Either > > fix all such problems or propose a plan to get there eventually. > > Is the postmaster signaling its children intrinsically vulnerable to > PID racing? Because it controls when it can call wait() or waitpid() > on child processes, it can unambiguously know that PIDs have not been > cycled for use. Agreed, for Unix anyway. > For this reason, a credible and entirely alternate > design might be to bounce IPC requests through the postmaster, but > since postmaster is so critical I had decided not to introduce nor > change mechanics there. Good point, but I also agree with your decision there. > The Postmaster I think keeps a private copy of cancellation keys that > are not in shared memory, if I read it properly (not 100% sure), and > uses that for cancellation requests. This has a useful property of > allowing cancellations even in event that shared memory goes insane > (and since postmaster is typically left as last sane process of the > group I thought it wise to not have it reuse a shared-memory based > approach). Yes. > > Currently, when pg_terminate_backend() follows a pg_cancel_backend() on which > > the target has yet to act, the eventual outcome is a terminated process. ?With > > this patch, the pg_terminate_backend() becomes a no-op with this warning: > > > >> ! ? ? ? ? ? ? ? ? ? ? ?ereport(WARNING, > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (errmsg("process is busy responding to administrative " > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "request")), > >> ! ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (errhint("This is temporary, and may be retried.")))); > > > > That's less useful than the current behavior. > > Yes. It could be fixed with dynamic allocation (holding more > administration requests), but for just getting a flavor of what a > solution might be like. I wanted to avoid additional dynamic > allocation (which would necessitate a similar condition in the form of > much-less likely OOM), but at some point I think this error condition > is inevitable in some form. I see it as akin to EAGAIN. Right now, > administrative requests are so short (copying and clearing a handful > of words out of PGPROC) that it's unlikely that this would be a > problem in practice. I nominally agree that the new race would be rare, but not rarer than the race this patch purposes to remove. You could also fix this by having the sender wait until the target is ready to accept an admin request. For the particular case of cancel/terminate, a terminate could overwrite a cancel; a cancel can reduce to a no-op when either request is pending. I share your interest in not tying a design to the narrow needs of cancel/terminate, though.
pgsql-hackers by date: