Re: [HACKERS] Optional message to user when terminating/cancellingbackend - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Optional message to user when terminating/cancellingbackend |
Date | |
Msg-id | 20180613191621.dl5ko3v5gjltxsbj@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Optional message to user when terminating/cancellingbackend (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: [HACKERS] Optional message to user when terminating/cancellingbackend
|
List | pgsql-hackers |
Hi, Onder, I CCed you because it seems worthwhile to ensure that the relevant Citus code could use this instead of the gross hack you and I committed... On 2018-06-13 20:54:03 +0200, Daniel Gustafsson wrote: > > On 9 Apr 2018, at 23:55, Andres Freund <andres@anarazel.de> wrote: > > On 2017-06-20 13:01:35 -0700, Andres Freund wrote: > >> For extensions it'd also be useful if it'd be possible to overwrite the > >> error code. E.g. for citus there's a distributed deadlock detector, > >> running out of process because there's no way to interrupt lock waits > >> locally, and we've to do some ugly hacking to generate proper error > >> messages and code from another session. > > > > What happened to this request? Seems we're out of the crunch mode and > > could round the feature out a littlebit more… > > Revisiting old patches, I took a stab at this request. > > Since I don’t really have a use case for altering the sqlerrcode other than the > on that Citus.. cited, I modelled the API around that. Cool. Onder? > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml > index b851fe023a..ad9763698f 100644 > --- a/doc/src/sgml/func.sgml > +++ b/doc/src/sgml/func.sgml > @@ -18540,7 +18540,7 @@ SELECT set_config('log_statement_stats', 'off', false); > <tbody> > <row> > <entry> > - <literal><function>pg_cancel_backend(<parameter>pid</parameter> <type>int</type>)</function></literal> > + <literal><function>pg_cancel_backend(<parameter>pid</parameter> <type>int</type> [, <parameter>message</parameter><type>text</type>])</function></literal> > </entry> > <entry><type>boolean</type></entry> > <entry>Cancel a backend's current query. This is also allowed if the > @@ -18565,7 +18565,7 @@ SELECT set_config('log_statement_stats', 'off', false); > </row> > <row> > <entry> > - <literal><function>pg_terminate_backend(<parameter>pid</parameter> <type>int</type>)</function></literal> > + <literal><function>pg_terminate_backend(<parameter>pid</parameter> <type>int</type> [, <parameter>message</parameter><type>text</type>])</function></literal> > </entry> > <entry><type>boolean</type></entry> > <entry>Terminate a backend. This is also allowed if the calling role > @@ -18596,6 +18596,14 @@ SELECT set_config('log_statement_stats', 'off', false); > The role of an active backend can be found from the > <structfield>usename</structfield> column of the > <structname>pg_stat_activity</structname> view. > + If the optional <literal>message</literal> parameter is set, the text > + will be appended to the error message returned to the signalled backend. > + <literal>message</literal> is limited to 128 bytes, any longer text > + will be truncated. An example where we cancel our own backend: > +<programlisting> > +postgres=# SELECT pg_cancel_backend(pg_backend_pid(), 'Cancellation message text'); > +ERROR: canceling statement due to user request: Cancellation message text > +</programlisting> > </para> I'm not sure I really like the appending bit. There's a security argument to be made about doing so, but from a user POV that mostly seems restrictive. I wonder if that aspect would be better handled by adding an error context that contains information about which process did the cancellation (or DETAIL?)? > +/* > + * Structure for registering a feedback payload to be sent to a cancelled, or > + * terminated backend. Each backend is registered per pid in the array which is > + * indexed by Backend ID. Reading and writing the message is protected by a > + * per-slot spinlock. > + */ > +typedef struct > +{ > + pid_t pid; This is the pid of the receiving process? If so, why do we need this in here? That's just duplicated data, no? > + slock_t mutex; > + char message[MAX_CANCEL_MSG]; > + int len; > + int sqlerrcode; I'd vote for including the pid of the process that did the cancelling in here. > +Datum > +pg_cancel_backend(PG_FUNCTION_ARGS) > +{ > + PG_RETURN_BOOL(pg_cancel_backend_internal(PG_GETARG_INT32(0), NULL)); > +} > + > +Datum > +pg_cancel_backend_msg(PG_FUNCTION_ARGS) > +{ > + pid_t pid; > + char *msg = NULL; > + > + if (PG_ARGISNULL(0)) > + PG_RETURN_NULL(); > + > + pid = PG_GETARG_INT32(0); > + > + if (PG_NARGS() == 2 && !PG_ARGISNULL(1)) > + msg = text_to_cstring(PG_GETARG_TEXT_PP(1)); > + > + PG_RETURN_BOOL(pg_cancel_backend_internal(pid, msg)); > +} Why isn't this just one function? Given that you already have a PG_NARGS check, I don't quite see the point? > /* > * Signal to terminate a backend process. This is allowed if you are a member > * of the role whose process is being terminated. > * > * Note that only superusers can signal superuser-owned processes. > */ > -Datum > -pg_terminate_backend(PG_FUNCTION_ARGS) > +static bool > +pg_terminate_backend_internal(pid_t pid, char *msg) > { > - int r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM); > + int r = pg_signal_backend(pid, SIGTERM, msg); > > if (r == SIGNAL_BACKEND_NOSUPERUSER) > ereport(ERROR, > @@ -146,7 +203,30 @@ pg_terminate_backend(PG_FUNCTION_ARGS) > (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")))); > > - PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS); > + return (r == SIGNAL_BACKEND_SUCCESS); > +} > + > +Datum > +pg_terminate_backend(PG_FUNCTION_ARGS) > +{ > + PG_RETURN_BOOL(pg_terminate_backend_internal(PG_GETARG_INT32(0), NULL)); > +} > + > +Datum > +pg_terminate_backend_msg(PG_FUNCTION_ARGS) > +{ > + pid_t pid; > + char *msg = NULL; > + > + if (PG_ARGISNULL(0)) > + PG_RETURN_NULL(); > + > + pid = PG_GETARG_INT32(0); > + > + if (PG_NARGS() == 2 && !PG_ARGISNULL(1)) > + msg = text_to_cstring(PG_GETARG_TEXT_PP(1)); > + > + PG_RETURN_BOOL(pg_terminate_backend_internal(pid, msg)); > } > > /* > @@ -168,7 +248,6 @@ pg_reload_conf(PG_FUNCTION_ARGS) > PG_RETURN_BOOL(true); > } Same. > +/* > + * Set a message for the cancellation of the backend with the specified pid, > + * using the default sqlerrcode. > + */ > +int > +SetBackendCancelMessage(pid_t backend_pid, char *message) > +{ > + return backend_feedback(backend_pid, message, ERRCODE_QUERY_CANCELED); > +} > + > +/* > + * Set a message for the termination of the backend with the specified pid, > + * using the default sqlerrcode. > + */ > +int > +SetBackendTerminationMessage(pid_t backend_pid, char *message) > +{ > + return backend_feedback(backend_pid, message, ERRCODE_ADMIN_SHUTDOWN); > +} > + > +/* > + * Set both a message and a sqlerrcode for use when signalling the backend > + * with the specified pid. > + */ > +int > +SetBackendSignalFeedback(pid_t backend_pid, char *message, int sqlerrcode) > +{ > + return backend_feedback(backend_pid, message, sqlerrcode); > +} I'm not quite seeing the point of these variants. What advantage are they over just doing the same on the caller level? > +/* > + * Sets a cancellation message for the backend with the specified pid, and > + * returns the length of the message actually created. If the returned length > + * is less than the length of the message parameter, truncation has occurred. > + * If the backend isn't found, -1 is returned. If no message is passed, zero is > + * returned. If two backends collide in setting a message, the existing message > + * will be overwritten by the last one in. > + */ > +static int > +backend_feedback(pid_t backend_pid, char *message, int sqlerrcode) > +{ > + int i; > + int len; > + > + if (!message) > + return 1; > + > + len = pg_mbcliplen(message, strlen(message), MAX_CANCEL_MSG - 1); > + > + for (i = 0; i < MaxBackends; i++) > + { > + BackendSignalFeedbackShmemStruct *slot = &BackendSignalFeedbackSlots[i]; > + > + if (slot->pid != 0 && slot->pid == backend_pid) > + { > + SpinLockAcquire(&slot->mutex); > + if (slot->pid != backend_pid) > + { > + SpinLockRelease(&slot->mutex); > + goto error; > + } > + > + MemSet(slot->message, '\0', sizeof(slot->message)); > + memcpy(slot->message, message, len); This seems unnecessarily expensive. > + slot->len = len; > + slot->sqlerrcode = sqlerrcode; > + SpinLockRelease(&slot->mutex); > + > + if (len != strlen(message)) > + ereport(NOTICE, > + (errmsg("message is too long and has been truncated"))); > + return 0; > + } > + } So we made cancellation a O(N) proposition :/. Probably not too bad, but still... > +error: > + > + elog(LOG, "Cancellation message requested for missing backend %d by %d", > + (int) backend_pid, MyProcPid); > + > + return 1; > +} So we now do log spam if processes exit in the wrong moment? > +/* > + * Test whether there is feedback registered for the current backend that can > + * be consumed and presented to the user. > + */ > +bool > +HasBackendSignalFeedback(void) > +{ > + volatile BackendSignalFeedbackShmemStruct *slot = MyCancelSlot; > + bool has_message = false; > + > + if (slot != NULL) > + { > + SpinLockAcquire(&slot->mutex); > + has_message = ((slot->len > 0) && (slot->sqlerrcode != 0)); > + SpinLockRelease(&slot->mutex); > + } > + > + return has_message; > +} I'm somewhat uncomfortable with acquiring a mutex in an error path. We used to prcoess at least some interrupts in signal handlers in some cases - I think I removed all of them, but if not, we'd be in deep trouble here. Wonder if we shouldn't try to get around needing that... Greetings, Andres Freund
pgsql-hackers by date: