Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5SixNu1UQyfOvZsDiMMcT3HkeT2_a-vdG3SE-OWWzLbzA@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: On-demand running query plans using auto_explain and signals  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

... This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)?  I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Please see attached for implementation of this approach.  The most surprising thing is that it actually works :)

One problem still remains with the process requesting information when the target process exits before it can have a chance to handle the signal.  The requesting process then waits forever, because nobody attaches/detaches the queue.  We've discussed this before and looks like we need to introduce a timeout parameter to pg_cmdstatus()...

I've added the timeout parameter to the pg_cmdstatus call, and more importantly to the sending side of the queue, so that one can limit the potential effect of handling the interrupt in case something goes really wrong.

I've tested a number of possible scenarios with artificial delays in reply and sending cancel request or SIGTERM to the receiving side and this is all seems to work nicely due to proper message queue detach event notification.  Still I think it helps to know that there is a timeout in case the receiving side is really slow to read the message.

I've also decided we really ought to suppress any possible ERROR level messages generated during the course of processing the status request in order not to prevent the originally running transaction to complete.  The errors so caught are just logged using LOG level and ignored in this new version of the patch.

I'm also submitting the instrumentation support as a separate patch on top of this.  I'm not really fond of adding bool parameter to InstrEndLoop, but for now I didn't find any better way.

What I'm now thinking about is probably we can extend this backend communication mechanism in order to query GUC values effective in a different backend or even setting the values.  The obvious candidate to check when you see some query misbehaving would be work_mem, for example.  Or we could provide a report of all settings that were overridden in the backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

The obvious candidates to be set externally are the cmdstatus_analyze/instrument_*: when you decided you want to turn them on, you'd rather do that carefully for a single backend than globally per-cluster.  One can still modify the postgresql.conf and then send SIGHUP to just a single backend, but I think a more direct way to alter the settings would be better.

In this light should we rename the API to something like "backend control" instead of "command status"?  The SHOW/SET syntax could be extended to support the remote setting retrieval/update.

--
Alex

Attachment

pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: pgbench progress with timestamp
Next
From: Tom Lane
Date:
Subject: Re: Inaccurate results from numeric ln(), log(), exp() and pow()