Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CAFj8pRDLc4QS9tyRGGBro5iGvZFb+kgzAcSRqkvSn9GTRF7ZcA@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: On-demand running query plans using auto_explain and signals
List pgsql-hackers


2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

... This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)?  I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Please see attached for implementation of this approach.  The most surprising thing is that it actually works :)

One problem still remains with the process requesting information when the target process exits before it can have a chance to handle the signal.  The requesting process then waits forever, because nobody attaches/detaches the queue.  We've discussed this before and looks like we need to introduce a timeout parameter to pg_cmdstatus()...

I've added the timeout parameter to the pg_cmdstatus call, and more importantly to the sending side of the queue, so that one can limit the potential effect of handling the interrupt in case something goes really wrong.

I don't think so introduction new user visible timer is good idea. This timer should be invisible

My idea - send a signal and wait 1sec, then check if target process is living still. Stop if not. Wait next 5 sec, then check target process. Stop if this process doesn't live or raise warning "requested process doesn't communicate, waiting.." The final cancel should be done by statement_timeout.

what do you think about it?
 

I've tested a number of possible scenarios with artificial delays in reply and sending cancel request or SIGTERM to the receiving side and this is all seems to work nicely due to proper message queue detach event notification.  Still I think it helps to know that there is a timeout in case the receiving side is really slow to read the message.

I've also decided we really ought to suppress any possible ERROR level messages generated during the course of processing the status request in order not to prevent the originally running transaction to complete.  The errors so caught are just logged using LOG level and ignored in this new version of the patch.

I'm also submitting the instrumentation support as a separate patch on top of this.  I'm not really fond of adding bool parameter to InstrEndLoop, but for now I didn't find any better way.

What I'm now thinking about is probably we can extend this backend communication mechanism in order to query GUC values effective in a different backend or even setting the values.  The obvious candidate to check when you see some query misbehaving would be work_mem, for example.  Or we could provide a report of all settings that were overridden in the backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

this is a good idea. More times I interested what is current setting of query. We cannot to use simple query - because it is not possible to push PID to function simply, but you can to write function  pg_settings_pid() so usage can look like

select * from pg_settings_pid(xxxx, possible other params) where ...

 

The obvious candidates to be set externally are the cmdstatus_analyze/instrument_*: when you decided you want to turn them on, you'd rather do that carefully for a single backend than globally per-cluster.  One can still modify the postgresql.conf and then send SIGHUP to just a single backend, but I think a more direct way to alter the settings would be better.

I am 100% for possibility to read the setting. But reconfiguration from other process is too hot coffee  - it can be available via extension, but not via usually used tools.
 

In this light should we rename the API to something like "backend control" instead of "command status"?  The SHOW/SET syntax could be extended to support the remote setting retrieval/update.

prepare API, and this functionality can be part of referential implementation in contrib.

This patch should not to have too range be finished in this release cycle.

Regards

Pavel



 

--
Alex


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Alvaro Herrera
Date:
Subject: Re: pltcl: sentence improvement