Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5StM7ikEEJi64B49k_zpUMXWVhdDKKyiHBS5tv+5S4VfA@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Re: On-demand running query plans using auto_explain and signals  (Andres Freund <andres@anarazel.de>)
Re: On-demand running query plans using auto_explain and signals  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Sep 18, 2015 at 7:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Frankly, I think you guys are making this out to be way more
complicated than it really is.  Basically, I think the process being
queried should publish a DSM via a slot it owns.  The recipient is
responsible for reading it and then notifying the sender.  If a second
process requests data before the first process reads its data, the
second process can either (1) become an additional reader of the
already-published data or (2) wait for the first process to finish,
and then send its own inquiry.

There are some problems to solve here, but they hardly seem impossible.

Thank you Robert, for your invaluable input on this patch!

I now believe that use of ProcessInterrupts() in the recently proposed design as well as manipulation of proc latch due to use of shared memory queue are major blockers.

In order to simplify things up, I've taken a step back and looked again at the auto_explain contrib module.  I now propose the most simple thing possible in my opinion: a new GUC option for auto_explain.  It will make every backend, in which the module is loaded via *_preload_libraries mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the plans of queries in dynamic shared memory as long as auto_explain.publish_plans is turned on.

The greatest part for me, is that this approach doesn't require handling of signals and is isolated in an external module, so it can be readily used with the current server versions, no need to wait for >= 9.6.

Some implementation details:

For every backend that might be running (MaxBackends) we reserve a dsm_handle slot in the addins shared memory.  When the new option is turned on, the ExecutorRun hook will produce a plan in whatever format is specified by the auto_explain.log_format, allocate a DSM segment, copy the plan into the segment and finally store the DSM handle into its own slot.  No locking is required around this because every backend writes to its slot exclusively, no other backend can be writing into it concurrently.  In the ExecutorFinish hook we invalidate the backend's slot by setting the handle to 0 and deallocate the DSM segment.

Reading of the plan is performed by a newly added function pg_explain_backend(PID).  Since it can determine the target process' backendId, it simply reads the DSM handle from that backend's slot and tries to attach it (there's not much point in checking the handle for being non-0, because the other backend could release the segment the moment after we've checked it, so we rely on dsm_attach returning non-NULL).  If attached successfully, we parse the contents and detach.  At this point the backend to detach the last is actually releasing the segment, due to reference counting.

Handling of the nested statements plans is an open question.  It can be really useful when the top-level plan is simply displaying a "SELECT my_stored_procedure()" and all the interesting stuff is happening behind the scenes, but I didn't start to think about how this could be implemented yet.

Pavel was really interested in retrieving the complete query text/plans which could be over a megabyte in his case (and pg_stat_activity.query is capped by 10240 bytes I believe).  This is now possible with the patch, but some others might still want to put a threshold on the allocation, especially given this is shared memory.  I can envision another GUC, but in our experience the extremely long queries (and plans) are most of the time due to use of VALUES() or IN() clauses with a huge list of literals.

I think we could fold the VALUES/IN into "?" if the query/plan text exceeds the specified threshold, or unconditionally (yes, pg_stat_statements, I'm looking at you).  This should help in the cases when the most interesting part is in the plan nodes near the end, but there's such a huge list of literals before it.

Future plans:

I believe this approach can be extended to enable instrumentation once again.  The backend executing the query could update the published plan every once in a while (for example, every N ms or 1% of rows processed in a node), and any other process interested in this data, can simply read it without the need for signals and complex and fragile communication.  This obviously requires a core patch.

Some problems:

There is a potential problem with the limited total number of DSM segments: it is reserved in a way to only allow 2 per backend (on average) and 64 additional per server, so if you run with the new option enabled at all times, you're down to only 1 additional DSM per backend (again, on average).  Not sure how critical this can be, but no one is forced to run with this option enabled for all backends.

If you don't want to run it enabled at all times, then enabling the GUC per-backend can be problematic.  It's still possible to update the conf file and send SIGHUP to a single backend, but it's harder to accomplish over psql, for example.  I think here we might still have some luck with the signals: use another array of per-backend slots with flags, set the target backend's flag and send it SIGUSR1.  The backend wakes on the signal and examines its slot, then toggles the GUC if needed.  Sounds pretty safe, eh?

No documentation changes yet, waiting for your comments. :-)

Happy hacking!
--
Alex

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!
Next
From: "Joshua D. Drake"
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!