Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CAFj8pRBWZbYpT-wRksbxcCUAHZ4+15JJDtq88heaxbXa-sPLiQ@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
Hi




2015-09-03 18:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

Please find attached a v3.

It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack.  Not sure about using the executor hook, since this is not an extension...

The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself.  After some testing with concurrent pgbench and intentionally deep recursive plpgsql functions (up to 700 plpgsql stack frames) I think this approach can work.  Unless there's some theoretical problem I'm just not aware of. :-)
 
Comments welcome!

I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in one time can be executed per server - I remember lot of queries that doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be unfriendly. Cannot to say if it is good enough for first iteration. This is functionality that can be used for diagnostic when you have overloaded server and this risk looks too high (for me). The idea of receive slot can to solve this risk well. The difference from this code should not be too big - although it is not trivial - needs work with PGPROC.



Other smaller issues:

* probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
* pg_usleep(1000L); - it is related to single point resource

Some ideas:

* this code share some important parts with auto_explain (query stack) - and because it should be in core (due handling signal if I remember well), it can be first step of integration auto_explain to core.
 
 
--
Alex


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_ctl/pg_rewind tests vs. slow AIX buildfarm members
Next
From: Pavel Stehule
Date:
Subject: Re: On-demand running query plans using auto_explain and signals