Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CAFj8pRAbk-dMBYMrDJ3aUROCz4+zgt6gO9=rnWtV32h1jzPaHg@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-hackers


2015-09-03 22:06 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi




2015-09-03 18:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

Please find attached a v3.

It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack.  Not sure about using the executor hook, since this is not an extension...

The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself.  After some testing with concurrent pgbench and intentionally deep recursive plpgsql functions (up to 700 plpgsql stack frames) I think this approach can work.  Unless there's some theoretical problem I'm just not aware of. :-)
 
Comments welcome!

I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in one time can be executed per server - I remember lot of queries that doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be unfriendly. Cannot to say if it is good enough for first iteration. This is functionality that can be used for diagnostic when you have overloaded server and this risk looks too high (for me). The idea of receive slot can to solve this risk well (and can be used elsewhere). The difference from this code should not be too big - although it is not trivial - needs work with PGPROC. The opinion of our multiprocess experts can be interesting. Maybe I am too careful.

 



Other smaller issues:

* probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
* pg_usleep(1000L); - it is related to single point resource

Some ideas:

* this code share some important parts with auto_explain (query stack) - and because it should be in core (due handling signal if I remember well), it can be first step of integration auto_explain to core.
 
 
--
Alex



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: On-demand running query plans using auto_explain and signals
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_ctl/pg_rewind tests vs. slow AIX buildfarm members