Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5Rn4KMtY=RUTnW58-yj+MU2YsYNOXa-oeJTm_xFOE=reQ@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: On-demand running query plans using auto_explain and signals
List pgsql-hackers
On Tue, Sep 29, 2015 at 8:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 29 September 2015 at 12:52, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
  
Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive. 

Hmm, I would have to disagree, sorry. For me the problem was dynamically allocating everything at the time the signal is received and getting into problems when that caused errors.

What I mean is that we need to move the actual EXPLAIN run out of ProcessInterrupts().  It can be still fine to trigger the communication with a signal.

* INIT - Allocate N areas of memory for use by queries, which can be expanded/contracted as needed. Keep a freelist of structures.
* OBSERVER - When requested, gain exclusive access to a diagnostic area, then allocate the designated process to that area, then send a signal
* QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated diagnostic area, (set flag to show complete, set latch on observer)
* OBSERVER - process data in diagnostic area and then release area for use by next observation

If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a problem and copy data only up to the size defined. Any other ERRORs that are caused by this process cause it to fail normally.

Do you envision problems if we do this with a newly allocated DSM every time instead of pre-allocated area?  This will have to revert the workflow, because only the QUERY knows the required segment size:

OBSERVER - sends a signal and waits for its proc latch to be set
QUERY - when signal is received allocates a DSM just big enough to fit the EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their latches)

The EXPLAIN plan should already be produced somewhere in the executor, to avoid calling into explain.c from ProcessInterrupts().

That allows the observer to be another backend, or it allows the query process to perform self-observation based upon a timeout (e.g. >1 hour) or a row limit (e.g. when an optimizer estimate is seen to be badly wrong).

Do you think there is one single best place in the executor code where such a check could be added?  I have very little idea about that.

--
Alex

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: ON CONFLICT issues around whole row vars,
Next
From: Alvaro Herrera
Date:
Subject: Re: Idea for improving buildfarm robustness