Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive.
Hmm, I would have to disagree, sorry. For me the problem was dynamically allocating everything at the time the signal is received and getting into problems when that caused errors.
What I mean is that we need to move the actual EXPLAIN run out of ProcessInterrupts(). It can be still fine to trigger the communication with a signal.
* INIT - Allocate N areas of memory for use by queries, which can be expanded/contracted as needed. Keep a freelist of structures.
* OBSERVER - When requested, gain exclusive access to a diagnostic area, then allocate the designated process to that area, then send a signal
* QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated diagnostic area, (set flag to show complete, set latch on observer)
* OBSERVER - process data in diagnostic area and then release area for use by next observation
If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a problem and copy data only up to the size defined. Any other ERRORs that are caused by this process cause it to fail normally.
Do you envision problems if we do this with a newly allocated DSM every time instead of pre-allocated area? This will have to revert the workflow, because only the QUERY knows the required segment size:
OBSERVER - sends a signal and waits for its proc latch to be set
QUERY - when signal is received allocates a DSM just big enough to fit the EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their latches)
The EXPLAIN plan should already be produced somewhere in the executor, to avoid calling into explain.c from ProcessInterrupts().
That allows the observer to be another backend, or it allows the query process to perform self-observation based upon a timeout (e.g. >1 hour) or a row limit (e.g. when an optimizer estimate is seen to be badly wrong).
Do you think there is one single best place in the executor code where such a check could be added? I have very little idea about that.