Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5R3ELNEAyfXd5-Nv_WHk3UwfeYiFhGTOsLb=ecuvs4CSA@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: On-demand running query plans using auto_explain and signals  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
...
    - Attempts to get plan for simple insert queries like this

           INSERT INTO x SELECT * FROM x;

       end with a segfault, because ActivePortal->queryDesc is 0x0 for this
       query. Needs investigation.


Yes, I've hit the same problem after submitting the latest version of
the patch.  For now I've just added a check for queryDesc being not
NULL, but I guess the top of the current_query_stack might contains
something useful.  Something I need to try.

Well, the thing is we're able to do EXPLAIN on those queries, and IIRC auto_explain can log them too. So perhaps look into the hooks where they take the queryDesc in those cases - it has to be available somewhere.

Yes, this seems to work.  I was able to successfully query "create table as" and even "explain analyze" for the plans.  In both cases ActivePortal doesn't have a valid queryDesc, but the executor hook captures one.

    - The lockless approach seems fine to me, although I think the fear
       of performance issues is a bit moot (I don't think we expect large
       number of processes calling pg_cmdstatus at the same time). But
       it's not significantly more complex, so why not.


I believe the main benefit of the less-locking approach is that if
something goes wrong when two backends tried to communicate it doesn't
prevent the rest of them from operating, because there is no shared (and
thus locked) communication channel.

Sure, but I think it really deserves a bit more detailed explanation of the protocol, and discussion of the expected behavior for some basic failure types.

For example - what happens when a backend requests info, but dies before receiving it, and the backed is reused for another connection? Doesn't this introduce a race condition? Perhaps not, I'm just asking.

Now explained better in code comments.

The worst thing that could happen in this case I believe is that the requesting backend will not receive any response from the second use of its slot due to the slot being marked as processed by the backend that was asked first:

A:
slot->is_processed = false;
slot->is_valid = true;
/* signals backend B */;
shm_mq_read(); /* blocks */

B: /* finds the slot and prepares the response */

A: /* decides to bail out */
InvalidateCmdStatusSlot();
shm_mq_detach();
/* exits pg_cmdstatus() */

/* pg_cmdstatus() is called again */
/* initializes the slot and signals some backend */
shm_mq_read(); /* blocks */

B: /* completes preparing the response */
slot->is_processed = true;
shm_mq_send() => SHM_MQ_DETACHED
slot->is_valid = false;
/* gives up with this slot */

Now the backend that has been signaled on the second call to pg_cmdstatus (it can be either some other backend, or the backend B again) will not find an unprocessed slot, thus it will not try to attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to interrupt the querying backend A still.

In any case, the backends that are being asked to send the info will be able to notice the problem (receiver detached early) and handle it gracefully.

    - The patch contains pretty much no documentation, both comments
       at the code level and user docs. The lack of user docs is not that
       a big deal at this point (although the patch seems to be mature
       enough, although the user-level API will likely change).

       The lack of code comments is more serious, as it makes the review
       somewhat more difficult. For example it'd be very nice to document
       the contract for the lock-less interface.


I will add the code comments.  The user docs could wait before we decide
on the interface, I think.

Agreed, although I think having rudimentary user documentation would be useful for the reviewers - a summary of the goals that are a bit scattered over the e-mail thread.

OK

    - I agree that pg_cmdstatus() is not the best API. Having something
       like EXPLAIN PID would be nice, but it does not really work for
       all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
       there's not a single API for all cases, i.e. we should use EXPLAIN
       PID for one case and invent something different for the other?


I can think of something like:

EXPLAIN [ ( option [, ...] ) ] PROCESS <PID>;

where option is extended with:

   QUERY
   PROGRESS
   BACKTRACE

in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.

Seems OK. I'm not quite sure what PROGRESS is - I assume it's the same thing as ANALYZE? Why not to use the keyword, then?

This is what CMD_INFO_PROGRESS_TAG does: return the "partial completion" command tag, like "INSERT 12345", reflecting the number of tuples processed up to this point in time.

    - Is there a particular reason why we allocate slots for auxiliary
       processes and not just for backends (NumBackends)? Do we expect those
       auxiliary processes to ever use this API?


If we extend the interface to a more general one, there still might be
some space for querying status of checkpointer of bgwriter.

I don't think we should mix this with monitoring of auxiliary processes. This interface is designed at monitoring SQL queries running in other backends, effectively "remote" EXPLAIN. But those auxiliary processes are not processing SQL queries at all, they're not even using regular executor ...

OTOH the ability to request this info (e.g. auxiliary process looking at plans running in backends) seems useful, so I'm ok with tuple slots for auxiliary processes.

Now that I think about it, reserving the slots for aux process doesn't let us query their status, it's the other way round.  If we don't reserve them, then aux process would not be able to query any other process for the status.  Likely this is not a problem at all, so we can remove these extra slots.

    - CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
       the need for the second argument, or the internal slot variable. Why
       not to simply use the MyCmdStatusSlot directly?


Good point.

    - I also don't quite understand why we need to track css_pid for the
       slot? In what scenario will this actually matter?


I think it's being only used for error reporting and could help in
debugging, but for now that's it.

I recommend getting rid of it, unless we have a clear idea of how to use it. Otherwise I envision we won't really keep it updated properly (because it's "just for debugging"), and then one day someone actually starts using it and get bitten by it.

The design was copied over from procsignal.c, where the pss_pid is used to "claim" the procsignal slot.  There it's used for searching the target process slot when backend id is not known.  We don't use that in cmd status slots because of the inverted slots logic, so we could really remove this field.

    - While being able to get EXPLAIN from the running process is nice,
       I'm especially interested in getting EXPLAIN ANALYZE to get insight
       into the progress of the execution. The are other ways to get the
       EXPLAIN, e.g. by opening a different connection and actually running
       it (sure, the plan might have changed since then), but currently
       there's no way to get insight into the progress.

       From the thread I get the impression that Oleksandr also finds this
       useful - correct? What are the plans in this direction?

       ISTM we need at least two things for that to work:

       (a) Ability to enable instrumentation on all queries (effectively
           what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
           on the queries later. But auto_explain is an extension, so that
           does not seem as a good match if this is supposed to be in core.
           In that case a separate GUC seems appropriate.

       (b) Being able to run the InstrEnd* methods repeatedly - the initial
           message in this thread mentions issues with InstrEndLoop for
           example. So perhaps this is non-trivial.


I was able to make this work with a simple change to InstrEndLoop and
the callers.  Basically, adding a bool parameter in_explain and passing
an appropriate value.  I guess that's not the best approach, but it
appears to work.

Adding a GUC to enable instrumentation sounds reasonable.

Nice!


Do you believe it makes sense to add instrumentation support in this
same patch or better focus on making the simplest thing work first?

Let's make it a "patch series" with two patches.

OK, I'll submit them later today after addressing all the comments.

    - And finally, I think we should really support all existing EXPLAIN
       formats, not just text. We need to support the other formats (yaml,
       json, xml) if we want to use the EXPLAIN PID approach, and it also
       makes the plans easier to process by additional tools.


Sure, that was in my plans (and see above for possible syntax).  What
would be really neat is retrieving the complete backtrace.  Not sure
what the good interface would look like, but using JSON format for the
output sounds promising.

I don't quite see the reason to encode everything as JSON, because that makes it a bit difficult for clients that would prefer YAML, for example. Why not to just use the requested format? For example in YAML, we can do something like this:

            QUERY PLAN
----------------------------------
 - Plan[0]:                      +
     Node Type: "Index Only Scan"+
     Scan Direction: "Forward"   +
     ...                         +
                                 +
 - Plan[1]:                      +
     Node Type: "Index Only Scan"+
     Scan Direction: "Forward"   +
     ...                         +
                                 +
 - Plan[2]:                      +
     Node Type: "Index Only Scan"+
     Scan Direction: "Forward"   +
     ...                         +

and similarly for other formats. We don't really use nesting.

You're right, the nesting structure is too simple to require a certain format.  Do you have ideas about representing stack frames in TEXT format?  Something like what gdb does with #0, #1, etc. markers for frame depth?

--
Alex

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Robert Haas
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers