Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id 55F6B95E.4080507@2ndquadrant.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Responses Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-hackers

On 09/14/2015 01:15 PM, Shulgin, Oleksandr wrote:
> On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>
>     On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
>
>         On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
>         <tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>
>         <mailto:tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>>> wrote:
>
>     ...
>
>              - Attempts to get plan for simple insert queries like this
>
>                     INSERT INTO x SELECT * FROM x;
>
>                 end with a segfault, because ActivePortal->queryDesc is
>         0x0 for this
>                 query. Needs investigation.
>
>
>         Yes, I've hit the same problem after submitting the latest
>         version of
>         the patch.  For now I've just added a check for queryDesc being not
>         NULL, but I guess the top of the current_query_stack might contains
>         something useful.  Something I need to try.
>
>
>     Well, the thing is we're able to do EXPLAIN on those queries, and
>     IIRC auto_explain can log them too. So perhaps look into the hooks
>     where they take the queryDesc in those cases - it has to be
>     available somewhere.
>
>
> Yes, this seems to work.  I was able to successfully query "create table
> as" and even "explain analyze" for the plans.  In both cases
> ActivePortal doesn't have a valid queryDesc, but the executor hook
> captures one.
>
>              - The lockless approach seems fine to me, although I think
>         the fear
>                 of performance issues is a bit moot (I don't think we
>         expect large
>                 number of processes calling pg_cmdstatus at the same
>         time). But
>                 it's not significantly more complex, so why not.
>
>
>         I believe the main benefit of the less-locking approach is that if
>         something goes wrong when two backends tried to communicate it
>         doesn't
>         prevent the rest of them from operating, because there is no
>         shared (and
>         thus locked) communication channel.
>
>
>     Sure, but I think it really deserves a bit more detailed explanation
>     of the protocol, and discussion of the expected behavior for some
>     basic failure types.
>
>     For example - what happens when a backend requests info, but dies
>     before receiving it, and the backed is reused for another
>     connection? Doesn't this introduce a race condition? Perhaps not,
>     I'm just asking.
>
>
> Now explained better in code comments.
>
> The worst thing that could happen in this case I believe is that the
> requesting backend will not receive any response from the second use of
> its slot due to the slot being marked as processed by the backend that
> was asked first:
>
> A:
> slot->is_processed = false;
> slot->is_valid = true;
> /* signals backend B */;
> shm_mq_read(); /* blocks */
>
> B: /* finds the slot and prepares the response */
>
> A: /* decides to bail out */
> InvalidateCmdStatusSlot();
> shm_mq_detach();
> /* exits pg_cmdstatus() */
>
> /* pg_cmdstatus() is called again */
> /* initializes the slot and signals some backend */
> shm_mq_read(); /* blocks */
>
> B: /* completes preparing the response */
> slot->is_processed = true;
> shm_mq_send() => SHM_MQ_DETACHED
> slot->is_valid = false;
> /* gives up with this slot */
>
> Now the backend that has been signaled on the second call to
> pg_cmdstatus (it can be either some other backend, or the backend B
> again) will not find an unprocessed slot, thus it will not try to
> attach/detach the queue and the backend A will block forever.
>
> This requires a really bad timing and the user should be able to
> interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and 
we should not rely on people interrupting the backend. Being able to 
detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such 
situations, but it's not there at the moment. If you believe it's 
possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we 
may introduce a LWLock for each slot, and lock it with dontWait=true 
(and skip it if we couldn't lock it). This should prevent most scenarios 
where one corrupted slot blocks many processes.

>
> In any case, the backends that are being asked to send the info will be
> able to notice the problem (receiver detached early) and handle it
> gracefully.

Ummm, how? Maybe I missed something?


>     I don't think we should mix this with monitoring of auxiliary
>     processes. This interface is designed at monitoring SQL queries
>     running in other backends, effectively "remote" EXPLAIN. But those
>     auxiliary processes are not processing SQL queries at all, they're
>     not even using regular executor ...
>
>     OTOH the ability to request this info (e.g. auxiliary process
>     looking at plans running in backends) seems useful, so I'm ok with
>     tuple slots for auxiliary processes.
>
>
> Now that I think about it, reserving the slots for aux process doesn't
> let us query their status, it's the other way round.  If we don't
> reserve them, then aux process would not be able to query any other
> process for the status.  Likely this is not a problem at all, so we can
> remove these extra slots.

I don't know. I can imagine using this from background workers, but I 
think those are counted as regular backends (not sure though).

>
>              - I also don't quite understand why we need to track
>         css_pid for the
>                 slot? In what scenario will this actually matter?
>
>
>         I think it's being only used for error reporting and could help in
>         debugging, but for now that's it.
>
>
>     I recommend getting rid of it, unless we have a clear idea of how to
>     use it. Otherwise I envision we won't really keep it updated
>     properly (because it's "just for debugging"), and then one day
>     someone actually starts using it and get bitten by it.
>
>
> The design was copied over from procsignal.c, where the pss_pid is used
> to "claim" the procsignal slot.  There it's used for searching the
> target process slot when backend id is not known.  We don't use that in
> cmd status slots because of the inverted slots logic, so we could really
> remove this field.

OK.

>     I don't quite see the reason to encode everything as JSON, because
>     that makes it a bit difficult for clients that would prefer YAML,
>     for example. Why not to just use the requested format? For example
>     in YAML, we can do something like this:
>
>                  QUERY PLAN
>     ----------------------------------
>       - Plan[0]:                      +
>           Node Type: "Index Only Scan"+
>           Scan Direction: "Forward"   +
>           ...                         +
>                                       +
>       - Plan[1]:                      +
>           Node Type: "Index Only Scan"+
>           Scan Direction: "Forward"   +
>           ...                         +
>                                       +
>       - Plan[2]:                      +
>           Node Type: "Index Only Scan"+
>           Scan Direction: "Forward"   +
>           ...                         +
>
>     and similarly for other formats. We don't really use nesting.
>
>
> You're right, the nesting structure is too simple to require a certain
> format.  Do you have ideas about representing stack frames in TEXT
> format?  Something like what gdb does with #0, #1, etc. markers for
> frame depth?

Yes. The simpler the format, the better.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Jan Wieck
Date:
Subject: Re: Double linking MemoryContext children