Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5SKOxdPJ54MwNxuK0CdHf7pp3mB5eN-Ha5e4WDg9i1Ksw@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
Re: On-demand running query plans using auto_explain and signals  ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List pgsql-hackers
On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and we should not rely on people interrupting the backend. Being able to detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such situations, but it's not there at the moment. If you believe it's possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we may introduce a LWLock for each slot, and lock it with dontWait=true (and skip it if we couldn't lock it). This should prevent most scenarios where one corrupted slot blocks many processes.

OK, I will revisit this part then.

In any case, the backends that are being asked to send the info will be
able to notice the problem (receiver detached early) and handle it
gracefully.

Ummm, how? Maybe I missed something?

Well, I didn't attach the updated patch (doing that now).  The basic idea is that when the backend that has requested information bails out prematurely it still detaches from the shared memory queue.  This makes it possible for the backend being asked to detect the situation either before attaching to the queue or when trying to send the data, so it won't be blocked forever if the other backend failed to wait.

    I don't think we should mix this with monitoring of auxiliary
    processes. This interface is designed at monitoring SQL queries
    running in other backends, effectively "remote" EXPLAIN. But those
    auxiliary processes are not processing SQL queries at all, they're
    not even using regular executor ...

    OTOH the ability to request this info (e.g. auxiliary process
    looking at plans running in backends) seems useful, so I'm ok with
    tuple slots for auxiliary processes.


Now that I think about it, reserving the slots for aux process doesn't
let us query their status, it's the other way round.  If we don't
reserve them, then aux process would not be able to query any other
process for the status.  Likely this is not a problem at all, so we can
remove these extra slots.

I don't know. I can imagine using this from background workers, but I think those are counted as regular backends (not sure though).

MaxBackends includes the background workers, yes.

--
Alex

pgsql-hackers by date:

Previous
From: YUriy Zhuravlev
Date:
Subject: Re: Scaling PostgreSQL at multicore Power8
Next
From: Petr Jelinek
Date:
Subject: Re: WIP: Rework access method interface