Home > mailing lists

Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From	Shulgin, Oleksandr
Subject	Re: On-demand running query plans using auto_explain and signals
Date	September 14, 2015 16:09:32
Msg-id	CACACo5SKOxdPJ54MwNxuK0CdHf7pp3mB5eN-Ha5e4WDg9i1Ksw@mail.gmail.com Whole thread Raw
In response to	Re: On-demand running query plans using auto_explain and signals (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: On-demand running query plans using auto_explain and signals ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>) Re: On-demand running query plans using auto_explain and signals ("Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de>)
List	pgsql-hackers

Tree view

On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and we should not rely on people interrupting the backend. Being able to detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such situations, but it's not there at the moment. If you believe it's possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we may introduce a LWLock for each slot, and lock it with dontWait=true (and skip it if we couldn't lock it). This should prevent most scenarios where one corrupted slot blocks many processes.

OK, I will revisit this part then.

In any case, the backends that are being asked to send the info will be
able to notice the problem (receiver detached early) and handle it
gracefully.

Ummm, how? Maybe I missed something?

Well, I didn't attach the updated patch (doing that now). The basic idea is that when the backend that has requested information bails out prematurely it still detaches from the shared memory queue. This makes it possible for the backend being asked to detect the situation either before attaching to the queue or when trying to send the data, so it won't be blocked forever if the other backend failed to wait.

I don't think we should mix this with monitoring of auxiliary
processes. This interface is designed at monitoring SQL queries
running in other backends, effectively "remote" EXPLAIN. But those
auxiliary processes are not processing SQL queries at all, they're
not even using regular executor ...

OTOH the ability to request this info (e.g. auxiliary process
looking at plans running in backends) seems useful, so I'm ok with
tuple slots for auxiliary processes.

Now that I think about it, reserving the slots for aux process doesn't
let us query their status, it's the other way round. If we don't
reserve them, then aux process would not be able to query any other
process for the status. Likely this is not a problem at all, so we can
remove these extra slots.

I don't know. I can imagine using this from background workers, but I think those are counted as regular backends (not sure though).

MaxBackends includes the background workers, yes.

Alex

pgsql-hackers by date:

From: YUriy Zhuravlev
Date: 14 September 2015, 16:05:26
Subject: Re: Scaling PostgreSQL at multicore Power8

From: Petr Jelinek
Date: 14 September 2015, 16:20:45
Subject: Re: WIP: Rework access method interface

Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

Previous

Next