Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5Q=nEEzAUE3o29WTU0wW7wourZB+SDhwPYzGZPyggF4hg@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On-demand running query plans using auto_explain and signals  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Thu, Sep 17, 2015 at 10:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Second, using a shm_mq manipulates the state of the process latch.  I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq.  But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem.  I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it.  Why do you need anything more?  You
could set the requestor's latch if it's convenient; that wouldn't be a
problem.  But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

There is still the whole problem of where exactly the backend being queried for the status should publish that DSM segment and when to free it?

If it's a location shared between all backends, there should be locking around it.  Probably this is not a big problem, if you don't expect all the backends start querying each other rapidly.  That is how it was implemented in the first versions of this patch actually.

If we take the per-backend slot approach the locking seems unnecessary and there are principally two options:

1) The backend puts the DSM handle in its own slot and notifies the requester to read it.
2) The backend puts the DSM handle in the slot of the requester (and notifies it).

If we go with the first option, the backend that has created the DSM will not know when it's OK to free it, so this has to be responsibility of the requester.  If the latter exits before reading and freeing the DSM, we have a leak.  Even bigger is the problem that the sender backend can no longer send responses to a number of concurrent requestors: if its slot is occupied by a DSM handle, it can not send a reply to another backend until the slot is freed.

With the second option we have all the same problems with not knowing when to free the DSM and potentially leaking it, but we can handle concurrent requests.

The current approach where the requester creates and frees the DSM doesn't suffer from these problems, so if we pre-allocate the segment just big enough we can avoid the use of shm_mq.  That will take another GUC for the segment size.  Certainly no one expects a query plan to weigh a bloody megabyte, but this is what happens to Pavel apparently.

--
Alex

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: extend pgbench expressions with functions
Next
From: Fujii Masao
Date:
Subject: Re: Freeze avoidance of very large table.