Re: asynchronous and vectorized execution - Mailing list pgsql-hackers

From Robert Haas
Subject Re: asynchronous and vectorized execution
Date
Msg-id CA+Tgmoa-dGO85K1mGGa1v7rHLrWEvzH2HQWb6-miEqpvH0kHnQ@mail.gmail.com
Whole thread Raw
In response to Re: asynchronous and vectorized execution  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Tue, Aug 2, 2016 at 3:41 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Thank you for the comment.
>
> At Mon, 1 Aug 2016 10:44:56 +0530, Amit Khandekar <amitdkhan.pg@gmail.com> wrote in
<CAJ3gD9ek4Y4SGTSuc_pzkGYwLMbrc9QOM7m1D8bj99JNW16o0g@mail.gmail.com>
>> On 21 July 2016 at 15:20, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp
>> > wrote:
>>
>> >
>> > After some consideration, I found that ExecAsyncWaitForNode
>> > cannot be reentrant because it means that the control goes into
>> > async-unaware nodes while having not-ready nodes, that is
>> > inconsistent state. To inhibit such reentering, I allocated node
>> > identifiers in depth-first order so that ascendant-descendant
>> > relationship can be checked (nested-set model) in simple way and
>> > call ExecAsyncConfigureWait only for the descendant nodes of the
>> > parameter planstate.
>> >
>> >
>> We have estate->waiting_nodes containing a mix of async-aware and
>> non-async-aware nodes. I was thinking, an asynchrony tree would have only
>> async-aware nodes, with possible multiple asynchrony sub-trees in a tree.
>> Somehow, if we restrict the bubbling up of events only upto the root of the
>> asynchrony subtree, do you think we can simplify some of the complexities ?
>
> The current code prohibiting regsitration of nodes outside the
> current subtree to avoid the reentring-disaster.
>
> Indeed leaving the "waiting node" mark or something like on every
> root node at the first visit will enable the propagation to stop
> upto the root of any async-subtree. Neverheless, when an
> async-child in an inactive async-root fires, the new tuple is
> loaded but is not consumed then the succeeding firing on the same
> child leads to a dead-lock (without result queueing). However,
> that can be avoided if ExecAsyncConfigureWait doesn't register
> nodes in ready state.

Why would a node call ExecAsyncConfigureWait in the first place if it
already had a result ready?  I think it shouldn't do that.

> On the other hand, any two or more asynchronous nodes can share a
> syncronization object. For instance, multiple postgres_fdw scan
> node can share one server connection and only one of them can get
> into waitable state at once. If no async-child in the current
> async subtree is waitable, it must be stuck. So I think it is
> crucial for ExecAsyncWaitForNode to force at least one child *in
> the current async subtree* to get into waiting state for such
> situation. The ascendant-descendant relationship is necessary to
> do that anyway.

This is another example of a situation where waiting only for nodes
within a subtree causes problems.

Suppose there are two Foreign Scans in completely different parts of
the plan tree that are going to use, in alternation, the same
connection to the same remote server.  When we encounter the first
one, it kicks off the query, uses ExecAsyncConfigureWait to register
itself as waiting, and returns without becoming ready.  When we
encounter the second one, it can't kick off the query and therefore
has no chance of becoming ready until after the first one has finished
with the connection.  Suppose we then wait for the second Foreign
Scan.  Well, we had better wait for the first one, too!  If we don't,
it will never finish with the connection, so the second node will
never get to use it, and now we're in trouble.

I think what we need is for the ConnCacheEntry to have a place to note
the ForeignScanState that is using the connection and any other
PlanState objects that would like to use it.  When one
ForeignScanState is done with the ConnCacheEntry, it activates the
next one, which then takes over.  That seems simple enough, but
there's a problem here for suspended queries: if we stop executing a
plan while some scan within that plan is holding onto a
ConnCacheEntry, and then we run some other query that wants to use the
same one, we've got a problem.  Maybe we can get by with letting the
other query finish running and then executing our own query, but that
might be messy to implement.  Another idea is to somehow let any
in-progress query finish running before allowing the first query to be
suspended; that would need some new infrastructure.

My main point here is that I think waiting for only a subtree is an
idea that cannot work out well.  Whatever problems are pushing you
into that design, we need to confront those problems directly and fix
them.  There shouldn't be any unsolvable problems in waiting for
everything in the whole query, and I'm pretty sure that's going to be
a more elegant and better-performing design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: asynchronous and vectorized execution
Next
From: Heikki Linnakangas
Date:
Subject: Re: kqueue