Re: Asynchronous Append on postgres_fdw nodes. - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Asynchronous Append on postgres_fdw nodes.
Date
Msg-id 20201214.175623.1287257830379316172.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Asynchronous Append on postgres_fdw nodes.  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: Asynchronous Append on postgres_fdw nodes.
List pgsql-hackers
At Sat, 12 Dec 2020 19:06:51 +0900, Etsuro Fujita <etsuro.fujita@gmail.com> wrote in 
> On Fri, Nov 20, 2020 at 8:16 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > I looked through the nodeAppend.c and postgres_fdw.c part and those
> > are I think the core of this patch.
> 
> Thanks again for the review!
> 
> > -                * figure out which subplan we are currently processing
> > +                * try to get a tuple from async subplans
> > +                */
> > +               if (!bms_is_empty(node->as_needrequest) ||
> > +                       (node->as_syncdone && !bms_is_empty(node->as_asyncpending)))
> > +               {
> > +                       if (ExecAppendAsyncGetNext(node, &result))
> > +                               return result;
> >
> > The function ExecAppendAsyncGetNext() is a function called only here,
> > and contains only 31 lines.  It doesn't seem to me that the separation
> > makes the code more readable.
> 
> Considering the original ExecAppend() is about 50 lines long, 31 lines
> of code would not be small.  So I'd vote for separating it into
> another function as proposed.

Ok, I no longer oppose to separating some part from ExecAppend().

> > -               /* choose new subplan; if none, we're done */
> > -               if (!node->choose_next_subplan(node))
> > +               /* wait or poll async events */
> > +               if (!bms_is_empty(node->as_asyncpending))
> > +               {
> > +                       Assert(!node->as_syncdone);
> > +                       Assert(bms_is_empty(node->as_needrequest));
> > +                       ExecAppendAsyncEventWait(node);
> >
> > You moved the function to wait for events from execAsync to
> > nodeAppend. The former is a generic module that can be used from any
> > kind of executor nodes, but the latter is specialized for nodeAppend.
> > In other words, the abstraction level is lowered here.  What is the
> > reason for the change?
> 
> The reason is just because that function is only called from
> ExecAppend().  I put some functions only called from nodeAppend.c in
> execAsync.c, though.

(I think) You told me that you preferred the genericity of the
original interface, but you're doing the opposite.  If you think we
can move such a generic feature to a part of Append node, all other
features can be move the same way.  I guess there's a reason you want
only the this specific feature out of all of them be Append-spcific
and I want to know that.

> > +               /* Perform the actual callback. */
> > +               ExecAsyncRequest(areq);
> > +               if (ExecAppendAsyncResponse(areq))
> > +               {
> > +                       Assert(!TupIsNull(areq->result));
> > +                       *result = areq->result;
> >
> > Putting aside the name of the functions, the first two function are
> > used only this way at only two places. ExecAsyncRequest(areq) tells
> > fdw to store the first tuple among the already received ones to areq,
> > and ExecAppendAsyncResponse(areq) is checking the result is actually
> > set. Finally the result is retrieved directory from areq->result.
> > What is the reason that the two functions are separately exists?
> 
> I think that when an async-aware node gets a tuple from an
> async-capable node, they should use ExecAsyncRequest() /
> ExecAyncHogeResponse() rather than ExecProcNode() [1].  I modified the
> patch so that ExecAppendAsyncResponse() is called from Append, but to
> support bubbling up the plan tree discussed in [2], I think it should
> be called from ForeignScans (the sides of async-capable nodes).  Am I
> right?  Anyway, I’ll rename ExecAppendAyncResponse() to the one
> proposed in Robert’s original patch.

Even though I understand the concept but to make work it we need to
remember the parent *async* node somewhere. In my faint memory the
very early patch did something like that.

So I think just providing ExecAsyncResponse() doesn't make it
true. But if we make it true, it would be something like
partially-reversed steps from what the current Exec*()s do for some of
the existing nodes and further code is required for some other nodes
like WindowFunction. Bubbling up works only in very simple cases where
a returned tuple is thrown up to further parent as-is or at least when
the node convers a tuple into another shape. If an async-receiver node
wants to process multiple tuples from a child or from multiple
children, it is no longer be just a bubbling up.

That being said, we could avoid passing (a-kind-of) side-channel
information when ExecProcNode is called by providing
ExecAsyncResponse(). But I don't think the "side-channel" is not a
problem since it is just another state of the node.


And.. I think the reason I feel uneasy for the patch may be that the
patch uses the interface names in somewhat different context.
Origianlly the fraemework resides in-between executor nodes, not on a
node of either side. ExecAsyncNotify() notifies the requestee about an
event and ExecAsyncResonse() notifies the requestor about a new
tuple. I don't feel strangeness in this usage. But this patch feels to
me using the same names in different (and somewhat wrong) context.

> > +                       /* Perform the actual callback. */
> > +                       ExecAsyncNotify(areq);
> >
> > Mmm. The usage of the function (or its name) looks completely reverse
> > to me.  I think FDW should NOTIFY to exec nodes that the new tuple
> > gets available but the reverse is nonsense. What the function is
> > actually doing is to REQUEST fdw to fetch tuples that are expected to
> > have arrived, which is different from what the name suggests.
> 
> As mentioned in a previous email, this is an FDW callback routine
> revived from Robert’s patch.  I think the naming is reasonable,
> because the callback routine notifies the FDW of readiness of a file
> descriptor.  And actually, the callback routine tells the core whether
> the corresponding ForeignScan node is ready for a new request or not,
> by setting the callback_pending flag accordingly.

Hmm. Agreed.  The word "callback" is also used there [3]...  I
remember and it seems reasonable that the core calls AsyncNotify() on
FDW and the FDW calls ExecForeignScan as a response to it and notify
back to core of that using ExecAsyncRequestDone().  But the patch here
feels a little strange, or uneasy, to me.

[3] https://www.postgresql.org/message-id/20161018.103051.30820907.horiguchi.kyotaro%40lab.ntt.co.jp

> > postgres_fdw.c
> >
> > > postgresIterateForeignScan(ForeignScanState *node)
> > > {
> > >       PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
> > >       TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
> > >
> > >       /*
> > >        * If this is the first call after Begin or ReScan, we need to create the
> > >        * cursor on the remote side.
> > >        */
> > >       if (!fsstate->cursor_exists)
> > >               create_cursor(node);
> >
> > With the patch, cursors are also created in another place so at least
> > the comment is wrong.
> 
> Good catch!  Will fix.
> 
> > That being said, I think we should unify the
> > code except the differences between async and sync.  For example, if
> > the fetch_more_data_begin() needs to be called only for async
> > fetching, the cursor should be created before calling the function, in
> > the code path common with sync fetching.
> 
> I think that that would make the code easier to understand, but I’m
> not 100% sure we really need to do so.

And I believe that we don't tolerate even the slightest performance
degradation.

> Best regards,
> Etsuro Fujita
> 
> [1]
https://www.postgresql.org/message-id/CA%2BTgmoYrbgTBnLwnr1v%3Dpk%2BC%3DznWg7AgV9%3DM9ehrq6TDexPQNw%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CA%2BTgmoZSWnhy%3DSB3ggQcB6EqKxzbNeNn%3DEfwARnCS5tyhhBNcw%40mail.gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: pg_basebackup caused FailedAssertion
Next
From: Kyotaro Horiguchi
Date:
Subject: Some error messages are omitted while recovery.