Re: Asynchronous Append on postgres_fdw nodes. - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Asynchronous Append on postgres_fdw nodes.
Date
Msg-id CAPmGK16rA5ODyRrVK9iPsyW-td2RcRZXsdWoVhMmLLmUhprsTg@mail.gmail.com
Whole thread Raw
In response to Re: Asynchronous Append on postgres_fdw nodes.  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Asynchronous Append on postgres_fdw nodes.
Re: Asynchronous Append on postgres_fdw nodes.
List pgsql-hackers
On Mon, Dec 14, 2020 at 4:01 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Sat, 12 Dec 2020 18:25:57 +0900, Etsuro Fujita <etsuro.fujita@gmail.com> wrote in
> > On Fri, Nov 20, 2020 at 3:51 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > The reason for
> > > the early fetching is letting fdw send the next request as early as
> > > possible. (However, I didn't measure the effect of the
> > > nodeAppend-level prefetching.)
> >
> > I agree that that would lead to an improved efficiency in some cases,
> > but I still think that that would be useless in some other cases like
> > SELECT * FROM sharded_table LIMIT 1.  Also, I think the situation
> > would get worse if we support Append on top of joins or aggregates
> > over ForeignScans, which would be more expensive to perform than these
> > ForeignScans.
>
> I'm not sure which gain we weigh on, but if doing "LIMIT 1" on Append
> for many times is more common than fetching all or "LIMIT <many
> multiples of fetch_size>", that discussion would be convincing... Is
> it really the case?

I don't have a clear answer for that...  Performance in the case you
mentioned would be improved by async execution without prefetching by
Append, so it seemed reasonable to me to remove that prefetching to
avoid unnecessary overheads in the case I mentioned.  BUT: I started
to think my proposal, which needs an additional FDW callback routine
(ie, ForeignAsyncBegin()), might be a bad idea, because it would
increase the burden on FDW authors.

> > If we do prefetching, I think it would be better that it’s the
> > responsibility of the FDW to do prefetching, and I think that that
> > could be done by letting the FDW to start another data fetch,
> > independently of the core, in the ForeignAsyncNotify callback routine,
>
> FDW does prefetching (if it means sending request to remote) in my
> patch, so I agree to that.  It suspect that you were intended to say
> the opposite.  The core (ExecAppendAsyncGetNext()) controls
> prefetching in your patch.

No.  That function just tries to retrieve a tuple from any of the
ready subplans (ie, subplans marked as as_needrequest).

> > which I revived from Robert's original patch.  I think that that would
> > be more efficient, because the FDW would no longer need to wait until
> > all buffered tuples are returned to the core.  In the WIP patch, I
>
> I don't understand. My patch sends a prefetch-query as soon as all the
> tuples of the last remote-request is stored into FDW storage.  The
> reason for removing ExecAsyncNotify() was it is just redundant as far
> as concerning Append asynchrony. But I particulary oppose to revive
> the function.

Sorry, my explanation was not good, but what I'm saying here is about
my patch, not your patch.  I think this FDW callback routine would be
useful; it allows an FDW to perform another asynchronous data fetch
before delivering a tuple to the core as discussed in [1].  Also, it
would be useful when extending to the case where we have intermediate
nodes between an Append and a ForeignScan such as joins or aggregates,
which I'll explain below.

> > only allowed the callback routine to put the corresponding ForeignScan
> > node into a state where it’s either ready for a new request or needing
> > a callback for another data fetch, but I think we could probably relax
> > the restriction so that the ForeignScan node can be put into another
> > state where it’s ready for a new request while needing a callback for
> > the prefetch.
>
> I don't understand this, too. ExecAsyncNotify() doesn't touch any of
> the bitmaps, as_needrequest, callback_pending nor as_asyncpending in
> your patch.  Am I looking into something wrong?  I'm looking
> async-wip-2020-11-17.patch.

In the WIP patch I post, these bitmaps are modified in the core side
based on the callback_pending and request_complete flags in
AsyncRequests returned from FDWs (See ExecAppendAsyncEventWait()).

> (By the way, it is one of those that make the code hard to read to me
> that the "callback" means "calling an API function".  I think none of
> them (ExecAsyncBegin, ExecAsyncRequest, ExecAsyncNotify) are a
> "callback".)

I thought the word “callback” was OK, because these functions would
call the corresponding FDW callback routines, but I’ll revise the
wording.

> > The reason why I disabled async execution when executing EPQ is to
> > avoid sending asynchronous queries to the remote sides, which would be
> > useless, because scan tuples for an EPQ recheck are obtained in a
> > dedicated way.
>
> If EPQ is performed onto Append, I think it should gain from
> asynchronous execution since it is going to fetch *a* tuple from
> several partitions or children.  I believe EPQ doesn't contain Append
> in major cases, though.  (Or I didn't come up with the steps for the
> case to happen...)

Sorry, I don’t understand this part.  Could you elaborate a bit more on it?

> > What do you mean by "push-up style executor"?
>
> The reverse of the volcano-style executor, which enters from the
> topmost node and down to the bottom.  In the "push-up stule executor",
> the bottom-most nodes fires by a certain trigger then every
> intermediate nodes throws up the result to the parent until reaching
> the topmost node.

That is what I'm thinking to be able to support the case I mentioned
above.  I think that that would allow us to find ready subplans
efficiently from occurred wait events in ExecAppendAsyncEventWait().
Consider a plan like this:

Append
-> Nested Loop
  -> Foreign Scan on a
  -> Foreign Scan on b
-> ...

I assume here that Foreign Scan on a, Foreign Scan on b, and Nested
Loop are all async-capable and that we have somewhere in the executor
an AsyncRequest with requestor="Nested Loop" and requestee="Foreign
Scan on a", an AsyncRequest with requestor="Nested Loop" and
requestee="Foreign Scan on b", and an AsyncRequest with
requestor="Append" and requestee="Nested Loop".  In
ExecAppendAsyncEventWait(), if a file descriptor for foreign table a
becomes ready, we would call ForeignAsyncNotify() for a, and if it
returns a tuple back to the requestor node (ie, Nested Loop) (using
ExecAsyncResponse()), then *ForeignAsyncNotify() would be called for
Nested Loop*.  Nested Loop would then call ExecAsyncRequest() for the
inner requestee node (ie, Foreign Scan on b; I assume here that it is
a foreign scan parameterized by a).  If Foreign Scan on b returns a
tuple back to the requestor node (ie, Nested Loop) (using
ExecAsyncResponse()), then Nested Loop would match the tuples from the
outer and inner sides.  If they match, the join result would be
returned back to the requestor node (ie, Append) (using
ExecAsyncResponse()), marking the Nested Loop subplan as
as_needrequest.  Otherwise, Nested Loop would call ExecAsyncRequest()
for the inner requestee node for the next tuple, and so on.  If
ExecAsyncRequest() can't return a tuple immediately, we would wait
until a file descriptor for foreign table b becomes ready; we would
start from calling ForeignAsyncNotify() for b when the file descriptor
becomes ready.  In this way we could find ready subplans efficiently
from occurred wait events in ExecAppendAsyncEventWait() when extending
to the case where subplans are joins or aggregates over Foreign Scans,
I think.  Maybe I’m missing something, though.

Thanks for the comments!

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/CAPmGK153oorYtTpW_-aZrjH-iecHbykX7qbxX_5630ZK8nqVHg%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: Double partition lock in bufmgr
Next
From: Etsuro Fujita
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.