Re: Asynchronous Append on postgres_fdw nodes. - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Asynchronous Append on postgres_fdw nodes. |
Date | |
Msg-id | CAPmGK16rA5ODyRrVK9iPsyW-td2RcRZXsdWoVhMmLLmUhprsTg@mail.gmail.com Whole thread Raw |
In response to | Re: Asynchronous Append on postgres_fdw nodes. (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Asynchronous Append on postgres_fdw nodes.
Re: Asynchronous Append on postgres_fdw nodes. |
List | pgsql-hackers |
On Mon, Dec 14, 2020 at 4:01 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Sat, 12 Dec 2020 18:25:57 +0900, Etsuro Fujita <etsuro.fujita@gmail.com> wrote in > > On Fri, Nov 20, 2020 at 3:51 PM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > The reason for > > > the early fetching is letting fdw send the next request as early as > > > possible. (However, I didn't measure the effect of the > > > nodeAppend-level prefetching.) > > > > I agree that that would lead to an improved efficiency in some cases, > > but I still think that that would be useless in some other cases like > > SELECT * FROM sharded_table LIMIT 1. Also, I think the situation > > would get worse if we support Append on top of joins or aggregates > > over ForeignScans, which would be more expensive to perform than these > > ForeignScans. > > I'm not sure which gain we weigh on, but if doing "LIMIT 1" on Append > for many times is more common than fetching all or "LIMIT <many > multiples of fetch_size>", that discussion would be convincing... Is > it really the case? I don't have a clear answer for that... Performance in the case you mentioned would be improved by async execution without prefetching by Append, so it seemed reasonable to me to remove that prefetching to avoid unnecessary overheads in the case I mentioned. BUT: I started to think my proposal, which needs an additional FDW callback routine (ie, ForeignAsyncBegin()), might be a bad idea, because it would increase the burden on FDW authors. > > If we do prefetching, I think it would be better that it’s the > > responsibility of the FDW to do prefetching, and I think that that > > could be done by letting the FDW to start another data fetch, > > independently of the core, in the ForeignAsyncNotify callback routine, > > FDW does prefetching (if it means sending request to remote) in my > patch, so I agree to that. It suspect that you were intended to say > the opposite. The core (ExecAppendAsyncGetNext()) controls > prefetching in your patch. No. That function just tries to retrieve a tuple from any of the ready subplans (ie, subplans marked as as_needrequest). > > which I revived from Robert's original patch. I think that that would > > be more efficient, because the FDW would no longer need to wait until > > all buffered tuples are returned to the core. In the WIP patch, I > > I don't understand. My patch sends a prefetch-query as soon as all the > tuples of the last remote-request is stored into FDW storage. The > reason for removing ExecAsyncNotify() was it is just redundant as far > as concerning Append asynchrony. But I particulary oppose to revive > the function. Sorry, my explanation was not good, but what I'm saying here is about my patch, not your patch. I think this FDW callback routine would be useful; it allows an FDW to perform another asynchronous data fetch before delivering a tuple to the core as discussed in [1]. Also, it would be useful when extending to the case where we have intermediate nodes between an Append and a ForeignScan such as joins or aggregates, which I'll explain below. > > only allowed the callback routine to put the corresponding ForeignScan > > node into a state where it’s either ready for a new request or needing > > a callback for another data fetch, but I think we could probably relax > > the restriction so that the ForeignScan node can be put into another > > state where it’s ready for a new request while needing a callback for > > the prefetch. > > I don't understand this, too. ExecAsyncNotify() doesn't touch any of > the bitmaps, as_needrequest, callback_pending nor as_asyncpending in > your patch. Am I looking into something wrong? I'm looking > async-wip-2020-11-17.patch. In the WIP patch I post, these bitmaps are modified in the core side based on the callback_pending and request_complete flags in AsyncRequests returned from FDWs (See ExecAppendAsyncEventWait()). > (By the way, it is one of those that make the code hard to read to me > that the "callback" means "calling an API function". I think none of > them (ExecAsyncBegin, ExecAsyncRequest, ExecAsyncNotify) are a > "callback".) I thought the word “callback” was OK, because these functions would call the corresponding FDW callback routines, but I’ll revise the wording. > > The reason why I disabled async execution when executing EPQ is to > > avoid sending asynchronous queries to the remote sides, which would be > > useless, because scan tuples for an EPQ recheck are obtained in a > > dedicated way. > > If EPQ is performed onto Append, I think it should gain from > asynchronous execution since it is going to fetch *a* tuple from > several partitions or children. I believe EPQ doesn't contain Append > in major cases, though. (Or I didn't come up with the steps for the > case to happen...) Sorry, I don’t understand this part. Could you elaborate a bit more on it? > > What do you mean by "push-up style executor"? > > The reverse of the volcano-style executor, which enters from the > topmost node and down to the bottom. In the "push-up stule executor", > the bottom-most nodes fires by a certain trigger then every > intermediate nodes throws up the result to the parent until reaching > the topmost node. That is what I'm thinking to be able to support the case I mentioned above. I think that that would allow us to find ready subplans efficiently from occurred wait events in ExecAppendAsyncEventWait(). Consider a plan like this: Append -> Nested Loop -> Foreign Scan on a -> Foreign Scan on b -> ... I assume here that Foreign Scan on a, Foreign Scan on b, and Nested Loop are all async-capable and that we have somewhere in the executor an AsyncRequest with requestor="Nested Loop" and requestee="Foreign Scan on a", an AsyncRequest with requestor="Nested Loop" and requestee="Foreign Scan on b", and an AsyncRequest with requestor="Append" and requestee="Nested Loop". In ExecAppendAsyncEventWait(), if a file descriptor for foreign table a becomes ready, we would call ForeignAsyncNotify() for a, and if it returns a tuple back to the requestor node (ie, Nested Loop) (using ExecAsyncResponse()), then *ForeignAsyncNotify() would be called for Nested Loop*. Nested Loop would then call ExecAsyncRequest() for the inner requestee node (ie, Foreign Scan on b; I assume here that it is a foreign scan parameterized by a). If Foreign Scan on b returns a tuple back to the requestor node (ie, Nested Loop) (using ExecAsyncResponse()), then Nested Loop would match the tuples from the outer and inner sides. If they match, the join result would be returned back to the requestor node (ie, Append) (using ExecAsyncResponse()), marking the Nested Loop subplan as as_needrequest. Otherwise, Nested Loop would call ExecAsyncRequest() for the inner requestee node for the next tuple, and so on. If ExecAsyncRequest() can't return a tuple immediately, we would wait until a file descriptor for foreign table b becomes ready; we would start from calling ForeignAsyncNotify() for b when the file descriptor becomes ready. In this way we could find ready subplans efficiently from occurred wait events in ExecAppendAsyncEventWait() when extending to the case where subplans are joins or aggregates over Foreign Scans, I think. Maybe I’m missing something, though. Thanks for the comments! Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK153oorYtTpW_-aZrjH-iecHbykX7qbxX_5630ZK8nqVHg%40mail.gmail.com
pgsql-hackers by date: