Re: Asynchronous Append on postgres_fdw nodes. - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Asynchronous Append on postgres_fdw nodes. |
Date | |
Msg-id | CAPmGK17uiUOACYwVxre-qmjYeurhEPEEwTd4Rm4v-pXHRL8KvA@mail.gmail.com Whole thread Raw |
In response to | Re: Asynchronous Append on postgres_fdw nodes. (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
List | pgsql-hackers |
On Fri, Jan 15, 2021 at 4:54 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > Mmm. I meant that the function explicitly calls > ExecAppendAsyncRequest(), which finally calls fetch_more_data_begin() > (if needed). Conversely if the function dosn't call > ExecAppendAsyncRequsest, the next request to remote doesn't > happen. That is, after the tuple buffer of FDW-side is exhausted, the > next request doesn't happen until executor requests for the next > tuple. You seem to be saying that "postgresForeignAsyncRequest() calls > fetch_more_data_begin() following its own decision." but this doesn't > seem to be "prefetching". Let me explain a bit more. Actually, the new version of the patch allows prefetching in the FDW side; for such prefetching in postgres_fdw, I think we could add a fetch_more_data_begin() call in postgresForeignAsyncNotify(). But I left that for future work, because we don’t know yet if that’s really useful. (Another reason why I left that is we have more important issues that should be addressed [1], and I think addressing those issues is a requirement for us to commit this patch, but adding such prefetching isn’t, IMO.) > Sorry. I think I misread you here. I agree that, the notify API is not > so useful now but would be useful if we allow notify descendents other > than immediate children. However, I stumbled on the fact that some > kinds of node doesn't return a result when all the underlying nodes > returned *a* tuple. Concretely count(*) doesn't return after *all* > tuple of the counted relation has been returned. I remember that the > fact might be the reason why I removed the API. After all the topmost > async-aware node must ask every immediate child if it can return a > tuple. The patch I posted, which revived Robert’s original patch using stuff from your patch and Thomas’, provides ExecAsyncRequest() as well as ExecAsyncNotify(), which supports pull-based execution like ExecProcNode() (while ExecAsyncNotify() supports push-based execution.) In the aggregate case you mentioned, I think we could iterate calling ExecAsyncRequest() for the underlying subplan to get all tuples from it, in a similar way to ExecProcNode() in the normal case. > EPQ retrieves a specific tuple from a node. If we perform EPQ on an > Append, only one of the children should offer a result tuple. Since > Append has no idea of which of its children will offer a result, it > has no way other than asking all children until it receives a > result. If we do that, asynchronously sending a query to all nodes > would win. Thanks for the explanation! But I’m still not sure why we need to send an asynchronous query to each of the asynchronous nodes in an EPQ recheck. Is it possible to explain a bit more about that? I wrote: > > That is what I'm thinking to be able to support the case I mentioned > > above. I think that that would allow us to find ready subplans > > efficiently from occurred wait events in ExecAppendAsyncEventWait(). > > Consider a plan like this: > > > > Append > > -> Nested Loop > > -> Foreign Scan on a > > -> Foreign Scan on b > > -> ... > > > > I assume here that Foreign Scan on a, Foreign Scan on b, and Nested > > Loop are all async-capable and that we have somewhere in the executor > > an AsyncRequest with requestor="Nested Loop" and requestee="Foreign > > Scan on a", an AsyncRequest with requestor="Nested Loop" and > > requestee="Foreign Scan on b", and an AsyncRequest with > > requestor="Append" and requestee="Nested Loop". In > > ExecAppendAsyncEventWait(), if a file descriptor for foreign table a > > becomes ready, we would call ForeignAsyncNotify() for a, and if it > > returns a tuple back to the requestor node (ie, Nested Loop) (using > > ExecAsyncResponse()), then *ForeignAsyncNotify() would be called for > > Nested Loop*. Nested Loop would then call ExecAsyncRequest() for the > > inner requestee node (ie, Foreign Scan on b; I assume here that it is > > a foreign scan parameterized by a). If Foreign Scan on b returns a > > tuple back to the requestor node (ie, Nested Loop) (using > > ExecAsyncResponse()), then Nested Loop would match the tuples from the > > outer and inner sides. If they match, the join result would be > > returned back to the requestor node (ie, Append) (using > > ExecAsyncResponse()), marking the Nested Loop subplan as > > as_needrequest. Otherwise, Nested Loop would call ExecAsyncRequest() > > for the inner requestee node for the next tuple, and so on. If > > ExecAsyncRequest() can't return a tuple immediately, we would wait > > until a file descriptor for foreign table b becomes ready; we would > > start from calling ForeignAsyncNotify() for b when the file descriptor > > becomes ready. In this way we could find ready subplans efficiently > > from occurred wait events in ExecAppendAsyncEventWait() when extending > > to the case where subplans are joins or aggregates over Foreign Scans, > > I think. Maybe I’m missing something, though. > Maybe so. As I mentioned above, in the follwoing case.. > > Join -1 > Join -2 > ForegnScan -A > ForegnScan -B > ForegnScan -C > > Where the Join-1 is the leader of asynchronous fetching. Even if both > of the FS-A,B have returned one tuple each, it's unsure that Join-2 > returns a tuple. I'm not sure how to resolve the situation with the > current infrastructure as-is. Maybe my explanation was not good, so let me explain a bit more. Assume that Join-2 is a nested loop join as shown above. If the tuples from the outer/inner sides didn’t match, we could iterate calling *ExecAsyncRequest()* for the inner side until a matched tuple from it is found. If the inner side wasn’t able to return a tuple immediately, 1) it would return request_complete=false to Join-2 using ExecAsyncResponse(), and 2) we could wait for a file descriptor for the inner side to become ready (while processing other part of the Append tree), and 3) when the file descriptor becomes ready, recursive ExecAsyncNotify() calls would restart the Join-2 processing in a push-based manner as explained above. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CAPmGK14xrGe%2BXks7%2BfVLBoUUbKwcDkT9km1oFXhdY%2BFFhbMjUg%40mail.gmail.com
pgsql-hackers by date: