Re: Asynchronous MergeAppend - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: Asynchronous MergeAppend |
Date | |
Msg-id | 764dd8b8-6374-4f5a-aac7-d8e3f6ebe5fd@postgrespro.ru Whole thread Raw |
In response to | Asynchronous MergeAppend (Alexander Pyhalov <a.pyhalov@postgrespro.ru>) |
List | pgsql-hackers |
Hi! Thank you for your work on this subject! I think this is a very useful optimization) While looking through your code, I noticed some points that I think should be taken into account. Firstly, I noticed only two tests to verify the functionality of this function and I think that this is not enough. Are you thinking about adding some tests with queries involving, for example, join connections with different tables and unusual operators? In addition, I have a question about testing your feature on a benchmark. Are you going to do this? On 17.07.2024 16:24, Alexander Pyhalov wrote: > Hello. > > I'd like to make MergeAppend node Async-capable like Append node. > Nowadays when planner chooses MergeAppend plan, asynchronous execution > is not possible. With attached patches you can see plans like > > EXPLAIN (VERBOSE, COSTS OFF) > SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------ > > Merge Append > Sort Key: async_pt.b, async_pt.a > -> Async Foreign Scan on public.async_p1 async_pt_1 > Output: async_pt_1.a, async_pt_1.b, async_pt_1.c > Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % > 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST > -> Async Foreign Scan on public.async_p2 async_pt_2 > Output: async_pt_2.a, async_pt_2.b, async_pt_2.c > Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % > 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST > > This can be quite profitable (in our test cases you can gain up to two > times better speed with MergeAppend async execution on remote servers). > > Code for asynchronous execution in Merge Append was mostly borrowed > from Append node. > > What significantly differs - in ExecMergeAppendAsyncGetNext() you must > return tuple from the specified slot. > Subplan number determines tuple slot where data should be retrieved > to. When subplan is ready to provide some data, > it's cached in ms_asyncresults. When we get tuple for subplan, > specified in ExecMergeAppendAsyncGetNext(), > ExecMergeAppendAsyncRequest() returns true and loop in > ExecMergeAppendAsyncGetNext() ends. We can fetch data for > subplans which either don't have cached result ready or have already > returned them to the upper node. This > flag is stored in ms_has_asyncresults. As we can get data for some > subplan either earlier or after loop in ExecMergeAppendAsyncRequest(), > we check this flag twice in this function. > Unlike ExecAppendAsyncEventWait(), it seems > ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no > need to get result > from synchronous subplan if a tuple form async one was explicitly > requested. > > Also we had to fix postgres_fdw to avoid directly looking at Append > fields. Perhaps, accesors to Append fields look strange, but allows > to avoid some code duplication. I suppose, duplication could be even > less if we reworked async Append implementation, but so far I haven't > tried to do this to avoid big diff from master. > > Also mark_async_capable() believes that path corresponds to plan. This > can be not true when create_[merge_]append_plan() inserts sort node. > In this case mark_async_capable() can treat Sort plan node as some > other and crash, so there's a small fix for this. I think you should add this explanation to the commit message because without it it's hard to understand the full picture of how your code works. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: