Re: Asynchronous MergeAppend - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Asynchronous MergeAppend
Date
Msg-id 764dd8b8-6374-4f5a-aac7-d8e3f6ebe5fd@postgrespro.ru
Whole thread Raw
In response to Asynchronous MergeAppend  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
List pgsql-hackers
Hi! Thank you for your work on this subject! I think this is a very 
useful optimization)

While looking through your code, I noticed some points that I think 
should be taken into account. Firstly, I noticed only two tests to 
verify the functionality of this function and I think that this is not 
enough.
Are you thinking about adding some tests with queries involving, for 
example, join connections with different tables and unusual operators?

In addition, I have a question about testing your feature on a 
benchmark. Are you going to do this?

On 17.07.2024 16:24, Alexander Pyhalov wrote:
> Hello.
>
> I'd like to make MergeAppend node Async-capable like Append node. 
> Nowadays when planner chooses MergeAppend plan, asynchronous execution 
> is not possible. With attached patches you can see plans like
>
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
>                                                           QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------

>
>  Merge Append
>    Sort Key: async_pt.b, async_pt.a
>    ->  Async Foreign Scan on public.async_p1 async_pt_1
>          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>    ->  Async Foreign Scan on public.async_p2 async_pt_2
>          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
>          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 
> 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
>
> This can be quite profitable (in our test cases you can gain up to two 
> times better speed with MergeAppend async execution on remote servers).
>
> Code for asynchronous execution in Merge Append was mostly borrowed 
> from Append node.
>
> What significantly differs - in ExecMergeAppendAsyncGetNext() you must 
> return tuple from the specified slot.
> Subplan number determines tuple slot where data should be retrieved 
> to. When subplan is ready to provide some data,
> it's cached in ms_asyncresults. When we get tuple for subplan, 
> specified in ExecMergeAppendAsyncGetNext(),
> ExecMergeAppendAsyncRequest() returns true and loop in 
> ExecMergeAppendAsyncGetNext() ends. We can fetch data for
> subplans which either don't have cached result ready or have already 
> returned them to the upper node. This
> flag is stored in ms_has_asyncresults. As we can get data for some 
> subplan either earlier or after loop in ExecMergeAppendAsyncRequest(),
> we check this flag twice in this function.
> Unlike ExecAppendAsyncEventWait(), it seems 
> ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no 
> need to get result
> from synchronous subplan if a tuple form async one was explicitly 
> requested.
>
> Also we had to fix postgres_fdw to avoid directly looking at Append 
> fields. Perhaps, accesors to Append fields look strange, but allows
> to avoid some code duplication. I suppose, duplication could be even 
> less if we reworked async Append implementation, but so far I haven't
> tried to do this to avoid big diff from master.
>
> Also mark_async_capable() believes that path corresponds to plan. This 
> can be not true when create_[merge_]append_plan() inserts sort node.
> In this case mark_async_capable() can treat Sort plan node as some 
> other and crash, so there's a small fix for this.

I think you should add this explanation to the commit message because 
without it it's hard to understand the full picture of how your code works.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Andrei Zubkov
Date:
Subject: Re: Vacuum statistics
Next
From: Kirill Reshke
Date:
Subject: Re: Vacuum statistics