Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers
From | Rafia Sabih |
---|---|
Subject | Re: [HACKERS] Parallel Append implementation |
Date | |
Msg-id | CAOGQiiOAn2SMRvSJTrNfh117M2tkP8P9tEuzySZh6_hEL_=1oA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Append implementation (Amit Khandekar <amitdkhan.pg@gmail.com>) |
List | pgsql-hackers |
On Wed, Aug 30, 2017 at 5:32 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > Hi Rafia, > > On 17 August 2017 at 14:12, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >> But for all of the cases here, partial >> subplans seem possible, and so even on HEAD it executed Partial >> Append. So between a Parallel Append having partial subplans and a >> Partial Append having partial subplans , the cost difference would not >> be significant. Even if we assume that Parallel Append was chosen >> because its cost turned out to be a bit cheaper, the actual >> performance gain seems quite large as compared to the expected cost >> difference. So it might be even possible that the performance gain >> might be due to some other reasons. I will investigate this, and the >> other queries. >> > > I ran all the queries that were showing performance benefits in your > run. But for me, the ParallelAppend benefits are shown only for plans > that use Partition-Wise-Join. > > For all the queries that use only PA plans but not PWJ plans, I got > the exact same plan for HEAD as for PA+PWJ patch, except that for the > later, the Append is a ParallelAppend. Whereas, for you, the plans > have join-order changed. > > Regarding actual costs; consequtively, for me the actual-cost are more > or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have > quite different costs naturally because the plans themselves are > different on head versus PA+PWJ. > > My PA+PWJ plan outputs (and actual costs) match exactly what you get > with PA+PWJ patch. But like I said, I get the same join order and same > plans (and actual costs) for HEAD as well (except > ParallelAppend=>Append). > > May be, if you have the latest HEAD code with your setup, you can > yourself check some of the queries again to see if they are still > seeing higher costs as compared to PA ? I suspect that some changes in > latest code might be causing this discrepancy; because when I tested > some of the explains with a HEAD-branch server running with your > database, I got results matching PA figures. > > Attached is my explain-analyze outputs. > Now, when I compare your results with the ones I posted I could see one major difference between them -- selectivity estimation errors. In the results I posted, e.g. Q3, on head it gives following -> Finalize GroupAggregate (cost=41131358.89..101076015.45 rows=455492628 width=44) (actual time=126436.642..129247.972 rows=226765 loops=1) Group Key: lineitem_001.l_orderkey, orders_001.o_orderdate, orders_001.o_shippriority -> Gather Merge (cost=41131358.89..90637642.73 rows=379577190 width=44) (actual time=126436.602..127791.768 rows=235461 loops=1) Workers Planned: 2 Workers Launched: 2 and in your results it is, -> Finalize GroupAggregate (cost=4940619.86..6652725.07 rows=13009521 width=44) (actual time=89573.830..91956.956 rows=226460 loops=1) Group Key: lineitem_001.l_orderkey, orders_001.o_orderdate, orders_001.o_shippriority -> Gather Merge (cost=4940619.86..6354590.21 rows=10841268 width=44) (actual time=89573.752..90747.393 rows=235465 loops=1) Workers Planned: 2 Workers Launched: 2 However, for the results with the patch/es this is not the case, in my results, with patch, -> Finalize GroupAggregate (cost=4933450.21..6631111.01 rows=12899766 width=44) (actual time=87250.039..90593.716 rows=226765 loops=1) Group Key: lineitem_001.l_orderkey, orders_001.o_orderdate, orders_001.o_shippriority -> Gather Merge (cost=4933450.21..6335491.38 rows=10749804 width=44) (actual time=87250.020..89125.279 rows=227291 loops=1) Workers Planned: 2 Workers Launched: 2 I think this explains the reason for drastic different in the plan choices and thus the performance for both the cases. Since I was using same database for the cases, I don't have much reasons for such difference in selectivity estimation for these queries. The only thing might be a missing vacuum analyse, but since I checked it a couple of times I am not sure if even that could be the reason. Additionally, it is not the case for all the queries, like in Q10 and Q21, the estimates are similar. However, on a fresh database the selectivity-estimates and plans as reported by you and with the patched version I posted seems to be the correct one. I'll see if I may check performance of these queries once again to verify these. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: