Home > mailing lists

Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

From	Rafia Sabih
Subject	Re: TPC-H Q20 from 1 hour to 19 hours!
Date	March 31, 2017 17:53:43
Msg-id	CAOGQiiOkSxHzv0eAhTboKxLMa2=0Z28S83gScMcqkUjOt==rdQ@mail.gmail.com Whole thread Raw
In response to	Re: TPC-H Q20 from 1 hour to 19 hours! (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Mar 31, 2017 at 5:13 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Mar 30, 2017 at 8:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Mar 29, 2017 at 8:00 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> What is however strange is that changing max_parallel_workers_per_gather
>>> affects row estimates *above* the Gather node. That seems a bit, um,
>>> suspicious, no? See the parallel-estimates.log.
>>
>> Thanks for looking at this!  Comparing the parallel plan vs. the
>> non-parallel plan:
>>
>> part: parallel rows (after Gather) 20202, non-parallel rows 20202
>> partsupp: parallel rows 18, non-parallel rows 18
>> part-partsupp join: parallel rows 88988, non-parallel rows 355951
>> lineitem: parallel rows 59986112, non-parallel rows 59986112
>> lineitem after aggregation: parallel rows 5998611, non-parallel rows 5998611
>> final join: parallel rows 131, non-parallel rows 524
>>
>> I agree with you that that looks mighty suspicious.  Both the
>> part-partsupp join and the final join have exactly 4x as many
>> estimated rows in the non-parallel plan as in the parallel plan, and
>> it just so happens that the parallel divisor here will be 4.
>>
>> Hmm... it looks like the parallel_workers value from the Gather node
>> is being erroneously propagated up to the higher levels of the plan
>> tree.   Wow.   Somehow, Gather Merge managed to get the logic correct
>> here, but Gather is totally wrong.  Argh.   Attached is a draft patch,
>> which I haven't really tested beyond checking that it passes 'make
>> check'.
>>
>
> Your patch looks good to me.  I have verified some join cases as well
> where the behaviour is sane after patch.  I have also done testing
> with force_parallel_mode=regress (ran make check-world) and everything
> seems good.
>
> --

I tried checking the plan of Q20 with this patch, and got the following results,
with patch,
->  Merge Join  (cost=3025719.98..3499235.22 rows=6 width=16) (actual
time=176440.801..245903.143 rows=118124 loops=1)
                           Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
                           Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
and without patch,
->  Merge Join  (cost=3014830.12..3511637.54 rows=2 width=16) (actual
time=130994.237..208887.149 rows=118124 loops=1)
                                 Merge Cond: ((partsupp.ps_partkey =
lineitem.l_partkey) AND (partsupp.ps_suppkey = lineitem.l_suppkey))
                                 Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))

So, it looks like in the problematic area, it is not improving much.
Please find the attached file for the query plan of Q20 with and
without patch. I haven't evaluated the performance of this query with
patch.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment

q20_after_gather_fix_patch.txt

pgsql-hackers by date:

From: Simon Riggs
Date: 31 March 2017, 17:53:34
Subject: Re: Patch: Write Amplification Reduction Method (WARM)

From: Etsuro Fujita
Date: 31 March 2017, 18:50:57
Subject: Re: postgres_fdw bug in 9.6

Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

Attachment

Previous

Next