Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: TPC-H Q20 from 1 hour to 19 hours!
Date
Msg-id f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
Whole thread Raw
In response to Re: TPC-H Q20 from 1 hour to 19 hours!  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: TPC-H Q20 from 1 hour to 19 hours!  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On 03/30/2017 12:14 AM, Tomas Vondra wrote:
>
> I've only ran the queries on 10GB data set, but that should be enough. 
> The plans are from current master - I'll rerun the script on an older 
> release later today.
> 

So, an plans from an older release (9.4) are attached. What seems to 
matter is the join between partsupp and part, which is estimated like 
this on 9.4:

    ->  Sort  (cost=172852.06..173741.94 rows=355951 width=12)
              (actual time=321.998..334.440 rows=86836 loops=1)
          Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
          Sort Method: quicksort  Memory: 7143kB
          ->  Nested Loop  (cost=0.43..140031.03 rows=355951 width=12)
                       (actual time=0.025..303.145 rows=86836 loops=1)

and like this on current master:

   ->  Sort  (cost=146617.86..146819.89 rows=80809 width=12)
             (actual time=562.513..575.599 rows=86836 loops=1)
          Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
          Sort Method: quicksort  Memory: 7143kB
          ->  Nested Loop  (cost=0.43..140031.03 rows=80809 width=12)
                      (actual time=0.054..536.003 rows=86836 loops=1)

which however seems clearly more accurate than 9.4. So perhaps there is 
some bug in computing the mergejoin estimate, but it's also possible 
correcting the estimate (lowering it) also has some impact.

But joining to the aggregated subquery also clearly plays some role, 
because without it the estimates are higher.

Another interesting observation is that only the foreign key between 
part/partsupp seems to matter - once it gets dropped, the estimates get 
back close to 9.4 values.

What is however strange is that changing max_parallel_workers_per_gather 
affects row estimates *above* the Gather node. That seems a bit, um, 
suspicious, no? See the parallel-estimates.log.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Next
From: Venkata B Nagothi
Date:
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?