Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date
Msg-id 3850.1158106859@sss.pgh.pa.us
Whole thread Raw
In response to -HEAD planner issue wrt hash_joins on dbt3 ?  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> Could we see the actual EXPLAIN ANALYZE results for the slow plan?

> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt

Well, indeed it seems that the hash join is just an innocent bystander:
the bulk of the runtime (all but about 120 sec in fact) is spent here:
                    ->  Nested Loop  (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713
rows=11897899loops=1)                          ->  Merge Join  (cost=0.00..10248.66 rows=20000 width=41) (actual
time=16.654..2578.060rows=19837 loops=1)                              ...                          ->  Bitmap Heap Scan
onlineitem  (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
            Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey)                                ->  Bitmap Index
Scanon i_l_suppkey  (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 loops=19837)
                      Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)
 

I suppose that the profile result you showed was taken during the
startup transient where it was computing the hashtables that this loop's
results are joined to ... but that's not where the problem is.  The
problem is repeating that bitmap scan on lineitem for nearly 20000
different l_suppkeys.

Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.  The other plan uses a different join order and doesn't try to
join lineitem until it's got orders.o_orderkey, whereupon it does a
mergejoin against an indexscan on lineitem:
                    ->  Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16)
(actualtime=32.868..123668.380 rows=59991868 loops=1)
 

The runtimes for the remainders of the plans are roughly comparable, so
it's the cost of joining lineitem that is hurting here.

Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
problem could be overestimating the cost of this indexscan.

What are the physical sizes of lineitem and its indexes, and how do
those compare to your RAM?  What are you using for planner settings
(particularly effective_cache_size)?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: UUID datatype progress
Next
From: Tatsuo Ishii
Date:
Subject: Re: pgbench is badly broken since July