Home > mailing lists

Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date	September 13, 2006 00:21:19
Msg-id	3850.1158106859@sss.pgh.pa.us Whole thread Raw
In response to	-HEAD planner issue wrt hash_joins on dbt3 ? (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses	Re: -HEAD planner issue wrt hash_joins on dbt3 ? (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List	pgsql-hackers

Tree view

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> Could we see the actual EXPLAIN ANALYZE results for the slow plan?

> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt

Well, indeed it seems that the hash join is just an innocent bystander:
the bulk of the runtime (all but about 120 sec in fact) is spent here:
                    ->  Nested Loop  (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713
rows=11897899loops=1)                          ->  Merge Join  (cost=0.00..10248.66 rows=20000 width=41) (actual
time=16.654..2578.060rows=19837 loops=1)                              ...                          ->  Bitmap Heap Scan
onlineitem  (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837)
            Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey)                                ->  Bitmap Index
Scanon i_l_suppkey  (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 loops=19837)
                      Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)
 

I suppose that the profile result you showed was taken during the
startup transient where it was computing the hashtables that this loop's
results are joined to ... but that's not where the problem is.  The
problem is repeating that bitmap scan on lineitem for nearly 20000
different l_suppkeys.

Apparently we've made the planner a bit too optimistic about the savings
that can be expected from repeated indexscans occurring on the inside of
a join.  The other plan uses a different join order and doesn't try to
join lineitem until it's got orders.o_orderkey, whereupon it does a
mergejoin against an indexscan on lineitem:
                    ->  Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16)
(actualtime=32.868..123668.380 rows=59991868 loops=1)
 

The runtimes for the remainders of the plans are roughly comparable, so
it's the cost of joining lineitem that is hurting here.

Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
problem could be overestimating the cost of this indexscan.

What are the physical sizes of lineitem and its indexes, and how do
those compare to your RAM?  What are you using for planner settings
(particularly effective_cache_size)?
        regards, tom lane

pgsql-hackers by date:

From: "Joshua D. Drake"
Date: 13 September 2006, 00:13:59
Subject: Re: UUID datatype progress

From: Tatsuo Ishii
Date: 13 September 2006, 00:39:46
Subject: Re: pgbench is badly broken since July

Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

Previous

Next