Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date
Msg-id 4507B4C2.9060208@kaltenbrunner.cc
Whole thread Raw
In response to Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> Tom Lane wrote:
>>> Could we see the actual EXPLAIN ANALYZE results for the slow plan?
> 
>> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt
> 
> Well, indeed it seems that the hash join is just an innocent bystander:
> the bulk of the runtime (all but about 120 sec in fact) is spent here:
> 
>                      ->  Nested Loop  (cost=13.65..1719683.85 rows=12000672 width=49) (actual
time=60.325..24923860.713rows=11897899 loops=1)
 
>                            ->  Merge Join  (cost=0.00..10248.66 rows=20000 width=41) (actual time=16.654..2578.060
rows=19837loops=1)
 
>                                ...
>                            ->  Bitmap Heap Scan on lineitem  (cost=13.65..77.16 rows=665 width=16) (actual
time=13.492..1254.535rows=600 loops=19837)
 
>                                  Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey)
>                                  ->  Bitmap Index Scan on i_l_suppkey  (cost=0.00..13.65 rows=665 width=0) (actual
time=10.662..10.662rows=600 loops=19837)
 
>                                        Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)
> 
> I suppose that the profile result you showed was taken during the
> startup transient where it was computing the hashtables that this loop's
> results are joined to ... but that's not where the problem is.  The
> problem is repeating that bitmap scan on lineitem for nearly 20000
> different l_suppkeys.

possible - I actually took them over a longer period of time

> 
> Apparently we've made the planner a bit too optimistic about the savings
> that can be expected from repeated indexscans occurring on the inside of
> a join.  The other plan uses a different join order and doesn't try to
> join lineitem until it's got orders.o_orderkey, whereupon it does a
> mergejoin against an indexscan on lineitem:
> 
>                      ->  Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16)
(actualtime=32.868..123668.380 rows=59991868 loops=1)
 
> 
> The runtimes for the remainders of the plans are roughly comparable, so
> it's the cost of joining lineitem that is hurting here.
> 
> Is lineitem sorted (or nearly sorted) by l_orderkey?  Part of the
> problem could be overestimating the cost of this indexscan.
> 
> What are the physical sizes of lineitem and its indexes, and how do
> those compare to your RAM?  What are you using for planner settings
> (particularly effective_cache_size)?

ouch - you are right(as usual) here.
effective_cache_size was set to 10GB(my fault for copying over the conf
from a 16GB box) during the run - lowering it just a few megabytes(!) or
to a more realistic 6GB results in the following MUCH better plan:

http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt

as for the relation sizes:

dbt3=# select pg_relation_size('lineitem'); pg_relation_size
------------------      10832764928
(1 row)

dbt3=# select pg_total_relation_size('lineitem'); pg_total_relation_size
------------------------            22960259072
(1 row)

there are nine btree indexes on lineitem all between 1,1GB and 1,4GB in 
size.


Stefan


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Simplifying "standby mode"
Next
From: Tom Dunstan
Date:
Subject: Re: Getting a move on for 8.2 beta