Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date
Msg-id 450D2907.3070307@kaltenbrunner.cc
Whole thread Raw
In response to Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Stefan Kaltenbrunner wrote:
> [already sent a variant of that yesterday but it doesn't look like it
> made it to the list]
> 
> Tom Lane wrote:
>> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>>> Tom Lane wrote:
>>>> Apparently we've made the planner a bit too optimistic about the savings
>>>> that can be expected from repeated indexscans occurring on the inside of
>>>> a join.
>>> effective_cache_size was set to 10GB(my fault for copying over the conf
>>> from a 16GB box) during the run - lowering it just a few megabytes(!) or
>>> to a more realistic 6GB results in the following MUCH better plan:
>>> http://www.kaltenbrunner.cc/files/dbt3_explain_analyze2.txt
>> Interesting.  It used to be that effective_cache_size wasn't all that
>> critical... what I think this report is showing is that with the 8.2
>> changes to try to account for caching effects in repeated indexscans,
>> we've turned that into a pretty significant parameter.
> 
> took me a while due to hardware issues on my testbox - but there are new
> results(with 6GB for effective_cache_size) up at:
> 
> http://www.kaltenbrunner.cc/files/5/
> 
> there are still a few issues with the validity of the run like the rf
> tests not actually being done right - but lowering effective_cache_size
> gave a dramtic speedup on Q5,Q7 and Q8.
> 
> that is the explain for the 4h+ Q9:
> 
> http://www.kaltenbrunner.cc/files/analyze_q9.txt
> 
> increasing the the statistic_target up to 1000 does not seem to change
> the plan btw.
> 
> disabling nested loop leads to the following (4 times faster) plan:
> 
> http://www.kaltenbrunner.cc/files/analyze_q9_no_nest.txt
> 
> since the hash-joins in there look rather slow (inappropriate hashtable
> set up due to the wrong estimates?) I disabled hash_joins too:
> 
> http://www.kaltenbrunner.cc/files/analyze_q9_no_nest_no_hashjoin.txt
> 
> and amazingly this plan is by far the fastest one in runtime (15min vs
> 4,5h ...) except that the planner thinks it is 20 times more expensive ...

some additional numbers(first one is with default settings, second is
with enable_nestloop = 'off', third one is with enable_nestloop = 'off'
and enable_hashjoin='off'):

http://www.kaltenbrunner.cc/files/analyze_q7.txt

here we have a 3x speedup with disabling nested loops and a 2x speedup
(over the original plan) with nested loops and hashjoins disabled.


http://www.kaltenbrunner.cc/files/analyze_q20.txt

here we have a 180x(!) speedup with both disabled planner options ...

it is worth mentioning that for both queries the estimated costs in
relation to each other looks quite reasonable as soon as enable_nestloop
= 'off' (ie 5042928 vs 10715247 with 344sec vs 514 for Q7 and 101441851
vs 101445468 with 10sec vs 11sec)


Stefan


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Reducing data type space usage
Next
From: "Magnus Hagander"
Date:
Subject: Re: [pgsql-www] Developer's Wiki