Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date
Msg-id 451808FC.1030804@kaltenbrunner.cc
Whole thread Raw
In response to Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
[RESENT in a modified version since the original reply seems to have ben
lost by the listserver which seems to happen sometimes lately]

Tom Lane wrote:
> Matteo Beccati <php@beccati.com> writes:
>> Tom Lane ha scritto:
>>> Matteo Beccati <php@beccati.com> writes:
>>>> I cannot see anything bad by using something like that:
>>>> if (histogram is large/representative enough)
>>> Well, the question is exactly what is "large enough"?  I feel a bit
>>> uncomfortable about applying the idea to a histogram with only 10
>>> entries (especially if we ignore two of 'em).  With 100 or more,
>>> it sounds all right.  What's the breakpoint?
> 
>> Yes, I think 100-200 could be a good breakpoint.
> 
> I've committed this change with (for now) 100 as the minimum histogram
> size to use.  Stefan, are you interested in retrying your benchmark?

spent some time retesting that and I got the following results(this is
the same box as before but with a much slower disk-setup and a newly
initdb'd cluster):

http://www.kaltenbrunner.cc/files/analyze_q9_beta1.txt

all that is with 2GB of effective_cache_size(plan does not change with
much smaller settings btw) and a statistic target of 1000 and the
following parameters:

default planner settings for the EXPLAIN and the first EXPLAIN ANALYZE -
then the same query with disabled hash_joins, the next one is with
disabled nest_loops and the fastest one is with both nest_loop and
hash_joins disabled (all run in that order - so there are possible
caching effects).

in comparision to:

http://www.kaltenbrunner.cc/files/analyze_q9.txt

we nearly got a 7 figure speedup due to the latest changes(much better
estimates at least) - however the mergejoin+sort only plan is still faster.



the other troubling query is the following:

query: http://www.kaltenbrunner.cc/files/7/power1/db/plans/power_query21.txt

plans: http://www.kaltenbrunner.cc/files/analyze_q21_beta1.txt

(default,default,enable_nestloop=off,enable_nestloop=off and
enable_hashjoin=off)

despite having not-too bad estimates for most of the key-points in the
plan the actual runtime of the choosen plan is quite disappointing.




Stefan


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Release Notes: Major Changes in 8.2
Next
From: Tom Lane
Date:
Subject: Re: Increase default effective_cache_size?