Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> btw - the "hashjoin is bad" was more or less based on the observation
>> that nearly all of the cpu is burned in hash-related functions in the
>> profile (when profiling over a longer period of time those accumulate
>> even more % of the time than in the short profile I included in the
>> original report)
>
> [ shrug... ] Two out of the three functions you mentioned are not used
> by hash join, and anyway the other plan probably has a comparable
> execution density in sort-related functions; does that make it bad?
hmm sorry for that - I should have checked the source before I made that
assumption :-(
>
> It's possible that the large time for ExecScanHashBucket has something
> to do with skewed usage of the hash buckets due to an unfortunate data
> distribution, but that's theorizing far in advance of the data.
http://www.kaltenbrunner.cc/files/4/
has preliminary data of the dbt3/scaling 10 run I did which seems to
imply we have at least 4 queries in there that take an excessive amount
of time (query 5 is the one I started the complaint with).
However those results have to be taken with a graint of salt since there
is an appearant bug in the dbt3 code which seems to rely on
add_missing_from=on (as can be seen in some of the errorlogs of the
database) and towards the end of the throughput run I did some of the
explain analyzes for the report (those are the small 100% spikes in the
graph due to the box using the second CPU to run them).
I will redo those tests later this week though ...
Stefan