On Sunday, June 23, 2013, Simon Riggs wrote:
On 23 June 2013 03:16, Stephen Frost <sfrost@snowman.net> wrote:
> Will think on it more.
Some other thoughts related to this...
* Why are we building a special kind of hash table? Why don't we just
use the hash table code that we in every other place in the backend.
If that code is so bad why do we use it everywhere else? That is
extensible, so we could try just using that. (Has anyone actually
tried?)
I've not looked at the hash table in the rest of the backend.
* We're not thinking about cache locality and set correspondence
either. If the join is expected to hardly ever match, then we should
be using a bitmap as a bloom filter rather than assuming that a very
large hash table is easily accessible.
That's what I was suggesting earlier, though I don't think it's technically a bloom filter- doesn't that require multiple hash functions?I don't think we want to require every data type to provide multiple hash functions.
* The skew hash table will be hit frequently and would show good L2
cache usage. I think I'll try adding the skew table always to see if
that improves the speed of the hash join.
The skew tables is just for common values though... To be honest, I have some doubts about that structure really being a terribly good approach for anything which is completely in memory.
Thanks,
Stephen