Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: A better way than tweaking NTUP_PER_BUCKET
Date
Msg-id CAOuzzgq0H2-CcMUy-xZwY0D-6od5JMdH8nvfR=mO_KJJveotKA@mail.gmail.com
Whole thread Raw
In response to Re: A better way than tweaking NTUP_PER_BUCKET  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sunday, June 23, 2013, Simon Riggs wrote:
On 23 June 2013 03:16, Stephen Frost <sfrost@snowman.net> wrote:

> Will think on it more.

Some other thoughts related to this...

* Why are we building a special kind of hash table? Why don't we just
use the hash table code that we in every other place in the backend.
If that code is so bad why do we use it everywhere else? That is
extensible, so we could try just using that. (Has anyone actually
tried?)

I've not looked at the hash table in the rest of the backend. 
 
* We're not thinking about cache locality and set correspondence
either. If the join is expected to hardly ever match, then we should
be using a bitmap as a bloom filter rather than assuming that a very
large hash table is easily accessible.

That's what I was suggesting earlier, though I don't think it's technically a bloom filter- doesn't that require multiple hash functions?I don't think we want to require every data type to provide multiple hash functions.   
 
* The skew hash table will be hit frequently and would show good L2
cache usage. I think I'll try adding the skew table always to see if
that improves the speed of the hash join.

The skew tables is just for common values though...   To be honest, I have some doubts about that structure really being a terribly good approach for anything which is completely in memory. 

Thanks,

Stephen 

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: A better way than tweaking NTUP_PER_BUCKET
Next
From: Dean Rasheed
Date:
Subject: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]