Thread: Hash index with larger hashes

Hash index with larger hashes

From
Kenneth Marshall
Date:
Hello Developers,

I have been following the recent discussions on increasing the
size of the hash function used in Postgres and the work to
provide WAL and performance improvements for hash indexes. 
I know it was mentioned when we moved to the new hashing
functions, but the existing functions do provide an additional
32-bits of hash. We currently do not use them, but they are
already calculated.

It had occurred to me that one way to decrease the space used
to store the hash value would be to include information about
the page number to determine the actual value. For example,
a hash index of 65k pages (540mb) would get two additional
bytes of hash with no associated storage cost. Also, if you
subdivided the hash page into say 128 sub-pages you would
get the extra 2 bytes of hash in a 4mb index. In this way,
the bigger the hash index is, the more bits you get for free.

Just wanted to throw it out there.

Regards,
Ken



Re: Hash index with larger hashes

From
Robert Haas
Date:
On Fri, Aug 5, 2016 at 10:39 AM, Kenneth Marshall <ktm@rice.edu> wrote:
> I have been following the recent discussions on increasing the
> size of the hash function used in Postgres and the work to
> provide WAL and performance improvements for hash indexes.
> I know it was mentioned when we moved to the new hashing
> functions, but the existing functions do provide an additional
> 32-bits of hash. We currently do not use them, but they are
> already calculated.
>
> It had occurred to me that one way to decrease the space used
> to store the hash value would be to include information about
> the page number to determine the actual value. For example,
> a hash index of 65k pages (540mb) would get two additional
> bytes of hash with no associated storage cost. Also, if you
> subdivided the hash page into say 128 sub-pages you would
> get the extra 2 bytes of hash in a 4mb index. In this way,
> the bigger the hash index is, the more bits you get for free.
>
> Just wanted to throw it out there.

I'm not sure I understand exactly what you are proposing here.
Suppose we have 64 bits of hashcode and (1 << N) buckets.  Currently,
we store hashcode bits 0..31 on each item.  Maybe what you're saying
is that we could instead store bits B..(31+B) on each item - that is,
cram in as many extra bits on each individual item as log2(nbuckets).
The problem with that is that it would make it very hard to split
buckets.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company