On Fri, Aug 5, 2016 at 10:39 AM, Kenneth Marshall <ktm@rice.edu> wrote:
> I have been following the recent discussions on increasing the
> size of the hash function used in Postgres and the work to
> provide WAL and performance improvements for hash indexes.
> I know it was mentioned when we moved to the new hashing
> functions, but the existing functions do provide an additional
> 32-bits of hash. We currently do not use them, but they are
> already calculated.
>
> It had occurred to me that one way to decrease the space used
> to store the hash value would be to include information about
> the page number to determine the actual value. For example,
> a hash index of 65k pages (540mb) would get two additional
> bytes of hash with no associated storage cost. Also, if you
> subdivided the hash page into say 128 sub-pages you would
> get the extra 2 bytes of hash in a 4mb index. In this way,
> the bigger the hash index is, the more bits you get for free.
>
> Just wanted to throw it out there.
I'm not sure I understand exactly what you are proposing here.
Suppose we have 64 bits of hashcode and (1 << N) buckets. Currently,
we store hashcode bits 0..31 on each item. Maybe what you're saying
is that we could instead store bits B..(31+B) on each item - that is,
cram in as many extra bits on each individual item as log2(nbuckets).
The problem with that is that it would make it very hard to split
buckets.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company