Re: Hash index with larger hashes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Hash index with larger hashes
Date
Msg-id CA+TgmoYhe2dKuDUxn80sfZLtJ1NwfKAwRnP0ZX1hQmShrSoVmg@mail.gmail.com
Whole thread Raw
In response to Hash index with larger hashes  (Kenneth Marshall <ktm@rice.edu>)
List pgsql-hackers
On Fri, Aug 5, 2016 at 10:39 AM, Kenneth Marshall <ktm@rice.edu> wrote:
> I have been following the recent discussions on increasing the
> size of the hash function used in Postgres and the work to
> provide WAL and performance improvements for hash indexes.
> I know it was mentioned when we moved to the new hashing
> functions, but the existing functions do provide an additional
> 32-bits of hash. We currently do not use them, but they are
> already calculated.
>
> It had occurred to me that one way to decrease the space used
> to store the hash value would be to include information about
> the page number to determine the actual value. For example,
> a hash index of 65k pages (540mb) would get two additional
> bytes of hash with no associated storage cost. Also, if you
> subdivided the hash page into say 128 sub-pages you would
> get the extra 2 bytes of hash in a 4mb index. In this way,
> the bigger the hash index is, the more bits you get for free.
>
> Just wanted to throw it out there.

I'm not sure I understand exactly what you are proposing here.
Suppose we have 64 bits of hashcode and (1 << N) buckets.  Currently,
we store hashcode bits 0..31 on each item.  Maybe what you're saying
is that we could instead store bits B..(31+B) on each item - that is,
cram in as many extra bits on each individual item as log2(nbuckets).
The problem with that is that it would make it very hard to split
buckets.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: truncate trigger for foreign data wrappers
Next
From: Tomas Vondra
Date:
Subject: Re: multivariate statistics (v19)