Home > mailing lists

Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: [HACKERS] Hash Functions
Date	August 4, 2017 01:08:23
Msg-id	20170803220823.63iknkhmmqzdgi5f@alap3.anarazel.de Whole thread
In response to	Re: [HACKERS] Hash Functions (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Hash Functions
List	pgsql-hackers

Tree view

On 2017-08-03 17:57:37 -0400, Robert Haas wrote:
> On Thu, Aug 3, 2017 at 5:50 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-08-03 17:43:44 -0400, Robert Haas wrote:
> >> For me, the basic point here is that we need a set of hash functions
> >> for hash partitioning that are different than what we use for hash
> >> indexes and hash joins -- otherwise when we hash partition a table and
> >> create hash indexes on each partition, those indexes will have nasty
> >> clustering.  Partitionwise hash joins will have similar problems.  So,
> >> a new set of hash functions specifically for hash partitioning is
> >> quite desirable.
> >
> > Couldn't that just as well solved by being a bit smarter with an IV? I
> > doubt we want to end up with different hashfunctions for sharding,
> > partitioning, hashjoins (which seems to form a hierarchy). Having a
> > working hash-combine function, or even better a hash API that can
> > continue to use the hash's internal state, seems a more scalable
> > solution.
> 
> That's another way to go, but it requires inventing a way to thread
> the IV through the hash opclass interface.

Only if we really want to do it really well :P. Using a hash_combine()
like

/** Combine two hash values, resulting in another hash value, with decent bit* mixing.** Similar to boost's
hash_combine().*/
static inline uint32
hash_combine(uint32 a, uint32 b)
{a ^= b + 0x9e3779b9 + (a << 6) + (a >> 2);return a;
}

between hash(IV) and the hashfunction should do the trick (the IV needs
to hashed once, otherwise the bit mix is bad).


> That's actually sort of a
> problem anyway.  Maybe I ought to have started with the question of
> how we're going to make that end of things work.

+1 one for that plan.


> We could:
> 
> - Invent a new hash_partition AM that doesn't really make indexes but
> supplies hash functions for hash partitioning.
> - Add a new, optional support function 2 to the hash AM that takes a
> value of the type *and* an IV as an argument.
> - Something else.

Not arguing for it, but one option could also have pg_type.hash*
function(s).

One thing that I think might be advisable to think about is that we're
atm stuck with a relatively bad hash function for hash indexes (and hash
joins/aggs), and we should probably evolve it at some point. At the same
time there's currently people out there relying on the current hash
functions remaining stable.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Robert Haas
Date: 04 August 2017, 00:57:37
Subject: Re: [HACKERS] Hash Functions

From: Mark Rofail
Date: 04 August 2017, 01:22:40
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays

Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

Previous

Next