Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Hash Functions
Date
Msg-id 13475.1494698235@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, May 13, 2017 at 12:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Can we think of defining separate portable hash functions which can be
>> used for the purpose of hash partitioning?

> I think that would be a good idea.  I think it shouldn't even be that
> hard.  By data type:

> - Integers.  We'd need to make sure that we get the same results for
> the same value on big-endian and little-endian hardware, and that
> performance is good on both systems.  That seems doable.

> - Floats.  There may be different representations in use on different
> hardware, which could be a problem.  Tom didn't answer my question
> about whether any even-vaguely-modern hardware is still using non-IEEE
> floats, which I suspect means that the answer is "no".  If every bit
> of hardware we are likely to find uses basically the same
> representation of the same float value, then this shouldn't be hard.
> (Also, even if this turns out to be hard for floats, using a float as
> a partitioning key would be a surprising choice because the default
> output representation isn't even unambiguous; you need
> extra_float_digits for that.)

> - Strings.  There's basically only one representation for a string.
> If we assume that the hash code only needs to be portable across
> hardware and not across encodings, a position for which I already
> argued upthread, then I think this should be manageable.

Basically, this is simply saying that you're willing to ignore the
hard cases, which reduces the problem to one of documenting the
portability limitations.  You might as well not even bother with
worrying about the integer case, because porting between little-
and big-endian systems is surely far less common than cases you've
already said you're okay with blowing off.

That's not an unreasonable position to take, perhaps; doing better
than that is going to be a lot more work and it's not very clear
how much real-world benefit results.  But I can't follow the point
of worrying about endianness but not encoding.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: [HACKERS] Hash Functions
Next
From: Jeff Davis
Date:
Subject: Re: [HACKERS] Hash Functions