Re: Hash support for arrays - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hash support for arrays
Date
Msg-id 16429.1288734459@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hash support for arrays  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Hash support for arrays
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Nov 2, 2010, at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, this is largely beside the point, because that theory, as well
>> as the Java code you're arguing from, has to do with the initial hashing
>> of a raw sequence of input items.  Not with combining some existing hash
>> values.  The rotate-and-xor method I suggested for that is borrowed
>> exactly from section 6.4 of Knuth (page 512, in the first edition of
>> volume 3).

> It seems undesirable to me to have a situation where transposing two array elements doesn't change the hash value.
ButI just work here.
 

[ shrug... ]  There are always going to be collisions, and you're
overstating the importance of this one (only some transpositions will
result in a duplicate hash, not any transposition).

What's actually *important*, for our purposes, is that all bits of the
final hash value depend in a reasonably uniform way on all bits of the
input hash values.  If we don't have that property, then bucket numbers
(which we form by taking the low-order N bits of the final hash, where
N isn't known in advance) won't be as well distributed as we'd like.

It's possible that the multiply-by-31 method is as good as the
rotate-and-xor method by that measure, or even better; but it's far from
obvious that it's better.  And I'm not convinced that the multiply
method has a pedigree that should encourage me to take it on faith.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] V3: Idle in transaction cancellation
Next
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] V3: Idle in transaction cancellation