Home > mailing lists

Re: Hash support for arrays - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	Re: Hash support for arrays
Date	November 2, 2010 16:34:40
Msg-id	4CD0217202000025000371A2@gw.wicourts.gov Whole thread Raw
In response to	Re: Hash support for arrays (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Hash support for arrays
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> The goal is to make those hard to predict, though.
>>
>> Really?  I think "I don't understand when this fails" isn't
>> obviously better than being able to predict when it fails ...
> 
> Isn't that the whole point of hash functions?  Collisions are
> inevitable, but you want them to be unpredictable.
Are you sure you're not confusing attributes of a good random number
generator with hash generation?  (They have much in common, but I've
only seen concern with "hard to predict" when working on random
number algorithms, as for financial audits or jury selection.)
> If you want a hash function with predictable collision behavior,
> just has the first element.  It'll be highly predictable AND
> wicked fast.
You're confusing "unpredictable" with "widely distributed", I think.
There's no reason that the hash value of an integer of the same size
as the hash shouldn't be the integer itself, for example.  It's hard
to get more predictable than that, yet it works well for hash
lookups.
I know that for a hash of a character string, the total = (total *
31) + nextcharacter has been shown to be effective.  That's hardly
random or hard to predict, but it does tend to spread out the hash
values well.  Whether it is as good for arrays, I don't know.  It
seems like a reasonable place to start, though, since a character
string is rather like an array of characters.
>> What concerns me about that is that it tends to push the bits to
>> the left --- I think the effects of the earlier inputs are going
>> to overflow out as you incorporate a lot of newer inputs.  What
>> you want is a scheme where every input item affects all the bits
>> of the final result about equally, and my gut feeling is this
>> doesn't provide that.
> 
> I don't think so.  That would be a problem if you multiplied by an
> even number.  You can see that you don't get dups if you just add
> in the same value over and over:
Yeah, that 1 in the low order position ensures that the impact of
any value which is ever added into the total is never entirely lost.
-Kevin

pgsql-hackers by date:

From: Robert Haas
Date: 02 November 2010, 16:04:13
Subject: Re: Hash support for arrays

From: Robert Haas
Date: 02 November 2010, 17:01:09
Subject: Re: ALTER TYPE recursion to typed tables

Re: Hash support for arrays - Mailing list pgsql-hackers

Previous

Next