Re: how to allow integer overflow for calculating hash code of a string? - Mailing list pgsql-admin

From Craig James
Subject Re: how to allow integer overflow for calculating hash code of a string?
Date
Msg-id CAFwQ8rdvACJpQB=2y7UyUzxTq4XqA7mRQ3Nm4HxtFNFhFkeLrA@mail.gmail.com
Whole thread Raw
In response to Re: how to allow integer overflow for calculating hash code of a string?  (Haifeng Liu <liuhaifeng@live.com>)
Responses Re: how to allow integer overflow for calculating hash code of a string?  (Haifeng Liu <liuhaifeng@live.com>)
Re: how to allow integer overflow for calculating hash code of a string?  (Haifeng Liu <liuhaifeng@live.com>)
List pgsql-admin
On Thu, Sep 20, 2012 at 7:56 PM, Haifeng Liu <liuhaifeng@live.com> wrote:

On Sep 20, 2012, at 10:34 PM, Craig James <cjames@emolecules.com> wrote:



On Thu, Sep 20, 2012 at 1:55 AM, Haifeng Liu <liuhaifeng@live.com> wrote:
I want to write a hash function which acts as String.hashCode() in java: hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How can I avoid this? I saw java do not care overflow of int, it just make the result negative.


Use the bitwise AND operator to mask the hash value with 0x3FFFFFF before each iteration:

  hash = (hash & 67108863) * 31 + s.charAt(i);

Craig

Thank you, I believe your solution is OK for a hash function, but I am aiming to create a hash function that is consistent with the one applications use. I know postgresql 9.1 has a hash function called hashtext, but I don't know what algorithm it use,  and I also see that it's not recommended to relay on it. So I am trying to create a hash function which behaves exactly the same as java.lang.String.hashCode().  The later one may generate negative hash value. I guess when the number is overflowing, the part out of range will be ignored, and if the highest bit get 1, the hash value turn to negative value.

You are probably doing something where you want the application and the database to implement the exact same function, but if you stick to the Java built-in function, you will only have control over one implementation of that function.  What happens if someone working on Java changes the how the Java internals work?

A better solution would be to implement your own hash function in Postgres, and then once you know exactly how it will work, re-implement it in Java with your own code.  That's the only way you can ensure consistency between the two.

Craig

pgsql-admin by date:

Previous
From: Haifeng Liu
Date:
Subject: Re: how to allow integer overflow for calculating hash code of a string?
Next
From: Scott Marlowe
Date:
Subject: Re: Windows Services and Postgresql 9.1.3