Thread: hash function in Postgres

hash function in Postgres

From
Ravi Kiran
Date:
hi, 


I want to know what kind of hash function postgresql uses while joining. I was debugging through gdb, I found out that it is not using bob jenkins hash function but a different hash function hash_uint32() and hash_any() functions if the joining attribute is an integer, and a different kind of hash function for a different type of joining attribute.

I want to know whether the hash functions will change if the number of tuples in the table are very large or very low, and if it changes, please tell me what hash function it uses if the tuples are very large and what hash function it uses if the number is very low, also I came to know that the hash function will change depending on the type of the attribute on which the join takes place, but will it always remains the same for the integer type of joining attribute or will it it change.

thanks


Re: hash function in Postgres

From
Adrian Klaver
Date:
On 01/23/2015 10:42 PM, Ravi Kiran wrote:
> hi,
>
>
> I want to know what kind of hash function postgresql uses while joining.
> I was debugging through gdb, I found out that it is not using bob
> jenkins hash function but a different hash function *hash_uint32() and
> hash_any() *functions if the joining attribute is an integer, and a
> different kind of hash function for a different type of joining attribute.
>
> I want to know whether the hash functions will change if the number of
> tuples in the table are very large or very low, and if it changes,
> please tell me what hash function it uses if the tuples are very large
> and what hash function it uses if the number is very low, also I came to
> know that the hash function will change depending on the type of the
> attribute on which the join takes place, but will it always remains the
> same for the integer type of joining attribute or will it it change.

Seems a good starting point is here:


http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883

In particular, hashfunc.c

>
> thanks
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: hash function in Postgres

From
Jim Nasby
Date:
On 1/24/15 9:03 AM, Adrian Klaver wrote:
> On 01/23/2015 10:42 PM, Ravi Kiran wrote:
>> hi,
>>
>>
>> I want to know what kind of hash function postgresql uses while joining.
>> I was debugging through gdb, I found out that it is not using bob
>> jenkins hash function but a different hash function *hash_uint32() and
>> hash_any() *functions if the joining attribute is an integer, and a
>> different kind of hash function for a different type of joining
>> attribute.
>>
>> I want to know whether the hash functions will change if the number of
>> tuples in the table are very large or very low, and if it changes,
>> please tell me what hash function it uses if the tuples are very large
>> and what hash function it uses if the number is very low, also I came to
>> know that the hash function will change depending on the type of the
>> attribute on which the join takes place, but will it always remains the
>> same for the integer type of joining attribute or will it it change.
>
> Seems a good starting point is here:
>
>
http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883

http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;hb=HEAD
is even better, since that's HEAD.

To answer your question, AFAIK there's no consideration made for the
number of tuples, and I rather doubt that would help much. If you want
to go mucking around with this I think it'd be more useful to look at
using a different hash algorithm for types that could be significant in
size (namely varchar, text and bytea). There are algorithms newer than
Jenkins that look to be significantly faster on larger volumes of data.

You might find https://github.com/decibel/hash_testing useful; it's way
to test hash performance of several hashes.
https://github.com/decibel/postgres/tree/xxhash is where I pulled xxhash
into Postgres to try using it for hashing shared buffer identifiers. It
didn't help there, but it might be faster than Jenkins for things like
hashing text fields.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com