Thread: hash function in Postgres
hi,
I want to know what kind of hash function postgresql uses while joining. I was debugging through gdb, I found out that it is not using bob jenkins hash function but a different hash function hash_uint32() and hash_any() functions if the joining attribute is an integer, and a different kind of hash function for a different type of joining attribute.
I want to know whether the hash functions will change if the number of tuples in the table are very large or very low, and if it changes, please tell me what hash function it uses if the tuples are very large and what hash function it uses if the number is very low, also I came to know that the hash function will change depending on the type of the attribute on which the join takes place, but will it always remains the same for the integer type of joining attribute or will it it change.
thanks
On 01/23/2015 10:42 PM, Ravi Kiran wrote: > hi, > > > I want to know what kind of hash function postgresql uses while joining. > I was debugging through gdb, I found out that it is not using bob > jenkins hash function but a different hash function *hash_uint32() and > hash_any() *functions if the joining attribute is an integer, and a > different kind of hash function for a different type of joining attribute. > > I want to know whether the hash functions will change if the number of > tuples in the table are very large or very low, and if it changes, > please tell me what hash function it uses if the tuples are very large > and what hash function it uses if the number is very low, also I came to > know that the hash function will change depending on the type of the > attribute on which the join takes place, but will it always remains the > same for the integer type of joining attribute or will it it change. Seems a good starting point is here: http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883 In particular, hashfunc.c > > thanks > > -- Adrian Klaver adrian.klaver@aklaver.com
On 1/24/15 9:03 AM, Adrian Klaver wrote: > On 01/23/2015 10:42 PM, Ravi Kiran wrote: >> hi, >> >> >> I want to know what kind of hash function postgresql uses while joining. >> I was debugging through gdb, I found out that it is not using bob >> jenkins hash function but a different hash function *hash_uint32() and >> hash_any() *functions if the joining attribute is an integer, and a >> different kind of hash function for a different type of joining >> attribute. >> >> I want to know whether the hash functions will change if the number of >> tuples in the table are very large or very low, and if it changes, >> please tell me what hash function it uses if the tuples are very large >> and what hash function it uses if the number is very low, also I came to >> know that the hash function will change depending on the type of the >> attribute on which the join takes place, but will it always remains the >> same for the integer type of joining attribute or will it it change. > > Seems a good starting point is here: > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883 http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;hb=HEAD is even better, since that's HEAD. To answer your question, AFAIK there's no consideration made for the number of tuples, and I rather doubt that would help much. If you want to go mucking around with this I think it'd be more useful to look at using a different hash algorithm for types that could be significant in size (namely varchar, text and bytea). There are algorithms newer than Jenkins that look to be significantly faster on larger volumes of data. You might find https://github.com/decibel/hash_testing useful; it's way to test hash performance of several hashes. https://github.com/decibel/postgres/tree/xxhash is where I pulled xxhash into Postgres to try using it for hashing shared buffer identifiers. It didn't help there, but it might be faster than Jenkins for things like hashing text fields. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com