Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber) - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date
Msg-id 5495D871.5060509@BlueTreble.com
Whole thread Raw
In response to Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On 12/20/14, 11:51 AM, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
>>> What I am thinking is not using all of those fields in their raw form to calculate the hash value. IE: something
analogousto:
 
>>> hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)
>>>
>>> perhaps that actual code wouldn't work, but I don't see why we couldn't do something similar... am I missing
something?
>
>> I don't think that'd improve anything. Jenkin's hash does have a quite
>> mixing properties, I don't believe that the above would improve the
>> quality of the hash.
>
> I think what Jim is suggesting is to intentionally degrade the quality of
> the hash in order to let it be calculated a tad faster.  We could do that
> but I doubt it would be a win, especially in systems with lots of buffers.
> IIRC, when we put in Jenkins hashing to replace the older homebrew hash
> function, it improved performance even though the hash itself was slower.

Right. Now that you mention it, I vaguely recall the discussions about changing the hash function to reduce
collisions.

I'll still take a look at fash-hash, but it's looking like there may not be anything we can do here unless we change
howwe identify relation files (combining dbid, tablespace id, fork number and file id, at least for searching). If we
had64bit hash support then maybe that'd be a significant win, since you wouldn't need to hash at all. But that
certainlydoesn't seem to be low-hanging fruit to me...
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PATCH: decreasing memory needlessly consumed by array_agg
Next
From: Peter Geoghegan
Date:
Subject: Re: GiST kNN search queue (Re: KNN-GiST with recheck)