Home > mailing lists

Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber) - Mailing list pgsql-hackers

From	ktm@rice.edu
Subject	Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)
Date	December 20, 2014 02:37:21
Msg-id	20141219233707.GA19678@aart.rice.edu Whole thread Raw
In response to	Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber) (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses	Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber) (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote:
> On 12/18/14, 5:00 PM, Jim Nasby wrote:
> >2201582 20 -- Mostly LOCALLOCK and Shared Buffer
> 
> Started looking into this; perhaps https://code.google.com/p/fast-hash/ would be worth looking at, though it requires
uint64.
> 
> It also occurs to me that we're needlessly shoving a lot of 0's into the hash input by using RelFileNode and
ForkNumber.RelFileNode includes the tablespace Oid, which is pointless here because relid is unique per-database. We
alsohave very few forks and typically care about very few databases. If we crammed dbid and ForkNum together that gets
usdown to 12 bytes, which at minimum saves us the trip through the case logic. I suspect it also means we could
eliminateone of the mix() calls.
 
> 
> But I wonder if we could still do better, because we typically also won't have that many relations. Is there some
fastway we could combine dbid, forkNum and relid into a uint32? That gets us down to 8 bytes, which means we could use
fash-hash,or a stripped down mix().
 
> 
> Unfortunately I don't know much about hash algorithms, so I don't know how practical any of this actually is, or what
agood method for combining those fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ relid,
butif you got unlucky with your oid values you could end up with a lot of collissions from that.
 
> 
> I can put some effort into this, but I'd like some guidance.
> -- 
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 

Hi,

If we are going to consider changing the hash function, we should
consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
is double the speed of fast-hash according to the page running on a
3GHz x86_64. In addition, something like that could be used a checksum
instead of the current CRC32, although some work has already gone into
speeding it up, as is. Otherwise, it probably makes sense to just stick
with creating the fastpath 8-byte analogously to the 4-byte fastpath
that was just added. Is calculating the hash the bottle-neck?

Regards,
Ken

pgsql-hackers by date:

From: Tom Lane
Date: 20 December 2014, 02:13:43
Subject: Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)

From: Josh Berkus
Date: 20 December 2014, 02:42:39
Subject: Re: Commitfest problems

Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber) - Mailing list pgsql-hackers

Previous

Next