Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption - Mailing list pgsql-hackers

From ktm@rice.edu
Subject Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date
Msg-id 20131007125023.GQ16128@aart.rice.edu
Whole thread Raw
In response to Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote:
> > 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
> >    For fun, try not hashing those ints at all and see how that performs (that,
> >    I think, is what you get from HashSet<int> in Java/C#).
> 
> I've used crc32 mostly because it's easily available in the code (i.e.
> I'm lazy), but I've done some quick research / primitive benchmarking
> too. For example hashing 2e9 integers takes this much time:
> 
> FNV-1   = 11.9
> FNV-1a  = 11.9
> jenkins = 38.8
> crc32   = 32.0
> 
> So it's not really "slow" and the uniformity seems to be rather good.
> 
> I'll try FNV in the future, however I don't think that's the main issue
> right now.
> 
Hi Tomas,

If you are going to use a function that is not currently in the code,
please consider xxhash:

http://code.google.com/p/xxhash/

Here are some benchmarks for some of the faster hash functions:

Name            Speed       Q.Score   Author
xxHash          5.4 GB/s     10
MumurHash 3a    2.7 GB/s     10       Austin Appleby
SpookyHash      2.0 GB/s     10       Bob Jenkins
SBox            1.4 GB/s      9       Bret Mulvey
Lookup3         1.2 GB/s      9       Bob Jenkins
CityHash64      1.05 GB/s    10       Pike & Alakuijala
FNV             0.55 GB/s     5       Fowler, Noll, Vo
CRC32           0.43 GB/s     9
MD5-32          0.33 GB/s    10       Ronald L. Rivest
SHA1-32         0.28 GB/s    10

Regards,
Ken



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench progress report improvements - split 3 v2 - A
Next
From: Andres Freund
Date:
Subject: Re: logical changeset generation v6.1