Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: Use simplehash.h instead of dynahash in SMgr
Date
Msg-id d051e4b8f125f837ffff1d798e6268d7@postgrespro.ru
Whole thread Raw
In response to Re: Use simplehash.h instead of dynahash in SMgr  (Andres Freund <andres@anarazel.de>)
Responses Re: Use simplehash.h instead of dynahash in SMgr
List pgsql-hackers
Andres Freund писал 2021-04-26 21:46:
> Hi,
> 
> On 2021-04-25 01:27:24 +0300, Yura Sokolov wrote:
>> It is quite interesting result. Simplehash being open-addressing with
>> linear probing is friendly for cpu cache. I'd recommend to define
>> SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is
>> suitable most for such kind of hash table.
> 
> It's not a "plain" linear probing hash table (although it is on the 
> lookup
> side). During insertions collisions are reordered so that the average 
> distance
> from the "optimal" position is ~even ("robin hood hashing"). That 
> allows a
> higher load factor than a plain linear probed hash table (for which 
> IIRC
> there's data to show 0.75 to be a good default load factor).

Even for Robin Hood hashing 0.9 fill factor is too high. It leads to too
much movements on insertion/deletion and longer average collision chain.

Note that Robin Hood doesn't optimize average case. Indeed, usually 
Robin Hood
has worse (longer) average collision chain than simple linear probing.
Robin Hood hashing optimizes worst case, ie it guarantees there is no 
unnecessary
long collision chain.

See discussion on Rust hash table fill factor when it were Robin Hood:
https://github.com/rust-lang/rust/issues/38003

> 
> There of course may still be a benefit in lowering the load factor, but 
> I'd
> not start there.
> 
> David's test aren't really suited to benchmarking the load factor, but 
> to me
> the stats he showed didn't highlight a need to lower the load factor. 
> Lowering
> the fill factor does influence the cache hit ratio...
> 
> Greetings,
> 
> Andres Freund

regards,
Yura.



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: multi-version capable PostgresNode.pm
Next
From: Alvaro Herrera
Date:
Subject: Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY