Andres Freund писал 2021-04-26 21:46:
> Hi,
>
> On 2021-04-25 01:27:24 +0300, Yura Sokolov wrote:
>> It is quite interesting result. Simplehash being open-addressing with
>> linear probing is friendly for cpu cache. I'd recommend to define
>> SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is
>> suitable most for such kind of hash table.
>
> It's not a "plain" linear probing hash table (although it is on the
> lookup
> side). During insertions collisions are reordered so that the average
> distance
> from the "optimal" position is ~even ("robin hood hashing"). That
> allows a
> higher load factor than a plain linear probed hash table (for which
> IIRC
> there's data to show 0.75 to be a good default load factor).
Even for Robin Hood hashing 0.9 fill factor is too high. It leads to too
much movements on insertion/deletion and longer average collision chain.
Note that Robin Hood doesn't optimize average case. Indeed, usually
Robin Hood
has worse (longer) average collision chain than simple linear probing.
Robin Hood hashing optimizes worst case, ie it guarantees there is no
unnecessary
long collision chain.
See discussion on Rust hash table fill factor when it were Robin Hood:
https://github.com/rust-lang/rust/issues/38003
>
> There of course may still be a benefit in lowering the load factor, but
> I'd
> not start there.
>
> David's test aren't really suited to benchmarking the load factor, but
> to me
> the stats he showed didn't highlight a need to lower the load factor.
> Lowering
> the fill factor does influence the cache hit ratio...
>
> Greetings,
>
> Andres Freund
regards,
Yura.