Home > mailing lists

Why hash OIDs? - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Why hash OIDs?
Date	August 28, 2018 07:50:43
Msg-id	CAEepm=2Dme2Fg_fnnJ8h01PsfqXFvafjWPUJci25uq-byXfbeg@mail.gmail.com Whole thread Raw
Responses	Re: Why hash OIDs?
List	pgsql-hackers

Tree view

Hello,

I'm curious about something which may qualify as a stupid question.

What bad thing would happen if we used OIDs directly as hash values in
internal hash tables (that is, instead of uint32_hash() we'd use
uint32_identity(), or somehow optimise it away entirely, as you can
see in some C++ standard libraries for eg std::hash<int>)?  From what
I know of this subject, the main risk is that, in general, the
distribution of integer keys stored in a hash table might
*accidentally* happen to have regular gaps with a stride that shares a
common factor with the number of buckets leading to an unbalanced hash
table (memory addresses are an example because they tend to have a
stride corresponding to hardware alignment rules, as are some
distributed sequence generation tricks), whereas it probably takes a
non-accidental attack to get higher collision rates with a decent hash
function.  A good hash function would defend against that kind of
thing (but cost cycles to compute), and alternatively a prime number
of buckets would defend against it (but cost more cycles to %).
However, as far as I can see OIDs are expected to have an even
distribution (or at least we don't expect regular sized gaps), so the
hazard doesn't seem to apply.

One problem could be that, although collisions are not expected with
high probability when the hash table is big enough, when they do
occur, hash tables using linear probing-based collision strategies
(simplehash, but not dynahash) probably work less well if the chance
of non-empty buckets having non-empty neighbours (AKA clustering)
increases.  I'm not sure whether to consider OIDs to be clustered in
general or not (though obviously the ones created by initdb are).

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Daniel Wood
Date: 28 August 2018, 06:39:36
Subject: First steps to being a contributer

From: Andres Freund
Date: 28 August 2018, 08:12:14
Subject: Re: Why hash OIDs?

Why hash OIDs? - Mailing list pgsql-hackers

Previous

Next