Home > mailing lists

Re: Why hash OIDs? - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Why hash OIDs?
Date	August 28, 2018 09:41:25
Msg-id	20180828034125.lenemn6gggvp6kfe@alap3.anarazel.de Whole thread Raw
In response to	Re: Why hash OIDs? (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: Why hash OIDs?
List	pgsql-hackers

Tree view

On 2018-08-28 14:45:49 +1200, Thomas Munro wrote:
> On Tue, Aug 28, 2018 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2018-08-28 13:50:43 +1200, Thomas Munro wrote:
> > >> What bad thing would happen if we used OIDs directly as hash values in
> > >> internal hash tables (that is, instead of uint32_hash() we'd use
> > >> uint32_identity(), or somehow optimise it away entirely, as you can
> > >> see in some C++ standard libraries for eg std::hash<int>)?
> >
> > > Oids are very much not equally distributed, so in all likelihood you'd
> > > get cases very you currently have a reasonably well averaged out usage
> > > of the hashtable, not be that anymore.
> >
> > Right.  In particular, most of our hash usages assume that all bits of
> > the hash value are equally "random", so that we can just mask off the
> > lowest N bits of the hash and not get values that are biased towards
> > particular hash buckets.  It's unlikely that raw OIDs would have that
> > property.
> 
> Yeah, it would be a terrible idea as a general hash function for use
> in contexts where the "avalanche effect" assumption is made about
> information being spread out over the bits (the HJ batching code
> wouldn't work for example).  I was wondering specifically about the
> limited case of hash tables that are used to look things up in caches.

I don't understand why that'd be ok there? With a simple 1:1 hash
function, which you seem to advocate, many hash-tables would be much
fuller in the 1000-3000 (where pg_class, etc all reside) than in any
other range.  A lot of our tables are populated on-demand, so you'd
often end up with most of the data in one or two buckets, and a larger
number largely empty.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Kyotaro HORIGUCHI
Date: 28 August 2018, 08:49:26
Subject: Re: Refactor textToQualifiedNameList()

From: Haribabu Kommi
Date: 28 August 2018, 09:48:22
Subject: Re: Pluggable Storage - Andres's take

Re: Why hash OIDs? - Mailing list pgsql-hackers

Previous

Next