Re: allowing broader use of simplehash - Mailing list pgsql-hackers

From Andres Freund
Subject Re: allowing broader use of simplehash
Date
Msg-id 20191212195140.xmfdweada7nxj6uq@alap3.anarazel.de
Whole thread Raw
In response to Re: allowing broader use of simplehash  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2019-12-11 10:50:16 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2019 at 4:59 PM Andres Freund <andres@anarazel.de> wrote:
> > 3) For lots of one-off uses of hashtables that aren't performance
> >    critical, we want a *simple* API. That IMO would mean that key/value
> >    end up being separately allocated pointers, and that just a
> >    comparator is provided when creating the hashtable.
> 
> I think the simplicity of the API is a key point. Some things that are
> bothersome about dynahash:
> 
> - It knows about memory contexts and insists on having its own.

Which is a waste, in a good number of cases.


> - You can't just use a hash table in shared memory; you have to
> "attach" to it first and have an object in backend-private memory.

I'm not quite sure there's all that good an alternative to this,
tbh. For efficiency it's useful to have backend-local state, I
think. And I don't really see how to have that without needing to attach.


> - The usual way of getting a shared hash table is ShmemInitHash(), but
> that means that the hash table has its own named chunk and that it's
> in the main shared memory segment. If you want to put it inside
> another chunk or put it in DSM or whatever, it doesn't work.

I don't think it's quite realistic for the same implementation - although
the code could partially be shared and just specialized for both cases -
to be used for DSM and "normal" shared memory. That's however not an
excuse to have drastically different interfaces for both.



> - It knows about LWLocks and if it's a shared table it needs its own
> tranche of them.
> - hash_search() is hard to wrap your head around.
>

> One thing I dislike about simplehash is that the #define-based
> interface is somewhat hard to use. It's not that it's a bad design.

I agree. It's the best I could come up taking the limitations of C into
account, when focusing on speed and type safety.  I really think this
type of hack is a stopgap measure, and we ought to upgrade to a subset
of C++.


> It's just you have to sit down and think for a while to figure out
> which things you need to #define in order to get it to do what you
> want. I'm not sure that's something that can or needs to be fixed, but
> it's something to consider. Even dynahash, as annoying as it is, is in
> some ways easier to get up and running.

I have been wondering about providing one simplehash wrapper in a
central place that uses simplehash to store a {key*, value*}, and has a
creation interface that just accepts a comparator. Plus a few wrapper
creation functions for specific types (e.g. string, oid, int64).  While
we'd not want to use that for really performance critical paths, for 80%
of the cases it'd be sufficient.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: allowing broader use of simplehash
Next
From: Andres Freund
Date:
Subject: Re: global / super barriers (for checksums)