Re: allowing broader use of simplehash - Mailing list pgsql-hackers

From Robert Haas
Subject Re: allowing broader use of simplehash
Date
Msg-id CA+TgmoZLOE_hJ+OHWmQ906xuUFCF4+tc74-W1qDtrO0mJv=-Yg@mail.gmail.com
Whole thread Raw
In response to Re: allowing broader use of simplehash  (Andres Freund <andres@anarazel.de>)
Responses Re: allowing broader use of simplehash  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Dec 10, 2019 at 4:59 PM Andres Freund <andres@anarazel.de> wrote:
> 3) For lots of one-off uses of hashtables that aren't performance
>    critical, we want a *simple* API. That IMO would mean that key/value
>    end up being separately allocated pointers, and that just a
>    comparator is provided when creating the hashtable.

I think the simplicity of the API is a key point. Some things that are
bothersome about dynahash:

- It knows about memory contexts and insists on having its own.
- You can't just use a hash table in shared memory; you have to
"attach" to it first and have an object in backend-private memory.
- The usual way of getting a shared hash table is ShmemInitHash(), but
that means that the hash table has its own named chunk and that it's
in the main shared memory segment. If you want to put it inside
another chunk or put it in DSM or whatever, it doesn't work.
- It knows about LWLocks and if it's a shared table it needs its own
tranche of them.
- hash_search() is hard to wrap your head around.

One thing I dislike about simplehash is that the #define-based
interface is somewhat hard to use. It's not that it's a bad design.
It's just you have to sit down and think for a while to figure out
which things you need to #define in order to get it to do what you
want. I'm not sure that's something that can or needs to be fixed, but
it's something to consider. Even dynahash, as annoying as it is, is in
some ways easier to get up and running.

Probably the two most common uses cases are: (1) a fixed-sized shared
memory hash table of fixed-size entries where the key is the first N
bytes of the entry and it never grows, or (2) a backend-private or
perhaps frontend hash table of fixed-size entries where the key is the
first N bytes of the entry, and it grows without limit. I think should
consider having specialized APIs for those two cases and then more
general APIs that you can use when that's not enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Optimization of NestLoop join in the case of guaranteed empty inner subtree
Next
From: Tom Lane
Date:
Subject: Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes