Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From David Rowley
Subject Re: Use simplehash.h instead of dynahash in SMgr
Date
Msg-id CAApHDvq+bv13-M5JZN5Swfmy5gVWwCXZw-6hP=R6Hi1RitK0hA@mail.gmail.com
Whole thread Raw
In response to Re: Use simplehash.h instead of dynahash in SMgr  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Use simplehash.h instead of dynahash in SMgr
List pgsql-hackers
On Tue, 22 Jun 2021 at 03:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I kind of wonder if we really need four different hash table
> implementations (this being the third "generic" one, plus hash join
> has its own, and I may have forgotten others).  Should we instead
> think about revising simplehash to gain the benefits of this patch?

hmm, yeah. I definitely agree with trying to have as much reusable
code as we can when we can. It certainly reduces maintenance and bugs
tend to be found more quickly too. It's a very worthy cause.

I did happen to think of this when I was copying swathes of code out
of simplehash.h.  However, I decided that the two implementations are
sufficiently different that if I tried to merge them both into one .h
file, we'd have some unreadable and unmaintainable mess.  I just don't
think their DNA is compatible enough for the two to be mated
successfully.  For example, simplehash uses open addressing and
generichash does not.  This means that things like iterating over the
table works completely differently.  Lookups in generichash need to
perform an extra step to fetch the actual data from the segment
arrays.  I think it would certainly be possible to merge the two, but
I just don't think it would be easy code to work on if we did that.

The good thing is that that the API between the two is very similar
and it's quite easy to swap one for the other.  I did make changes
around memory allocation due to me being too cheap to zero memory when
I didn't need to and simplehash not having any means of allocating
memory without zeroing it.

I also think that there's just no one-size-fits-all hash table type.
simplehash will not perform well when the size of the stored element
is very large. There's simply too much memcpying to move data around
during insert/delete.  simplehash will also have terrible iteration
performance in sparsely populated tables.  However, simplehash will be
pretty much unbeatable for lookups where the element type is very
small, e.g single Datum, or an int.  The CPU cache efficiency there
will be pretty much unbeatable.

I tried to document the advantages of each in the file header
comments.  I should probably also add something to simplehash.h's
comments to mention generichash.h

David



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Use simplehash.h instead of dynahash in SMgr
Next
From: Japin Li
Date:
Subject: Re: Fix for segfault in logical replication on master