Re: Use simplehash.h instead of dynahash in SMgr - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Use simplehash.h instead of dynahash in SMgr
Date
Msg-id CA+hUKGLQd-h=7tvmtFGhhNEuX7Vtc=h44FC_Buu1-NU6-341WQ@mail.gmail.com
Whole thread Raw
In response to Re: Use simplehash.h instead of dynahash in SMgr  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use simplehash.h instead of dynahash in SMgr  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Wed, Jun 30, 2021 at 11:14 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Wed, 23 Jun 2021 at 12:17, Thomas Munro <thomas.munro@gmail.com> wrote:
> Thanks for taking an interest in this. I started looking at your idea
> and I've now changed my mind from just not liking it to thinking that
> the whole idea is just completely horrible :-(

Hah.

I accept that trying to make a thing that "wraps" these data
structures and provides a simple interface is probably really quite
horrible with preprocessor voodoo.

I was mainly questioning how bad it would be if we had a generic
segmented array component (seems like a great idea, which I'm sure
would find other uses, I recall wanting to write that myself before),
and then combined that with the presence map idea to make a dense
object pool (ditto), but then, in each place where we need something
like this, just used a plain old hash table to point directly to
objects in it whenever we needed that, open coding the logic to keep
it in sync (I mean, just the way that people usually use hash tables).
That way, the object pool can give you very fast scans over all
objects in cache friendly order (no linked lists), and the hash table
doesn't know/care about its existence.  In other words, small reusable
components that each do one thing well and are not coupled together.

I think I understand now that you really, really want small index
numbers and not 64 bit pointers in the hash table.  Hmm.

> It gets really messy with all the nested pre-processor stuff around
> fetching the element from the segmented array inside simplehash.  One
> problem is that simplehash needs the address of the segments despite
> simplehash not knowing anything about segments. I've tried to make
> that work by passing in the generic hash struct as simplehash's
> private_data. This ends up with deeply nested macros all defined in
> different files. I pitty the future person debugging that.

Yeah, that sounds terrible.

> There are also a bunch of changes / API breakages that need to be done
> to make this work with simplehash.h.
>
> 1) Since I really need 8-byte buckets in the hash table to make this
> as fast as possible, I want to use the array index for the hash status
> and that means changing the simplehash API to allow that to work.
> This requires something like SH_IS_BUCKET_INUSE, SH_SET_BUCKET_INUSE,
> SH_SET_BUCKET_EMPTY.

+1 for doing customisable "is in use" checks on day anyway, as a
separate project.  Not sure if any current users could shrink their
structs in practice because, at a glance, the same amount of space
might be used by padding anyway, but when a case like that shows up...

> > 2.  A bitmapset that tracks unused elements in 1, making it easy to
> > find the lowest-index hole when looking for a place to put a new one
> > by linear search for a 1 bit, so that we tend towards maximum density
> > despite having random frees from time to time (seems good, the same
> > idea is used in  kernels to allocate the lowest unused file descriptor
> > number).
>
> I didn't use Bitmapsets. I wanted the bitmaps to be allocated in the
> same chunk of memory as the segments of the array.  Also, because
> bitmapset's nwords is variable, then they can't really do any loop
> unrolling.  Since in my implementation the number of bitmap words are
> known at compile-time, the compiler has the flexibility to do loop
> unrolling.  The bitmap manipulation is one of the biggest overheads in
> generichash.h. I'd prefer to keep that as fast as possible.

I think my hands autocompleted "bitmapset", I really meant to write
just "bitmap" as I didn't think you were using the actual thing called
bitmapset, but point taken.

> So, with all that. I really don't think it's a great idea to try and
> have this use simplehash.h code. I plan to pursue the idea I proposed
> with having seperate hash table code that is coded properly to have
> stable pointers into the data rather than trying to contort
> simplehash's code into working that way.

Fair enough.

It's not that I don't believe it's a good idea to be able to perform
cache-friendly iteration over densely packed objects... that part
sounds great... it's just that it's not obvious to me that it should
be a *hashtable's* job to provide that access path.  Perhaps I lack
imagination and we'll have to agree to differ.



pgsql-hackers by date:

Previous
From: Gurjeet Singh
Date:
Subject: Automatic notification of top transaction IDs
Next
From: David Rowley
Date:
Subject: Re: Numeric multiplication overflow errors