Re: Change GUC hashtable to use simplehash? - Mailing list pgsql-hackers

From John Naylor
Subject Re: Change GUC hashtable to use simplehash?
Date
Msg-id CANWCAZbMY6NSqg9j=_PH2ANTO95-7K8q=j7Si0d4CRGVJosPOw@mail.gmail.com
Whole thread Raw
In response to Re: Change GUC hashtable to use simplehash?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Change GUC hashtable to use simplehash?
List pgsql-hackers
On Mon, Dec 4, 2023 at 4:16 AM Jeff Davis <pgsql@j-davis.com> wrote:
> I'm trying to follow the distinctions you're making between dynahash
> and simplehash -- are you saying it's easier to do incremental hashing
> with dynahash, and if so, why?

That's a good thing to clear up. This thread has taken simplehash as a
starting point from the very beginning. It initially showed no
improvement, and then we identified problems with the hashing and
equality computations. The latter seem like independently commitable
improvements, so I'm curious if they help on their own, even if we
still need to switch to simplehash as a last step to meet your
performance goals.

> If I understood what Andres was saying, the exposed hash state would be
> useful for writing a hash function like guc_name_hash().

From my point of view, it would at least be useful for C-strings,
where we don't have the length available up front.

Aside from that, we have multiple places that compute full 32-bit
hashes on multiple individual values, and then combine them with
various ad-hoc ways. It could be worth exploring whether an
incremental interface would be better in those places on a
case-by-case basis.

(If Andres had something else in mind, I'll let him address that.)

> But whether we
> use simplehash or dynahash is a separate question, right?

Right, the table implementation should treat the hash function as a
black box. Think of the incremental API as lower-level building blocks
for building hash functions.

> Also, while the |= 0x20 is a nice trick for lowercasing, did we decide
> that it's better than my approach in patch 0004 here:
>
> https://www.postgresql.org/message-id/27a7a289d5b8f42e1b1e79b1bcaeef3a40583bd2.camel@j-davis.com
>
> which optimizes exact hits (most GUC names are already folded) before
> trying case folding?

Note there were two aspects there: hashing and equality. I demonstrated in

https://www.postgresql.org/message-id/CANWCAZbQ30O9j-bEZ_1zVCyKPpSjwbE4u19cSDDBJ%3DTYrHvPig%40mail.gmail.com

... in v4-0003 that the equality function can be optimized for
already-folded names (and in fact measured almost equally) using way,
way, way less code.



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby