Re: Change GUC hashtable to use simplehash? - Mailing list pgsql-hackers

From John Naylor
Subject Re: Change GUC hashtable to use simplehash?
Date
Msg-id CANWCAZbQ30O9j-bEZ_1zVCyKPpSjwbE4u19cSDDBJ=TYrHvPig@mail.gmail.com
Whole thread Raw
In response to Re: Change GUC hashtable to use simplehash?  (John Naylor <johncnaylorls@gmail.com>)
List pgsql-hackers
I wrote:

> Thinking some more, I'm not quite comfortable with the number of
> places in these patches that have to know about the pre-downcased
> strings, or whether we need that in the first place. If lower case is
> common enough to optimize for, it seems the equality function can just
> check strict equality on the char and only on mismatch try downcasing
> before returning false. Doing our own function would allow the
> compiler to inline it, or at least keep it on the same page. Further,
> the old hash function shouldn't need to branch to do the same
> downcasing, since hashing is lossy anyway. In the keyword hashes, we
> just do "*ch |= 0x20", which downcases letters and turns undercores to
> DEL. I can take a stab at that later.

v4 is a quick POC for that. I haven't verified that it's correct for
the case of the probe and the entry don't match, but in case it
doesn't it should be easy to fix. I also didn't bother with
SH_STORE_HASH in my testing.

0001 adds the murmur32 finalizer -- we should do that regardless of
anything else in this thread.
0002 is just Jeff's 0001
0003 adds an equality function that downcases lazily, and teaches the
hash function about the 0x20 trick.

master:
latency average = 581.765 ms

v3 0001-0005:
latency average = 544.576 ms

v4 0001-0003:
latency average = 547.489 ms

This gives similar results with a tiny amount of code (excluding the
simplehash conversion). I didn't check if the compiler inlined these
functions, but we can hint it if necessary. We could use the new
equality function in all the call sites that currently test for
"guc_name_compare() == 0", in which case it might not end up inlined,
but that's probably okay.

We could also try to improve the hash function's collision behavior by
collecting the bytes on a uint64 and calling our new murmur64 before
returning the lower half, but that's speculative.

Attachment

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: WaitEventSet resource leakage
Next
From: Amit Langote
Date:
Subject: Re: remaining sql/json patches