Home > mailing lists

Re: Change GUC hashtable to use simplehash? - Mailing list pgsql-hackers

From	John Naylor
Subject	Re: Change GUC hashtable to use simplehash?
Date	December 6, 2023 00:39:00
Msg-id	CANWCAZZYAZUH_3GYUtETmLOcd-5oD+kVC3cX889F7tzAQwtOsA@mail.gmail.com Whole thread Raw
In response to	Re: Change GUC hashtable to use simplehash? (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Change GUC hashtable to use simplehash? Re: Change GUC hashtable to use simplehash?
List	pgsql-hackers

Tree view

On Tue, Dec 5, 2023 at 1:57 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Mon, 2023-12-04 at 12:12 +0700, John Naylor wrote:

> There's already a patch to use simplehash, and the API is a bit
> cleaner, and there's a minor performance improvement. It seems fairly
> non-controversial -- should I just proceed with that patch?

I won't object if you want to commit that piece now, but I hesitate to
call it a performance improvement on its own.

- The runtime measurements I saw reported were well within the noise level.
- The memory usage starts out better, but with more entries is worse.

> > From my point of view, it would at least be useful for C-strings,
> > where we don't have the length available up front.
>
> That's good news.
>
> By the way, is there any reason that we would need hash_bytes(s,
> strlen(s)) == cstring_hash(s)?

"git grep cstring_hash" found nothing, so not sure what you're asking.

> Each approach has its own optimization techniques. In (a), we can use
> the |= 0x20 trick, and for equality do a memcmp() check first.

I will assume you are referring to semantics, but on the odd chance
readers take this to mean the actual C library call, that wouldn't be
an optimization, that'd be a pessimization.

> As a tangential point, we may eventually want to provide a more
> internationalized definition of "case insensitive" for GUC names. That
> would be slightly easier with (b) than with (a), but we can cross that
> bridge if and when we come to it.

The risk/reward ratio seems pretty bad.

> It seems you are moving toward (a) whereas my patches moved toward (b).
> I am fine with either approach but I wanted to clarify which approach
> we are using.

I will make my case:

> In the abstract, I kind of like approach (b) because we don't need to
> be as special/clever with the hash functions.

In the abstract, I consider (b) to be a layering violation. As a
consequence, the cleverness in (b) is not confined to one or two
places, but is smeared over a whole bunch of places. I find it hard to
follow.

Concretely, it also adds another pointer to the element struct. That's
not good for a linear open-addressing array, which simplehash has.

Further, remember the equality function is important as well. In v3,
it was "strcmp(a,b)==0", which is a holdover from the dynahash API.
One of the advantages of the simplehash API is that we can 1) use an
equality function, which should be slightly cheaper than a full
comparison function, and 2) we have the option to inline it. (It
doesn't make sense in turn, to jump to a shared lib page and invoke an
indirect function call.) Once we've done that, it's already "special",
so it's not a stretch to make it do what we want to begin with. If a
nicer API is important, why not use it?

pgsql-hackers by date:

From: Jeff Davis
Date: 05 December 2023, 23:55:11
Subject: Re: Faster "SET search_path"

From: Peter Geoghegan
Date: 06 December 2023, 00:41:06
Subject: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Re: Change GUC hashtable to use simplehash? - Mailing list pgsql-hackers

Previous

Next