Re: Change GUC hashtable to use simplehash? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Change GUC hashtable to use simplehash?
Date
Msg-id 20231117232712.mpnzsho7ydssg2oc@awork3.anarazel.de
Whole thread Raw
In response to Re: Change GUC hashtable to use simplehash?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Change GUC hashtable to use simplehash?
List pgsql-hackers
Hi,

On 2023-11-17 14:08:56 -0800, Jeff Davis wrote:
> On Fri, 2023-11-17 at 17:04 -0500, Tom Lane wrote:
> > I can't imagine wanting to convert *every* hashtable in the system
> > to simplehash; the added code bloat would be unreasonable.  So yeah,
> > I think we'll have two mechanisms indefinitely.  That's not to say
> > that we might not rewrite hsearch.  But simplehash was never meant
> > to be a universal solution.
> 
> OK, I will withdraw the patch until/unless it provides a concrete
> benefit.

It might already in the space domain:

SELECT count(*), sum(total_bytes) total_bytes, sum(total_nblocks) total_nblocks, sum(free_bytes) free_bytes,
sum(free_chunks)free_chunks, sum(used_bytes) used_bytes
 
FROM pg_backend_memory_contexts
WHERE name LIKE 'GUC%';

HEAD:
┌───────┬─────────────┬───────────────┬────────────┬─────────────┬────────────┐
│ count │ total_bytes │ total_nblocks │ free_bytes │ free_chunks │ used_bytes │
├───────┼─────────────┼───────────────┼────────────┼─────────────┼────────────┤
│     2 │       57344 │             5 │      25032 │          10 │      32312 │
└───────┴─────────────┴───────────────┴────────────┴─────────────┴────────────┘

your patch:
┌───────┬─────────────┬───────────────┬────────────┬─────────────┬────────────┐
│ count │ total_bytes │ total_nblocks │ free_bytes │ free_chunks │ used_bytes │
├───────┼─────────────┼───────────────┼────────────┼─────────────┼────────────┤
│     1 │       36928 │             3 │      12360 │           3 │      24568 │
└───────┴─────────────┴───────────────┴────────────┴─────────────┴────────────┘


However, it fares less well at larger number of GUCs, performance wise. At
first I thought that that's largely because you aren't using SH_STORE_HASH.
With that, it's slower when creating a large number of GUCs, but a good bit
faster retrieving them. But that slowness didn't seem right.


Then I noticed that memory usage was too large when creating many GUCs - a bit
of debugging later, I figured out that that's due to guc_name_hash() being
terrifyingly bad. There's no bit mixing whatsoever! Which leads to very large
numbers of hash conflicts - which simplehash tries to defend against a bit by
making the table larger.

(gdb) p guc_name_hash("andres.c2")
$14 = 3798554171
(gdb) p guc_name_hash("andres.c3")
$15 = 3798554170


Fixing that makes simplehash always faster, but still doesn't win on memory
usage at the upper end - the two pointers in GUCHashEntry make it too big.


I think, independent of this patch, it might be worth requiring that hash
table lookups applied the transformation before the lookup. A comparison
function this expensive is not great...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Next
From: jian he
Date:
Subject: Re: Add pg_basetype() function to obtain a DOMAIN base type