Home > mailing lists

Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash - Mailing list pgsql-hackers

From	CharSyam
Subject	Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
Date	April 11 04:17:07
Msg-id	CAMrLSE5DHxxBxhSMCnimZoMdsUjKagDF4PzgMoFMsZcp_=24cg@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash (Nathan Bossart <nathandbossart@gmail.com>)
List	pgsql-hackers

Tree view

Thank you for the thoughtful review. I've attached an updated patch (v2)
addressing your feedback.

1. Comment added

I've added a comment in find_in_bucket() explaining the hash pre-check
assumption:

/*
* If the hash values don't match, the keys certainly don't match
* either, so we can skip the expensive key comparison. Matching
* hashes don't guarantee matching keys, so equal_keys() is still
* needed for confirmation.
*/

A short "See comment in find_in_bucket()" reference is also added in
delete_key_from_bucket().

2. No regressions with short keys / cheap comparisons

I ran an additional benchmark with short keys ("1" ~ "10000", 1-5 chars)
where strcmp cost is minimal:

Test | Before | After | Change
-------------------------+-----------+-----------+------------
INSERT 10000 entries | 13.26 ms | 6.44 ms | ~51% faster
LOOKUP 10000 hits | 7.94 ms | 5.03 ms | ~37% faster
LOOKUP 10000 misses | 8.08 ms | 4.80 ms | ~41% faster
LOOKUP 50000 hits (x5) | 33.05 ms | 24.06 ms | ~27% faster

For reference, here are the original string key results ("key_1" ~
"key_10000"):

Test | Before | After | Change
-------------------------+-----------+-----------+------------
INSERT 10000 entries | 14.99 ms | 9.46 ms | ~37% faster
LOOKUP 10000 hits | 10.40 ms | 5.52 ms | ~47% faster
LOOKUP 10000 misses | 8.41 ms | 4.95 ms | ~41% faster
LOOKUP 50000 hits (x5) | 33.48 ms | 26.44 ms | ~21% faster

No regressions in either scenario. Even with short keys, the integer
hash pre-check is always cheaper than the DSM address translation plus
compare function call it avoids.

Both tests ran on macOS (arm64) using the test_dsm_registry extension
with 10,000 entries.

3. Keeping the pre-check outside equal_keys()

I considered moving the hash comparison into equal_keys(), but I think
keeping it at the call site is a better fit for the following reasons:

- equal_keys() is a pure key comparison function. Mixing in hash
comparison would change its semantics and make the name misleading.

- To pass hashes into equal_keys(), we would need to add two hash
parameters (one for the search key, one for the item). The caller
still needs to extract item->hash before the call, so the call site
doesn't actually get simpler.

- There are only two call sites, both already modified in this patch,
so there is no real maintenance burden from having the check at each
call site.

- This pattern is consistent with other hash table implementations in
PostgreSQL (e.g., dynahash.c), which also compare hashes outside
the key equality function.

I'm happy to reconsider if you feel differently.

Regards,
charsyam

2026년 4월 11일 (토) 오전 3:10, Nathan Bossart <nathandbossart@gmail.com>님이 작성:

On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> I believe this patch is ready for review. I look forward to any feedback
> and am happy to make revisions.

Sorry, I forgot to ask whether we could move the "pre-check" to within the
equal_keys() function so that it needn't be added to every one of its
callers.

--
nathan

Attachment

0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-v2.patch

pgsql-hackers by date:

From: Haibo Yan
Date: 11 April, 03:55:29
Subject: Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired

From: Tender Wang
Date: 11 April, 04:31:59
Subject: Re: pg17: XX000: no relation entry for relid 0

Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash - Mailing list pgsql-hackers

Attachment

Previous

Next