Dear PostgreSQL Hackers,
This email proposes a patch to optimize hash table lookups and
deletions in dshash. The patch file,
0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch,
is attached.
Problem:
Each dshash_table_item already stores the hash value in its 'hash'
field, but find_in_bucket() and delete_key_from_bucket() never use
it. They unconditionally call the compare function for every item
in the bucket chain, incurring unnecessary overhead from DSM address
translation and key comparison on non-matching items.
Solution:
This patch adds a hash pre-check (item->hash == hash) before calling
equal_keys() in both find_in_bucket() and delete_key_from_bucket().
Items with non-matching hash values are skipped with a single integer
comparison, avoiding the expensive key comparison entirely.
All callers -- dshash_find(), dshash_find_or_insert_extended(), and
dshash_delete_key() -- already have the hash value computed, so it
is simply passed through as a new parameter. Since both functions
are static, there is no API change.
Benchmark:
I ran a benchmark using the test_dsm_registry extension with 10,000
string-keyed entries (key = 'key_1' .. 'key_10000') on an Apple
Silicon machine:
Test | Before | After | Improvement
-------------------------+-----------+-----------+------------
INSERT 10000 entries | 14.99 ms | 9.46 ms | ~37%
LOOKUP 10000 hits | 10.40 ms | 5.52 ms | ~47%
LOOKUP 10000 misses | 8.41 ms | 4.95 ms | ~41%
LOOKUP 50000 hits (x5) | 33.48 ms | 26.44 ms | ~21%
The improvement is most significant when bucket chains are longer
and key comparison is expensive (e.g., string keys).
Testing & Compatibility:
The patch compiles cleanly and passes all core regression tests
(make check) on macOS (arm64). The changes are limited to internal
static functions in dshash.c and do not affect any public API, so
full backward compatibility is maintained.
I believe this patch is ready for review. I look forward to any
feedback and am happy to make revisions.
Thank you,
charsyam