[PATCH] Use cached hash to skip unnecessary key comparisons in dshash - Mailing list pgsql-hackers

From CharSyam
Subject [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
Date
Msg-id CAMrLSE4JrhPB2XYMOMzbPSsgE+6kmDaKHKiF34NWdnbQts=Cpw@mail.gmail.com
Whole thread Raw
Responses Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
List pgsql-hackers
Dear PostgreSQL Hackers,

  This email proposes a patch to optimize hash table lookups and
  deletions in dshash.  The patch file,
  0001-Use-cached-hash-to-skip-unnecessary-key-comparisons-.patch,
  is attached.

  Problem:

  Each dshash_table_item already stores the hash value in its 'hash'
  field, but find_in_bucket() and delete_key_from_bucket() never use
  it.  They unconditionally call the compare function for every item
  in the bucket chain, incurring unnecessary overhead from DSM address
  translation and key comparison on non-matching items.

  Solution:

  This patch adds a hash pre-check (item->hash == hash) before calling
  equal_keys() in both find_in_bucket() and delete_key_from_bucket().
  Items with non-matching hash values are skipped with a single integer
  comparison, avoiding the expensive key comparison entirely.

  All callers -- dshash_find(), dshash_find_or_insert_extended(), and
  dshash_delete_key() -- already have the hash value computed, so it
  is simply passed through as a new parameter.  Since both functions
  are static, there is no API change.

  Benchmark:

  I ran a benchmark using the test_dsm_registry extension with 10,000
  string-keyed entries (key = 'key_1' .. 'key_10000') on an Apple
  Silicon machine:

    Test                     | Before    | After     | Improvement
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37%
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47%
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41%
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21%

  The improvement is most significant when bucket chains are longer
  and key comparison is expensive (e.g., string keys).

  Testing & Compatibility:

  The patch compiles cleanly and passes all core regression tests
  (make check) on macOS (arm64).  The changes are limited to internal
  static functions in dshash.c and do not affect any public API, so
  full backward compatibility is maintained.

  I believe this patch is ready for review.  I look forward to any
  feedback and am happy to make revisions.

  Thank you,
  charsyam
Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [doc] pg_ctl: fix wrong description for -l
Next
From: Robert Treat
Date:
Subject: Re: Adding REPACK [concurrently]