Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash - Mailing list pgsql-hackers

From CharSyam
Subject Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash
Date
Msg-id CAMrLSE5DHxxBxhSMCnimZoMdsUjKagDF4PzgMoFMsZcp_=24cg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use cached hash to skip unnecessary key comparisons in dshash  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
Thank you for the thoughtful review. I've attached an updated patch (v2)
  addressing your feedback.

  1. Comment added

  I've added a comment in find_in_bucket() explaining the hash pre-check
  assumption:

    /*
     * If the hash values don't match, the keys certainly don't match
     * either, so we can skip the expensive key comparison.  Matching
     * hashes don't guarantee matching keys, so equal_keys() is still
     * needed for confirmation.
     */

  A short "See comment in find_in_bucket()" reference is also added in
  delete_key_from_bucket().

  2. No regressions with short keys / cheap comparisons

  I ran an additional benchmark with short keys ("1" ~ "10000", 1-5 chars)
  where strcmp cost is minimal:

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 13.26 ms  |  6.44 ms  | ~51% faster
    LOOKUP 10000 hits        |  7.94 ms  |  5.03 ms  | ~37% faster
    LOOKUP 10000 misses      |  8.08 ms  |  4.80 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.05 ms  | 24.06 ms  | ~27% faster

  For reference, here are the original string key results ("key_1" ~
  "key_10000"):

    Test                     | Before    | After     | Change
    -------------------------+-----------+-----------+------------
    INSERT 10000 entries     | 14.99 ms  |  9.46 ms  | ~37% faster
    LOOKUP 10000 hits        | 10.40 ms  |  5.52 ms  | ~47% faster
    LOOKUP 10000 misses      |  8.41 ms  |  4.95 ms  | ~41% faster
    LOOKUP 50000 hits (x5)   | 33.48 ms  | 26.44 ms  | ~21% faster

  No regressions in either scenario.  Even with short keys, the integer
  hash pre-check is always cheaper than the DSM address translation plus
  compare function call it avoids.

  Both tests ran on macOS (arm64) using the test_dsm_registry extension
  with 10,000 entries.

  3. Keeping the pre-check outside equal_keys()

  I considered moving the hash comparison into equal_keys(), but I think
  keeping it at the call site is a better fit for the following reasons:

    - equal_keys() is a pure key comparison function.  Mixing in hash
      comparison would change its semantics and make the name misleading.

    - To pass hashes into equal_keys(), we would need to add two hash
      parameters (one for the search key, one for the item).  The caller
      still needs to extract item->hash before the call, so the call site
      doesn't actually get simpler.

    - There are only two call sites, both already modified in this patch,
      so there is no real maintenance burden from having the check at each
      call site.

    - This pattern is consistent with other hash table implementations in
      PostgreSQL (e.g., dynahash.c), which also compare hashes outside
      the key equality function.

  I'm happy to reconsider if you feel differently.

  Regards,
  charsyam

2026년 4월 11일 (토) 오전 3:10, Nathan Bossart <nathandbossart@gmail.com>님이 작성:
On Sat, Apr 11, 2026 at 01:09:33AM +0900, CharSyam wrote:
> I believe this patch is ready for review.  I look forward to any feedback
> and am happy to make revisions.

Sorry, I forgot to ask whether we could move the "pre-check" to within the
equal_keys() function so that it needn't be added to every one of its
callers.

--
nathan
Attachment

pgsql-hackers by date:

Previous
From: Haibo Yan
Date:
Subject: Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired
Next
From: Tender Wang
Date:
Subject: Re: pg17: XX000: no relation entry for relid 0