Allow simplehash to use already-calculated hash values - Mailing list pgsql-hackers

From Jeff Davis
Subject Allow simplehash to use already-calculated hash values
Date
Msg-id 48abe675e1330f0c264ab2fe0d4ff23eb244f9ef.camel@j-davis.com
Whole thread Raw
Responses Re: Allow simplehash to use already-calculated hash values  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
The attached small patch adds new entry points to simplehash.h that
allow the caller to pass in the already-calculated hash value, so that
simplehash doesn't need to recalculate it.

This is helpful for Memory-Bounded Hash Aggregation[1], which uses the
hash value for multiple purposes. For instance, if the hash table is
full and the group is not already present in the hash table, it needs
to spill the tuple to disk. In that case, it would use the hash value
for the initial lookup, then to select the right spill partition.
Later, when it processes the batch, it will again need the same hash
value to perform a lookup. By separating the hash value calculation
from where it's used, we can avoid needlessly recalculating it for each
of these steps.

There is already an option for simplehash to cache the calculated hash
value and return it with the entry, but that doesn't quite fit the
need. The hash value is needed in cases where the lookup fails, because
that is when the tuple must be spilled; but if the lookup fails, it
returns NULL, discarding the calculated hash value.

I am including this patch separately from Hash Aggregation because it
is a small and independently-reviewable change.

In theory, this could add overhead for "SH_SCOPE extern" for callers
not specifying their own hash value, because it adds an extra external
function call. I looked at the generated LLVM and it's a simple tail
call, and I looked at the generated assembly and it's just an extra
jmp. I tested by doing a hash aggregation of 30M zeroes, which should
exercise that path a lot, and I didn't see any difference. Also, once
we actually use this for hash aggregation, there will be no "SH_SCOPE
extern" callers that don't specify the hash value anyway.

Regards,
    Jeff Davis

[1] 
https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel%40j-davis.com

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs
Next
From: Andres Freund
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs