Hi John,
Many thanks for your feedback!
> There's more we can do here. Above the stanzas changed in the patch
> there is this, at least for varlena/bytea:
>
> hash = DatumGetUInt32(hash_any((unsigned char *) authoritative_data,
> Min(len, PG_CACHE_LINE_SIZE)));
>
> This makes no sense to me: hash_any() calls hash_bytes() and turns the
> result into a Datum, and then we just get it right back out of the
> Datum again.
I see similar patterns in files other than bytea.c and varlena.c.
Implemented as a separate patch.
> if (len > PG_CACHE_LINE_SIZE)
> hash ^= DatumGetUInt32(hash_uint32((uint32) len));
>
> Similar here, but instead of hash_bytes_uint32(), we may as well use
> mumurhash32().
Ditto.
It's worth noting that timetz_hash() uses a similar pattern but I
choose to keep it as is. Changing it will break backward
compatibility. Also it breaks our tests.
Using hash_bytes_uint32() / hash_bytes_uint32_extended() directly in
timetz_hash() / timetz_hash_extended() is safe though. Proposed as a
separate patch.
> addHyperLogLog says "typically generated using
> hash_any()", but that function takes a uint32, not a Datum, so that
> comment should probably be changed. hash_bytes() is global, so we can
> use it directly.
Makes sense. Implemented as an independent patch.
--
Best regards,
Aleksander Alekseev