Hi,
On 2023-11-22 15:56:21 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2023-11-21 16:42:55 +0700, John Naylor wrote:
> >> The strlen call required for hashbytes() is not free. The lack of
> >> mixing in the (probably inlined after 0001) previous hash function can
> >> remedied directly, as in the attached:
>
> > I doubt this is a good hashfunction. For short strings, sure, but after
> > that... I don't think it makes sense to reduce the internal state of a hash
> > function to something this small.
>
> GUC names are just about always short, though, so I'm not sure you've
> made your point?
With short I meant <= 6 characters (32 / 5 = 6.x). After that you're
overwriting bits that you previously set, without dispersing the "overwritten"
bits throughout the hash state.
It's pretty easy to create conflicts this way, even just on paper. E.g. I
think abcdefgg and cbcdefgw would have the same hash, because the accumulated
value passed to murmurhash32 is the same.
The fact that this happens when a large part of the string is the same
is bad, because it makes it more likely that prefixed strings trigger such
conflicts, and they're obviously common with GUC strings.
Greetings,
Andres Freund