Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Do we want a hashset type?
Date
Msg-id 6b8c54d8-b823-479a-9c93-d0d955b7d7cb@app.fastmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Do we want a hashset type?
Re: Do we want a hashset type?
List pgsql-hackers
On Fri, Jun 16, 2023, at 17:42, Joel Jacobson wrote:
> I realise int4hashset_hash() is broken,
> since two int4hashset's that are considered equal,
> can by coincidence get different hashes:
...
> Do we have any ideas on how to fix this without sacrificing performance?

The problem was due to hashset_hash() function accumulating the hashes
of individual elements in a non-commutative manner. As a consequence, the
final hash value was sensitive to the order in which elements were inserted
into the hashset. This behavior led to inconsistencies, as logically
equivalent sets (i.e., sets with the same elements but in different orders)
produced different hash values.

Solved by modifying the hashset_hash() function to use a commutative operation
when combining the hashes of individual elements. This change ensures that the
final hash value is independent of the element insertion order, and logically
equivalent sets produce the same hash.

An somewhat unfortunate side-effect of this fix, is that we can no longer
visually sort the hashset output format, since it's not lexicographically sorted.
I think this is an acceptable trade-off for a hashset type,
since the only alternative I see would be to sort the elements,
but then it wouldn't be a hashset, but a treeset, which different
Big-O complexity.

New patch is attached, which will henceforth always be a complete patch,
to avoid the hassle of having to assemble incremental patches.

/Joel
Attachment

pgsql-hackers by date:

Previous
From: Quan Zongliang
Date:
Subject: Re: Incorrect estimation of HashJoin rows resulted from inaccurate small table statistics
Next
From: Amit Kapila
Date:
Subject: Re: RFC: logical publication via inheritance root?