Re: Do we want a hashset type? - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Do we want a hashset type? |
Date | |
Msg-id | 183b5dbc-0647-43a3-a02e-0a3abbf8a1ac@app.fastmail.com Whole thread Raw |
In response to | Re: Do we want a hashset type? ("Joel Jacobson" <joel@compiler.org>) |
Responses |
Re: Do we want a hashset type?
Re: Do we want a hashset type? |
List | pgsql-hackers |
On Sat, Jun 24, 2023, at 21:16, Joel Jacobson wrote: > New version of int4hashset_contains() that should follow the same > General Rules as MULTISET's MEMBER OF (8.16 <member predicate>). ... > SELECT hashset_contains('{}'::int4hashset, NULL::int); -- false ... > SELECT hashset_contains('{null}'::int4hashset, NULL::int); -- null When it comes to SQL, the general rule of thumb is that expressions and functions handling null usually return the null value. This is why it might feel a bit out of the ordinary to return False when checking if an empty set contains NULL. However, that's my understanding of the General Rules on page 553 of ISO/IEC 9075-2:2023(E). Rule 3 Case a) specifically states: "If N is 0 (zero), then the <member predicate> is False.", where N is the cardinality, and for an empty set, that's 0 (zero). Rule 3 Case b) goes on to say: "If at least one of XV and MV is the null value, then the <member predicate> is Unknown." But since b) follows a), and the condition for a) already matches, b) is out of the running. This leads me to believe that the result of: SELECT hashset_contains('{}'::int4hashset, NULL::int); would be False, according to the General Rules. Now, this is based on the assumption that the Case conditions are evaluated in sequence, stopping at the first match. Does that assumption hold water? Applying the same rules, we'd have to return Unknown (which we represent as null) for: SELECT hashset_contains('{null}'::int4hashset, NULL::int); Here, since the cardinality N is 1, Case a) doesn't apply, but Case b) does since XV is null. Looking ahead, we're entertaining the possibility of a future SET SQL-syntax feature and wondering how our hashset type could be adapted to be compatible and reusable for such a development. It's a common prediction that any future SET syntax feature would probably operate on Three-Valued Logic. Therefore, it's key for our hashset to handle null values, whether storing, identifying, or adding them. But here's my two cents, and remember it's just a personal viewpoint. I'm not so sure that the hashset type functions need to mirror the corresponding MULTISET language constructs exactly. In my book, our hashset catalog functions could take a more clear-cut route with null handling, as long as our data structure is prepared to handle null values. Think about this possibility: hashset_contains_null(int4hashset) -> boolean hashset_add_null(int4hashset) -> int4hashset hashset_contains(..., NULL) -> ERROR hashset_add(..., NULL) -> ERROR In my mind, this explicit null handling could simplify things, clear up any potential confusion, and at the same time pave the way for compatibility with any future SET SQL-syntax feature. Thoughts? /Joel
pgsql-hackers by date: