Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Do we want a hashset type?
Date
Msg-id 183b5dbc-0647-43a3-a02e-0a3abbf8a1ac@app.fastmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Do we want a hashset type?
Re: Do we want a hashset type?
List pgsql-hackers
On Sat, Jun 24, 2023, at 21:16, Joel Jacobson wrote:
> New version of int4hashset_contains() that should follow the same
> General Rules as MULTISET's MEMBER OF (8.16 <member predicate>).
...
> SELECT hashset_contains('{}'::int4hashset, NULL::int); -- false
...
> SELECT hashset_contains('{null}'::int4hashset, NULL::int); -- null

When it comes to SQL, the general rule of thumb is that expressions and functions 
handling null usually return the null value. This is why it might feel a bit out 
of the ordinary to return False when checking if an empty set contains NULL.

However, that's my understanding of the General Rules on page 553 of 
ISO/IEC 9075-2:2023(E). Rule 3 Case a) specifically states:

    "If N is 0 (zero), then the <member predicate> is False.",

where N is the cardinality, and for an empty set, that's 0 (zero).

Rule 3 Case b) goes on to say:

    "If at least one of XV and MV is the null value, then the 
    <member predicate> is Unknown."

But since b) follows a), and the condition for a) already matches, b) is out of 
the running. This leads me to believe that the result of:

    SELECT hashset_contains('{}'::int4hashset, NULL::int);

would be False, according to the General Rules.

Now, this is based on the assumption that the Case conditions are evaluated in 
sequence, stopping at the first match. Does that assumption hold water?

Applying the same rules, we'd have to return Unknown (which we represent as
null) for:

    SELECT hashset_contains('{null}'::int4hashset, NULL::int);

Here, since the cardinality N is 1, Case a) doesn't apply, but Case b) does 
since XV is null.

Looking ahead, we're entertaining the possibility of a future SET SQL-syntax
feature and wondering how our hashset type could be adapted to be compatible and
reusable for such a development. It's a common prediction that any future SET
syntax feature would probably operate on Three-Valued Logic. Therefore, it's key
for our hashset to handle null values, whether storing, identifying, or adding
them.

But here's my two cents, and remember it's just a personal viewpoint. I'm not so 
sure that the hashset type functions need to mirror the corresponding MULTISET 
language constructs exactly. In my book, our hashset catalog functions could
take a more clear-cut route with null handling, as long as our data structure is
prepared to handle null values.

Think about this possibility:

hashset_contains_null(int4hashset) -> boolean
hashset_add_null(int4hashset) -> int4hashset
hashset_contains(..., NULL) -> ERROR
hashset_add(..., NULL) -> ERROR

In my mind, this explicit null handling could simplify things, clear up any
potential confusion, and at the same time pave the way for compatibility with
any future SET SQL-syntax feature.

Thoughts?

/Joel



pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: Stampede of the JIT compilers
Next
From: Tatsuo Ishii
Date:
Subject: Row pattern recognition