Home > mailing lists

Re: Do we want a hashset type? - Mailing list pgsql-hackers

From	Joel Jacobson
Subject	Re: Do we want a hashset type?
Date	June 25, 2023 09:42:52
Msg-id	183b5dbc-0647-43a3-a02e-0a3abbf8a1ac@app.fastmail.com Whole thread Raw
In response to	Re: Do we want a hashset type? ("Joel Jacobson" <joel@compiler.org>)
Responses	Re: Do we want a hashset type? Re: Do we want a hashset type?
List	pgsql-hackers

Tree view

On Sat, Jun 24, 2023, at 21:16, Joel Jacobson wrote:
> New version of int4hashset_contains() that should follow the same
> General Rules as MULTISET's MEMBER OF (8.16 <member predicate>).
...
> SELECT hashset_contains('{}'::int4hashset, NULL::int); -- false
...
> SELECT hashset_contains('{null}'::int4hashset, NULL::int); -- null

When it comes to SQL, the general rule of thumb is that expressions and functions 
handling null usually return the null value. This is why it might feel a bit out 
of the ordinary to return False when checking if an empty set contains NULL.

However, that's my understanding of the General Rules on page 553 of 
ISO/IEC 9075-2:2023(E). Rule 3 Case a) specifically states:

    "If N is 0 (zero), then the <member predicate> is False.",

where N is the cardinality, and for an empty set, that's 0 (zero).

Rule 3 Case b) goes on to say:

    "If at least one of XV and MV is the null value, then the 
    <member predicate> is Unknown."

But since b) follows a), and the condition for a) already matches, b) is out of 
the running. This leads me to believe that the result of:

    SELECT hashset_contains('{}'::int4hashset, NULL::int);

would be False, according to the General Rules.

Now, this is based on the assumption that the Case conditions are evaluated in 
sequence, stopping at the first match. Does that assumption hold water?

Applying the same rules, we'd have to return Unknown (which we represent as
null) for:

    SELECT hashset_contains('{null}'::int4hashset, NULL::int);

Here, since the cardinality N is 1, Case a) doesn't apply, but Case b) does 
since XV is null.

Looking ahead, we're entertaining the possibility of a future SET SQL-syntax
feature and wondering how our hashset type could be adapted to be compatible and
reusable for such a development. It's a common prediction that any future SET
syntax feature would probably operate on Three-Valued Logic. Therefore, it's key
for our hashset to handle null values, whether storing, identifying, or adding
them.

But here's my two cents, and remember it's just a personal viewpoint. I'm not so 
sure that the hashset type functions need to mirror the corresponding MULTISET 
language constructs exactly. In my book, our hashset catalog functions could
take a more clear-cut route with null handling, as long as our data structure is
prepared to handle null values.

Think about this possibility:

hashset_contains_null(int4hashset) -> boolean
hashset_add_null(int4hashset) -> int4hashset
hashset_contains(..., NULL) -> ERROR
hashset_add(..., NULL) -> ERROR

In my mind, this explicit null handling could simplify things, clear up any
potential confusion, and at the same time pave the way for compatibility with
any future SET SQL-syntax feature.

Thoughts?

/Joel

pgsql-hackers by date:

From: Michael Banck
Date: 25 June 2023, 09:10:00
Subject: Re: Stampede of the JIT compilers

From: Tatsuo Ishii
Date: 25 June 2023, 12:05:09
Subject: Row pattern recognition

Re: Do we want a hashset type? - Mailing list pgsql-hackers

Previous

Next