Re: Do we want a hashset type? - Mailing list pgsql-hackers

From jian he
Subject Re: Do we want a hashset type?
Date
Msg-id CACJufxHq_ZwBObSEL1wJrvDrLWcU1brsnw4+OQ+wkKqsZSBE9Q@mail.gmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Do we want a hashset type?
List pgsql-hackers


On Mon, Jun 19, 2023 at 2:51 PM Joel Jacobson <joel@compiler.org> wrote:
>
> On Mon, Jun 19, 2023, at 02:00, jian he wrote:
> > select hashset_contains('{1,2}'::int4hashset,NULL::int);
> > should return null?
>
> Hmm, that's a good philosophical question.
>
> I notice Tomas Vondra in the initial commit opted for allowing NULL inputs,
> treating them as empty sets, e.g. in int4hashset_add() we create a
> new hashset if the first argument is NULL.
>
> I guess the easiest perhaps most consistent NULL-handling strategy
> would be to just mark all relevant functions STRICT except for the agg ones
> since we probably want to allow skipping over rows with NULL values
> without the entire result becoming NULL.
>
> But if we're not just going the STRICT route, then I think it's a bit more tricky,
> since you could argue the hashset_contains() example should return FALSE
> since the set doesn't contain the NULL value, but OTOH, since we don't
> store NULL values, we don't know if has ever been added, hence a NULL
> result would perhaps make more sense.
>
> I think I lean on thinking that if we want to be "NULL-friendly", like we
> currently are in hashset_add(), it would probably be most user-friendly
> to be consistent and let all functions return non-null return values in
> all cases where it is not unreasonable.
>
> Since we're essentially designing a set-theoretic system, I think we should
> aim for the logical "soundness" property of it and think about how we can
> verify that it is.
>
> Thoughts?
>
> /Joel

hashset_to_array function should be strict?

I noticed hashset_symmetric_difference and  hashset_difference handle null in a different way, seems they should handle null in a consistent way?

select '{1,2,NULL}'::int[] operator (pg_catalog.@>) '{NULL}'::int[]; --false
select '{1,2,NULL}'::int[] operator (pg_catalog.&&) '{NULL}'::int[]; --false.
So similarly I guess hashset_contains should be false.
select hashset_contains('{1,2}'::int4hashset,NULL::int); 


pgsql-hackers by date:

Previous
From: Schoemans Maxime
Date:
Subject: Re: Implement missing join selectivity estimation for range types
Next
From: Jelte Fennema
Date:
Subject: Re: Deleting prepared statements from libpq.