Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Do we want a hashset type?
Date
Msg-id 0db4941c-d954-617c-2bb9-a39ed11a0d63@enterprisedb.com
Whole thread Raw
In response to Re: Do we want a hashset type?  (jian he <jian.universality@gmail.com>)
Responses Re: Do we want a hashset type?
List pgsql-hackers

On 6/25/23 15:32, jian he wrote:
>> Or maybe I just don't understand the proposal. Perhaps it'd be best if
>> jian wrote a patch illustrating the idea, and showing how it performs
>> compared to the current approach.
> 
> currently joel's idea is a int4hashset. based on the code first tomas wrote.
> it looks like a non-nested an collection of unique int4. external text
> format looks like {int4, int4,int4}
> structure looks like (header +  capacity slots * int4).
> Within the capacity slots, some slots are empty, some have unique values.
> 
> The textual int4hashset looks like a one dimensional array.
> so I copied/imitated src/backend/utils/adt/arrayfuncs.c code, rewrote a
> slight generic hashset input and output function.
> 
> see the attached c file.
> It works fine for non-null input output for {int4hashset, int8hashset,
> timestamphashset,intervalhashset,uuidhashset).

So how do you define a table with a "set" column? I mean, with the
original patch we could have done

   CREATE TABLE (a int4hashset);

and then store / query this. How do you do that with this approach?

I've looked at the patch only very briefly - it's really difficult to
grok such patches - large, with half the comments possibly obsolete etc.
So what does reusing the array code give us, really?

I'm not against reusing some of the array code, but arrays seem to be
much more elaborate (multiple dimensions, ...) so the code needs to do
significantly more stuff in various cases.

When I previously suggested that maybe we should get "inspiration" from
the array code, I was mostly talking about (a) type polymorphism, i.e.
doing sets for arbitrary types, and (b) integrating this into grammar
(instead of using functions).

I don't see how copying arrayfuncs.c like this achieves either of these
things. It still hardcodes just a handful of selected data types, and
the array polymorphism relies on automatic creation of array type for
every scalar type.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Steve Chavez
Date:
Subject: Fwd: Castable Domains for different JSON representations
Next
From: James Coleman
Date:
Subject: Re: Stampede of the JIT compilers