Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Do we want a hashset type?
Date
Msg-id 83627ce2-236b-4a68-ac05-7398d9ec701f@app.fastmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Do we want a hashset type?
List pgsql-hackers
On Wed, Jun 14, 2023, at 11:44, Tomas Vondra wrote:
>> Perspective from a potential user: I'm currently working on something
>> where an array-like structure with fast membership test performance
>> would be very useful. The main type of query is doing an =ANY(the set)
>> filter, where the set could contain anywhere from very few to thousands
>> of entries (ints in our case). So we'd want the same index usage as
>> =ANY(array) but would like faster row checking than we get with an array
>> when other indexes are used.
>> 
>
> We kinda already do this since PG14 (commit 50e17ad281), actually. If
> the list is long enough (9 values or more), we'll build a hash table
> during query execution. So pretty much exactly what you're asking for.

Would it be feasible to teach the planner to utilize the internal hash table of
hashset directly? In the case of arrays, the hash table construction is an
ad hoc operation, whereas with hashset, the hash table already exists, which
could potentially lead to a faster execution.

Essentially, the aim would be to support:

=ANY(hashset)

Instead of the current:

=ANY(hashset_to_array(hashset))

Thoughts?

/Joel



pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Shouldn't cost_append() also scale the partial path's cost?
Next
From: Tom Lane
Date:
Subject: Re: ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1