Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Do we want a hashset type?
Date
Msg-id d52abbc7-f474-b8f4-0c8a-11bbb1bedb0e@enterprisedb.com
Whole thread Raw
In response to Re: Do we want a hashset type?  (Tom Dunstan <pgsql@tomd.cc>)
Responses Re: Do we want a hashset type?
Re: Do we want a hashset type?
Re: Do we want a hashset type?
List pgsql-hackers
On 6/14/23 06:31, Tom Dunstan wrote:
> On Mon, 12 Jun 2023 at 22:37, Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> wrote:
> 
>     Perhaps. So you're proposing to have this as a regular built-in type?
>     It's hard for me to judge how popular this feature would be, but I guess
>     people often use arrays while they actually want set semantics ...
> 
> 
> Perspective from a potential user: I'm currently working on something
> where an array-like structure with fast membership test performance
> would be very useful. The main type of query is doing an =ANY(the set)
> filter, where the set could contain anywhere from very few to thousands
> of entries (ints in our case). So we'd want the same index usage as
> =ANY(array) but would like faster row checking than we get with an array
> when other indexes are used.
> 

We kinda already do this since PG14 (commit 50e17ad281), actually. If
the list is long enough (9 values or more), we'll build a hash table
during query execution. So pretty much exactly what you're asking for.

> Our app runs connecting to either an embedded postgres database that we
> control or an external one controlled by customers - this is typically
> RDS or some other cloud vendor's DB. Having such a type as a separate
> extension would make it unusable for us until all relevant cloud vendors
> decided that it was popular enough to include - something that may never
> happen, or even if it did, now any time soon.
> 

Understood, but that's really a problem / choice of the cloud vendors.

The thing is, adding stuff to core is not free - it means the community
becomes responsible for maintenance, testing, fixing issues, etc. It's
not feasible (or desirable) to have all extensions in core, and cloud
vendors generally do have ways to support some pre-vetted extensions
that they deem useful enough. Granted, it means vetting/maintenance for
them, but that's kinda the point of managed services. And it'd not be
free for us either.

Anyway, that's mostly irrelevant, as PG14 already does the hash table
for this kind of queries. And I'm not strictly against adding some of
this into core, if it ends up being useful enough.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: [PATCH] Using named captures in Catalog::ParseHeader()
Next
From: Richard Guo
Date:
Subject: Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)