Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Do we want a hashset type?
Date
Msg-id 7bf1b03d-c52a-eb2c-38c9-8019f4207387@enterprisedb.com
Whole thread Raw
In response to Re: Do we want a hashset type?  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
On 6/20/23 20:08, jian he wrote:
> On Wed, Jun 21, 2023 at 12:25 AM Tomas Vondra
> ...
>>  http://www.wiscorp.com/sqlmultisets.zip
> 
>> Conceptually, a multiset is an unordered collection of elements, all of the same type, with dupli-
>> cates permitted. Unlike arrays, a multiset is an unbounded collection, with no declared maximum
>> cardinality. This does not mean that the user can insert elements in a multiset without limit, just
>> that the standard does not mandate that there should be a limit. This is analogous to tables, which
>> have no declared maximum number of rows.
> 
> Postgres arrays don't have size limits.

Right. You can say int[5] but we don't enforce that limit (I haven't
checked why, but presumably because we had arrays before the standard
existed, and it was more like a list in LISP or something.)

> unordered means no need to use subscript?

Yeah - there's no obvious way to subscript the items when there's no
implicit ordering.

> So multiset is a more limited array type?
> 

Yes and no - both are collection types, so there are similarities and
differences. Multiset does not need to keep the ordering, so in this
sense it's a relaxed version of array.


> null is fine. but personally I feel like so far the hashset main
> feature is the quickly aggregate unique value using hashset.
> I found using hashset count distinct (non null values) is quite faster.

True. That's related to fast membership checks.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX
Next
From: Jeff Davis
Date:
Subject: Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX