Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Do we want a hashset type?
Date
Msg-id 0c6099ef-1d10-7375-6775-97f10ce01e3d@enterprisedb.com
Whole thread Raw
In response to Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Do we want a hashset type?
List pgsql-hackers

On 6/11/23 12:26, Joel Jacobson wrote:
> On Sat, Jun 10, 2023, at 22:26, Tomas Vondra wrote:
>> On 6/10/23 17:46, Andrew Dunstan wrote:
>>>
>>> Maybe you can post a full patch as well as incremental?
>>>
>>
>> I wonder if we should keep discussing this extension here, considering
>> it's going to be out of core (at least for now). Not sure how many
>> pgsql-hackers are interested in this, so maybe we should just move it to
>> github PRs or something ...
> 
> I think there are some good arguments that speaks in favour of including it in core:
> 
> 1. It's a fundamental data structure. Perhaps "set" would have been a better name,
> since the use of hash functions from an end-user perspective is implementation
> details, but we cannot use that word since it's a reserved keyword, hence "hashset".
> 
> 2. The addition of SQL/PGQ in SQL:2023 is evidence of a general perceived need
> among SQL users to evaluate graph queries. Even if a future implementation of SQL/PGQ
> would mean users wouldn't need to deal with the hashset type directly, the same
> type could hopefully be used, in part or in whole, under the hood by the future 
> SQL/PGQ implementation. If low-level functionality is useful on its own, I think it's
> a benefit of exposing it to users, since it allows system testing of the component
> in isolation, even if it's primarily gonna be used as a smaller part of a larger more
> high-level component.
> 
> 3. I think there is a general need for hashset, experienced by myself, Andrew and
> I would guess lots of others users. The general pattern that will be improved is
> when you currently would do array_agg(DISTINCT ...)
> probably there are other situations too, since it's a fundamental data structure.
> 

I agree with all of that, but ...

It's just past feature freeze, so the earliest release this could appear
in is 17, about 15 months away.

Once stuff gets added to core, it's tied to the release cycle, so no new
features in between.

Presumably people would like to use the extension in the release they
already use, without backporting.

Postgres is extensible for a reason, exactly so that we don't need to
have everything in core.

> On Sat, Jun 10, 2023, at 22:12, Tomas Vondra wrote:
>>>> 3) support for other types (now it only works with int32)
>> I think we should decide what types we want/need to support, and add one
>> or two types early. Otherwise we'll have code / on-disk format making
>> various assumptions about the type length etc.
>>
>> I have no idea what types people use as node IDs - is it likely we'll
>> need to support types passed by reference / varlena types? Or can we
>> just assume it's int/bigint?
> 
> I think we should just support data types that would be sensible
> to use as a PRIMARY KEY in a fully normalised data model,
> which I believe would only include "int", "bigint" and "uuid".
> 

OK, makes sense.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Do we want a hashset type?
Next
From: "Joel Jacobson"
Date:
Subject: Re: Do we want a hashset type?