Home > mailing lists

Re: Do we want a hashset type? - Mailing list pgsql-hackers

From	Joel Jacobson
Subject	Re: Do we want a hashset type?
Date	June 11, 2023 10:26:39
Msg-id	2040c023-1a52-4366-9716-8c8507bb6e32@app.fastmail.com Whole thread Raw
In response to	Re: Do we want a hashset type? (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Do we want a hashset type? Re: Do we want a hashset type?
List	pgsql-hackers

Tree view

On Sat, Jun 10, 2023, at 22:26, Tomas Vondra wrote:
> On 6/10/23 17:46, Andrew Dunstan wrote:
>> 
>> Maybe you can post a full patch as well as incremental?
>> 
>
> I wonder if we should keep discussing this extension here, considering
> it's going to be out of core (at least for now). Not sure how many
> pgsql-hackers are interested in this, so maybe we should just move it to
> github PRs or something ...

I think there are some good arguments that speaks in favour of including it in core:

1. It's a fundamental data structure. Perhaps "set" would have been a better name,
since the use of hash functions from an end-user perspective is implementation
details, but we cannot use that word since it's a reserved keyword, hence "hashset".

2. The addition of SQL/PGQ in SQL:2023 is evidence of a general perceived need
among SQL users to evaluate graph queries. Even if a future implementation of SQL/PGQ
would mean users wouldn't need to deal with the hashset type directly, the same
type could hopefully be used, in part or in whole, under the hood by the future 
SQL/PGQ implementation. If low-level functionality is useful on its own, I think it's
a benefit of exposing it to users, since it allows system testing of the component
in isolation, even if it's primarily gonna be used as a smaller part of a larger more
high-level component.

3. I think there is a general need for hashset, experienced by myself, Andrew and
I would guess lots of others users. The general pattern that will be improved is
when you currently would do array_agg(DISTINCT ...)
probably there are other situations too, since it's a fundamental data structure.

On Sat, Jun 10, 2023, at 22:12, Tomas Vondra wrote:
>>> 3) support for other types (now it only works with int32)
> I think we should decide what types we want/need to support, and add one
> or two types early. Otherwise we'll have code / on-disk format making
> various assumptions about the type length etc.
>
> I have no idea what types people use as node IDs - is it likely we'll
> need to support types passed by reference / varlena types? Or can we
> just assume it's int/bigint?

I think we should just support data types that would be sensible
to use as a PRIMARY KEY in a fully normalised data model,
which I believe would only include "int", "bigint" and "uuid".

/Joel

pgsql-hackers by date:

From: vignesh C
Date: 11 June 2023, 03:12:04
Subject: Re: Implement generalized sub routine find_in_log for tap test

From: Tomas Vondra
Date: 11 June 2023, 12:41:27
Subject: Should heapam_estimate_rel_size consider fillfactor?

Re: Do we want a hashset type? - Mailing list pgsql-hackers

Previous

Next