Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Do we want a hashset type?
Date
Msg-id cbfb0346-4f05-493d-a7dc-4c6bdc7efc99@app.fastmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On Tue, Jun 20, 2023, at 02:04, Tomas Vondra wrote:
> For UPDATE, it'd be pretty clear too, I think. It's possible to do
>
>    UPDATE table SET col = SET[1,2,3]
>
> and it's clear the first is the command SET, while the second is a set
> constructor. For SELECT there'd be conflict, and for ALTER TABLE it'd be
> possible to do
>
>    ALTER TABLE table ALTER COLUMN col SET DEFAULT SET[1,2,3];
>
> Seems clear to me too, I think.
...
> It's a matter of personal taste, I guess. I'm fine with calling function
> API and what not, but a sensible SQL syntax seems nicer.

Now when I see it written out, I actually agree looks nice.

>> I think it's still meaningful to continue hacking on the int4-type
>> hashset extension, to see if we can agree on the semantics,
>> especially around null handling and sorting.
>> 
>
> Definitely. It certainly was not my intention to derail the work by
> proposing more and more stuff. So feel free to pursue what makes sense
> to you / helps the use case.

OK, cool, and didn't mean at all that you did. I appreciate the long-term
perspective, otherwise our short-term work might go wasted.

> TBH I don't particularly see why we'd want to sort sets.

Me neither, sorting sets in the conventional, visually coherent sense
(i.e., lexicographically) doesn't seem necessary. However, for ORDER BY hashset
functionality, we need a we need a stable and deterministic method.

This can be achieved performance-efficiently by computing a commutative hash of
the hashset, XORing each new value's hash with set->hash:

        set->hash ^= hash;

...and then sort primarily by set->hash.

Though resulting in an apparently random order, this approach, already employed
in int4hashset_add_element() and int4hashset_cmp(), ensures a deterministic and
stable sorting order.

I think this an acceptable trade-off, better than not supporting ORDER BY.

Jian He had some comments on hashset_cmp() which I will look at.

/Joel



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: allow granting CLUSTER, REFRESH MATERIALIZED VIEW, and REINDEX
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific