Re: Do we want a hashset type? - Mailing list pgsql-hackers

From jian he
Subject Re: Do we want a hashset type?
Date
Msg-id CACJufxG5OM61djHJ54QkZa6MTytsSN6wkcALT_hOFp9J+=ge3A@mail.gmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Do we want a hashset type?
List pgsql-hackers
On Tue, Jun 27, 2023 at 4:27 PM Joel Jacobson <joel@compiler.org> wrote:
>
> On Tue, Jun 27, 2023, at 04:35, jian he wrote:
> > in SQLMultiSets.pdf(previously thread) I found a related explanation
> > on page 45, 46.
> > /home/jian/hashset/0001-make-int4hashset_contains-strict-and-header-file-change.patch
> > (CASE WHEN OP1 IS NULL OR OP2 IS NULL THEN NULL ELSE MULTISET ( SELECT
> > T1.V FROM UNNEST (OP1) AS T1 (V) INTERSECT SQ SELECT T2.V FROM UNNEST
> > (OP2) AS T2 (V) ) END)
> >
> > CASE WHEN OP1 IS NULL OR OP2 IS NULL THEN NULL ELSE MULTISET ( SELECT
> > T1.V FROM UNNEST (OP1) AS T1 (V) UNION SQ SELECT T2.V FROM UNNEST
> > (OP2) AS T2 (V) ) END
> >
> > (CASE WHEN OP1 IS NULL OR OP2 IS NULL THEN NULL ELSE MULTISET ( SELECT
> > T1.V FROM UNNEST (OP1) AS T1 (V) EXCEPT SQ SELECT T2.V FROM UNNEST
> > (OP2) AS T2 (V) ) END)
>
> Thanks! This was exactly what I was looking for, I knew I've seen it but failed to find it.
>
> Attached is a new incremental patch as well as a full patch, since this is a substantial change:
>
>     Align null semantics with SQL:2023 array and multiset standards
>
>     * Introduced a new boolean field, null_element, in the int4hashset_t type.
>
>     * Rename hashset_count() to hashset_cardinality().
>
>     * Rename hashset_merge() to hashset_union().
>
>     * Rename hashset_equals() to hashset_eq().
>
>     * Rename hashset_neq() to hashset_ne().
>
>     * Add hashset_to_sorted_array().
>
>     * Handle null semantics to work as in arrays and multisets.
>
>     * Update int4hashset_add() to allow creating a new set if none exists.
>
>     * Use more portable int32 typedef instead of int32_t.
>
>     This also adds a thorough test suite in array-and-multiset-semantics.sql,
>     which aims to test all relevant combinations of operations and values.
>
>  Makefile                                       |   2 +-
>  README.md                                      |   6 ++--
>  hashset--0.0.1.sql                             |  37 +++++++++++---------
>  hashset-api.c                                  | 208
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------
>  hashset.c                                      |  12 ++++++-
>  hashset.h                                      |  11 +++---
>  test/expected/array-and-multiset-semantics.out | 365
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  test/expected/basic.out                        |  12 +++----
>  test/expected/reported_bugs.out                |   6 ++--
>  test/expected/strict.out                       | 114 ------------------------------------------------------------
>  test/expected/table.out                        |   8 ++---
>  test/sql/array-and-multiset-semantics.sql      | 232
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  test/sql/basic.sql                             |   4 +--
>  test/sql/benchmark.sql                         |  14 ++++----
>  test/sql/reported_bugs.sql                     |   6 ++--
>  test/sql/strict.sql                            |  32 -----------------
>  test/sql/table.sql                             |   2 +-
>  17 files changed, 823 insertions(+), 248 deletions(-)
>
> /Joel

Hi there.
I changed the function hashset_contains to strict.
also change the way to return an empty array.

in benchmark.sql, would it be ok to use EXPLAIN to demonstrate that
int4hashset can speed distinct aggregate and distinct counts?
like the following:

explain(analyze, costs off, timing off, buffers)
SELECT array_agg(DISTINCT i) FROM benchmark_input_100k \watch c=3

explain(analyze, costs off, timing off, buffers)
SELECT hashset_agg(i) FROM benchmark_input_100k \watch c=3

explain(costs off,timing off, analyze,buffers)
select count(distinct rnd) from benchmark_input_100k \watch c=3

explain(costs off,timing off, analyze,buffers)
SELECT hashset_cardinality(x) FROM (SELECT hashset_agg(rnd) FROM
benchmark_input_100k) sub(x) \watch c=3

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: sslinfo extension - add notbefore and notafter timestamps
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication