Re: Do we want a hashset type? - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Do we want a hashset type?
Date
Msg-id 242f23fb-3780-4e04-8fee-335fef8a6ae6@app.fastmail.com
Whole thread Raw
In response to Re: Do we want a hashset type?  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Do we want a hashset type?
List pgsql-hackers
New patch attached:

Add customizable params to int4hashset() and collision count function

This commit enhances int4hashset() by introducing adjustable capacity,
load, and growth factors, providing flexibility for performance optimization.

Also added is a new function, hashset_collisions(), to report collision
counts, aiding in performance tuning.

Aggregate functions are renamed to hashset_agg() for consistency with
array_agg() and range_agg().

A new test file, test/sql/benchmark.sql, is added for evaluating the
performance of hash functions. It's not run automatically by
make installcheck.

The adjustable parameters and the naive hash function are useful for testing
and performance comparison. However, to keep things simple and streamlined
for users, these features are likely to be removed in the final release,
emphasizing the use of well-optimized default settings.

SQL-function indentation is also adjusted to align with the PostgreSQL
source repo, improving readability.

In the benchmark results below, it was a bit surprising the naive hash
function had no collisions, but that only held true when the input
elements were sequential integers. When tested with random integers,
all three hash functions caused collisions.

Timing results not statistical significant, the purpose is just to
give an idea of the execution times.

*** Elements in sequence 1..100000
- Testing default hash function (Jenkins/lookup3)
psql:test/sql/benchmark.sql:23: NOTICE:  hashset_count: 100000
psql:test/sql/benchmark.sql:23: NOTICE:  hashset_capacity: 262144
psql:test/sql/benchmark.sql:23: NOTICE:  hashset_collisions: 31195
DO
Time: 1342.564 ms (00:01.343)
- Testing Murmurhash32
psql:test/sql/benchmark.sql:40: NOTICE:  hashset_count: 100000
psql:test/sql/benchmark.sql:40: NOTICE:  hashset_capacity: 262144
psql:test/sql/benchmark.sql:40: NOTICE:  hashset_collisions: 30879
DO
Time: 1297.823 ms (00:01.298)
- Testing naive hash function
psql:test/sql/benchmark.sql:57: NOTICE:  hashset_count: 100000
psql:test/sql/benchmark.sql:57: NOTICE:  hashset_capacity: 262144
psql:test/sql/benchmark.sql:57: NOTICE:  hashset_collisions: 0
DO
Time: 1400.936 ms (00:01.401)
*** Testing 100000 random ints
    setseed
---------

(1 row)

Time: 3.591 ms
- Testing default hash function (Jenkins/lookup3)
psql:test/sql/benchmark.sql:77: NOTICE:  hashset_count: 100000
psql:test/sql/benchmark.sql:77: NOTICE:  hashset_capacity: 262144
psql:test/sql/benchmark.sql:77: NOTICE:  hashset_collisions: 30919
DO
Time: 1415.497 ms (00:01.415)
    setseed
---------

(1 row)

Time: 1.282 ms
- Testing Murmurhash32
psql:test/sql/benchmark.sql:95: NOTICE:  hashset_count: 100000
psql:test/sql/benchmark.sql:95: NOTICE:  hashset_capacity: 262144
psql:test/sql/benchmark.sql:95: NOTICE:  hashset_collisions: 30812
DO
Time: 2079.202 ms (00:02.079)
    setseed
---------

(1 row)

Time: 0.122 ms
- Testing naive hash function
psql:test/sql/benchmark.sql:113: NOTICE:  hashset_count: 100000
psql:test/sql/benchmark.sql:113: NOTICE:  hashset_capacity: 262144
psql:test/sql/benchmark.sql:113: NOTICE:  hashset_collisions: 30822
DO
Time: 1613.965 ms (00:01.614)

/Joel
Attachment

pgsql-hackers by date:

Previous
From: Nikita Malakhov
Date:
Subject: Re: Pluggable toaster
Next
From: shveta malik
Date:
Subject: Re: Support logical replication of DDLs