Re: Do we want a hashset type? - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Do we want a hashset type? |
Date | |
Msg-id | 242f23fb-3780-4e04-8fee-335fef8a6ae6@app.fastmail.com Whole thread Raw |
In response to | Re: Do we want a hashset type? ("Joel Jacobson" <joel@compiler.org>) |
Responses |
Re: Do we want a hashset type?
|
List | pgsql-hackers |
New patch attached: Add customizable params to int4hashset() and collision count function This commit enhances int4hashset() by introducing adjustable capacity, load, and growth factors, providing flexibility for performance optimization. Also added is a new function, hashset_collisions(), to report collision counts, aiding in performance tuning. Aggregate functions are renamed to hashset_agg() for consistency with array_agg() and range_agg(). A new test file, test/sql/benchmark.sql, is added for evaluating the performance of hash functions. It's not run automatically by make installcheck. The adjustable parameters and the naive hash function are useful for testing and performance comparison. However, to keep things simple and streamlined for users, these features are likely to be removed in the final release, emphasizing the use of well-optimized default settings. SQL-function indentation is also adjusted to align with the PostgreSQL source repo, improving readability. In the benchmark results below, it was a bit surprising the naive hash function had no collisions, but that only held true when the input elements were sequential integers. When tested with random integers, all three hash functions caused collisions. Timing results not statistical significant, the purpose is just to give an idea of the execution times. *** Elements in sequence 1..100000 - Testing default hash function (Jenkins/lookup3) psql:test/sql/benchmark.sql:23: NOTICE: hashset_count: 100000 psql:test/sql/benchmark.sql:23: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:23: NOTICE: hashset_collisions: 31195 DO Time: 1342.564 ms (00:01.343) - Testing Murmurhash32 psql:test/sql/benchmark.sql:40: NOTICE: hashset_count: 100000 psql:test/sql/benchmark.sql:40: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:40: NOTICE: hashset_collisions: 30879 DO Time: 1297.823 ms (00:01.298) - Testing naive hash function psql:test/sql/benchmark.sql:57: NOTICE: hashset_count: 100000 psql:test/sql/benchmark.sql:57: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:57: NOTICE: hashset_collisions: 0 DO Time: 1400.936 ms (00:01.401) *** Testing 100000 random ints setseed --------- (1 row) Time: 3.591 ms - Testing default hash function (Jenkins/lookup3) psql:test/sql/benchmark.sql:77: NOTICE: hashset_count: 100000 psql:test/sql/benchmark.sql:77: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:77: NOTICE: hashset_collisions: 30919 DO Time: 1415.497 ms (00:01.415) setseed --------- (1 row) Time: 1.282 ms - Testing Murmurhash32 psql:test/sql/benchmark.sql:95: NOTICE: hashset_count: 100000 psql:test/sql/benchmark.sql:95: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:95: NOTICE: hashset_collisions: 30812 DO Time: 2079.202 ms (00:02.079) setseed --------- (1 row) Time: 0.122 ms - Testing naive hash function psql:test/sql/benchmark.sql:113: NOTICE: hashset_count: 100000 psql:test/sql/benchmark.sql:113: NOTICE: hashset_capacity: 262144 psql:test/sql/benchmark.sql:113: NOTICE: hashset_collisions: 30822 DO Time: 1613.965 ms (00:01.614) /Joel
Attachment
pgsql-hackers by date: