Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Next Steps with Hash Indexes |
Date | |
Msg-id | CAA4eK1KM+vFv9PBAkE9QKgMsuHKY-kzRdaaodSN3suDo8kYCYg@mail.gmail.com Whole thread Raw |
In response to | Re: Next Steps with Hash Indexes (Sadhuprasad Patro <b.sadhu@gmail.com>) |
Responses |
Re: Next Steps with Hash Indexes
|
List | pgsql-hackers |
On Fri, Aug 27, 2021 at 4:27 PM Sadhuprasad Patro <b.sadhu@gmail.com> wrote: > > IMHO, as discussed above, since other databases also have the > limitation that if you create a multi-column hash index then the hash > index can not be used until all the key columns are used in the search > condition. So my point is that users might be using the hash index > with this limitation and their use case might be that they want to > gain the best performance when they use this particular case and they > might not be looking for much flexibility like we provide in BTREE. > > For reference: > https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html > https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/indexes-for-memory-optimized-tables?view=sql-server-ver15 > > We already know that performance will be better with a single hash for > multiple columns, but still I just wanted to check the performance > difference in PG. This might help us to decide the approach we need to > go for. With a quick POC of both the ideas, I have observed there is a > major performance advantage with single combined hash for multi-key > columns. > > Performance Test Details: (Used PGBENCH Tool) > Initialize cmd: “./pgbench -i -s 100 -d postgres" > > postgres=# \d+ pgbench_accounts > > Table "public.pgbench_accounts" > > Column | Type | Collation | Nullable | Default | Storage > | Compression | Stats target | Description > > ----------+---------------+-----------+----------+---------+----------+-------------+--------------+------------- > > aid | integer | | not null | | plain > | | | > bid | integer | | | | plain > | | | > abalance | integer | | | | plain > | | | > filler | character(84) | | | | extended > | | | > > Indexes: > "pgbench_accounts_idx" hash (aid, bid) > Access method: heap > Options: fillfactor=100 > > Test Command: “./pgbench -j 1 postgres -C -M prepared -S -T 300” > > Performance Test Results: > Idea-1: Single Hash value for multiple key columns > TPS = ~292 > > Idea-2: Separate Hash values for each key column. But use only the > first one to search the bucket. Other hash values are used as payload > to get to the matching tuple before going to the heap. > TPS = ~212 > > Note: Here we got near to 25% better performance in a single combine > hash approach with only TWO key columns. > That's a significant difference. Have you checked via perf or some other way what causes this difference? I have seen that sometimes single client performance with pgbench is not stable, so can you please once check with 4 clients or so and possibly with a larger dataset as well. One more thing to consider is that it seems that the planner requires a condition for the first column of an index before considering an indexscan plan. See Tom's email [1] in this regard. I think it would be better to see what kind of work is involved there if you want to explore a single hash value for all columns idea. [1] - https://www.postgresql.org/message-id/29263.1506483172%40sss.pgh.pa.us -- With Regards, Amit Kapila.
pgsql-hackers by date: