Re: Next Steps with Hash Indexes - Mailing list pgsql-hackers
From | Sadhuprasad Patro |
---|---|
Subject | Re: Next Steps with Hash Indexes |
Date | |
Msg-id | CAFF0-CHqhQKv7F_Yo8QwyKkbCjCxWyJusRN+jvb_sC3ompwfEQ@mail.gmail.com Whole thread Raw |
In response to | Re: Next Steps with Hash Indexes (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Next Steps with Hash Indexes
|
List | pgsql-hackers |
On Fri, Aug 13, 2021 at 11:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Aug 13, 2021 at 9:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 12, 2021 at 8:30 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > The design of the patch has changed since the initial proposal. It > > > > tries to perform unique inserts by holding a write lock on the bucket > > > > page to avoid duplicate inserts. > > > > > > Do you mean that you're holding a buffer lwlock while you search the > > > whole bucket? If so, that's surely not OK. > > > > > > > I think here you are worried that after holding lwlock we might > > perform reads of overflow pages which is not a good idea. I think > > there are fewer chances of having overflow pages for unique indexes so > > we don't expect such cases in common > > I think if we identify the bucket based on the hash value of all the > columns then there should be a fewer overflow bucket, but IIUC, in > this patch bucket, is identified based on the hash value of the first > column only so there could be a lot of duplicates on the first column. IMHO, as discussed above, since other databases also have the limitation that if you create a multi-column hash index then the hash index can not be used until all the key columns are used in the search condition. So my point is that users might be using the hash index with this limitation and their use case might be that they want to gain the best performance when they use this particular case and they might not be looking for much flexibility like we provide in BTREE. For reference: https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/indexes-for-memory-optimized-tables?view=sql-server-ver15 We already know that performance will be better with a single hash for multiple columns, but still I just wanted to check the performance difference in PG. This might help us to decide the approach we need to go for. With a quick POC of both the ideas, I have observed there is a major performance advantage with single combined hash for multi-key columns. Performance Test Details: (Used PGBENCH Tool) Initialize cmd: “./pgbench -i -s 100 -d postgres" postgres=# \d+ pgbench_accounts Table "public.pgbench_accounts" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description ----------+---------------+-----------+----------+---------+----------+-------------+--------------+------------- aid | integer | | not null | | plain | | | bid | integer | | | | plain | | | abalance | integer | | | | plain | | | filler | character(84) | | | | extended | | | Indexes: "pgbench_accounts_idx" hash (aid, bid) Access method: heap Options: fillfactor=100 Test Command: “./pgbench -j 1 postgres -C -M prepared -S -T 300” Performance Test Results: Idea-1: Single Hash value for multiple key columns TPS = ~292 Idea-2: Separate Hash values for each key column. But use only the first one to search the bucket. Other hash values are used as payload to get to the matching tuple before going to the heap. TPS = ~212 Note: Here we got near to 25% better performance in a single combine hash approach with only TWO key columns. If we go for separate Hash values for all key columns mentioned then there will be a performance dip and storage also will be relatively higher when we have more key columns. I have just done separate POC patches to get the performance results as mentioned above, there are many other scenarios like split case, to be taken care further. Attaching the POC patches here just for reference… Thanks & Regards SadhuPrasad EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: