Dear PostgreSQL Hackers:
After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am currently considering:
1. Characterize the current hash index implementation against the BTree index, with a focus on space utilization and
lookupperformance against a collection of test data. This will give a baseline performance test to evaluate the impact
of changes. I initially do not plan to bench the hash creation process since my initial focus will be on lookup
performance.
2. Evaluate the performance of different hash index implementations and/or changes to the current implementation. My
currentplan is to keep the implementation as simple as possible and still provide the desired performance. Several
hashindex suggestions deal with changing the layout of the keys on a page to improve lookup performance, including
reducingthe bucket size to a fraction of a page or only storing the hash value on the page, instead of the index
valueitself. My goal in this phase is to produce one or more versions with better performance than the current BTree.
3. Look at build time and concurrency issues with the addition of some additional tests to the test bed. (1)
4. Repeat as needed.
This is the rough plan. Does anyone see anything critical that
is missing at this point? Please send me any suggestions for test
data and various performance test ideas, since I will be working
on that first.
Regards,
Ken Marshall