Re: Hash index todo list item - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Hash index todo list item |
Date | |
Msg-id | 200709281848.l8SIm8O22044@momjian.us Whole thread Raw |
In response to | Re: Hash index todo list item (Tom Raney <twraney@comcast.net>) |
List | pgsql-hackers |
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Tom Raney wrote: > We are pleased to announce an upcoming patch to the hash index code > which improves build time and index size, based on this item in the > TODO list: > During index creation, pre-sort the tuples to improve build speed > http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php > > Using our implementation, build times and index sizes are > comparable with btree index build times and index sizes. > For example, for a particular 12 million row relation, the > 8.2.4 release requires over 2.8 hours to build the hash index. > With our patch, the index is built in 80 seconds. > Here is the link for a graph, showing a comparative analysis of > btree and hash index builds and describing the benchmark data. > http://web.cecs.pdx.edu/~raneyt/pg/ > > We are currently cleaning up the patch and will submit it asap. > > Regards, > Shreya Bhargava <shreya_bhargav@yahoo.com> > Tom Raney <twraney@comcast.net> > > > Kenneth Marshall wrote: > > Dear PostgreSQL Hackers: > > > > After following the hackers mailing list for quite a while, > > I am going to start investigating what will need to be done > > to improve hash index performance. Below are the pieces of > > this project that I am currently considering: > > > > 1. Characterize the current hash index implementation against > > the BTree index, with a focus on space utilization and > > lookup performance against a collection of test data. This > > will give a baseline performance test to evaluate the impact > > of changes. I initially do not plan to bench the hash creation > > process since my initial focus will be on lookup performance. > > > > 2. Evaluate the performance of different hash index implementations > > and/or changes to the current implementation. My current plan is > > to keep the implementation as simple as possible and still provide > > the desired performance. Several hash index suggestions deal with > > changing the layout of the keys on a page to improve lookup > > performance, including reducing the bucket size to a fraction of > > a page or only storing the hash value on the page, instead of > > the index value itself. My goal in this phase is to produce one > > or more versions with better performance than the current BTree. > > > > 3. Look at build time and concurrency issues with the addition of > > some additional tests to the test bed. (1) > > > > 4. Repeat as needed. > > > > This is the rough plan. Does anyone see anything critical that > > is missing at this point? Please send me any suggestions for test > > data and various performance test ideas, since I will be working > > on that first. > > > > Regards, > > Ken Marshall > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: