Re: Hash index todo list item - Mailing list pgsql-hackers

From Tom Raney
Subject Re: Hash index todo list item
Date
Msg-id 46F30C7D.6060402@comcast.net
Whole thread Raw
In response to Hash index todo list item  (Kenneth Marshall <ktm@rice.edu>)
Responses Re: Hash index todo list item
Re: Hash index todo list item
List pgsql-hackers
We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
For example, for a particular 12 million row relation, the
8.2.4 release requires over 2.8 hours to build the hash index. 
With our patch, the index is built in 80 seconds.
Here is the link for a graph, showing a comparative analysis of
btree and hash index builds and describing the benchmark data.
http://web.cecs.pdx.edu/~raneyt/pg/

We are currently cleaning up the patch and will submit it asap.

Regards,
Shreya Bhargava <shreya_bhargav@yahoo.com>
Tom Raney <twraney@comcast.net>


Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am currently considering:
>
> 1. Characterize the current hash index implementation against
>    the BTree index, with a focus on space utilization and
>    lookup performance against a collection of test data. This
>    will give a baseline performance test to evaluate the impact
>    of changes. I initially do not plan to bench the hash creation
>    process since my initial focus will be on lookup performance.
>
> 2. Evaluate the performance of different hash index implementations
>    and/or changes to the current implementation. My current plan is
>    to keep the implementation as simple as possible and still provide
>    the desired performance. Several hash index suggestions deal with
>    changing the layout of the keys on a page to improve lookup
>    performance, including reducing the bucket size to a fraction of
>    a page or only storing the hash value on the page, instead of
>    the index value itself. My goal in this phase is to produce one
>    or more versions with better performance than the current BTree.
>    
> 3. Look at build time and concurrency issues with the addition of
>    some additional tests to the test bed. (1)
>
> 4. Repeat as needed.
>
> This is the rough plan. Does anyone see anything critical that
> is missing at this point? Please send me any suggestions for test
> data and various performance test ideas, since I will be working
> on that first.
>
> Regards,
> Ken Marshall 
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Silence Solaris compiler warnings, per buildfarm.
Next
From: Neil Conway
Date:
Subject: Re: stored procedure stats in collector