On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote:
> The patch store hash code only in the index tuple.
> It based on Neil Conway's patch with an old version of PostgreSQL.
> It passes the regression test but I didn't test the performance yet.
> Anyone interested can make a performance test;-)
> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> original implementation.
> It's a preliminary implementation and I'm looking for input here.
> Hope to hear from you.
I've spent some time today performing tests similar to those mentioned
here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)
Using a word list of 2650024 unique words (maximum length is 118
bytes), build times are still high, but I'm not really seeing any
performance improvements over b-tree. I haven't profiled it yet, but
my test is as follows:
- Created the dict table
- Loaded the dict table
- Counted the records in the dict table
- Created the index
- Shutdown the database
- Randomly selected 200 entries from the word list and built a file
full of (SELECT * FROM dict WHERE word = '<word>') queries using them.
- Cleared out the kernel cache
- Started the database
- Ran the query file
The result of this is between 5-10ms improvement in the overall
execution time of all 200 queries. The time-per-query is practically
unnoticeable. As this is in the range of noise, methinks there's a
larger problem with hash indexes. I haven't looked heavily into their
implementation, but do you any of you know of any major design flaws?
--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/