Home > mailing lists

Fixing hash index build time - Mailing list pgsql-hackers

From	Tom Lane
Subject	Fixing hash index build time
Date	March 20, 2007 16:52:37
Msg-id	16911.1174420341@sss.pgh.pa.us Whole thread Raw
Responses	Re: Fixing hash index build time
List	pgsql-hackers

Tree view

There are several reasons why Postgres' hash indexes currently suck,
but one of the bigger ones is that the time to build an index on a
large existing table is excessive, eg
http://archives.postgresql.org/pgsql-novice/2007-03/msg00064.php

I'm not sure if this has been discussed before, but I suddenly realized
while responding to the above message that the reason for the awful
performance is pretty obvious: hashbuild starts with a minimum-size
index (two buckets) and repeatedly splits buckets as insertions are
done, exactly the same as ordinary dynamic growth of the index would do.
This means that for an N-row table, approximately N/entries-per-bucket
splits need to occur during index build, which results in roughly O(N^2)
performance because we have to reprocess already-inserted entries over
and over.  This explains the empiric observation I made a long time ago:
http://archives.postgresql.org/pgsql-hackers/2002-04/msg01379.php

This could be fixed with a relatively small amount of new code: when
beginning hashbuild, estimate the parent table's rowcount (the same
method used by the planner will do fine, viz RelationGetNumberOfBlocks
times an estimated tuple density) and construct the appropriate number
of buckets immediately.  No splits, just write out empty pages as fast
as we can.  *Then* do the insertions.

Comments?
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 20 March 2007, 16:26:51
Subject: Re: Money type todos?

From: "D'Arcy J.M. Cain"
Date: 20 March 2007, 17:05:47
Subject: Re: Money type todos?

Fixing hash index build time - Mailing list pgsql-hackers

Previous

Next