Home > mailing lists

Re: Fixing hash index build time - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Fixing hash index build time
Date	March 22, 2007 06:06:49
Msg-id	1174554204.3703.3.camel@localhost.localdomain Whole thread
In response to	Re: Fixing hash index build time (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Fixing hash index build time Re: Fixing hash index build time
List	pgsql-hackers

Tree view

Ühel kenal päeval, K, 2007-03-21 kell 17:25, kirjutas Bruce Momjian:
> Added to TODO:
> 
>         o During index creation, pre-sort the tuples to improve build speed
> 
>          http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

Maybe the TODO text should mention, that it is about HASH indexes ?

> ---------------------------------------------------------------------------
> 
> Tom Lane wrote:
> > I wrote:
> > > I'm not sure if this has been discussed before, but I suddenly realized
> > > while responding to the above message that the reason for the awful
> > > performance is pretty obvious: hashbuild starts with a minimum-size
> > > index (two buckets) and repeatedly splits buckets as insertions are
> > > done, exactly the same as ordinary dynamic growth of the index would do.
> > > This means that for an N-row table, approximately N/entries-per-bucket
> > > splits need to occur during index build, which results in roughly O(N^2)
> > > performance because we have to reprocess already-inserted entries over
> > > and over.
> > 
> > Well, unfortunately this theory seems to be all wet.  Given that the
> > bucket loading is reasonably even, the time to split a bucket is about
> > constant and so there's no O(N^2) effect.  (The multiplier hidden inside
> > O(N) is pretty awful, but it doesn't change with N.)
> > 
> > The real reason why performance falls off a cliff for building large
> > hash indexes seems to be much harder to fix: basically, once the size
> > of your index exceeds working memory, it's nap time.  Given that the
> > incoming data has randomly distributed hash values, each bucket is about
> > as likely to be touched next as any other; there is no locality of
> > access and so the "working set" is the same size as the index.  Once it
> > doesn't fit in RAM anymore you're into swap hell.
> > 
> > The only way I can see to fix that is to try to impose some locality of
> > access during the index build.  This is not impossible: for example,
> > given a choice for the number of buckets, we could sort all the index
> > tuples by hashed bucket number and then start inserting.  btree does a
> > preliminary sort, and its index build times are way more reasonable
> > than hash's currently are, so the cost of the sort isn't outrageous.
> > (I note this is mainly because we know how to do sorting with locality
> > of access...)  Before we start inserting we will know exactly how many
> > tuples there are, so we can pre-create the right number of buckets and
> > be sure that no on-the-fly splits will be needed for the rest of the
> > build.  If we guessed wrong about the number of buckets there will be
> > some places in the process where we concurrently insert into several
> > buckets not just one, or perhaps come back to a bucket that we touched
> > earlier, but that's still maintaining plenty of locality of access.
> > 
> > This is looking like more work than I want to do in the near future,
> > but I thought I'd put it into the archives for someone to tackle.
> > Bruce, would you add a TODO item linking to this:
> > 
> >     * Improve hash index build time by sorting
> > 
> >             regards, tom lane
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's datatypes do not
> >        match
> 
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.

pgsql-hackers by date:

From: Grzegorz Jaskiewicz
Date: 22 March 2007, 04:15:55
Subject: Re: [PATCHES] Bitmapscan changes

From: "Dany DeBontridder"
Date: 22 March 2007, 06:19:18
Subject: Re: Patch for pg_dump

Re: Fixing hash index build time - Mailing list pgsql-hackers

Previous

Next