Home > mailing lists

Re: Fixing hash index build time - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Fixing hash index build time
Date	March 22, 2007 11:22:21
Msg-id	200703221422.l2MEM7r21548@momjian.us Whole thread Raw
In response to	Re: Fixing hash index build time (Hannu Krosing <hannu@skype.net>)
List	pgsql-hackers

Tree view

Hannu Krosing wrote:
> ?hel kenal p?eval, K, 2007-03-21 kell 17:25, kirjutas Bruce Momjian:
> > Added to TODO:
> > 
> >         o During index creation, pre-sort the tuples to improve build speed
> > 
> >          http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
> 
> Maybe the TODO text should mention, that it is about HASH indexes ?

It is in the HASH section of the TODO list.

---------------------------------------------------------------------------


> 
> > ---------------------------------------------------------------------------
> > 
> > Tom Lane wrote:
> > > I wrote:
> > > > I'm not sure if this has been discussed before, but I suddenly realized
> > > > while responding to the above message that the reason for the awful
> > > > performance is pretty obvious: hashbuild starts with a minimum-size
> > > > index (two buckets) and repeatedly splits buckets as insertions are
> > > > done, exactly the same as ordinary dynamic growth of the index would do.
> > > > This means that for an N-row table, approximately N/entries-per-bucket
> > > > splits need to occur during index build, which results in roughly O(N^2)
> > > > performance because we have to reprocess already-inserted entries over
> > > > and over.
> > > 
> > > Well, unfortunately this theory seems to be all wet.  Given that the
> > > bucket loading is reasonably even, the time to split a bucket is about
> > > constant and so there's no O(N^2) effect.  (The multiplier hidden inside
> > > O(N) is pretty awful, but it doesn't change with N.)
> > > 
> > > The real reason why performance falls off a cliff for building large
> > > hash indexes seems to be much harder to fix: basically, once the size
> > > of your index exceeds working memory, it's nap time.  Given that the
> > > incoming data has randomly distributed hash values, each bucket is about
> > > as likely to be touched next as any other; there is no locality of
> > > access and so the "working set" is the same size as the index.  Once it
> > > doesn't fit in RAM anymore you're into swap hell.
> > > 
> > > The only way I can see to fix that is to try to impose some locality of
> > > access during the index build.  This is not impossible: for example,
> > > given a choice for the number of buckets, we could sort all the index
> > > tuples by hashed bucket number and then start inserting.  btree does a
> > > preliminary sort, and its index build times are way more reasonable
> > > than hash's currently are, so the cost of the sort isn't outrageous.
> > > (I note this is mainly because we know how to do sorting with locality
> > > of access...)  Before we start inserting we will know exactly how many
> > > tuples there are, so we can pre-create the right number of buckets and
> > > be sure that no on-the-fly splits will be needed for the rest of the
> > > build.  If we guessed wrong about the number of buckets there will be
> > > some places in the process where we concurrently insert into several
> > > buckets not just one, or perhaps come back to a bucket that we touched
> > > earlier, but that's still maintaining plenty of locality of access.
> > > 
> > > This is looking like more work than I want to do in the near future,
> > > but I thought I'd put it into the archives for someone to tackle.
> > > Bruce, would you add a TODO item linking to this:
> > > 
> > >     * Improve hash index build time by sorting
> > > 
> > >             regards, tom lane
> > > 
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > >        choose an index scan if your joining column's datatypes do not
> > >        match
> > 
> -- 
> ----------------
> Hannu Krosing
> Database Architect
> Skype Technologies O?
> Akadeemia tee 21 F, Tallinn, 12618, Estonia
> 
> Skype me:  callto:hkrosing
> Get Skype for free:  http://www.skype.com
> 
> NOTICE: This communication contains privileged or other confidential
> information. If you have received it in error, please advise the sender
> by reply email and immediately delete the message and any attachments
> without copying or disclosing the contents.
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

pgsql-hackers by date:

From: "Merlin Moncure"
Date: 22 March 2007, 11:11:00
Subject: Re: Function cache regeneration

From: Tom Lane
Date: 22 March 2007, 11:41:23
Subject: Re: Fixing hash index build time

Re: Fixing hash index build time - Mailing list pgsql-hackers

Previous

Next