Re: Hash index build patch has *worse* performance at small table sizes - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Hash index build patch has *worse* performance at small table sizes
Date
Msg-id 200806212234.m5LMYl114305@momjian.us
Whole thread Raw
In response to Hash index build patch has *worse* performance at small table sizes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hash index build patch has *worse* performance at small table sizes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Did we ever do anything about this?

---------------------------------------------------------------------------

Tom Lane wrote:
> I've been reviewing the hash index build patch submitted here:
> http://archives.postgresql.org/pgsql-patches/2007-10/msg00154.php
> 
> Although it definitely helps on large indexes, it's actually
> counterproductive on not-so-large ones.  The test case I'm using
> is random integers generated like this:
>     create table foo as select (random() * N)::int as f1
>       from generate_series(1,N);
>     select count(*) from foo; -- force hint bit updates
>     checkpoint;
> then timing
>     create index fooi on foo using hash(f1);
> 
> Using all-default configuration settings on some not-very-new hardware,
> at N = 1E6 I see
> 
> 8.3.1:                    30 sec
> With pre-expansion of index (CVS HEAD):    24 sec
> With sorting:                72 sec
> To build a btree index on same data:    34 sec
> 
> Now this isn't amazingly surprising, because the original argument for
> doing sorting was to improve locality of access to the index during
> the build, and that only matters if you've got an index significantly
> bigger than memory.  If the index fits in RAM then the sort is pure
> overhead.
> 
> The obvious response to this is to use the sorting approach only when
> the estimated index size exceeds some threshold.  One possible choice of
> threshold would be shared_buffers (or temp_buffers for a temp index)
> but I think that is probably too conservative, because in most scenarios
> the kernel's disk cache is available too.  Plus you can't tweak that
> setting without a postmaster restart.  I'm tempted to use
> effective_cache_size, which attempts to measure an appropriate number
> and can be set locally within the session doing the CREATE INDEX if
> necessary.  Or we could invent a new GUC parameter, but that is probably
> overkill.
> 
> Comments?
> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Not valid dump [8.2.9, 8.3.1]
Next
From: Tom Lane
Date:
Subject: Re: Hash index build patch has *worse* performance at small table sizes