Thread: CREATE INDEX rather sluggish

CREATE INDEX rather sluggish

From
Gavin Hamill
Date:
Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm
finding that it's taking an age to CREATE INDEX on a large table:

    Column     |          Type          |                              Modifiers
----------------+------------------------+---------------------------------------------------------------------
 ID             | integer                | not null default nextval(('public.keyword_id_seq'::text)::regclass)
 Text           | character varying(200) |
 Longitude      | numeric(16,5)          |
 Latitude       | numeric(16,5)          |
 AreaID         | integer                |
 SearchCount    | integer                | not null default 0
 Radius         | integer                |
 LanguageID     | integer                |
 KeywordType    | character varying(20)  |
 LowerText      | character varying(200) |
 NumberOfHotels | integer                |
 CountryID      | integer                |
 FriendlyText   | character varying(200) |
Indexes:


2006-03-29 21:39:38 BST LOG:  duration: 41411.625 ms  statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING
btree("AreaID"); 
2006-03-29 21:42:46 BST LOG:  duration: 188550.644 ms  statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING
btree("Longitude", "Latitude"); 
2006-03-29 21:46:41 BST LOG:  duration: 234864.571 ms  statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USING
btree("LowerText"); 
2006-03-29 21:52:32 BST LOG:  duration: 350757.565 ms  statement: CREATE INDEX ix_keyword_type ON "Keyword" USING btree
("KeywordType");

The table has just under six million rows - should it really be taking
nearly six minutes to add an index? These log snippets were taking
during a pg_restore on a newly created db, so there should be no issues
with the table needing vacuuming.

What parameters in the postgresql.conf are pertinent here? I have

shared_buffers 120000
work_mem 16384
maintenance_work_mem = 262144

for starters... any advice would be warmly welcomed!

Cheers,
Gavin.


Re: CREATE INDEX rather sluggish

From
Tom Lane
Date:
Gavin Hamill <gdh@laterooms.com> writes:
> The table has just under six million rows - should it really be taking
> nearly six minutes to add an index?

Try running it with trace_sort enabled to get more info about where the
time is going.

We've been doing some considerable work on the sorting code in the last
couple months, so 8.2 should be better, but I'd like to verify that
you're not seeing something we don't know about.

> maintenance_work_mem = 262144

Fooling with this might affect the results some.

            regards, tom lane

Re: CREATE INDEX rather sluggish

From
Simon Riggs
Date:
On Thu, 2006-03-30 at 09:26 +0100, Gavin Hamill wrote:
> Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm
> finding that it's taking an age to CREATE INDEX on a large table:
>
>     Column     |          Type          |                              Modifiers
> ----------------+------------------------+---------------------------------------------------------------------
>  ID             | integer                | not null default nextval(('public.keyword_id_seq'::text)::regclass)
>  Text           | character varying(200) |
>  Longitude      | numeric(16,5)          |
>  Latitude       | numeric(16,5)          |
>  AreaID         | integer                |
>  SearchCount    | integer                | not null default 0
>  Radius         | integer                |
>  LanguageID     | integer                |
>  KeywordType    | character varying(20)  |
>  LowerText      | character varying(200) |
>  NumberOfHotels | integer                |
>  CountryID      | integer                |
>  FriendlyText   | character varying(200) |
> Indexes:
>
>
> 2006-03-29 21:39:38 BST LOG:  duration: 41411.625 ms  statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING
btree("AreaID"); 
> 2006-03-29 21:42:46 BST LOG:  duration: 188550.644 ms  statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING
btree("Longitude", "Latitude"); 
> 2006-03-29 21:46:41 BST LOG:  duration: 234864.571 ms  statement: CREATE INDEX ix_keyword_lowertext ON "Keyword"
USINGbtree ("LowerText"); 
> 2006-03-29 21:52:32 BST LOG:  duration: 350757.565 ms  statement: CREATE INDEX ix_keyword_type ON "Keyword" USING
btree("KeywordType"); 
>
> The table has just under six million rows - should it really be taking
> nearly six minutes to add an index? These log snippets were taking
> during a pg_restore on a newly created db, so there should be no issues
> with the table needing vacuuming.

The index build time varies according to the number and type of the
datatypes, as well as the distribution of values in the table. As well
as the number of rows in the table.

Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20))

> What parameters in the postgresql.conf are pertinent here? I have
>
> shared_buffers 120000
> work_mem 16384
> maintenance_work_mem = 262144

Try trace_sort = on and then rerun the index builds to see what's
happening there. We've speeded sort up by about 2.5 times in the current
development version, but it does just run in single threaded mode so
your 8 CPUs aren't helping there.

Looks like you might be just over the maintenance_work_mem limit for the
last index builds. You can try doubling maintenance_work_mem.

The extended runtime for KeywordType is interesting in comparison to
LowerText, which on the face of it is a longer column. My guess would be
that LowerText is fairly unique and sorts quickly, whereas KeywordType
is fairly non-unique with a high average row length that require
complete string comparison before deciding it is actually the same
value. You might want to try using codes rather than textual
KeywordTypes.

You might try using partial indexes also, along the lines of

CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL;

Best Regards, Simon Riggs


Re: CREATE INDEX rather sluggish

From
Gavin Hamill
Date:
Tom Lane wrote:

>Gavin Hamill <gdh@laterooms.com> writes:
>
>
>>The table has just under six million rows - should it really be taking
>>nearly six minutes to add an index?
>>
>>
>
>Try running it with trace_sort enabled to get more info about where the
>time is going.
>
>We've been doing some considerable work on the sorting code in the last
>couple months, so 8.2 should be better, but I'd like to verify that
>you're not seeing something we don't know about.
>
>
>
OKies, I dropped the db, created again so it's all clean, ran pg_restore
again with trace_sort on - here's the output from one of the larger
CREATE INDEXes:

2006-03-30 16:48:53 BST LOG:  begin index sort: unique = f, workMem =
262144, randomAccess = f
2006-03-30 16:49:04 BST LOG:  switching to external sort: CPU
0.88s/9.99u sec elapsed 10.90 sec

2006-03-30 16:49:44 BST LOG:  autovacuum: processing database "postgres"
2006-03-30 16:50:38 BST LOG:  performsort starting: CPU 1.69s/102.73u
sec elapsed 104.58 sec
2006-03-30 16:50:44 BST LOG:  autovacuum: processing database "laterooms"
2006-03-30 16:51:44 BST LOG:  autovacuum: processing database "postgres"
2006-03-30 16:52:23 BST LOG:  finished writing run 1: CPU 2.40s/206.53u
sec elapsed 209.30 sec
2006-03-30 16:52:39 BST LOG:  finished writing final run 2: CPU
2.51s/222.98u sec elapsed 225.89 sec
2006-03-30 16:52:40 BST LOG:  performsort done (except final merge): CPU
2.59s/223.99u sec elapsed 226.98 sec
2006-03-30 16:52:44 BST LOG:  autovacuum: processing database "laterooms"
2006-03-30 16:52:53 BST LOG:  external sort ended, 21292 disk blocks
used: CPU 3.65s/233.10u sec elapsed 239.35 sec
2006-03-30 16:52:53 BST LOG:  duration: 239381.535 ms  statement: CREATE
INDEX ix_keyword_lowertext ON "Keyword" USING btree ("LowerText");


During all this, there's been about 900KB/sec of disk activity. The
disks are RAID1 and will happily sustain 50MB/sec with minimal system
overhead.

I'm guessing then that an external sort means disk-based...

>>maintenance_work_mem = 262144
>>
>>
>
>Fooling with this might affect the results some.
>
>

OK will tinker with that - it's not a massive problem since I hope I
never have to do a pg_restore once the live server is running fulltime :)

Right - I bumped maintenance_work_mem up to 1GB, tried dropping the
index and recreating, and sure enough it's an internal sort now,
chopping 10% off the time taken:

2006-03-30 21:15:57 BST LOG:  begin index sort: unique = f, workMem =
1048576, randomAccess = f
2006-03-30 21:16:03 BST LOG:  autovacuum: processing database "laterooms"
2006-03-30 21:16:12 BST LOG:  performsort starting: CPU 1.20s/13.85u sec
elapsed 15.07 sec
2006-03-30 21:17:03 BST LOG:  autovacuum: processing database "laterooms"
2006-03-30 21:18:03 BST LOG:  autovacuum: processing database "laterooms"
2006-03-30 21:19:03 BST LOG:  autovacuum: processing database "laterooms"
2006-03-30 21:19:28 BST LOG:  performsort done: CPU 1.20s/210.34u sec
elapsed 211.69 sec
2006-03-30 21:19:36 BST LOG:  internal sort ended, 336538 KB used: CPU
2.06s/212.61u sec elapsed 218.80 sec
2006-03-30 21:19:36 BST LOG:  duration: 218847.055 ms  statement: CREATE
INDEX ix_keyword_lowertext on "Keyword" USING btree ("LowerText");

If that's reasonable performance from 8.1, then that's fine - I just
didn't want to be inadvertantly running way under average :)

Cheers,
Gavin.


Re: CREATE INDEX rather sluggish

From
Gavin Hamill
Date:
On Thu, 30 Mar 2006 18:08:44 +0100
Simon Riggs <simon@2ndquadrant.com> wrote:

Hello again Simon :)

> The index build time varies according to the number and type of the
> datatypes, as well as the distribution of values in the table. As well
> as the number of rows in the table.
>
> Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20))

Fair enough. :) Is there much of a performance increase by using fixed-length character fields instead of varchars?

> Try trace_sort = on and then rerun the index builds to see what's
> happening there. We've speeded sort up by about 2.5 times in the current
> development version, but it does just run in single threaded mode so
> your 8 CPUs aren't helping there.

Yum - I look forward to the 8.2 release =)

> Looks like you might be just over the maintenance_work_mem limit for the
> last index builds. You can try doubling maintenance_work_mem.

You were right - needed ~370MB ... I'm happy to alloc 1GB to allow for db growth..

> The extended runtime for KeywordType is interesting in comparison to
> LowerText, which on the face of it is a longer column. My guess would be
> that LowerText is fairly unique and sorts quickly, whereas KeywordType
> is fairly non-unique with a high average row length that require
> complete string comparison before deciding it is actually the same
> value.

From looking at a few samples of the millions of rows it seems that it's actually KeywordType that's more unique -
LowerTextis simply an lowercase representation of the name of this search-keyword, so it's much less unique. Fun stuff
:)

> You might want to try using codes rather than textual KeywordTypes.

That makes sense - I can't get a grip on the data in KeywordType at the moment .. many are more obvious like 'RGN'
'AREA''MKT' 'LK' for Region, Area, Market and Lake, but many other rows have '1'. 

> You might try using partial indexes also, along the lines of
>
> CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL;

Well, each row does have a KeywordType, so no row has a NULL entry...

> Best Regards, Simon Riggs

Cheers :)
Gavin.