Thread: CREATE INDEX rather sluggish
Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm finding that it's taking an age to CREATE INDEX on a large table: Column | Type | Modifiers ----------------+------------------------+--------------------------------------------------------------------- ID | integer | not null default nextval(('public.keyword_id_seq'::text)::regclass) Text | character varying(200) | Longitude | numeric(16,5) | Latitude | numeric(16,5) | AreaID | integer | SearchCount | integer | not null default 0 Radius | integer | LanguageID | integer | KeywordType | character varying(20) | LowerText | character varying(200) | NumberOfHotels | integer | CountryID | integer | FriendlyText | character varying(200) | Indexes: 2006-03-29 21:39:38 BST LOG: duration: 41411.625 ms statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING btree("AreaID"); 2006-03-29 21:42:46 BST LOG: duration: 188550.644 ms statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING btree("Longitude", "Latitude"); 2006-03-29 21:46:41 BST LOG: duration: 234864.571 ms statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USING btree("LowerText"); 2006-03-29 21:52:32 BST LOG: duration: 350757.565 ms statement: CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType"); The table has just under six million rows - should it really be taking nearly six minutes to add an index? These log snippets were taking during a pg_restore on a newly created db, so there should be no issues with the table needing vacuuming. What parameters in the postgresql.conf are pertinent here? I have shared_buffers 120000 work_mem 16384 maintenance_work_mem = 262144 for starters... any advice would be warmly welcomed! Cheers, Gavin.
Gavin Hamill <gdh@laterooms.com> writes: > The table has just under six million rows - should it really be taking > nearly six minutes to add an index? Try running it with trace_sort enabled to get more info about where the time is going. We've been doing some considerable work on the sorting code in the last couple months, so 8.2 should be better, but I'd like to verify that you're not seeing something we don't know about. > maintenance_work_mem = 262144 Fooling with this might affect the results some. regards, tom lane
On Thu, 2006-03-30 at 09:26 +0100, Gavin Hamill wrote: > Hullo, I have pg 8.1.3 on an 8-CPU AIX 5.3 box with 16GB of RAM, and I'm > finding that it's taking an age to CREATE INDEX on a large table: > > Column | Type | Modifiers > ----------------+------------------------+--------------------------------------------------------------------- > ID | integer | not null default nextval(('public.keyword_id_seq'::text)::regclass) > Text | character varying(200) | > Longitude | numeric(16,5) | > Latitude | numeric(16,5) | > AreaID | integer | > SearchCount | integer | not null default 0 > Radius | integer | > LanguageID | integer | > KeywordType | character varying(20) | > LowerText | character varying(200) | > NumberOfHotels | integer | > CountryID | integer | > FriendlyText | character varying(200) | > Indexes: > > > 2006-03-29 21:39:38 BST LOG: duration: 41411.625 ms statement: CREATE INDEX ix_keyword_areaid ON "Keyword" USING btree("AreaID"); > 2006-03-29 21:42:46 BST LOG: duration: 188550.644 ms statement: CREATE INDEX ix_keyword_lonlat ON "Keyword" USING btree("Longitude", "Latitude"); > 2006-03-29 21:46:41 BST LOG: duration: 234864.571 ms statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USINGbtree ("LowerText"); > 2006-03-29 21:52:32 BST LOG: duration: 350757.565 ms statement: CREATE INDEX ix_keyword_type ON "Keyword" USING btree("KeywordType"); > > The table has just under six million rows - should it really be taking > nearly six minutes to add an index? These log snippets were taking > during a pg_restore on a newly created db, so there should be no issues > with the table needing vacuuming. The index build time varies according to the number and type of the datatypes, as well as the distribution of values in the table. As well as the number of rows in the table. Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20)) > What parameters in the postgresql.conf are pertinent here? I have > > shared_buffers 120000 > work_mem 16384 > maintenance_work_mem = 262144 Try trace_sort = on and then rerun the index builds to see what's happening there. We've speeded sort up by about 2.5 times in the current development version, but it does just run in single threaded mode so your 8 CPUs aren't helping there. Looks like you might be just over the maintenance_work_mem limit for the last index builds. You can try doubling maintenance_work_mem. The extended runtime for KeywordType is interesting in comparison to LowerText, which on the face of it is a longer column. My guess would be that LowerText is fairly unique and sorts quickly, whereas KeywordType is fairly non-unique with a high average row length that require complete string comparison before deciding it is actually the same value. You might want to try using codes rather than textual KeywordTypes. You might try using partial indexes also, along the lines of CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL; Best Regards, Simon Riggs
Tom Lane wrote: >Gavin Hamill <gdh@laterooms.com> writes: > > >>The table has just under six million rows - should it really be taking >>nearly six minutes to add an index? >> >> > >Try running it with trace_sort enabled to get more info about where the >time is going. > >We've been doing some considerable work on the sorting code in the last >couple months, so 8.2 should be better, but I'd like to verify that >you're not seeing something we don't know about. > > > OKies, I dropped the db, created again so it's all clean, ran pg_restore again with trace_sort on - here's the output from one of the larger CREATE INDEXes: 2006-03-30 16:48:53 BST LOG: begin index sort: unique = f, workMem = 262144, randomAccess = f 2006-03-30 16:49:04 BST LOG: switching to external sort: CPU 0.88s/9.99u sec elapsed 10.90 sec 2006-03-30 16:49:44 BST LOG: autovacuum: processing database "postgres" 2006-03-30 16:50:38 BST LOG: performsort starting: CPU 1.69s/102.73u sec elapsed 104.58 sec 2006-03-30 16:50:44 BST LOG: autovacuum: processing database "laterooms" 2006-03-30 16:51:44 BST LOG: autovacuum: processing database "postgres" 2006-03-30 16:52:23 BST LOG: finished writing run 1: CPU 2.40s/206.53u sec elapsed 209.30 sec 2006-03-30 16:52:39 BST LOG: finished writing final run 2: CPU 2.51s/222.98u sec elapsed 225.89 sec 2006-03-30 16:52:40 BST LOG: performsort done (except final merge): CPU 2.59s/223.99u sec elapsed 226.98 sec 2006-03-30 16:52:44 BST LOG: autovacuum: processing database "laterooms" 2006-03-30 16:52:53 BST LOG: external sort ended, 21292 disk blocks used: CPU 3.65s/233.10u sec elapsed 239.35 sec 2006-03-30 16:52:53 BST LOG: duration: 239381.535 ms statement: CREATE INDEX ix_keyword_lowertext ON "Keyword" USING btree ("LowerText"); During all this, there's been about 900KB/sec of disk activity. The disks are RAID1 and will happily sustain 50MB/sec with minimal system overhead. I'm guessing then that an external sort means disk-based... >>maintenance_work_mem = 262144 >> >> > >Fooling with this might affect the results some. > > OK will tinker with that - it's not a massive problem since I hope I never have to do a pg_restore once the live server is running fulltime :) Right - I bumped maintenance_work_mem up to 1GB, tried dropping the index and recreating, and sure enough it's an internal sort now, chopping 10% off the time taken: 2006-03-30 21:15:57 BST LOG: begin index sort: unique = f, workMem = 1048576, randomAccess = f 2006-03-30 21:16:03 BST LOG: autovacuum: processing database "laterooms" 2006-03-30 21:16:12 BST LOG: performsort starting: CPU 1.20s/13.85u sec elapsed 15.07 sec 2006-03-30 21:17:03 BST LOG: autovacuum: processing database "laterooms" 2006-03-30 21:18:03 BST LOG: autovacuum: processing database "laterooms" 2006-03-30 21:19:03 BST LOG: autovacuum: processing database "laterooms" 2006-03-30 21:19:28 BST LOG: performsort done: CPU 1.20s/210.34u sec elapsed 211.69 sec 2006-03-30 21:19:36 BST LOG: internal sort ended, 336538 KB used: CPU 2.06s/212.61u sec elapsed 218.80 sec 2006-03-30 21:19:36 BST LOG: duration: 218847.055 ms statement: CREATE INDEX ix_keyword_lowertext on "Keyword" USING btree ("LowerText"); If that's reasonable performance from 8.1, then that's fine - I just didn't want to be inadvertantly running way under average :) Cheers, Gavin.
On Thu, 30 Mar 2006 18:08:44 +0100 Simon Riggs <simon@2ndquadrant.com> wrote: Hello again Simon :) > The index build time varies according to the number and type of the > datatypes, as well as the distribution of values in the table. As well > as the number of rows in the table. > > Note the x10 factor to index AreaID (integer) v KeywordType (vchar(20)) Fair enough. :) Is there much of a performance increase by using fixed-length character fields instead of varchars? > Try trace_sort = on and then rerun the index builds to see what's > happening there. We've speeded sort up by about 2.5 times in the current > development version, but it does just run in single threaded mode so > your 8 CPUs aren't helping there. Yum - I look forward to the 8.2 release =) > Looks like you might be just over the maintenance_work_mem limit for the > last index builds. You can try doubling maintenance_work_mem. You were right - needed ~370MB ... I'm happy to alloc 1GB to allow for db growth.. > The extended runtime for KeywordType is interesting in comparison to > LowerText, which on the face of it is a longer column. My guess would be > that LowerText is fairly unique and sorts quickly, whereas KeywordType > is fairly non-unique with a high average row length that require > complete string comparison before deciding it is actually the same > value. From looking at a few samples of the millions of rows it seems that it's actually KeywordType that's more unique - LowerTextis simply an lowercase representation of the name of this search-keyword, so it's much less unique. Fun stuff :) > You might want to try using codes rather than textual KeywordTypes. That makes sense - I can't get a grip on the data in KeywordType at the moment .. many are more obvious like 'RGN' 'AREA''MKT' 'LK' for Region, Area, Market and Lake, but many other rows have '1'. > You might try using partial indexes also, along the lines of > > CREATE INDEX ix_keyword_type ON "Keyword" USING btree ("KeywordType") WHERE KeywordType IS NOT NULL; Well, each row does have a KeywordType, so no row has a NULL entry... > Best Regards, Simon Riggs Cheers :) Gavin.