GIN index creation extremely slow ? - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject GIN index creation extremely slow ?
Date
Msg-id 44A014A6.9060405@kaltenbrunner.cc
Whole thread Raw
Responses Re: GIN index creation extremely slow ?  (Oleg Bartunov <oleg@sai.msu.su>)
Re: GIN index creation extremely slow ?  (Christopher Kings-Lynne <chris.kings-lynne@calorieking.com>)
Re: GIN index creation extremely slow ?  (Teodor Sigaev <teodor@sigaev.ru>)
Re: GIN index creation extremely slow ?  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
on IRC somebody mentioned that it took >34h to greate a GIN index (on a
tsvector) on a ~3 Million column table (wikipedia dump) with a
reasonable speced box (AMD 3400+).
After getting hold of a dump of said table (around 4,1GB in size) I
managed to get the following timings:

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms

ouch - that makes for a whoppy 14,6hours(!). During that time the box is
completely CPU bottlenecked and during virtually no IO at all - (varing
maintainance_work_mem does not seem to make any noticable difference).

That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10
for the WAL and 12 disks for the data running a very recent -HEAD
checkout ...

It looks like we still don't have any docs for GIN in the tree so I
don't know if those timings are expected or not ...


Stefan


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Overhead for stats_command_string et al, take 2
Next
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC