Home > mailing lists

Index creation time and distribution - Mailing list pgsql-performance

From	Guillaume Smet
Subject	Index creation time and distribution
Date	May 22, 2008 09:32:51
Msg-id	1d4e0c10805220532v514c2bd5je4c6c20ceb485495@mail.gmail.com Whole thread
Responses	Re: Index creation time and distribution Re: Index creation time and distribution
List	pgsql-performance

Tree view

Hi -performance,

I experienced this morning a performance problem when we imported a
dump in a 8.1 database.

The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG:  duration: 1636301.317 ms  statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG:  duration: 20613.009 ms  statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG:  duration: 10653.290 ms  statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG:  duration: 15031.579 ms  statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);

The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
option.

As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.

So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
time.
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?

Thanks for any input.

--
Guillaume

pgsql-performance by date:

From: "Robins Tharakan"
Date: 21 May 2008, 21:54:45
Subject: Re: Varchar pkey instead of integer

From: Tom Lane
Date: 22 May 2008, 10:14:59
Subject: Re: Index creation time and distribution

Index creation time and distribution - Mailing list pgsql-performance

Previous

Next