Thread: Slow index

Slow index

From
Matthew Wakeling
Date:
Hi all. I'm having an interesting time performance-wise with a set of indexes.
Any clues as to what is going on or tips to fix it would be appreciated.

My application runs lots of queries along the lines of:

SELECT * from table where field IN (.., .., ..);

There is always an index on the field in the table, but the table is not
necessarily clustered on the index.

There are six different indexes which the application hits quite hard, that I
have investigated. These are:

gene__key_primaryidentifier      (index size 20MB)  (table size 72MB)
gene__key_secondaryidentifier    (index size 20MB)  (table size 72MB)
ontologyterm__key_name_ontology  (index size 2.5MB) (table size 10MB)
protein__key_primaryacc          (index size 359MB) (table size 1.2GB)
publication__key_pubmed          (index size 12MB)  (table size 48MB)
synonym__key_synonym             (index size 3GB)   (table size 3.5GB)

These six indexes all perform very differently.

These are the results from a few thousand queries on each index, from our
application logs. Generally, the same value is not used more than once in all
the queries.

                                  (1)   (2)   (3)   (4)    (5)
gene__key_primaryidentifier      22    17    417   19     24.5
gene__key_secondaryidentifier    8.5   5.3   21    2.4    3.9
ontologyterm__key_name_ontology  6.5   6.5   9.4   1.4    1.4
protein__key_primaryacc          73    8.1   164   2.2    20
publication__key_pubmed          52    31    156   3.0    5.0
synonym__key_synonym             335   66    245   0.7    3.7

(1) - Average number of values in the IN list.
(2) - Average number of rows returned by the queries.
(3) - Average time taken to execute the query, in ms.
(4) - Average time per value in the IN lists.
(5) - Average time per row returned.

All the queries are answered with a bitmap index scan on the correct
index.

I have also plotted all the log entries on an XY graph, with number of
elements in the IN list against time taken, which is at
http://wakeling.homeip.net/~mnw21/slow_index1.png. It is clear that the
gene__key_primaryidentifier index runs a lot slower than some of the other
indexes.

The thing is, the table and the index are both small. The machine has 16GB of
RAM, and its disc subsystem is a RAID array of 16 15krpm drives with a BBU
caching RAID controller. The entire table and index should be in the cache. Why
it is taking 20 milliseconds per value is beyond me. Moreover, the synonym
index is MUCH larger, has bigger queries, and performs better.

If we concentrate on just this index, it seems that some queries are
answered very quickly indeed, while others are answered a lot slower. I
have plotted just this one index on an XY graph, with two colours for
values in the IN list and actual rows returned, which is at
http://wakeling.homeip.net/~mnw21/slow_index2.png. It is clear that there
is a gap in the graph between the slow queries and the fast queries.

Is there something else going on here which is slowing the system down? The
table is not bloated. There is quite heavy simultaneous write traffic, but
little other read traffic, and the 16 spindles and BBU cache should take care
of that quite happily. I don't think it's slow parsing the query, as it seems
to manage on other queries in a millisecond or less.

Any ideas welcome.

Also, the mailing list server doesn't seem to be able to cope with image
attachments.

Matthew

--
import oz.wizards.Magic;
   if (Magic.guessRight())...           -- Computer Science Lecturer

Re: Slow index

From
Tom Lane
Date:
Matthew Wakeling <matthew@flymine.org> writes:
> Hi all. I'm having an interesting time performance-wise with a set of indexes.
> Any clues as to what is going on or tips to fix it would be appreciated.

Are the indexed columns all the same datatype?  (And which type is it?)

It might be helpful to REINDEX the "slow" index.  It's possible that
what you're seeing is the result of a chance imbalance in the btree,
which reindexing would fix.

            regards, tom lane

Re: Slow index

From
Matthew Wakeling
Date:
On Thu, 25 Sep 2008, Tom Lane wrote:
> Matthew Wakeling <matthew@flymine.org> writes:
>> Hi all. I'm having an interesting time performance-wise with a set of indexes.
>> Any clues as to what is going on or tips to fix it would be appreciated.
>
> Are the indexed columns all the same datatype?  (And which type is it?)

Gene.key_primaryidentifier is a text column
Gene.key_secondaryidentifier is a text column followed by an integer
OntologyTerm.key_name_ontology is a text column followed by an integer
Protein.key_primaryacc is a text column
Publication.key_pubmed is a text column
Synonym.key_synonym is an integer, two texts, and an integer

In most cases, the first text will be enough to uniquely identify the
relevant row.

> It might be helpful to REINDEX the "slow" index.  It's possible that
> what you're seeing is the result of a chance imbalance in the btree,
> which reindexing would fix.

That's unlikely to be the problem. When the application starts, the
database has just been loaded from a dump, so the indexes are completely
fresh. The behaviour starts out bad, and does not get progressively worse.

I don't know - is there likely to be any locking getting in the way? Our
write traffic is fairly large chunks of binary COPY in. Could it be
locking the index while it adds the write traffic to it?

Matthew

--
Most books now say our sun is a star. But it still knows how to change
back into a sun in the daytime.