Re: [SQL] selecting from indexes - Mailing list pgsql-sql

From Tom Lane
Subject Re: [SQL] selecting from indexes
Date
Msg-id 2440.943024902@sss.pgh.pa.us
Whole thread Raw
In response to selecting from indexes  ("Tim Joyce" <tim@hoop.co.uk>)
List pgsql-sql
"Tim Joyce" <tim@hoop.co.uk> writes:
> I am trying to improve search times on a moderately large table (approx 1
> GB).

> eg, my query is:

> SELECT id FROM books WHERE category_key = 1471;

> (this takes ages on a table not ordered by category_key even if I have an
> index on category_key)

This should *not* take a long time if you have an index on category_key.
What does EXPLAIN show as the query plan?  (I am wondering if maybe the
planner doesn't know the table is large, which it wouldn't if you've
never vacuumed it... in that case it might be picking a sequential scan
instead of using the index.)

Also, how many rows are actually selected by the above?


> If I created an index:

> CREATE INDEX books_category_id ON books(category_key,id);

> and then run the above query,  it has no need to go to the books table to
> retrieve the id,

Yes it does, because the index is only a hint.  The executor must still
fetch each tuple fingered by the index in order to find out whether the
tuples are valid (committed).  But that fetching should cost at most
one disk read per potentially-interesting tuple.

Adding id to the index as you show above would be counterproductive,
at least for this query.  It'd just inflate the size of the index
and thus require more I/O to scan the index.  A 2-column index is
only useful for queries where WHERE constrains both columns.

> Another option would be to do the clustering using a view, but:

A view doesn't provide any performance advantage, it's only a rule
for rewriting your query before it's executed.
        regards, tom lane


pgsql-sql by date:

Previous
From: "Tim Joyce"
Date:
Subject: selecting from indexes
Next
From: "Tim Joyce"
Date:
Subject: Re: [SQL] selecting from indexes