Thread: LARGE table won't use index?

LARGE table won't use index?

From
bemowski@yahoo.com (Paul Bemowski)
Date:
I'm having a problem getting PostgreSQL 7.2.1 to use an index when it
is painfully obvious that the index will be faster.  Here is the
situation:

2 tables:

1) keyword
keyword_id integer primary key
word varchar(50)

- the word is indexed (keyword_word_index)
- the table contains ~100k rows

2) article_keyword (this is a mapping table for a many-many)
article_id integer
keyword_id integer
relevenace integer
primary_key(article_id, keyword_id)

- the keyword_id col is indexed (ak_kwid_index)
- the table contains ~2 million rows.

OK. Consider the follwing psql/explain output:
-------------------------------------------------------------------------
search.dev=# select * from keyword where word='spam';
 keyword_id | word
------------+------
       2616 | spam
(1 row)

search.dev=# explain select * from article_keyword where
keyword_id=2616;
NOTICE:  QUERY PLAN:

Index Scan using ak_kwid_index on article_keyword  (cost=0.00..2990.85
rows=810 width=12)

EXPLAIN
search.dev=# explain select * from article_keyword where keyword_id in
search.dev-# (select keyword_id from keyword where word='spam');
NOTICE:  QUERY PLAN:

Seq Scan on article_keyword  (cost=0.00..3378740757.90 rows=1820389
width=12)
  SubPlan
    ->  Materialize  (cost=1856.04..1856.04 rows=1345 width=4)
          ->  Index Scan using keyword_word_index on keyword
(cost=0.00..1856.04 rows=1345 width=4)

EXPLAIN
---------------------------------------------------------------------------

Note that when I query using the explicit keyword_id, the query plan
uses the index at a cost of about 3000.  When I use a subselect (which
returns the EXACT same keyword_id and I explicitly used) the plan uses
a sequence scan of 2 million rows at a cost of about 3.3 billion!!

I really need for postgres to use the indexes on the mapping table,
what can I do.  Any help from the PostgreSQL gods is appreciated.

Paul Bemowski
bemowski@yahoo.com

Re: LARGE table won't use index?

From
Edwin Grubbs
Date:
You can't get it to use indexes on the subselect since postgres doesn't
know if the subselect corresponds directly to a tables index since it
often doesn't. There are two ways to fix the speed problem.

1. Make the subselect smaller:

SELECT *
FROM article_keyword t1
WHERE keyword_id IN
    (SELECT keyword_id
     FROM keyword sub1
     WHERE word = 'spam'
        AND t1.keyword_id = sub1.keyword_id
    );

2. Join the tables:

SELECT t1.*
FROM article_keyword t1 JOIN keyword t2 USING (keyword_id)
WHERE t2.word = 'spam';


On 15 Jun 2001, Paul Bemowski wrote:
> search.dev=# explain select * from article_keyword where keyword_id in
> search.dev-# (select keyword_id from keyword where word='spam');
> NOTICE:  QUERY PLAN:
>
> Seq Scan on article_keyword  (cost=0.00..3378740757.90 rows=1820389
> width=12)
>   SubPlan
>     ->  Materialize  (cost=1856.04..1856.04 rows=1345 width=4)
>           ->  Index Scan using keyword_word_index on keyword
> (cost=0.00..1856.04 rows=1345 width=4)
>
> EXPLAIN
> ---------------------------------------------------------------------------