Does not use index - Mailing list pgsql-novice

From Martin Hampl
Subject Does not use index
Date
Msg-id E42C0410-9944-11D8-8E8E-000393674318@stud.uni-erlangen.de
Whole thread Raw
List pgsql-novice
Hi,

I'm trying to optimize a query and are stuck and don't understand why
PostgreSQL does not do what I want. It would be appreciated very much
if someone could have a look at it. Thanks in advance.

I am using PostgreSQL 7.4.1

I have the following tables:

              Table "public.token"
   Column  |          Type          | Modifiers
----------+------------------------+-----------
  text_id  | integer                | not null
  position | integer                | not null
  word     | character varying(255) |
Indexes:
     "token_pkey" primary key, btree (text_id, "position")
     "word_idx" btree (word)

and

        Table "public.s"
  Column  |  Type   | Modifiers
---------+---------+-----------
  text_id | integer | not null
  s       | integer | not null
  start   | integer |
  stop    | integer |
Indexes:
     "s_pkey" primary key, btree (text_id, s)
     "s_begin_idx" btree (text_id, "start")
     "s_end_idx" btree (text_id, stop)
Foreign-key constraints:
     "$1" FOREIGN KEY (text_id, "start") REFERENCES token(text_id,
"position")
     "$2" FOREIGN KEY (text_id, stop) REFERENCES token(text_id,
"position")

(I hope it's readable)

and the following query:

select * from
     token,
     s
where
     token.word = 'FACTSHEET' and
     s.text_id = token.text_id and
     s.start = token.position


PostgreSQL generates the following query plan

                                             QUERY PLAN
------------------------------------------------------------------------
--------------------------
  Nested Loop  (cost=0.00..39120.95 rows=1 width=32)
    ->  Index Scan using s_pkey on s  (cost=0.00..24698.44 rows=3367
width=16)
    ->  Index Scan using token_pkey on token  (cost=0.00..4.27 rows=1
width=16)
          Index Cond: (("outer".text_id = token.text_id) AND
("outer"."start" = token."position"))
          Filter: ((word)::text = 'FACTSHEET'::text)

and the execution takes ages.

Now, 'FACTSHEET' is *very* seldom (5 of 23 million) and I intended it
to first search for 'FACTSHEET' and then use index s_begin_idx (and the
values of token.text_id and token.position) to derive the corresponding
records in s.

I tried to force PostgreSQL in doing it in this order by trying the
following:

select * from (select text_id, position from token where word =
'FACTSHEET') t left join s on (s.text_id = t.text_id and  s.start =
t.position )

                                           QUERY PLAN
------------------------------------------------------------------------
-----------------------
  Hash Left Join  (cost=24715.28..56630.78 rows=7892 width=24)
    Hash Cond: (("outer".text_id = "inner".text_id) AND
("outer"."position" = "inner"."start"))
    ->  Index Scan using word_idx on token  (cost=0.00..31402.51
rows=7892 width=8)
          Index Cond: ((word)::text = 'FACTSHEET'::text)
    ->  Hash  (cost=24698.44..24698.44 rows=3367 width=16)
          ->  Index Scan using s_pkey on s  (cost=0.00..24698.44
rows=3367 width=16)


Now it *does* first search for 'FACTSHEET' but it still does not use
s_begin_idx and I have no idea why. Any ideas, what I could do?

Thanks,
Martin.


pgsql-novice by date:

Previous
From: "Chari Clark"
Date:
Subject: Restoring a SQL Server 2000 database to PostGreSQL...
Next
From: Gastón Simone
Date:
Subject: Hey Mr. PG! End my transaction, BUT KEEP MY CURSORS OPEN!!