sequential scan unduly favored over text search gin index - Mailing list pgsql-performance

From Sushant Sinha
Subject sequential scan unduly favored over text search gin index
Date
Msg-id 1308584339.2488.12.camel@dragflick
Whole thread Raw
Responses Re: sequential scan unduly favored over text search gin index  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: sequential scan unduly favored over text search gin index  (Jesper Krogh <jesper@krogh.cc>)
Re: sequential scan unduly favored over text search gin index  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
I have a tsvector column docvector and a gin index on it
docmeta1_docvector_idx

I have a simple query "select * from docmeta1 where docvector @@
plainto_tsquery('english', 'free');"

I find that the planner chooses a sequential scan of the table even when
the index performs orders of magnitude. I set random_page_cost = 1.0 for
the database to favor index use. However, I still see that the cost
estimate for sequential scan of the entire table (23000) is cheaper than
the cost of using the index (33000). The time taken for sequential
access is 5200 ms and for index usage is only 85 ms.

Details here:

postgres version 9.0.2
statistics on docvector is set to 10000 and as you can see the row
estimates are fine.

lawdb=# \d docmeta1
       Table "public.docmeta1"
   Column    |   Type    | Modifiers
-------------+-----------+-----------
 tid         | integer   | not null
 docweight   | integer   |
 doctype     | integer   |
 publishdate | date      |
 covertids   | integer[] |
 titlevector | tsvector  |
 docvector   | tsvector  |
Indexes:
    "docmeta1_pkey" PRIMARY KEY, btree (tid)
    "docmeta1_date_idx" btree (publishdate)
    "docmeta1_docvector_idx" gin (docvector)
    "docmeta1_title_idx" gin (titlevector)

lawdb=# SELECT relpages, reltuples FROM pg_class WHERE relname
='docmeta1';
relpages | reltuples
----------+-----------
    18951 |    329940


lawdb=# explain analyze select * from docmeta1 where docvector @@
plainto_tsquery('english', 'free');
                                                    QUERY
PLAN

--------------------------------------------------------------------------------
-----------------------------------
 Seq Scan on docmeta1  (cost=0.00..23075.25 rows=35966 width=427)
(actual time=0
.145..5189.556 rows=35966 loops=1)
   Filter: (docvector @@ '''free'''::tsquery)
 Total runtime: 5196.231 ms
(3 rows)

lawdb=# set enable_seqscan = off;
SET
lawdb=# explain analyze select * from docmeta1 where docvector @@
plainto_tsquery('english', 'free');
                                                                QUERY
PLAN

--------------------------------------------------------------------------------
-----------------------------------------------------------
 Bitmap Heap Scan on docmeta1  (cost=14096.25..33000.83 rows=35966
width=427) (a
ctual time=9.543..82.754 rows=35966 loops=1)
   Recheck Cond: (docvector @@ '''free'''::tsquery)
   ->  Bitmap Index Scan on docmeta1_docvector_idx  (cost=0.00..14087.26
rows=35
966 width=0) (actual time=8.059..8.059 rows=35967 loops=1)
         Index Cond: (docvector @@ '''free'''::tsquery)
 Total runtime: 85.304 ms
(5 rows)


-Sushant.


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Inoptimal query plan for max() and multicolumn index
Next
From: "Kevin Grittner"
Date:
Subject: Re: sequential scan unduly favored over text search gin index