How can I get the query planner to use a bitmap index scap instead of an index scan ? - Mailing list pgsql-performance

From Mohan Krishnan
Subject How can I get the query planner to use a bitmap index scap instead of an index scan ?
Date
Msg-id CAK6uQa-zUheQrLLqR5gjBypce=YWbkULYqyfTQY1y5smXm56FQ@mail.gmail.com
Whole thread Raw
Responses Re: How can I get the query planner to use a bitmap index scap instead of an index scan ?  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Hello folks,

I have a table of about 700k rows in Postgres 9.3.3, which has the
following structure:

Columns:
 content_body  - text
 publish_date  - timestamp without time zone
 published     - boolean

Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "article_text_gin" gin (article_text)
    "articles_publish_date_id_index" btree (publish_date DESC NULLS
LAST, id DESC)

The query that I am making has a full text search query and a limit, as follows:

When I search for a string which is in my index with a limit and order
in the query it is fast:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'in_index') order by id limit 10;
                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual
time=2.073..9.837 rows=10 loops=1)
   ->  Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual time=2.055..9.711
rows=10 loops=1)
         Filter: (article_text @@ '''in_index'''::tsquery)
         Rows Removed by Filter: 611
 Total runtime: 9.952 ms

However if the string is not in the index it takes much longer:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id limit
10;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..1293.88 rows=10 width=1298) (actual
time=5633.684..5633.684 rows=0 loops=1)
   ->  Index Scan using articles_pkey on articles
(cost=0.42..462150.49 rows=3573 width=1298) (actual
time=5633.672..5633.672 rows=0 loops=1)
         Filter: (article_text @@ '''not_in_index'''::tsquery)
         Rows Removed by Filter: 796146
 Total runtime: 5633.745 ms

However if I remove the order clause it is fast for either case:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'in_index')  limit 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=55.69..90.22 rows=10 width=1298) (actual
time=7.748..7.853 rows=10 loops=1)
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=7.735..7.781 rows=10 loops=1)
         Recheck Cond: (article_text @@ '''in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=5.977..5.977 rows=8910 loops=1)
               Index Cond: (article_text @@ '''in_index'''::tsquery)
 Total runtime: 7.952 ms


explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'not_in_index')  limit 10;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=55.69..90.22 rows=10 width=1298) (actual
time=0.083..0.083 rows=0 loops=1)
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=0.065..0.065 rows=0 loops=1)
         Recheck Cond: (article_text @@ '''not_in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=0.047..0.047 rows=0 loops=1)
               Index Cond: (article_text @@ '''not_in_index'''::tsquery)
 Total runtime: 0.163 ms

Removing the limit clause has the same effect, although the in index
query is noticably slower:

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'in_index') order by id;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12601.46..12610.40 rows=3573 width=1298) (actual
time=106.347..140.481 rows=8910 loops=1)
   Sort Key: id
   Sort Method: external merge  Disk: 12288kB
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=5.618..50.329 rows=8910 loops=1)
         Recheck Cond: (article_text @@ '''in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=4.243..4.243 rows=8910 loops=1)
               Index Cond: (article_text @@ '''in_index'''::tsquery)
 Total runtime: 170.987 ms

explain analyze select * from "articles" where article_text @@
plainto_tsquery('pg_catalog.simple', 'not_in_index') order by id;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=12601.46..12610.40 rows=3573 width=1298) (actual
time=0.067..0.067 rows=0 loops=1)
   Sort Key: id
   Sort Method: quicksort  Memory: 25kB
   ->  Bitmap Heap Scan on articles  (cost=55.69..12390.60 rows=3573
width=1298) (actual time=0.044..0.044 rows=0 loops=1)
         Recheck Cond: (article_text @@ '''not_in_index'''::tsquery)
         ->  Bitmap Index Scan on article_text_gin  (cost=0.00..54.80
rows=3573 width=0) (actual time=0.026..0.026 rows=0 loops=1)
               Index Cond: (article_text @@ '''not_in_index'''::tsquery)
 Total runtime: 0.148 ms

The little I can deduce is that overall, a bitmap index scan+bitmap
heap scan is overall better for my queries then an index scan. How can
I tell the query planner to do that though?


--
Mohan


pgsql-performance by date:

Previous
From: "acanada"
Date:
Subject: Re: Query taking long time
Next
From: Jeff Janes
Date:
Subject: Re: How can I get the query planner to use a bitmap index scap instead of an index scan ?