Index Scan/Bitmap Index Scan for queries with FTS and ORDER+LIMIT - Mailing list pgsql-general

From Sergey Konoplev
Subject Index Scan/Bitmap Index Scan for queries with FTS and ORDER+LIMIT
Date
Msg-id c3a7de1f0910210548v6b971f13rf8ccdad9cba1e7c0@mail.gmail.com
Whole thread Raw
Responses Re: Index Scan/Bitmap Index Scan for queries with FTS and ORDER+LIMIT
List pgsql-general
Hi, All

Well what do we have?

8 core, 32 GB, RAID 10, CentOS 5.2, Pg 8.3

A query using tsearch in WHERE block with ORDER and LIMIT:
select * from test_table where obj_tsvector @@
make_tsquery('some_words', 'utf8_russian') and obj_status_did = 1
order by obj_created desc limit 10;

Two indexes - one for FTS, another for simple scan:
"i_test_table__tsvector_1" gist (obj_tsvector) WHERE obj_status_did = 1
"i_test_table__created" btree (obj_created) WHERE obj_status_did = 1

It's clear that using i_test_table__created index is better when
querying words occurring frequently in test_table and
i_test_table__tsvector_1 in opposite case. But with enable_indexscan
turned on planner force to use i_test_table__created that is worth for
querying sparce words.

Is there a way (or workaround) to make it use i_test_table__created
for frequent and i_test_table__tsvector_1 for sparse words? May be
some PG internal that would give us a hint whether the words is
frequent or not?

Here is a test that reflects the problem:

test_db=# set enable_indexscan to on;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('frequent_words', 'utf8_russian') and obj_status_did =
1 order by obj_created desc limit 10;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2843.83 rows=10 width=621) (actual
time=0.830..6.360 rows=10 loops=1)
   ->  Index Scan Backward using i_test_table__created on test_table
(cost=0.00..2235820.48 rows=7862 width=621) (actual time=0.829..6.355
rows=10 loops=1)
         Filter: (obj_tsvector @@ '''frequent_words'''::tsquery)
 Total runtime: 6.407 ms
(4 rows)

test_db=#
test_db=# set enable_indexscan to off;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('frequent_words', 'utf8_russian') and obj_status_did =
1 order by obj_created desc limit 10;
ERROR:  canceling statement due to statement timeout

test_db=#
test_db=# set enable_indexscan to on;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('sparse_words', 'utf8_russian') and obj_status_did = 1
order by obj_created desc limit 10;
ERROR:  canceling statement due to statement timeout

test_db=#
test_db=# set enable_indexscan to off;
SET
test_db=# explain analyze select * from test_table where obj_tsvector
@@ make_tsquery('sparse_words', 'utf8_russian') and obj_status_did = 1
order by obj_created desc limit 10;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=29038.86..29038.89 rows=10 width=621) (actual
time=344.218..344.223 rows=10 loops=1)
   ->  Sort  (cost=29038.86..29058.52 rows=7862 width=621) (actual
time=344.217..344.220 rows=10 loops=1)
         Sort Key: obj_created
         Sort Method:  top-N heapsort  Memory: 43kB
         ->  Bitmap Heap Scan on test_table  (cost=469.20..28868.97
rows=7862 width=621) (actual time=292.314..344.176 rows=21 loops=1)
               Recheck Cond: (obj_status_did = 1)
               Filter: (obj_tsvector @@ '''sparse_words'''::tsquery)
               ->  Bitmap Index Scan on i_test_table__tsvector_1
(cost=0.00..467.23 rows=7862 width=0) (actual time=290.202..290.202
rows=2208 loops=1)
                     Index Cond: (obj_tsvector @@ '''sparse_words'''::tsquery)
 Total runtime: 344.289 ms
(10 rows)

--
Regards,
Sergey Konoplev

pgsql-general by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: How to send multiple parameters to a pl/pgsql function
Next
From: Gaini Rajeshwar
Date:
Subject: How to use Logical Operators in Fulltext Search?