Thread: Query planner doesn't use index scan on tsvector GIN index if LIMIT is specifiedQuery planner doesn't use index scan on tsvector GIN index if LIMIT is specified

Some info:
PostgreSQL version: 9.1.2

Table "cache":
Rows count: 3 471 081
Column "tsv" tsvector
Index "cache_tsv" USING gin (tsv)

If i do query like THIS:

SELECT id FROM table WHERE tsv @@ to_tsquery('test:*');
It uses index and returns results immediately:

explain analyze 
'Bitmap Heap Scan on cache  (cost=1441.78..63802.63 rows=19843 width=4) (actual time=29.309..31.518 rows=1358 loops=1)'
'  Recheck Cond: (tsv @@ to_tsquery('test:*'::text))'
'  ->  Bitmap Index Scan on cache_tsv  (cost=0.00..1436.82 rows=19843 width=0) (actual time=28.966..28.966 rows=1559 loops=1)'
'        Index Cond: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 31.789 ms'


But the performance problems starts when i do the same query specifying LIMIT. 
SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;

By some reason index is not used. 

explain analyze 
'Limit  (cost=0.00..356.23 rows=20 width=4) (actual time=7.984..765.550 rows=20 loops=1)'
'  ->  Seq Scan on cache  (cost=0.00..353429.50 rows=19843 width=4) (actual time=7.982..765.536 rows=20 loops=1)'
'        Filter: (tsv @@ to_tsquery('test:*'::text))'
'Total runtime: 765.620 ms'


Some more debug notes:
1) If i set SET enable_seqscan=off; then query uses indexes correctly
2) Also i notified, if i use: to_tsquery('test') without wildcard search :*, then index is used correctly in both queries, with or without LIMIT

Any ideas how to fix the problem?
Thank you
darklow <darklow@gmail.com> writes:
> But the performance problems starts when i do the same query specifying
> LIMIT.
> *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*
> By some reason index is not used.

It apparently thinks there are enough matches that it might as well just
seqscan the table and expect to find some matches at random, in less
time than using the index would take.

The estimate seems to be off quite a bit, so maybe raising the stats
target for this column would help.

            regards, tom lane

On 2012-01-10 18:04, Tom Lane wrote:
> darklow<darklow@gmail.com>  writes:
>> But the performance problems starts when i do the same query specifying
>> LIMIT.
>> *SELECT id FROM cache WHERE tsv @@ to_tsquery('test:*') limit 20;*
>> By some reason index is not used.
> It apparently thinks there are enough matches that it might as well just
> seqscan the table and expect to find some matches at random, in less
> time than using the index would take.
>
> The estimate seems to be off quite a bit, so maybe raising the stats
> target for this column would help.
The cost of matching ts_match_vq against a toasted column
is not calculated correctly. This is completely parallel with
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01754.php

Try raising the cost for ts_match_vq(tsvector,tsquery) that help a bit, but
its hard to get the cost high enough.

Raising statistics target  helps too..

--
Jesper