Re: Full text search with ORDER BY performance issue - Mailing list pgsql-performance

From Robert Haas
Subject Re: Full text search with ORDER BY performance issue
Date
Msg-id 603c8f070907290702r2201e63k8c4c9ec7767dc42@mail.gmail.com
Whole thread Raw
In response to Re: Full text search with ORDER BY performance issue  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Full text search with ORDER BY performance issue
Re: Full text search with ORDER BY performance issue
List pgsql-performance
On Mon, Jul 20, 2009 at 8:12 AM, Oleg Bartunov<oleg@sai.msu.su> wrote:
>> Here's a couple of queries:
>>
>> archive=> explain analyze select * from a where  comment_tsv @@
>> plainto_tsquery('love') order by timestamp desc limit 24 offset 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=453248.73..453248.79 rows=24 width=281) (actual
>> time=188441.047..188441.148 rows=24 loops=1)
>>  ->  Sort  (cost=453248.73..453882.82 rows=253635 width=281) (actual
>> time=188441.043..188441.079 rows=24 loops=1)
>>        Sort Key: "timestamp"
>>        Sort Method:  top-N heapsort  Memory: 42kB
>>        ->  Bitmap Heap Scan on a  (cost=17782.16..446166.02 rows=253635
>> width=281) (actual time=2198.930..187948.050 rows=256378 loops=1)
>>              Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text))
>>              ->  Bitmap Index Scan on timestamp_comment_gin
>> (cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664
>> rows=259828 loops=1)
>>                    Index Cond: (comment_tsv @@
>> plainto_tsquery('love'::text))
>> Total runtime: 188442.617 ms
>> (9 rows)
>>
>> archive=> explain analyze select * from a where  comment_tsv @@
>> plainto_tsquery('love') limit 24 offset 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647
>> rows=24 loops=1)
>>  ->  Seq Scan on a  (cost=0.00..701071.49 rows=253635 width=281) (actual
>> time=14.629..53.588 rows=24 loops=1)
>>        Filter: (comment_tsv @@ plainto_tsquery('love'::text))
>> Total runtime: 53.731 ms
>> (4 rows)
>>
>> First one runs painfully slow.
>
> Hmm, everything is already written in explain :) In the first query 253635
> rows should be readed from disk and sorted, while in the
> second query only 24 (random) rows readed from disk, so there is 4
> magnitudes
> difference and in the worst case you should expected time for the 1st query
> about 53*10^4 ms.

If love is an uncommon word, there's no help for queries of this type
being slow unless the GIN index can return the results in order.  But
if love is a common word, then it would be faster to do an index scan
by timestamp on the baserel and then treat comment_tsv @@
plainto_tsquery('love') as a filter condition.  Is this a selectivity
estimation bug?

...Robert

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: hyperthreaded cpu still an issue in 8.4?
Next
From: Tom Lane
Date:
Subject: Re: Full text search with ORDER BY performance issue